CA3151219A1

CA3151219A1 - Data loading method, device, computer equipment and storage medium

Info

Publication number: CA3151219A1
Application number: CA3151219A
Authority: CA
Inventors: Yao Ge; Renshan LIN; Yuegen Yin; Kejia Zhu; Wei Ge
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2021-03-08
Filing date: 2022-03-07
Publication date: 2022-09-08
Also published as: CN113065084B; CN113065084A

Abstract

The present application relates to a data loading method, and corresponding device, computer equipment and storage medium. The method comprises: obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications; placing table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions; and pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

Description

DATA LOADING METHOD, DEVICE, COMPUTER EQUIPMENT AND STORAGE
MEDIUM
BACKGROUND OF THE INVENTION
Technical Field [0001] The present application relates to the field of data processing technology, and more particularly to a data loading method, and corresponding device, computer equipment and storage medium.
Description of Related Art

[0002] With the development of data processing technology, the caching technique of data has come into being. The principle of the caching technique is that, when it is required for a server system to read one piece of data, the data is firstly searched in the cache of the server, the data is immediately read once it is found in the cache, if it is not found in the cache, the data is read relatively slowly from the database or other regions.
Accordingly, the hit rate of the cache affects the performance of the system to a larger extent.

[0003] However, with respect to data of certain types in such professions as the e-commerce, the entire data might be hotspot data at a certain time point, and it is required to totally cache the entire data of the given type in order to satisfy performance requirement of the system.
Total caching is generally performed in the traditional caching method only when data volume is relatively small, when the volume of data to be loaded is relatively large, total caching would cause uneven distributions of pressure onto various application servers in a cluster, thereby engendering wastage of server hardware resources.

Date Recue/Date Received 2022-03-07 SUMMARY OF THE INVENTION

[0004] In view of the above technical problem, there is an urgent need to provide a data loading method, and corresponding device, computer equipment and storage medium capable of enhancing distribution equilibrium of pressure allocated to various servers in a cluster under the scenario of total cache data loading.

[0005] There is provided a data loading method that comprises:

[0006] obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications;

[0007] placing table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions; and

[0008] pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

[0009] In one embodiment, the step of placing table information of the various sub-tables of the data to be loaded in a task queue includes: screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond; and placing table information of the screened sub-tables sequentially in the task queue according to the screening order.

[0010] In one embodiment, the step of screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond includes: screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing at a current screening round; and performing a next round of screening Date Recue/Date Received 2022-03-07 out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing when the current screening round is not the last screening round.

[0011] In one embodiment, the method further comprises: obtaining loading mode information of the various sub-tables of the data to be loaded; modifying a loading status of first-type sub-tables as a waiting loading status, wherein the first-type sub-tables are sub-tables that employ a breakpoint continuing loading mode determined according to the loading mode information and that have not completed loading; and deleting data of second-type sub-tables in breakpoint continuing cache, wherein the second-type sub-tables are sub-tables that employ a loading-from-the-beginning loading mode determined according to the loading mode information.

[0012] In one embodiment, before one piece of the table information is obtained from the task queue, the method further comprises: extracting identification information of a requested server cluster from the monitored loading request; and entering the step of obtaining one piece of the table information from the task queue when the identification information of the requested server cluster matches with identification information of the server cluster.

[0013] In one embodiment, before data in a target sub-table to which the obtained table information corresponds is loaded to cache, the method further comprises:
determining a current concurrent volume of the sub-library in which the target sub-table locates;
entering the step of loading data in a target sub-table to which the obtained table information corresponds to cache when the current concurrent volume is smaller than a preset concurrency threshold; and placing the obtained table information back in a queue tail of the task queue when the current concurrent volume is greater than the preset concurrency threshold.

[0014] In one embodiment, the step of loading data in a target sub-table to which the obtained Date Recue/Date Received 2022-03-07 table information corresponds to cache includes: performing total loading from first data of the target sub-table when the target sub-table is a first-type sub-table;
and obtaining historical loading information of the target sub-table, and loading the target sub-table from data loaded when the previous loading was interrupted as recorded in the historical loading information, when the target sub-table is a second-type sub-table.

[0015] In one embodiment, the method further comprises: recording loading information of various sub-tables that have not completed loading, and storing the loading information of various sub-tables that have not completed loading in the breakpoint continuing cache, when a data loading operation is interrupted.

[0016] In one embodiment, the method further comprises: performing capacity expanding or capacity reducing adjustment on the servers in the server cluster according to the number of the loading requests.

[0017] There is provided a data loading device that comprises:

[0018] a sub-table determining module, for obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications;

[0019] a task placing module, for placing table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions; and

[0020] a request pushing module, for pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

[0021] There is provided a computer equipment that comprises a memory, a processor and a Date Recue/Date Received 2022-03-07 computer program stored on the memory and operable on the processor, and the steps of the aforementioned data loading method are realized when the processor executes the computer program.

[0022] There is provided a computer-readable storage medium storing a computer program thereon, and the steps of the aforementioned data loading method are realized when the computer program is executed by a processor.

[0023] In the aforementioned data loading method, device, computer equipment and storage medium, by determining corresponding sub-tables of data to be loaded in a database, inserting table information of various sub-tables in a task queue with the sub-tables as dimensions, and pushing loading requests of a number corresponding to the number of the sub-tables to a message queue, when various servers in a cluster monitor the loading requests in the message queue, it is made possible to obtain table information from the task queue and to load to cache the data in the corresponding sub-table according to the table information. Therefore, the various servers in the cluster do not entirely bear the loading pressure of the total data, rather, partial data loading pressure is respectively borne with sub-tables as dimensions, so that equilibrium of pressure allocated to various servers in the cluster is enhanced, and the utilization rate of the entire system hardware resource is enhanced.
BRIEF DESCRIPTION OF THE DRAWINGS

[0024] Fig. 1 is a view illustrating the application environment for a data loading method in an embodiment;

[0025] Fig. 2 is a flowchart schematically illustrating a data loading method in an embodiment;

[0026] Fig. 3 is a view schematically illustrating the overall architecture of a primary machine Date Recue/Date Received 2022-03-07 room in a concrete example of application;

[0027] Fig. 4 is a view schematically illustrating the overall architecture of a sub machine room in a concrete example of application;

[0028] Fig. 5 is a flowchart schematically illustrating a backstage message pusher of a total cache data loading method in a concrete example of application;

[0029] Fig. 6 is a flowchart schematically illustrating a backstage message monitor of a total cache data loading method in a concrete example of application;

[0030] Fig. 7 is a flowchart schematically illustrating calibration of cache and data consistency in the database in a concrete example of application;

[0031] Fig. 8 is a block diagram illustrating the structure of a data loading device in an embodiment; and

[0032] Fig. 9 is a view illustrating the internal structure of a computer equipment in an embodiment.
DETAILED DESCRIPTION OF THE INVENTION

[0033] To make more lucid and clear the objectives, technical solutions and advantages of the present application, the present application is described in greater detail below with reference to accompanying drawings and embodiments. As should be understood, the specific embodiments described here are merely meant to explain the present application, rather than to restrict the present application.

[0034] As can be understood, the wording "and/or" as used in the present application describes Date Recue/Date Received 2022-03-07 the association relation of associated objects, and expresses the existence of three possible relations, for instance, A and/or B can express the three circumstances of the single existence of A, the simultaneous existence of A and B, and the single existence of B. The sign "I" generally expresses the relation of "or" between the associated objects before the sign and after the sign.

[0035] The data loading method provided by the present application is applicable to the application environment as shown in Fig. 1. Current server 102 obtains sharding identifications of data to be loaded, determines sub-tables of the data to be loaded in a database according to the sharding identifications, places table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions, and pushes loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to message queue 106 monitored by server cluster 104, so that, when any server in server cluster 104 monitors any one of the loading requests, one piece of the table information is obtained from task queue 106, and data in a sub-table to which the obtained table information corresponds is loaded to cache.

[0036] Current server 102 can be embodied as an independent server or a server cluster consisting of a plurality of servers, and server 102 can also be embodied as anyone or plural server(s) in server cluster 104.

[0037] In one embodiment, as shown in Fig. 2, there is provided a data loading method, and the method is explained with an example of its being applied to the current server in Fig. 1, to comprise the following steps.

[0038] Step S202 ¨ obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications.

[0039] The data to be loaded indicates total data to which a business of a certain type, which is Date Recue/Date Received 2022-03-07 required to be loaded to cache, corresponds. The data to be loaded can be designated by a user through a loading client end. Sharding indicates to split an originally independent database into plural sub-libraries and to split a larger datasheet into plural sub-tables, so that the data volume of a single database and the data volume of a single datasheet become smaller, whereby is achieved the objective of enhancing database performance.
Sharding information can include sub-library number information and sub-table number information, etc.

[0040] Specifically, the current server can base on a business type designated by the user to take the total data to which the business type corresponds as the data to be loaded, and obtain sharding information of the data to be loaded, according to which sharding information can be determined the entire corresponding database sub-libraries of the data to be loaded in the database and the entire corresponding datasheet sub-tables in the various sub-libraries.

[0041] Step S204 - placing table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions.

[0042] The table information is information descriptive of features of the datasheet sub-tables and capable of instructing the server to enquire or locate a corresponding sub-table from the database, for instance, the table information can include such information as sub-library numbers, sub-table numbers or sub-table names, etc. The task queue indicates a component used to temporarily stores tasks, and can be a redis queue.

[0043] Specifically, the current server can take the separate sub-table as an independent dimension, and places table information of various sub-tables of the data to be loaded sequentially in the task queue, for example, such placing can be according to the order of sub-table numbers of the various sub-tables, and can also be according to a sequence self-defined by the user.

Date Recue/Date Received 2022-03-07

[0044] Step S206 - pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

[0045] The message queue indicates a container that stores messages during the process of transmitting the messages. The target sub-table indicates a sub-table to which the table information obtained by the server in the server cluster corresponds.

[0046] Specifically, the current server generates a corresponding number of loading requests according to the number of sub-tables of the data to be loaded, packages the loading requests into messages, and pushes the messages to the message queue, wherein the number of the packaged messages is the same as the number of the sub-tables.
The various servers in the server cluster monitor the message queue, when any server in the cluster monitors any one of the loading requests, an information invoking mechanism of the task queue can be abided by to obtain one piece of the table information from the task queue, for instance, one piece of table information can be sequentially obtained from the queue head of the task queue, a corresponding sub-table is located in the database according to the obtained table information to serve as a target sub-table, and data stored in the target sub-table is loaded to cache. The messages packaged according to the various loading requests can further contain loading mode information, and identification information of a requested server cluster designated by the user, etc.

[0047] In the aforementioned data loading method, by determining sub-tables of corresponding datasheets of data to be loaded in a database, inserting table information of various sub-tables in a task queue with the sub-tables as dimensions, and pushing loading requests of a number corresponding to the number of the sub-tables to a message queue, when any Date Recue/Date Received 2022-03-07 server in a cluster monitors any one of the loading requests in the message queue, it is made possible to obtain one piece of table information from the task queue and to load to cache the data in the corresponding sub-table according to the table information.
Therefore, the various servers in the cluster do not entirely bear the loading pressure of the total data, rather, partial data loading pressure is respectively borne with sub-tables as dimensions, so that equilibrium of pressure allocated to various servers in the cluster is enhanced, and the utilization rate of the entire system hardware resource is enhanced.

[0048] In one embodiment, the step of placing table information of the various sub-tables of the data to be loaded in a task queue includes: screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond; and placing table information of the screened sub-tables sequentially in the task queue according to the screening order.

[0049] In this embodiment, partial sub-tables are respectively screened out of various sub-libraries to which the sharding identifications correspond at each round according to rounds, the number of sub-tables screened out of the various sub-libraries each round is the same as other rounds, and table information of the screened sub-tables is sequentially placed in the task queue according to the screening order. Preferably, it is possible to screen out one sub-table from various sub-libraries respectively according to the sequence of the sub-table numbers.

[0050] In one embodiment, the step of screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond includes: screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing at a current screening round; and performing a next round of screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing when the current Date Recue/Date Received 2022-03-07 screening round is not the last screening round. The sub-library sequencing can be sequencing according to sub-library numbers, and can also be preset random sequencing.

[0051] For instance, 0th library 1st table, 1st library 1st table, 2nd library 1st table until N- lth library Pt table (where N is the number of the sub-libraries) can be screened out at the first round, and the table information of the Pt tables in the various sub-libraries is sequentially inserted into the task queue according to the screening order, namely according to the sequence of 0th library 1st table, 1st library 1st table, 2nd library 1st table... N- lth library 1st table; 0th library 2nd table, 1st library 2nd table, 2nd library 2nd table until Nth library 2nd table are screened out at the second round, and the table information of the 2nd tables in the various sub-libraries is sequentially inserted into the task queue according to the screening order, cycling round by round, until the table information of the entire sub-tables of the entire sub-libraries is entirely placed in the task queue when the last round ends.

[0052] In this embodiment, by sequentially placing the sub-tables screened out of the various sub-libraries round by round in the task queue, and maintaining the same number of sub-tables respectively screened out the various sub-libraries at each round and placed in the task queue, it is made possible to lessen the pressure borne by any single sub-library, to enhance equilibrium of pressure allocated to the various sub-libraries of the database, and to hence enhance data loading stability.

[0053] In one embodiment, the method further comprises: obtaining loading mode information of the various sub-tables of the data to be loaded; modifying a loading status of first-type sub-tables as a waiting loading status, wherein the first-type sub-tables are sub-tables that employ a breakpoint continuing loading mode determined according to the loading mode information and that have not completed loading; and deleting data of second-type sub-tables in breakpoint continuing cache, wherein the second-type sub-tables are sub-tables that employ a loading-from-the-beginning loading mode determined according to the Date Recue/Date Received 2022-03-07 loading mode information.

[0054] In this embodiment, the user can designate the loading modes of the various sub-tables of the data to be loaded, and the current server obtains the designated loading mode information of the various sub-tables of the data to be loaded, and makes preparation work before data loading according to the loading mode information. More specifically, the current server takes the sub-tables that employ a breakpoint continuing loading mode and that have not completed loading as first-type sub-tables according to the loading mode information, takes sub-tables that employ a loading-from-the-beginning loading mode as second-type sub-tables, and deletes cache data of the second-type sub-tables in the breakpoint continuing cache.

[0055] In a practical application scenario, if the breakpoint continuing loading mode is employed, the loading status of a sub-table that had not completed loading when the previous loading ended might be pause (loading paused), complete (loading completed) or fail (loading failed), since a redis lock is set up before loading to avoid repetitive loadings, the condition of judging whether to end the loading task and to delete the redis lock is that the loading statuses of all sub-tables are pause, complete or fail, if the loading status is not reset, the redis lock might be deleted before the loading is actually ended. Therefore, by resetting the loading status of the sub-table that employs the breakpoint continuing loading mode as waiting (waiting to be loaded) before the data is loaded in this embodiment, it is made possible to enhance data loading precision, and to avoid data loading omission caused by early release of the redis lock.

[0056] On the other hand, in this embodiment, the current server can further delete the data of the sub-tables that employ the loading-from-the-beginning loading mode in the breakpoint continuing cache, whereby it is made possible to avoid repeated loadings of data, and to further enhance data loading precision.

Date Recue/Date Received 2022-03-07

[0057] In one embodiment, before one piece of the table information is obtained from the task queue, the method further comprises: extracting identification information of a requested server cluster from the monitored loading request; and entering the step of obtaining one piece of the table information from the task queue when the identification information of the requested server cluster matches with identification information of the server cluster.

[0058] In this embodiment, the loading request contains identification information of requested server clusters, for example, such identification information may be the machine room number, etc. A requested server cluster indicates a server cluster (machine room) that is requested to execute a data loading task. Specifically, the user can designate a server cluster that executes the task according to requirement, when the server cluster that monitors the loading request is not the server cluster designated by the user, the loading task exits. In this embodiment, when there are simultaneously plural server clusters, it can be supported to flexibly select and switch the server cluster that executes the task, and it is possible to control and transfer the loading task according to practical resource circumstances, so that data loading efficiency is enhanced.

[0059] In one embodiment, before data in a target sub-table to which the obtained table information corresponds is loaded to cache, the method further comprises:
determining a current concurrent volume of the sub-library in which the target sub-table locates;
entering the step of loading data in a target sub-table to which the obtained table information corresponds to cache when the current concurrent volume is smaller than a preset concurrency threshold; and placing the obtained table information back in a queue tail of the task queue when the current concurrent volume is greater than the preset concurrency threshold.

[0060] In this embodiment, the server in the cluster determines a current concurrent volume of the sub-library to which the obtained table information corresponds according to the obtained table information, and determines whether to immediately load the data Date Recue/Date Received 2022-03-07 according to the current concurrent volume. When the current concurrent volume is greater than a preset concurrency threshold, it is further possible to base on the current resource circumstance of the system to wait for a certain time before placing the table information back in the task queue. In this embodiment, by monitoring the current concurrent volume of the sub-library, it is made possible to achieve control of flows of the various sub-libraries, to thereby balance the concurrent volumes of the various sub-libraries, to control the pressure of the separate library, and to enhance equilibrium of pressure allocated to the various sub-libraries.

[0061] In one embodiment, the step of loading data in a target sub-table to which the obtained table information corresponds to cache includes: performing total loading from first data of the target sub-table when the target sub-table is a first-type sub-table;
and obtaining historical loading information of the target sub-table, and loading the target sub-table from data loaded when the previous loading was interrupted as recorded in the historical loading information, when the target sub-table is a second-type sub-table.

[0062] In this embodiment, it is not only supported to perform total loading on the data in the target sub-table, but also supported to continually transmit the data from breakpoint in the target sub-table, and this embodiment can load the data in the target sub-table with a corresponding mode according to different loading modes employed by the target sub-table. Accordingly, it is made possible to enhance data loading flexibility and controllability, and the breakpoint continuing loading mode can continually transmit data according to the historical loading information, so that loading time is shortened, and loading speed is enhanced.

[0063] In one embodiment, the method further comprises: recording loading information of various sub-tables that have not completed loading, and storing the loading information of various sub-tables that have not completed loading in the breakpoint continuing cache, when a data loading operation is interrupted.

Date Recue/Date Received 2022-03-07

[0064] This embodiment can support pausing and resuming operations of the loading process, by recording loading information of various sub-tables that have not completed loading when the data loading operation is interrupted, when the loading instruction is received again, it is possible to directly invoke historical loading information recorded in the previous interruption from the breakpoint continuing cache, and to complete continued transmission according to the record, whereby data loading controllability and checkability are enhanced, and loading information loss brought about by abnormal interruption is avoided.

[0065] In one embodiment, the method further comprises: performing capacity expanding or capacity reducing adjustment on the servers in the server cluster according to the number of the loading requests.

[0066] This embodiment can dynamically adjust the capacity of the servers in the server cluster that monitors the message queue according to the data volume of the data to be loaded, namely the number of loading requests generated. For instance, when the number of the loading requests is relatively large, the capacity of the servers can be expanded, so as to enhance the speed of total data loading to cache, and to enable the system performance to quickly satisfy the business requirement; when the number of the loading requests is relatively small, the capacity of the servers can be reduced, so as to reduce hardware resource wastage.

[0067] Moreover, the method can further comprise the following steps: judging whether the data of a certain business type already loaded to cache is consistent with the data of this business type in the database both before and after data loading when data loading has been completed, in the case the data in the two are inconsistent, dynamically adjusting the data in the cache, so as to maintain consistent the data in the cache with the data in the database. For instance, when the data in the database is deleted, the data in the cache Date Recue/Date Received 2022-03-07 is correspondingly deleted, when data is updated in the database, the data is correspondingly updated in the cache.

[0068] The data loading method dealt with in the present application is further explained in detail below in conjunction with a concrete example of application. Refer to Figs. 3 to 4, of which Fig. 3 is a view schematically illustrating the overall architecture of a primary machine room in a concrete example of application, and Fig. 4 is a view schematically illustrating the overall architecture of a sub machine room in a concrete example of application. Logics for performing total cache data loading method based on the above overall architectures are as schematically shown in Figs. 5 to 7. Fig. 5 is a flowchart schematically illustrating a backstage message pusher of a total cache data loading method, and Fig. 6 is a flowchart schematically illustrating a backstage message monitor of a total cache data loading method. Fig. 7 is a flowchart schematically illustrating calibration of cache and data consistency in the database after completion of loading.
Specifically, the total cache data loading logics are described as follows:

[0069] 1. The backstage can select the machine room required for loading, the total cache type to be loaded, and the (loading-from-the-beginning or breakpoint continuing) loading mode via page.

[0070] 2. A redis lock is set up to prevent business data of the same type from being loaded for several times.

[0071] 3. The loading mode of the total cache selected via page is judged.

[0072] 4. In the case of the loading-from-the-beginning loading mode, breakpoint continuing redis cache is deleted via a microservice framework.

[0073] 5. In the case of the breakpoint continuing loading mode, status in the breakpoint continuing redis cache is set as waiting via the microservice framework.

[0074] 6. The numbers of the entire sub-libraries and sub-tables are calculated according to the type of the business data to be totally cached as designated, and insertion into a redis task queue is effected with the dimensional order of separate tables, for instance, the insertion rule can be as follows: insertion is effected by polling with the mode of once a single Date Recue/Date Received 2022-03-07 table according to the sub-library number, i.e., table information is sequentially inserted into the task queue by the order of firstly inserting 0th library Pt table, secondly inserting 1st library 1st table, thirdly inserting 2nd library 1st table... in the Nth time inserting N- lth library Pt table in the first round; firstly inserting 0th library 2nd table, secondly inserting 1st library 2nd table, thirdly inserting 2nd library 2nd table... in the Nth time inserting N- lth library Pt table in the second round, so on and so forth, firstly inserting 0th library nth table, secondly inserting Pt library nth table, thirdly inserting 2nd library nth table... in the Nth time inserting N- lth library nth table in the nth round. Single library pressure can be reduced, and pressure on each sub-library is balanced in the loading process.

[0075] 7. Loading requests are pushed to the message queue according to the type of the business data and the number of the sub-tables.

[0076] 8. A backstage application monitors messages in the message queue.

[0077] 9. It is judged whether the loading request information in the messages contains identification information of the current machine room in which the current application locates.

[0078] 10. If the loading request information does not contain identification information of the current machine room in which the current application locates, the process directly exits.

[0079] 11. If the loading request information contains identification information of the current machine room in which the current application locates, one piece of table information is obtained from the redis task queue to which the business type corresponds according to the business type, and it is judged whether the number (in the case of the breakpoint continuing loading mode, the number can be the number in the statuses as running in breakpoint continuing) of executed single libraries of the database sub-libraries to which the table information corresponds reaches the concurrency threshold; if the concurrency threshold is reached, the table information is inserted back into the queue tail of the redis task queue after having lain dormant for 1000 ms, and the loading request is pushed anew to the message queue; if the concurrency threshold is not reached, the following two circumstances can be identified.

Date Recue/Date Received 2022-03-07

[0080] As regards the loading-from-the-beginning loading mode: the corresponding sub-table to be loaded is located according to the table information, data in the sub-table is loaded to the redis cluster starting from the first data in the sub-table, and the database is enquired once both before and after loading to the redis cluster to compare whether data in the database and data in the cache are consistent. The table information can include data types, sub-library numbers, sub-table numbers, table names, etc.

[0081] As regards the breakpoint continuing loading mode:

[0082] 1) Loading information of the corresponding sub-table to be loaded is obtained in the redis breakpoint continuing cache, status of the table is updated as running, starttime and execution ip are recorded, starttime of the current batch is recorded, data is traversed from the breakpoint continuing id of the sub-table: database data is stored in cache value, current time is stored in cache loadTime field, and cache key is set as never invalidated.

[0083] 2) Starting id is searched out and data is enquired once again (for calibration).

[0084] 3) Data in the database and data in the cache are compared as to their difference according to starttime, starting id and update time of the data, data in the cache is deleted if data is reduced, and data in the cache is updated if data is updated.

[0085] 4) The loading information is updated, for instance, the id location loaded to in the breakpoint continuing cache is updated as the last id of this batch.

[0086] 5) It is judged whether traversing of the table is ended, if traversing of the table is ended, status in the breakpoint continuing redis of this table is set as complete, and endTime and ip are recorded. If traversing of the table is not ended, a task manual interruption switch is judged, if the task manual interruption switch is On, status in the breakpoint continuing redis of this table is set as pause, endTime and ip are recorded; if the task manual interruption switch is Off, cycle begins to traverse ids of the next batch.

[0087] 6) It is checked whether breakpoint continuing statuses of the entire sub-libraries and sub-tables of the cache type are complete, if breakpoint continuing statuses of the entire sub-libraries and sub-tables of the cache type are complete, redis lock is deleted, and total loading of this table exits; if breakpoint continuing statuses of some sub-libraries and sub-Date Recue/Date Received 2022-03-07 tables of the cache type are not complete, it is judged whether breakpoint continuing statuses of the entire sub-libraries and sub-tables still have data of the waiting or running status.

[0088] 7) If none of breakpoint continuing statuses of the entire sub-libraries and sub-tables is the waiting or running status, redis lock is firstly deleted and total cache loading of this table exits, otherwise total cache loading of this table directly exits.

[0089] The breakpoint continuing redis cache data structure (Hash structure) is as shown in the following Table, for progress monitor and pause resumption of the loading process:

[0090]
Key BREAK
POINT RESUME: CACHE TYPE: SUB-LIBRARY NUMBER
Field SUB-TABLE NUMBER
Value {id,status,errorMsg,startTime,endTime,ip} (status as waiting,running,pause,complete,fail)

[0091] The redis task queue data structure (an orderly task queue is created to equally share database pressure) is as shown in the following Table:

[0092]
Key FULL CACHE QUEUE:CACHE TYPE
Value {tableNum,dbNum,tableName } , {tableNum,dbNum,tableNam e } , {tableNum,dbNum,tableName} ...

[0093] The data loading method dealt with in the above concrete example of application achieves the following characteristics:

[0094] 1. Total cache loading can make full use of application server resource, equally share application server pressure, and quicken the loading progress at any time through capacity expansion of the application server.

[0095] 2. Query and control of statuses of the total cache loading process are supported, database pressure is equally shared, and database pressure is controlled through single-library Date Recue/Date Received 2022-03-07 threshold.

[0096] Before use of the data loading method dealt with in the above concrete example of application: it takes 30 minutes to totally cache-load one hundred million pieces of data, pressure is unevenly distributed in various application servers and various databases during total cache loading, hardware resource wastage is engendered, and it is impossible for system performance to quickly meet business requirements.

[0097] After use of the data loading method dealt with in the above concrete example of application: it takes less than 5 minutes to totally cache-load one hundred million pieces of data, pressure is evenly distributed in various application servers and various databases during total cache loading, loading progress can be quickened through capacity expansion of the application servers, hardware resource is maximally utilized, and system performance is enabled to quickly meet business requirements.

[0098] As should be understood, although the various steps in the flowcharts of Figs. 2 and 5-7 are sequentially displayed as indicated by arrows, these steps are not necessarily executed in the sequences indicated by arrows. Unless otherwise explicitly noted in this paper, execution of these steps is not restricted by any sequence, as these steps can also be executed in other sequences (than those indicated in the drawings). Moreover, at least partial steps in the flowcharts of Figs. 2 and 5-7 may include plural sub-steps or multi-phases, these sub-steps or phases are not necessarily completed at the same timing, but can be executed at different timings, and these sub-steps or phases are also not necessarily sequentially performed, but can be performed in turns or alternately with other steps or with at least some of sub-steps or phases of other steps.

[0099] In one embodiment, as shown in Fig. 8, there is provided a data loading device that comprises a sub-table determining module 810, a task placing module 820 and a request pushing module 830, of which:
Date Recue/Date Received 2022-03-07

[0100] the sub-table determining module 810 is employed for obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications;

[0101] the task placing module 820 is employed for placing table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions; and

[0102] the request pushing module 830 is employed for pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

[0103] In one embodiment, the task placing module 820 screens out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond, and places table information of the screened sub-tables sequentially in the task queue according to the screening order.

[0104] In one embodiment, the task placing module 820 screens out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing at a current screening round, and performs a next round of screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing when the current screening round is not the last screening round.

[0105] In one embodiment, the sub-table determining module 810 is further employed for obtaining loading mode information of the various sub-tables of the data to be loaded;
modifying a loading status of first-type sub-tables as a waiting loading status, wherein the first-type sub-tables are sub-tables that employ a breakpoint continuing loading mode determined according to the loading mode information and that have not completed Date Recue/Date Received 2022-03-07 loading; and deleting data of second-type sub-tables in breakpoint continuing cache, wherein the second-type sub-tables are sub-tables that employ a loading-from-the-beginning loading mode determined according to the loading mode information.

[0106] In one embodiment, the device further comprises a data loading module 840 that, before one piece of the table information is obtained from the task queue, extracts identification information of a requested server cluster from the monitored loading request, and enters the step of obtaining one piece of the table information from the task queue when the identification information of the requested server cluster matches with identification information of the server cluster.

[0107] In one embodiment, before data in a target sub-table to which the obtained table information corresponds is loaded to cache, the data loading module 840 is further employed for determining a current concurrent volume of the sub-library in which the target sub-table locates; entering the step of loading data in a target sub-table to which the obtained table information corresponds to cache when the current concurrent volume is smaller than a preset concurrency threshold; and placing the obtained table information back in a queue tail of the task queue when the current concurrent volume is greater than the preset concurrency threshold.

[0108] In one embodiment, the data loading module 840 performs total loading from first data of the target sub-table when the target sub-table is a first-type sub-table, and obtains historical loading information of the target sub-table, and loads the target sub-table from data loaded when the previous loading was interrupted as recorded in the historical loading information, when the target sub-table is a second-type sub-table.

[0109] In one embodiment, the device further comprises an information recording module 850 that records loading information of various sub-tables that have not completed loading, and stores the loading information of various sub-tables that have not completed loading Date Recue/Date Received 2022-03-07 in the breakpoint continuing cache, when a data loading operation is interrupted.

[0110] In one embodiment, the data loading module 840 is further employed for performing capacity expanding or capacity reducing adjustment on the servers in the server cluster according to the number of the loading requests.

[0111] Specific definitions relevant to the data loading device may be inferred from the aforementioned definitions to the data loading method, while no repetition is made in this context. The various modules in the aforementioned data loading device can be wholly or partly realized via software, hardware, and a combination of software with hardware.
The various modules can be embedded in the form of hardware in a processor in a computer equipment or independent of any computer equipment, and can also be stored in the form of software in a memory in a computer equipment, so as to facilitate the processor to invoke and perform operations corresponding to the aforementioned various modules.

[0112] In one embodiment, a computer equipment is provided, the computer equipment can be a server, and its internal structure can be as shown in Fig. 9. The computer equipment comprises a processor, a memory, a network interface, and a database connected to each other via a system bus. The processor of the computer equipment is employed to provide computing and controlling capabilities. The memory of the computer equipment includes a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores therein an operating system, a computer program and a database. The internal memory provides environment for the running of the operating system and the computer program in the nonvolatile storage medium. The database of the computer equipment is employed to store data to be loaded. The network interface of the computer equipment is employed to connect to an external terminal via network for communication. The computer program realizes a data loading method when it is executed by a processor.

Date Recue/Date Received 2022-03-07

[0113] As understandable to persons skilled in the art, the structure illustrated in Fig. 9 is merely a block diagram of partial structure relevant to the solution of the present application, and does not constitute any restriction to the computer equipment on which the solution of the present application is applied, as the specific computer equipment may comprise component parts that are more than or less than those illustrated in Fig. 9, or may combine certain component parts, or may have different layout of component parts.

[0114] In one embodiment, there is provided a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:
obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications;
placing table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions; and pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

[0115] In one embodiment, when the processor executes the computer program to realize the step of placing table information of the various sub-tables of the data to be loaded in a task queue, the following steps are specifically realized: screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond; and placing table information of the screened sub-tables sequentially in the task queue according to the screening order.

[0116] In one embodiment, when the processor executes the computer program to realize the step of screening out the same number of sub-tables round by round from various sub-Date Recue/Date Received 2022-03-07 libraries, respectively, to which the sharding identifications correspond, the following steps are specifically realized: screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing at a current screening round; and performing a next round of screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing when the current screening round is not the last screening round.

[0117] In one embodiment, when the processor executes the computer program, the following steps are further realized: obtaining loading mode information of the various sub-tables of the data to be loaded; modifying a loading status of first-type sub-tables as a waiting loading status, wherein the first-type sub-tables are sub-tables that employ a breakpoint continuing loading mode determined according to the loading mode information and that have not completed loading; and deleting data of second-type sub-tables in breakpoint continuing cache, wherein the second-type sub-tables are sub-tables that employ a loading-from-the-beginning loading mode determined according to the loading mode information.

[0118] In one embodiment, before the processor executes the computer program to realize the step of obtaining one piece of the table information from the task queue, the following steps are further realized: extracting identification information of a requested server cluster from the monitored loading request; and entering the step of obtaining one piece of the table information from the task queue when the identification information of the requested server cluster matches with identification information of the server cluster.

[0119] In one embodiment, before the processor executes the computer program to realize the step of loading data in a target sub-table to which the obtained table information corresponds to cache, the following steps are further realized: determining a current concurrent volume of the sub-library in which the target sub-table locates;
entering the Date Recue/Date Received 2022-03-07 step of loading data in a target sub-table to which the obtained table information corresponds to cache when the current concurrent volume is smaller than a preset concurrency threshold; and placing the obtained table information back in a queue tail of the task queue when the current concurrent volume is greater than the preset concurrency threshold.

[0120] In one embodiment, when the processor executes the computer program to realize the step of loading data in a target sub-table to which the obtained table information corresponds to cache, the following steps are specifically realized:
performing total loading from first data of the target sub-table when the target sub-table is a first-type sub-table; and obtaining historical loading information of the target sub-table, and loading the target sub-table from data loaded when the previous loading was interrupted as recorded in the historical loading information, when the target sub-table is a second-type sub-table.

[0121] In one embodiment, when the processor executes the computer program, the following steps are further realized: recording loading information of various sub-tables that have not completed loading, and storing the loading information of various sub-tables that have not completed loading in the breakpoint continuing cache, when a data loading operation is interrupted.

[0122] In one embodiment, when the processor executes the computer program, the following step is further realized: performing capacity expanding or capacity reducing adjustment on the servers in the server cluster according to the number of the loading requests.

[0123] In one embodiment, there is provided a computer-readable storage medium storing thereon a computer program, and the following steps are realized when the computer program is executed by a processor: obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications; placing table information of the various sub-tables of the data to Date Recue/Date Received 2022-03-07 be loaded in a task queue with the sub-tables as dimensions; and pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

[0124] In one embodiment, when the computer program is executed by a processor to realize the step of placing table information of the various sub-tables of the data to be loaded in a task queue, the following steps are specifically realized: screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond; and placing table information of the screened sub-tables sequentially in the task queue according to the screening order.

[0125] In one embodiment, when the computer program is executed by a processor to realize the step of screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond, the following steps are specifically realized: screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing at a current screening round; and performing a next round of screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing when the current screening round is not the last screening round.

[0126] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: obtaining loading mode information of the various sub-tables of the data to be loaded; modifying a loading status of first-type sub-tables as a waiting loading status, wherein the first-type sub-tables are sub-tables that employ a breakpoint continuing loading mode determined according to the loading mode information and that Date Recue/Date Received 2022-03-07 have not completed loading; and deleting data of second-type sub-tables in breakpoint continuing cache, wherein the second-type sub-tables are sub-tables that employ a loading-from-the-beginning loading mode determined according to the loading mode information.

[0127] In one embodiment, before the computer program is executed by a processor to realize the step of obtaining one piece of the table information from the task queue, the following steps are further realized: extracting identification information of a requested server cluster from the monitored loading request; and entering the step of obtaining one piece of the table information from the task queue when the identification information of the requested server cluster matches with identification information of the server cluster.

[0128] In one embodiment, before the computer program is executed by a processor to realize the step of loading data in a target sub-table to which the obtained table information corresponds to cache, the following steps are further realized: determining a current concurrent volume of the sub-library in which the target sub-table locates;
entering the step of loading data in a target sub-table to which the obtained table information corresponds to cache when the current concurrent volume is smaller than a preset concurrency threshold; and placing the obtained table information back in a queue tail of the task queue when the current concurrent volume is greater than the preset concurrency threshold.

[0129] In one embodiment, when the computer program is executed by a processor to realize the step of loading data in a target sub-table to which the obtained table information corresponds to cache, the following steps are specifically realized:
performing total loading from first data of the target sub-table when the target sub-table is a first-type sub-table; and obtaining historical loading information of the target sub-table, and loading the target sub-table from data loaded when the previous loading was interrupted as recorded in the historical loading information, when the target sub-table is a second-type sub-table.

Date Recue/Date Received 2022-03-07

[0130] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: recording loading information of various sub-tables that have not completed loading, and storing the loading information of various sub-tables that have not completed loading in the breakpoint continuing cache, when a data loading operation is interrupted.

[0131] In one embodiment, when the computer program is executed by a processor, the following step is further realized: performing capacity expanding or capacity reducing adjustment on the servers in the server cluster according to the number of the loading requests.

[0132] As comprehensible to persons ordinarily skilled in the art, the entire or partial flows in the methods according to the aforementioned embodiments can be completed via a computer program instructing relevant hardware, the computer program can be stored in a nonvolatile computer-readable storage medium, and the computer program can include the flows as embodied in the aforementioned various methods when executed. Any reference to the memory, storage, database or other media used in the various embodiments provided by the present application can all include nonvolatile and/or volatile memory/memories. The nonvolatile memory can include a read-only memory (ROM), a programmable ROM (PROM), an electrically programmable ROM (EPROM), an electrically erasable and programmable ROM (EEPROM) or a flash memory. The volatile memory can include a random access memory (RAM) or an external cache memory. To serve as explanation rather than restriction, the RAM is obtainable in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM
(SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM
(RDRAM), etc.

Date Recue/Date Received 2022-03-07

[0133] Technical features of the aforementioned embodiments are randomly combinable, while all possible combinations of the technical features in the aforementioned embodiments are not exhausted for the sake of brevity, but all these should be considered to fall within the scope recorded in the Description as long as such combinations of the technical features are not mutually contradictory.

[0134] The foregoing embodiments are merely directed to several modes of execution of the present application, and their descriptions are relatively specific and detailed, but they should not be hence misunderstood as restrictions to the inventive patent scope. As should be pointed out, persons with ordinary skill in the art may further make various modifications and improvements without departing from the conception of the present application, and all these should pertain to the protection scope of the present application.
Accordingly, the patent protection scope of the present application shall be based on the attached Claims.
Date Recue/Date Received 2022-03-07

Claims

What is claimed is:

1. A data loading method, characterized in comprising:
obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications;
placing table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions; and pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

2. The method according to Claim 1, characterized in that the step of placing table information of the various sub-tables of the data to be loaded in a task queue includes:
screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond; and placing table information of the screened sub-tables sequentially in the task queue according to the screening order;
preferably, the step of screening out the same number of sub-tables round by round from various sub-libraries, respectively, to which the sharding identifications correspond includes:
screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing at a current screening round; and performing a next round of screening out the same number of sub-tables from various sub-libraries, respectively, to which the sharding identifications correspond according to sub-library sequencing when the current screening round is not the last screening round.

Date Recue/Date Received 2022-03-07

3. The method according to Claim 1, characterized in further comprising:
obtaining loading mode information of the various sub-tables of the data to be loaded;
modifying a loading status of first-type sub-tables as a waiting loading status, wherein the first-type sub-tables are sub-tables that employ a breakpoint continuing loading mode determined according to the loading mode information and that have not completed loading;
and deleting data of second-type sub-tables in breakpoint continuing cache, wherein the second-type sub-tables are sub-tables that employ a loading-from-the-beginning loading mode determined according to the loading mode information.

4. The method according to Claim 1, characterized in that, before one piece of the table information is obtained from the task queue, the method further comprises:
extracting identification information of a requested server cluster from the monitored loading request; and entering the step of obtaining one piece of the table information from the task queue when the identification information of the requested server cluster matches with identification information of the server cluster.

5. The method according to Claim 1, characterized in that, before data in a target sub-table to which the obtained table information corresponds is loaded to cache, the method further comprises:
determining a current concurrent volume of the sub-library in which the target sub-table locates;
entering the step of loading data in a target sub-table to which the obtained table information corresponds to cache when the current concurrent volume is smaller than a preset concurrency threshold; and placing the obtained table information back in a queue tail of the task queue when the current concurrent volume is greater than the preset concurrency threshold.

6. The method according to Claim 3, characterized in that the step of loading data in a target sub-Date Recue/Date Received 2022-03-07 table to which the obtained table information corresponds to cache includes:
performing total loading from first data of the target sub-table when the target sub-table is a first-type sub-table; and obtaining historical loading information of the target sub-table, and loading the target sub-table from data loaded when the previous loading was interrupted as recorded in the historical loading information, when the target sub-table is a second-type sub-table.

7. The method according to any of Claims 1 to 6, characterized in further comprising:
recording loading information of various sub-tables that have not completed loading, and storing the loading information of various sub-tables that have not completed loading in the breakpoint continuing cache, when a data loading operation is interrupted; and/or performing capacity expanding or capacity reducing adjustment on the servers in the server cluster according to the number of the loading requests.

8. A data loading device, characterized in comprising:
a sub-table determining module, for obtaining sharding identifications of data to be loaded, and determining sub-tables of the data to be loaded in a database according to the sharding identifications;
a task placing module, for placing table information of the various sub-tables of the data to be loaded in a task queue with the sub-tables as dimensions; and a request pushing module, for pushing loading requests of a number corresponding to the number of the sub-tables of the data to be loaded to a message queue monitored by a server cluster, so that, when any server in the server cluster monitors any one of the loading requests, one piece of the table information is obtained from the task queue, and data in a target sub-table to which the obtained table information corresponds is loaded to cache.

9. A computer equipment, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, characterized in that the method steps according to any of Claims 1 to 7 are realized when the processor executes the computer program.

Date Recue/Date Received 2022-03-07

10. A computer-readable storage medium, storing a computer program thereon, characterized in that the method steps according to any of Claims 1 to 7 are realized when the computer program is executed by a processor.

Date Recue/Date Received 2022-03-07