CN113821501B

CN113821501B - Data archiving method and device

Info

Publication number: CN113821501B
Application number: CN202110923731.3A
Authority: CN
Inventors: 刘超; 赵国庆; 曾琳铖曦; 吴海英
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-05-16
Anticipated expiration: 2041-08-12
Also published as: CN113821501A

Abstract

The invention discloses a data archiving method and device, which relate to the technical field of data processing, and mainly comprise the following steps: determining query sample sentences corresponding to the archiving requirements of an archiving task, wherein different archiving requirements correspond to different query sample sentences, and conditional items are arranged in the query sample sentences and are used for limiting archiving of data meeting corresponding conditions; assigning values to the condition items in the query sample statement according to the archiving condition information of the archiving task to form a query statement; executing the query statement in a database corresponding to the archiving task; and archiving the data queried according to the query statement. According to the technical scheme, the data meeting the specific archiving requirement in the database can be archived, so that the data meeting the specific condition in the database is separated from the database, and a storage space is reserved for the database.

Description

Data archiving method and device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data archiving method and apparatus.

Background

In internet applications, the data volume of some core data tables in a database may increase over time, and the performance of the database may decrease as the storage of these data tables becomes larger. Often only a small portion of these data in the vast amount of data tables of the database are frequently queried and used, while a large portion of the data in the data tables are rarely queried and used. There is currently no efficient way to strip this portion of the rarely queried and used data from the database.

Disclosure of Invention

In view of this, the present invention provides a data archiving method and apparatus, and is mainly aimed at archiving data meeting specific archiving requirements in a database.

In order to achieve the above purpose, the present invention mainly provides the following technical solutions:

in a first aspect, the present invention provides a data archiving method, the method comprising:

determining query sample sentences corresponding to the archiving requirements of an archiving task, wherein different archiving requirements correspond to different query sample sentences, and conditional items are arranged in the query sample sentences and are used for limiting archiving of data meeting corresponding conditions;

assigning values to the condition items in the query sample statement according to the archiving condition information of the archiving task to form a query statement;

executing the query statement in a database corresponding to the archiving task;

and archiving the data queried according to the query statement.

In a second aspect, the present invention provides a data archiving apparatus, the apparatus comprising:

a selecting unit, configured to determine query sample sentences corresponding to archiving requirements of an archiving task, where different archiving requirements correspond to different query sample sentences, and conditional items are set in the query sample sentences, and the conditional items are used to define archiving of data meeting corresponding conditions;

The generating unit is used for assigning values to the condition items in the query sample statement according to the archiving condition information of the archiving task to form a query statement;

the execution unit is used for executing the query statement in the database corresponding to the archiving task;

and the archiving unit is used for archiving the data queried according to the query statement.

In a third aspect, the present invention provides a computer readable storage medium, where the storage medium includes a stored program, where the program, when executed, controls a device in which the storage medium is located to perform the data archiving method of the first aspect.

In a fourth aspect, the present invention provides a storage management apparatus comprising:

a memory for storing a program;

a processor, coupled to the memory, for executing the program to perform the data archiving method of the first aspect.

By means of the technical scheme, when an archiving task exists, the data archiving method and device provided by the invention determine the query sample statement corresponding to the archiving requirement of the archiving task, and assign values to the condition items in the query sample statement according to the archiving condition information of the archiving task to form the query statement. And executing the query statement in the database corresponding to the archiving task, and archiving the data queried by the query statement. Therefore, the scheme provided by the invention can archive the data meeting the specific archiving requirement in the database because the query statement for querying the archiving data is set according to the archiving requirement and the archiving condition information, so that the data meeting the specific condition in the database is separated from the database, and a storage space is reserved for the database.

The foregoing description is only an overview of the present invention, and is intended to provide a better understanding of the technical means of the present invention, as it is embodied in the present specification, and is intended to provide a better understanding of the above and other objects, features and advantages of the present invention, as it is embodied in the following description.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a flow chart of a method for archiving data in accordance with one embodiment of the present invention;

FIG. 2 is a flow chart of a method for archiving data in accordance with another embodiment of the present invention;

FIG. 3 is a schematic diagram of a data archiving apparatus according to one embodiment of the present invention;

fig. 4 is a schematic structural diagram of a data archiving device according to another embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In practical applications, the data volume of some core data tables in a database may increase over time, and the performance of the database may decrease as the storage of these data tables becomes larger. Often only a small portion of these data-intensive data tables of the database are frequently queried and used, while a large portion of the data in the data tables is rarely queried and used, which occupies a large amount of storage space in the database. At present, data archiving can only be performed on the whole data in a data table, and cannot be performed on the data meeting specific conditions in the data table, so that no effective method is available for stripping the data rarely queried and used in the database from the database. In order to strip data which is rarely queried and used in a database from the database, and archive and store the data with cheaper storage, the embodiment of the invention provides a data archiving method and device. The following specifically describes a data archiving method and device provided in the embodiment of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a data archiving method, which mainly includes:

101. a query sample statement corresponding to an archiving requirement of an archiving task is determined, wherein different archiving requirements correspond to different query sample statements, and conditional items are arranged in the query sample statement and are used for limiting archiving of data meeting corresponding conditions.

In order to archive data in a database that meets specific archiving requirements, a query sample statement needs to be set. The query sample statement is specifically described in terms of:

first, in order to meet different filing requirements, a plurality of query sample sentences need to be set, and one query sample sentence corresponds to one filing requirement in the set plurality of query sample sentences, that is, different filing requirements correspond to different query sample sentences. Illustratively, there is an archival need as follows: the method comprises the steps of multi-table data aggregation archiving, single-table condition archiving and multi-table associated archiving, wherein the multi-table data aggregation archiving has corresponding query sample sentences, the single-table condition archiving has corresponding query sample sentences and the multi-table associated archiving has corresponding query sample sentences.

Second, since the different types of databases have restrictions on the types of sentences, the types of query sample sentences should meet the requirements of the databases. Illustratively, the database requires the use of sql statements, and the query sample statement is of the type sql statement. Illustratively, if the database requires the use of an oracle statement, the type of the query sample statement is an oracle statement.

Thirdly, in order to archive only data satisfying a specific condition, conditional items are set in the query sample statement described in the present embodiment. The condition items are used for limiting the archiving of the data meeting the corresponding conditions, and query sentences which only query the data meeting the specific conditions in the database are formed by assigning values to the condition items. The number and type of condition items may be determined based on specific service requirements, and are not specifically limited in this embodiment. The condition items may include, but are not limited to, at least one of: data table ID, row ID, data field (e.g., time, name, etc.). Illustratively, the set condition items in query sample statement 1 are: after the condition items in the query sample statement 1 are assigned, the value of the data table ID is table, the line ID is 1-10, and the data field is successful, the formed query statement is used for querying the line data with the state mark of successful line data with the line ID of 1-10 in the data table.

Based on the above description of the query sample statement in three aspects, one specific query sample statement example is listed below: the query sample statement corresponding to the archive requirement "single-table conditional archive" is "select { table } where { id } =? AND { id } < =? and { create_time } =? AND { create_time } <? The bracket part in the query sample statement is a condition item, the condition item is replaced along with the change of the archiving requirement, the table is a data table ID, the ID is a row ID, and the create_time is a data field 'time'. The query sample statement is an sql statement.

In order to rapidly complete the archiving task, when the archiving task exists, query sample sentences are directly selected according to the archiving requirement of the archiving task, so that data meeting the specific archiving requirement in the database can be rapidly archived by utilizing the query sample sentences.

In order to enable a user to intuitively archive data, each archiving requirement is presented to the user in the form of an interactive interface, and when any archiving requirement is selected by the user, a query sample sentence corresponding to the selected archiving requirement is determined as a query sample sentence corresponding to the archiving requirement of an archiving task.

102. And assigning values to the condition items in the query sample statement according to the archiving condition information of the archiving task to form a query statement.

The archive condition information is archive conditions set by the user for archive tasks, and is used to define the range of data to be archive-processed.

In order to facilitate the user to set the archiving condition information, an interactive interface for collecting the archiving condition information by the user can be displayed to the user, and the data input by the user aiming at the archiving task in the interactive interface is determined to be the archiving condition information. The archiving condition information comprises the value of the condition item related to the selected query sample statement, and the value can be used for assigning the value to the condition item in the query sample statement, so that the range of data needing archiving processing in the database is limited.

Illustratively, the query sample statement is "select { table } where { id } =? AND { id } < =? and { create_time } =? AND { create_time } <? The filing condition information includes: the data table ID is "order", the row ID "1-10", the data fields "2021-01-01:05:00 to 2021-01-01:00:00 to 2021-01-12:00:00", the query statement formed after assigning the condition item in the query sample statement is "select { order } where { ID } = 1and { ID } < = 10and { create_time } = 2021-01-01 00:00:00 AND{create_time } < 2021-01-12:00).

In practical application, when the data volume related to the archiving condition information corresponding to the archiving task is large, if only one query statement is formed according to the archiving condition information, query data is slow, threads of the database are occupied for a long time, and normal use of the database is affected. Therefore, in order to avoid this, it needs to be determined whether the archiving task needs to be segmented before assigning a value to a condition item in the query sample statement according to the archiving condition information of the archiving task, and the specific process of the determining method is as follows:

step one, determining a condition item for segment judgment from the condition items of the query sample statement, and extracting a value corresponding to the condition item for segment judgment from the archive condition information.

The condition item for segment determination may be selected by a user, and the condition item for segment determination may be selected based on a service requirement, for example, it may be a row ID or a data field. Illustratively, the data field is time.

And step two, determining whether the value meets the preset value requirement.

Different conditional items for segment determination correspond to different value requirements, which can be determined based on the performance of the database, so as to ensure that the performance of the database is not affected when the database executes the query statement.

And step three, if the file task is not satisfied, splitting the file task into a plurality of segmentation tasks.

If the value corresponding to the condition item for segment judgment does not meet the preset value requirement, the method indicates that if only one query statement is formed according to the archiving condition information, query data is slow, threads of a database are occupied for a long time, normal use of the database is affected, and therefore the archiving task needs to be split into a plurality of segment tasks.

The following describes a specific splitting method for splitting an archiving task into a plurality of segmentation tasks, and the method at least comprises the following two steps: one is a segmentation rule for segmenting a value corresponding to a conditional item for determination, and an archiving task is segmented based on the segmentation rule. Illustratively, the data field is time, the segmentation rule is an increment of time, the start time is 2021-01-01:00:00, and the increment step of time is: 5 minutes, the end time is 2021-01-01 00:20:00, then the archiving task is divided into 4 segmentation tasks, for example, the data time range corresponding to the first segmentation task is 2021-01-01:00:00 to 2021-01-01:05:00, the time range of the next periodic scanning is 2021-01-01:05:00 to 2021-01-01:00:10:00, and so on. And secondly, determining the total data quantity corresponding to the archiving task according to the value corresponding to the condition item for segmentation judgment, then determining the number of segmentation tasks corresponding to the archiving task according to the preset segmentation data quantity and the total data quantity, and splitting the archiving task based on the number, wherein the segmentation data quantity is the maximum data quantity corresponding to each segment.

And step four, determining the archiving condition information corresponding to each segmentation task based on the archiving condition information.

And determining a condition item for segmentation judgment from the condition items of the query sample statement, extracting a value corresponding to the condition item for segmentation judgment from the archiving condition information, and determining archiving condition information corresponding to each segmentation task according to the value and the number of the segmentation tasks. That is, values corresponding to the condition items for segment determination are split according to the number of segment tasks, and archive condition information corresponding to each segment task is formed.

For example, the number of segmentation tasks is 4, and the values corresponding to the condition items for segmentation determination are 2021-01-01 00:00:00 and 2021-01-00:20:00, then, according to the time step of 5 minutes, the data time range corresponding to the first segmentation task is 2021-01-01:00:00 to 2021-01-00:05:00, the time range of the next periodic scan is 2021-01-00:05:00 to 2021-01-00:10:00, and so on.

And fifthly, respectively assigning values to the condition items in the query sample sentences according to the archiving condition information corresponding to the segmentation tasks to form the query sentences corresponding to the segmentation tasks.

The archive condition information corresponding to one segmentation task comprises the value of the condition item related to the query sample statement, and the condition item in the query sample statement can be assigned according to the value, so that the range of data which can be queried by one query statement and needs to be archive-processed in the database is limited.

The filing task is segmented into a plurality of segmentation tasks, each segmentation task is provided with a respective query statement, and the query statements scatter the query amount of the database, so that the filing task can be completed by utilizing the query statements of the segmentation tasks, query data are not slow when the filing task is carried out, threads of the database are not occupied for a long time, and normal use of the database is not influenced.

And step six, if the file task is satisfied, determining that the file task is implemented without splitting.

If the value corresponding to the condition item for segment judgment meets the preset value requirement, the method shows that if only one query statement is formed according to the filing condition information, query data are not slow, threads of the database are not occupied for a long time, and normal use of the database is not affected. At this time, the archiving condition information is directly used for assigning values to the condition items in the query sample statement.

103. And executing the query statement in a database corresponding to the archiving task.

The number of the databases corresponding to the archiving task can be one or a plurality of databases, and the databases all need to open the execution authority of the query statement so as to facilitate the data query by using the query statement.

The execution of query statements in the database corresponding to the archiving task may be performed according to the following principles:

first, if the archiving task is not split, the query statement is directly executed in the database corresponding to the archiving task.

Secondly, the archiving task is not split, in the process of executing the query statement in the database, the query statement can be executed according to a preset pause period, namely, the query statement is stopped to be executed when the query statement reaches the pause period, and the query statement is continuously executed after the pause period is ended. This reduces the impact of query operations on database performance when conducting data queries.

Third, if the archiving task is split into a plurality of segmentation tasks, setting corresponding execution time for the query statement of each segmentation task, and executing each query statement in the database corresponding to the archiving task based on the execution time of each query statement.

Fourth, if an archiving task is split into a plurality of segmentation tasks, determining an execution sequence of query sentences of each segmentation task, and executing each query sentence in a database corresponding to the archiving task based on the execution sequence of each query sentence. In the method, because each sectional task is sequentially executed, the next sectional task is executed after one sectional task is ended, so that missing sectional tasks are avoided, and the integrity of the whole archiving task is ensured.

104. And archiving the data queried according to the query statement.

The process of archiving the data queried by the query statement is the process of extracting the data meeting certain conditions from the database and archiving the data into the appointed database. This approach can strip the data from the original database that is rarely queried and used, and archive this portion of data with cheaper storage.

The concrete process of archiving the data queried by the query statement is as follows: and determining the archive data in the data queried by the query statement, generating a corresponding primary key for each queried archive data, combining each archive data and the corresponding primary key into a key value pair, and archiving the key value pair to a specified database. The archive data may be the full data queried by the query statement, or may be part of the data queried by the query statement, which may be specifically determined according to the service requirement. The specific type of the specified database may be determined based on the service requirement, and is not specifically limited in this embodiment. The specified database may be, for example, hbase database or tidb database, which is simple in data storage and supports mass data storage.

In order to enrich the subsequent query mode for the archive data, after archiving the key value pairs to the appointed database, a corresponding secondary key can be generated for the specific data in each archive data, the association between each secondary key and the corresponding primary key of the archive data is established, and the specific data in each archive data and the corresponding secondary key are combined into the key value pairs. The specific data herein is data having a secondary index generation requirement that can be specified by a user based on their query requirement for archived data.

When the archive data is queried and used later, the method for setting the secondary key not only can query the archive data based on the primary key, but also can query the archive data by using specific data, wherein the secondary key can be queried according to the specific data, and then the archive data corresponding to the primary key can be queried according to the corresponding relation between the secondary key and the primary key. It can be seen that this method of archiving archive data enables multidimensional queries to be performed on the archived archive data.

In order to enable the setting of the secondary key to meet the use requirement of the user, before generating the corresponding secondary key for the specific data in each archive data, a plurality of secondary key generation strategies of the archive data are displayed, and according to the secondary key generation strategy selected by the user, the corresponding secondary key is generated for the specific data in each archive data. Wherein the secondary key generation policy is to determine specific data from the archive data for use in generating secondary keys.

In order to save the time for setting the secondary index by the user, a plurality of secondary key generation strategies are directly provided for the user to select. The secondary key generation policy is used herein to determine specific data from the archived data for use in generating secondary keys, i.e., the secondary key generation policy is used to inform the user what specified data in the archived data may have secondary keys present. The data in the archive data, on which the secondary key can be set, can be all designated data.

For example, if the archive data includes age data and date of birth data, the date of birth and the age data may be designated data. The two-level key generation strategy shown includes two. One is to generate a secondary key for age data, and the other is to generate a secondary key brithday for birthday data. When the user selects the secondary key generation strategy corresponding to the birth date data, a secondary key brithday is generated for the birth date data.

According to the data archiving method provided by the invention, when an archiving task exists, a query sample statement corresponding to the archiving requirement of the archiving task is determined, and the condition items in the query sample statement are assigned according to the archiving condition information of the archiving task to form the query statement. And executing the query statement in the database corresponding to the archiving task, and archiving the data queried by the query statement. Therefore, according to the scheme provided by the embodiment of the invention, the query statement for querying the archived data is set according to the archiving requirement and the archiving condition information, so that the data meeting the specific archiving requirement in the database can be archived, and the data meeting the specific condition in the database is separated from the database, so that a storage space is reserved for the database.

Further, according to the method shown in fig. 1, another embodiment of the present invention further provides a data archiving method, as shown in fig. 2, where the method mainly includes:

201. a query sample statement corresponding to an archiving requirement of an archiving task is determined, wherein different archiving requirements correspond to different query sample statements, and conditional items are set in the query sample statements.

202. And assigning values to the condition items in the query sample statement according to the archiving condition information of the archiving task to form a query statement.

203. And executing the query statement in a database corresponding to the archiving task.

204. And archiving the data queried according to the query statement.

205. And determining inspection sample sentences corresponding to the archiving requirements, wherein different archiving requirements correspond to different inspection sample sentences, and the inspection sample sentences are provided with inspection items.

In order to ensure the quality of the archived data, it is necessary to set an inspection sample sentence, and to form an inspection sentence from the inspection sample sentence to inspect the archived data. The examination sample statement is specifically described in terms of:

first, to meet different filing requirements, multiple inspection sample sentences need to be set, so that different filing requirements correspond to different inspection sample sentences. Illustratively, there is an archival need as follows: if the multi-table data aggregation archive, the single-table condition archive and the multi-table association archive are stored, the multi-table data aggregation archive has corresponding check sample sentences, the single-table condition archive has corresponding check sample sentences and the multi-table association archive has corresponding check sample sentences. It should be noted that, to simplify the configuration step, the check sample statement may multiplex the query sample statement.

Second, since the different types of databases have restrictions on the types of sentences, detecting the types of sample sentences should meet the requirements of the databases. Illustratively, if the database requires the use of sql statements, then the sample statement is checked for the type of sql statement. Illustratively, if the database requires the use of an oracle sentence, the type of the sample sentence is checked as the oracle sentence.

Thirdly, in order to check only the archived data satisfying the specific condition, the check sample statement in this embodiment is provided with a check item, and by assigning a value to the check item, a check statement is formed that performs a query operation only on the data satisfying the specific condition in the database. The number and type of inspection items may be determined based on specific business requirements, and are not particularly limited in this embodiment. The inspection items may include, but are not limited to, at least one of: data table ID, row ID, data field (e.g., time, name, etc.).

Based on the above description of the query sample statement in three aspects, one specific example of a check sample statement is listed below: the check sample statement corresponding to the archive requirement of "single-table conditional archive" is "select { table } where { id } =? AND { id } < =? and { create_time } =? AND { create_time } <? The bracket part in the test sample statement is the test item, the table is the data table ID, the ID is the row ID, and the create_time is the data field "time". The check sample statement is an sql statement.

In order to rapidly complete the archival data inspection, when an archival task exists, after a query sample statement is directly selected according to the archival requirement of the archival task, a corresponding inspection sample statement is selected so as to utilize the query sample statement to archive the data meeting the specific archival requirement in the database, and then utilize the inspection sample statement to inspect the archived data in time.

206. And assigning values to the inspection items in the inspection sample sentences according to the archiving condition information of the archiving task to form the inspection sentences.

The forming process of the check statement is substantially the same as the forming process of the query statement in the step 102, and will not be described herein.

207. And executing the check statement in a database corresponding to the archiving task.

The process of executing the check statement is substantially the same as the process of executing the query statement in step 103, and will not be described here.

208. And checking the data after the archiving processing according to the data queried by the check statement.

The process of checking the archived data according to the data queried by the check statement comprises the following steps: and comparing the data queried by the check statement with the data after archiving processing. And carrying out supplementary record archiving on the data which does not exist in the data after archiving processing and exists in the data queried by the check statement.

The above-mentioned two data comparison method relates to the storage mode of the data after the filing process, so the comparison method includes two kinds of following:

first, the archive data is stored only in the form of key value pairs, and after the check data is queried using the check statement, a corresponding primary key is generated for each piece of queried check data, and the rule of generating the primary key is the same as the rule of generating the corresponding primary key for the archive data. And combining each check data and the corresponding primary key thereof into a key value pair, and archiving the key value pair to a specified database. The inspection data and the archive data are compared by comparing the primary key of the inspection data and the archive data.

If the primary key of the check data does not exist in the appointed database where the archive data is located, the key value pair corresponding to the primary key is complemented and recorded into the appointed database where the archive data is located.

If the main key of the checking data exists in the appointed database where the archiving data exists, the data corresponding to the main key is successfully archived.

Second, the archive data is stored in the form of a secondary index, and after the inspection data is queried by using the inspection statement, a corresponding primary key is generated for each piece of queried inspection data, and the rule for generating the primary key is the same as the rule for generating the corresponding primary key for the archive data. And combining each check data and the corresponding primary key thereof into a key value pair, and archiving the key value pair to a specified database. Generating corresponding secondary keys for specific data in each inspection data, establishing the association between each secondary key and a main key of the corresponding inspection data, and combining the specific data in each inspection data and the corresponding secondary key into key value pairs. The generation rule of the secondary key is the same as the generation rule of the archive data.

If the primary key of the checking data exists in the appointed database where the archived data exists, checking whether the primary key exists a corresponding secondary key or not; if the secondary key exists, checking whether the secondary key exists in the appointed database where the archived data exists. If the secondary key exists in the appointed database, the data archiving is successful. And if the secondary key does not exist in the appointed database, the secondary key is complemented and recorded into the appointed database where the archive data are located.

209. And determining a deletion sample statement corresponding to the archiving requirement, wherein different archiving requirements correspond to different deletion sample statements, and deletion items are arranged in the deletion sample statements.

In order to clear the data in the database in time and make a storage space for the database, a deleting sample sentence is required to be set, and the deleting sample sentence is used for forming a deleting sentence to delete the data in the database. The deletion of sample statements is described in detail in terms of:

first, to meet different filing requirements, multiple delete sample sentences need to be set, so that different filing requirements correspond to different delete sample sentences. Illustratively, there is an archival need as follows: the multi-table data aggregation archive, the single-table condition archive and the multi-table associated archive are archived, and then the multi-table data aggregation archive has corresponding deleted sample sentences, the single-table condition archive has corresponding deleted sample sentences and the multi-table associated archive has corresponding deleted sample sentences.

Second, since the different types of databases have restrictions on the types of sentences, deleting the types of sample sentences should satisfy the requirements of the databases. Illustratively, the database requires the use of sql statements, and the type of delete sample statement is an sql statement. Illustratively, if the database requires the use of an oracle statement, the type of the delete sample statement is an oracle statement.

Thirdly, in order to delete only the data that satisfies the successful archiving, the delete sample statement in this embodiment is provided with a delete item, and by assigning a value to the delete item, a delete statement is formed that performs delete operation only on the data that has been successfully archived in the database. Deleting an item may include, but is not limited to, at least one of: data table ID, row ID, data field (e.g., time, name, etc.).

Based on the above description of the delete sample statement, one specific delete sample statement example is listed below: the delete sample statement corresponding to the archive requirement "single table conditional archive" is "delete { table } window { id } =? AND { id } < =? and { create_time } =? AND { create_time } <? The bracket part in the deleted sample statement is a deleted item, the table is a data table ID, the ID is a row ID, and the create_time is a data field "time". The delete sample statement is an sql statement.

210. And assigning a value to the deletion item in the deletion sample statement based on the successfully archived data to form a deletion statement.

In order to purposefully complete the deletion of the successfully archived data, the assignment of the deletion item in the deletion sample statement is performed based on the successfully archived data, and the specific process is as follows: and determining archive condition information corresponding to the archive data which is successfully archived, and then using the determined archive condition information to assign a value to the deletion item in the deletion sample sentence to form a deletion sentence.

The precondition for forming the delete statement is: the timed poll automatically salvages the archive tasks that have been written successfully, checked successfully, and automatically deleted. Judging whether a preset deleting time point is reached, if so, forming a deleting statement, and cleaning the original data which is already filed in the database; if not, the method is exited and the next polling is waited.

211. And executing the deleting statement in the database corresponding to the archiving task.

After the deleting statement is executed, the original data which is already executed and archived in the database is cleaned, so that the data in the database is cleaned in time, and a storage space is reserved for the database.

Further, the above-mentioned query sample statement, check sample statement, and delete sample statement of FIGS. 1-2 are described below in terms of three specific archival requirements.

Example 1, the archive requirement is single-table conditional archive, i.e., archive only data in a single table that satisfies a particular condition.

Archival requirements: the field status in the archive table is "successful" data.

Query sample statement select from { order } where { id } =? AND { id } < =? and { create_time } =? AND { create_time } <? and status= 'successful'

Test sample statement select from { order } where { id } =? AND { id } < =? and { create_time } =? AND { create_time } <? and status= 'successful'

Delete sample statement delete from { order } where { id } =? AND { id } < =? and { create_time } =? AND { create_time } <? and status= 'successful'

Wherein the information that is bracketed is a conditional item that is related to the archive requirements that are to be changed accordingly.

Select { order }, characterize the acquisition data from the order table, where is followed by the acquisition data conditions. delete from { order }, characterize delete data from order table, where is followed by acquire data condition.

And { id } =? AND { id } < =? Characterizing the start ID and end ID, limiting a fixed one ID range? The assignment section changes to a specific value after assigning a value to a condition item.

And { create_time } =? AND { create_time } <? Characterizing the start time and end time parameters, limiting the time range of the query, wherein the create_time field can be replaced with other time fields, such as update_time, indicating archiving data according to the update time dimension.

And status= 'successful', the query state is successful data.

Example 2: multi-table data aggregation archive

Archival requirements: the data corresponds to one, and one order corresponds to one order detail record. Associating the order table and the order_detail table according to the order Id, merging the data of the two tables, and archiving the data according to the creation time dimension, wherein the order-id corresponds to the order_detail order Id

Query sample statement select from order ord left join order _reduced_reduced on ord.id=ord_reduced.ord_id where { ord.id } > =? AND { ord.id } < =? and { ord. Create_time } =? AND { ord. Create_time } <?

Check sample statement select from order ord left join order _reduced_reduced on ord.id=ord_reduced.ord_id where { ord.id } > =? AND { ord.id } < =? and { ord. Create_time } =? AND { ord. Create_time } <?

select from order left join order _reduced_reduced on-order.id=ord_reduced, ordeid queries the order table and the data of order_reduced according to the order id association

Delete sample statement [1]select id from order ord left join order_detail ord_detail on ord.id =ord_detail.orderid where { ord.id } ] =? AND { ord.id } < =? and { ord. Create_time } =? AND { ord. Create_time } <?

[2]delete from{order}where{id}in(${primaryKeys})

[3]delete from{order_detail}where{orderId}in(${primaryKeys})

Wherein, the configuration rule of the sample sentence is deleted: the first statement must be to query the association key set and then perform a purge from the association key to multiple other tables. [2] in the delete sample statement characterizes the data already archived according to the primary key id delete order table, delete from { order } characterizes the data of which table to delete, where { id } in ($ { primary keys }) characterizes the deletion of data according to the id range. The [3] in the delete statement characterizes the data already archived according to the associated orderId key delete order_detail table, delete from { order_detail } characterizes which table to delete, and where { orderId } in ($ { primary keys }) characterizes deleting data according to the id range.

Example 3: multi-table data association archiving

Archival requirements: the data corresponding relation one to management, one order corresponds to a plurality of pieces of payment record information to archive order table data, and a plurality of pieces of payment records (pay_details) corresponding to order numbers (ids) are archived together, and order-ids correspond to order_details orderId

Query sample statement [1]select{id}from order where{id } ] =? AND { id } < =? and { create_time } =? AND { create_time } <?

[2]select*from order where{id}in(${primaryKeys})

[3]select*from pay_detail where{orderId}in(${primaryKeys})

……

Check sample statement [1]select{id}from order where{id } ] =? AND { id } < =? and { create_time } =? AND { create_time } <?

[2]select*from{order}where{id}in(${primaryKeys})

[3]select*from{order_detail}where{orderId}in(${primaryKeys})

……

Wherein, the check sample sentence can be queried and all the data of the association table of the current association key are filed together, and the sentence configuration rule of the check sample sentence is as follows: the first statement must be a set of query association keys, and then query for associated archived data from a number of other in-table based on the association keys. Checking that [2] in the sample statement represents data of querying an order table according to a primary key id, selecting { order } represents data of which table to query, and where { id } in ($ { primary keys }) represents querying order data according to an associated key id range. Checking the sample statement for [3] to characterize the data of the order_detail table queried according to the associated orderId key, selecting from { order_detail } to characterize the data of which table to query, and where { order id } in ($ { primary keys }) to characterize the order_detail data queried according to the associated key order id range.

Delete sample statement [1]select{id}from order where{id } ] =? AND { id } < =? and { create_time } =? AND { create_time } <?

[2]delete from order where{id}in(${primaryKeys})

[3]delete from pay_detail where{orderId}in(${primaryKeys})

……

The deleting sample sentence can delete the data of all the association tables of the current association key, and the sentence configuration rule of the deleting sample sentence is as follows: the first statement must be a set of query association keys and then delete the associated archived data from multiple other tables based on the association keys. And [2] in the delete sample statement represents deleting data of an order table according to a primary key id, delete from { order } represents deleting data of which table, and where { id } in ($ { primary keys }) represents deleting order data according to an associated key id range. The [3] in the delete sample statement characterizes the data of the delete order_detail table according to the associated orderId key, delete from { order_detail } characterizes the data of which table to delete, and the where { order id } in ($ { primary keys }) characterizes the delete order_detail data according to the associated key orderId range.

Further, according to the above method embodiment, another embodiment of the present invention further provides a data archiving device, as shown in fig. 3, including:

a selecting unit 31, configured to determine query sample sentences corresponding to archiving requirements of an archiving task, where different archiving requirements correspond to different query sample sentences, and the query sample sentences are provided with condition items, where the condition items are used to define archiving of data that meets corresponding conditions;

A generating unit 32, configured to assign values to the condition items in the query sample statement according to the archiving condition information of the archiving task, so as to form a query statement;

an execution unit 33, configured to execute the query statement in a database corresponding to the archiving task;

an archiving unit 34, configured to archive the data queried according to the query statement.

When an archiving task exists, the data archiving device provided by the invention determines the query sample statement corresponding to the archiving requirement of the archiving task, and assigns a value to a condition item in the query sample statement according to the archiving condition information of the archiving task to form the query statement. And executing the query statement in the database corresponding to the archiving task, and archiving the return data queried by the query statement. Therefore, according to the scheme provided by the embodiment of the invention, the query statement for querying the archived data is set according to the archiving requirement and the archiving condition information, so that the data meeting the specific archiving requirement in the database can be archived, and the data meeting the specific condition in the database is separated from the database, so that a storage space is reserved for the database.

Alternatively, as shown in fig. 4, the generating unit 32 includes:

A splitting module 321, configured to split the archiving task into a plurality of segmentation tasks;

a determining module 322, configured to determine archive condition information corresponding to each of the segmentation tasks based on the archive condition information;

the generating module 323 is configured to assign values to the condition items in the query sample statement according to the archive condition information corresponding to each segment task, so as to form a query statement corresponding to each segment task.

Optionally, as shown in fig. 4, the splitting module 321 is configured to determine a condition item for segmentation decision from the condition items of the query sample statement; extracting a value corresponding to a condition item for segment judgment from the archiving condition information; determining whether the value meets a preset value requirement; if not, splitting the archiving task into a plurality of segmentation tasks.

Optionally, as shown in fig. 4, the splitting module 321 is specifically configured to determine a condition item selected by a user as a condition item for segmentation determination.

Optionally, as shown in fig. 4, a determining module 322 is configured to determine a condition item for segment determination from condition items of the query sample statement; extracting a value corresponding to a condition item for segment judgment from the archiving condition information; and determining the archiving condition information corresponding to each segmentation task according to the extracted value and the number of the segmentation tasks.

Optionally, as shown in fig. 4, the determining module 322 is specifically configured to determine the condition item selected by the user as the condition item for segmentation determination.

Optionally, as shown in fig. 4, the archiving unit 34 is configured to set a corresponding execution time for each query statement of the segmented task; and executing each query statement in a database corresponding to the archiving task based on the execution time of each query statement. Or, the archiving unit 34 is configured to determine an execution order of the query statement of each of the segmented tasks; and executing each query statement in a database corresponding to the archiving task based on the execution sequence of each query statement.

Optionally, as shown in fig. 4, the apparatus further includes:

an inspection unit 35, configured to determine inspection sample sentences corresponding to the archiving requirement after the archiving unit 34 performs archiving processing on the archived data queried by the query sentence, where different archiving requirements correspond to different inspection sample sentences, and the inspection sample sentences are provided with inspection items; assigning values to the inspection items in the inspection sample sentences according to the archiving condition information of the archiving task to form the inspection sentences; executing the check statement in a database corresponding to the archiving task; and checking the data after the archiving processing according to the data queried by the check statement.

Optionally, as shown in fig. 4, the checking unit 35 is specifically configured to compare the data queried by the check statement with the data after archiving processing; and carrying out supplementary record archiving on the data which does not exist in the data after archiving processing and exists in the data queried by the check statement.

Optionally, as shown in fig. 4, an archiving unit 34 is configured to determine archived data in the data queried by the query statement; generating a corresponding primary key for each piece of archive data; combining each archive data and its corresponding primary key into a key value pair; and archiving the key value pairs to a specified database.

Optionally, as shown in fig. 4, the archiving unit 34 is further configured to generate, after archiving the key value pair to the specified database, a corresponding secondary key for the specific data in each archived data; establishing the association between each secondary key and the corresponding primary key of the archive data; and combining the specific data in each archive data and the corresponding secondary keys thereof into key value pairs.

Optionally, as shown in fig. 4, the archiving unit 34 is further configured to, before generating a corresponding secondary key for the specific data in each piece of archived data, expose a plurality of secondary key generating policies of the archived data, where the secondary key generating policies are used to determine the specific data from the archived data to be used to generate the secondary key; and generating a corresponding secondary key for specific data in the archive data according to a secondary key generation strategy selected by a user.

Optionally, as shown in fig. 4, the apparatus further includes:

a deletion unit 36, configured to determine a deletion sample sentence corresponding to the archiving requirement after the archiving unit 34 performs archiving processing on the archive data queried by the query sentence, where different archiving requirements correspond to different deletion sample sentences, and a deletion item is set in the deletion sample sentence; assigning a value to the deletion item in the deletion sample statement based on the successfully archived data to form a deletion statement; and executing the deleting statement in the database corresponding to the archiving task.

Optionally, as shown in fig. 4, the apparatus further includes:

and a determining unit 37, configured to determine, as the archiving condition information, data for the archiving task input by the user in the interactive interface.

In the data archiving device provided by the embodiment of the present invention, a detailed description of a method adopted in the operation process of each functional module may refer to a detailed description of a corresponding method of the method embodiments of fig. 1-2, which is not repeated herein.

Further, according to the above embodiment, another embodiment of the present invention further provides a computer readable storage medium, where the storage medium includes a stored program, and when the program runs, the device where the storage medium is controlled to execute the data archiving method described in fig. 1 or fig. 2.

Further, according to the above embodiment, another embodiment of the present invention further provides a storage management device, including:

a memory for storing a program;

a processor, coupled to the memory, for executing the program to perform the data archiving method described in fig. 1 or fig. 2.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

It will be appreciated that the relevant features of the methods and apparatus described above may be referenced to one another. In addition, the "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent the merits and merits of the embodiments.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in the methods, apparatus and framework of operation of the deep neural network model according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims

1. A method of archiving data, the method comprising:

determining query sample sentences corresponding to the archiving requirements of an archiving task, wherein different archiving requirements correspond to different query sample sentences, and conditional items are arranged in the query sample sentences and are used for limiting archiving of data meeting corresponding conditions; the archiving requirement is any one of the following: multi-table data associated archiving, multi-table data aggregation archiving and single-table condition archiving; the corresponding query sample statement of the multi-table data associated archiving is a query sample statement for the multi-table data associated archiving; the corresponding query sample statement of the multi-table data aggregation archive is a query sample statement used for the multi-table data aggregation archive; the query sample sentences corresponding to the single-table condition archiving are the query sample sentences used for the single-table condition archiving;

Splitting the archiving task into a plurality of segmentation tasks; splitting the archiving task into a plurality of segmentation tasks is performed when the value corresponding to the condition item for segmentation judgment does not meet the preset value requirement; the value requirement is determined based on the performance of the database;

based on the archiving condition information of the archiving task, determining the archiving condition information corresponding to each segmentation task;

according to the values of the condition items included in the filing condition information corresponding to each segmented task, respectively assigning values to the condition items in the query sample statement to form a query statement corresponding to each segmented task;

sequentially executing each inquiry statement in a database corresponding to the archiving task according to the execution time or the execution sequence of each inquiry statement;

and archiving the data queried according to the query statement.

2. The method of claim 1, wherein splitting the archiving task into a plurality of segmentation tasks comprises:

determining a condition item for segmentation judgment from the condition items of the query sample statement;

extracting a value corresponding to a condition item for segment judgment from the archiving condition information;

Determining whether the value meets a preset value requirement;

if not, splitting the archiving task into a plurality of segmentation tasks.

3. The method of claim 1, wherein determining archive condition information corresponding to each of the segmentation tasks based on the archive condition information comprises:

and determining the archiving condition information corresponding to each segmentation task according to the extracted value and the number of the segmentation tasks.

4. A method according to claim 2 or 3, wherein determining conditional items for segment determination from conditional items of the query sample statement comprises:

the condition item selected by the user is determined as the condition item for the segment determination.

5. A method according to any one of claims 1-3, wherein sequentially executing the query statement in the database corresponding to the archiving task according to the execution time or execution order of each query statement comprises:

Setting corresponding execution time for each inquiry statement of the segmentation task;

executing each query statement in a database corresponding to the archiving task based on the execution time of each query statement;

or alternatively, the first and second heat exchangers may be,

executing the query statement in a database corresponding to the archiving task, including:

determining the execution sequence of the query statement of each segmented task;

and executing each query statement in a database corresponding to the archiving task based on the execution sequence of each query statement.

6. The method of claim 1, wherein after archiving the data queried according to the query statement, the method further comprises:

determining an inspection sample statement corresponding to the archiving requirement, wherein different archiving requirements correspond to different inspection sample statements, and the inspection sample statement is provided with an inspection item;

assigning values to the inspection items in the inspection sample sentences according to the archiving condition information of the archiving task to form the inspection sentences;

executing the check statement in a database corresponding to the archiving task;

and checking the data after the archiving processing according to the data queried by the check statement.

7. The method of claim 6, wherein the checking of the archived data based on the check data queried by the check statement comprises:

comparing the data queried by the check statement with the data after archiving processing;

and carrying out supplementary record archiving on the data which does not exist in the data after archiving processing and exists in the data queried by the check statement.

8. The method according to any one of claims 1-3, 6-7, wherein archiving the data queried according to the query statement comprises:

determining archive data in the data queried by the query statement;

generating a corresponding primary key for each piece of archive data;

combining each archive data and its corresponding primary key into a key value pair;

and archiving the key value pairs to a specified database.

9. The method of claim 8, wherein after archiving the key-value pairs to a specified database, the method further comprises:

generating a corresponding secondary key for specific data in the archive data, wherein the specific data is data with a secondary index generation requirement;

establishing the association between each secondary key and the corresponding primary key of the archive data;

And combining the specific data in each archive data and the corresponding secondary keys thereof into key value pairs.

10. The method of claim 9, wherein prior to generating a corresponding secondary key for particular data in each of the archived data, the method further comprises:

a secondary key generation policy exposing a plurality of the archive data, wherein the secondary key generation policy is used to determine specific data from the archive data for generating secondary keys;

and generating a corresponding secondary key for specific data in the archive data according to a secondary key generation strategy selected by a user.

11. The method of any of claims 1-3, 6-7, wherein after archiving the data queried according to the query statement, the method further comprises:

determining a deletion sample statement corresponding to the archiving requirement, wherein different archiving requirements correspond to different deletion sample statements, and deletion items are arranged in the deletion sample statement;

assigning a value to the deletion item in the deletion sample statement based on the successfully archived data to form a deletion statement;

and executing the deleting statement in the database corresponding to the archiving task.

12. The method according to any one of claims 1-3, 6-7, wherein the method further comprises:

and determining the data input by the user in the interactive interface for the archiving task as the archiving condition information.

13. A data archiving apparatus, the apparatus comprising:

a selecting unit, configured to determine query sample sentences corresponding to archiving requirements of an archiving task, where different archiving requirements correspond to different query sample sentences, and conditional items are set in the query sample sentences, and the conditional items are used to define archiving of data meeting corresponding conditions; the archiving requirement is any one of the following: multi-table data associated archiving, multi-table data aggregation archiving and single-table condition archiving; the corresponding query sample statement of the multi-table data associated archiving is a query sample statement for the multi-table data associated archiving; the corresponding query sample statement of the multi-table data aggregation archive is a query sample statement used for the multi-table data aggregation archive; the query sample sentences corresponding to the single-table condition archiving are the query sample sentences used for the single-table condition archiving;

a generation unit including: the splitting module is used for splitting the archiving task into a plurality of segmentation tasks; splitting the archiving task into a plurality of segmentation tasks is performed when the value corresponding to the condition item for segmentation judgment does not meet the preset value requirement; the value requirement is determined based on the performance of the database; the determining module is used for determining the archiving condition information corresponding to each segmentation task based on the archiving condition information of the archiving task; the generating module is used for respectively assigning values to the condition items in the query sample sentences according to the values of the condition items included in the archiving condition information corresponding to each segmented task to form the query sentences corresponding to each segmented task;

The execution unit is used for executing the query sentences in the database corresponding to the archiving task in sequence according to the execution time or the execution sequence of each query sentence;

14. A computer-readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the data archiving method of any one of claims 1 to 12.

15. A storage management device, the storage management device comprising:

a memory for storing a program;

a processor coupled to the memory for running the program to perform the data archiving method of any one of claims 1 to 12.