WO2019169764A1 - 电子装置、数据链式归档的方法、系统及存储介质 - Google Patents

电子装置、数据链式归档的方法、系统及存储介质 Download PDF

Info

Publication number
WO2019169764A1
WO2019169764A1 PCT/CN2018/089458 CN2018089458W WO2019169764A1 WO 2019169764 A1 WO2019169764 A1 WO 2019169764A1 CN 2018089458 W CN2018089458 W CN 2018089458W WO 2019169764 A1 WO2019169764 A1 WO 2019169764A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
partition
temporary table
database
partitions
Prior art date
Application number
PCT/CN2018/089458
Other languages
English (en)
French (fr)
Inventor
余明浩
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to SG11202002842YA priority Critical patent/SG11202002842YA/en
Priority to US16/632,552 priority patent/US11106649B2/en
Publication of WO2019169764A1 publication Critical patent/WO2019169764A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Definitions

  • the present application relates to the field of communications technologies, and in particular, to an electronic device, a data chain archiving method, system, and storage medium.
  • the purpose of the present application is to provide an electronic device, a data chain archiving method, system and storage medium, which aim to quickly and efficiently archive data in a database.
  • the present application provides an electronic device including a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing The system implements the following steps when executed by the processor:
  • the first writing step divides the original data table in the database into a preset first number of partitions, and completes each partition by using a preset second number of parallel threads and according to a predetermined first processing manner.
  • the data is written into the pre-established first temporary table, and the first quantity is greater than or equal to the second quantity;
  • a second writing step after the completed data in each of the partitions is all written into the first temporary table, the data having the upper and lower levels in the first temporary table is obtained by using a single thread, and the upper and lower levels are The associated data is written into a pre-established second temporary table;
  • the present application further provides a data chain archiving method, and the data chain archiving method includes:
  • the first quantity is greater than or equal to the second quantity
  • the data in each partition in the second temporary table is archived into a pre-established archive table in the database, and the data in the original data table corresponding to the data in the archive table is deleted.
  • the third quantity is greater than or equal to the fourth quantity.
  • the present application further provides a data chain archiving system, the data chain archiving system comprising:
  • a first writing module configured to divide the original data table in the database into a preset first number of partitions, and use a preset second number of parallel threads and according to a predetermined first processing manner, The completed data is written into the pre-established first temporary table, and the first quantity is greater than or equal to the second quantity;
  • a second write module configured to acquire, by using a single thread, data having a lower-level association in the first temporary table, after the completed data in each partition is all written into the first temporary table, and the The data associated with the upper and lower levels is written into the pre-established second temporary table;
  • An archiving module configured to divide the second temporary table into a preset third number of partitions after using a single thread to complete processing all data in the first temporary table, using a preset fourth quantity Parallel threads and in accordance with a predetermined second processing manner, data in each partition in the second temporary table is archived into a pre-established archive table in the database, and the corresponding data in the original table is written into the archive table Data deletion, the third quantity being greater than or equal to the fourth quantity.
  • the present application also provides a computer readable storage medium having stored thereon a processing system that, when executed by a processor, implements the steps of the method described above.
  • the beneficial effects of the present application are as follows: when the data is archived in the database, the file is archived by means of chain multi-threading, including three-step operation, and the next step can be performed after each step of the operation is completed, in the first step. Partition the original data table and use multiple parallel threads to write data to the temporary table. In the second step, use a single thread to filter out the eligible data, and in the third step, partition the temporary table and Archiving data into the database's archive table using multiple parallel threads enables fast and efficient archiving of data in the database to maintain database system performance stability.
  • FIG. 1 is a schematic diagram of a hardware architecture of an embodiment of an electronic device according to the present application.
  • FIG. 2 is a diagram showing an example of an embodiment of parent node data in a data original table of a database
  • FIG. 3 is a diagram showing an example of an embodiment of child node data under the parent node data shown in FIG. 2;
  • FIG. 4 is a schematic flowchart of a method for data chain archiving according to an embodiment of the present application.
  • the electronic device 1 is an apparatus capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the electronic device 1 may include, but is not limited to, a memory 11 communicably connected to each other through a system bus, a processor 12, and a network interface 13, and the memory 11 stores a processing system operable on the processor 12. It is to be noted that Figure 1 only shows the electronic device 1 having the components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the electronic device 1;
  • the readable storage medium may be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM).
  • a non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1; in other embodiments, the non-volatile storage medium may also be external to the electronic device 1.
  • a storage device such as a plug-in hard disk equipped with an electronic device 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, or the like.
  • the readable storage medium of the memory 11 is generally used to store an operating system and various types of application software installed in the electronic device 1, such as program code for storing a processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the electronic device 1, such as performing control and processing related to data interaction or communication with other devices.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running a processing system or the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application;
  • the at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
  • the electronic device of the present application is installed with a large database, such as an Oracle database, a mysql database, etc., and a large amount of data is stored in the original data table of the database, and the data is archived in order to make the original data table without a large amount of data backlog.
  • a large database such as an Oracle database, a mysql database, etc.
  • the chained multi-threading method is used for archiving.
  • the chain refers to the multi-step operation. After each step is completed, the next step can be performed: in the first step, the original data table is partitioned and adopted.
  • the parallel thread method quickly writes the completed data into the temporary table.
  • the single thread is used to further filter out the data conforming to the predetermined rule, and in the third step, the temporary table is performed. Partition and use multiple parallel threads to quickly archive data. This application can quickly and efficiently archive data to ensure the stability of data in the database system.
  • the first writing step divides the original data table in the database into a preset first number of partitions, and completes each partition by using a preset second number of parallel threads and according to a predetermined first processing manner.
  • the data is written into the pre-established first temporary table, and the first quantity is greater than or equal to the second quantity;
  • the data original table can be divided into 32 partitions, and the completed data in each partition is written into the first by using 8 parallel threads and processing the data in the 4 partitions according to each thread.
  • the data original table can be divided into other numbers of partitions, and the partitioned data is processed by other parallel threads, as long as the number of partitions is greater than or equal to the number of parallel threads.
  • the predetermined first processing mode is to process the data of at least one partition for each parallel thread, so that there will be no idle threads, in order to write the completed data in each partition into the pre-established first temporary table more quickly.
  • each parallel thread processes the same number of partitions.
  • each piece of data in the original data table it includes a corresponding process id, an identifier for processing completion, a superior process id, a partition number, and other fields, as shown in Table 1 below:
  • the data is fetched one by one, and then the identifier of whether the processing is completed in the data is obtained. If the identifier of the processing completion is the completion identifier, the data may be The data is completely written in the first temporary table. If the identifier of the completed processing is an incomplete identifier, the data is retained in the original data table. In the above table 1, for the data whose process id is "parentProcessId", whether the flag of the processing completion is "Yes", the data is the completed data, and is written in the first temporary table.
  • the step of dividing the data original table in the database into a preset first number of partitions includes: establishing a partition function, determining a boundary value of the partition and a partition basis column based on the first quantity And inputting the determined boundary value and the partition basis into the partition function, and partitioning the original data table based on the partition function that inputs the boundary value and the partition according to the column.
  • the partition function is an independent object in the database, which maps the rows of the original data table to a set of partitions. When the partition function is created, the boundary value of the data partition and the partition basis column are executed, and the partition function is executed to The original table is partitioned.
  • a second writing step after the completed data in each of the partitions is all written into the first temporary table, the data having the upper and lower levels in the first temporary table is obtained by using a single thread, and the upper and lower levels are The associated data is written into a pre-established second temporary table;
  • the write operation of this step is performed.
  • a lot of data in the database has an associated relationship in the process, that is, there is a relationship between the upper and lower levels. If the data is broken, the specific meaning is lost. If the multi-thread processing method is adopted, there may be an association.
  • the data is processed into different threads and the data is broken up. Because the data cannot be broken, the data in the first temporary table is preprocessed by a single thread in the writing operation of the step, and the data with the upper and lower levels associated is written into the pre-established second temporary table. Ensure that the associated data is not broken up.
  • the second writing step specifically includes:
  • each piece of data in the first temporary table Using a single thread to obtain a process id identifier of each piece of data in the first temporary table, analyzing whether each piece of data has parent node data and child node data associated with the parent node data; if there is parent node data and the parent node If the data of the parent node is associated with the data of the parent node, the processing identifier of the data of the parent node is obtained; if the processing identifier of the data of the parent node is the completed identifier, the data is written as the data with the association of the upper and lower levels into the second temporary table. in.
  • the parent node data is, for example, the survey in FIG. 2, the entry adjustment condition check, the adjustment, the quality check, and the payment.
  • the "into the adjustment condition check" is used as the parent node data, and the corresponding child node data is entered into the adjustment check rule, and the waiting for the adjustment condition is satisfied.
  • each parent node is assigned a parent node processing ID (parentProcessId), and the child node under the parent node is assigned a child node processing ID (sonProcessId), and the data under the parent node and its corresponding child node are
  • parentProcessId parentProcessId
  • childProcessId child node processing ID
  • the data is data with a relationship between the upper and lower levels.
  • Table 2 shows that the two pieces of data are completed data, and enter the first temporary table through the first writing step, and the second writing step analyzes the two pieces of data, first obtaining the data through the process ID, and the existence thereof
  • the parent node data and the child node data that is, the data having the association relationship between the upper and lower levels
  • the child node data is necessarily the completed data, and the two pieces of data are written into the second In the temporary table.
  • the sub-node data in the two data in Table 3 is the completed data, and enters the first temporary table through the first writing step, and the parent node data is incomplete data, and cannot be written in the first temporary table.
  • the second writing step analyzes the data of the piece of child nodes, and finds that the piece of data is data having an association relationship between the upper and lower levels, but the corresponding parent node data does not exist in the first temporary table, which is unfinished data, therefore, this Slave node data cannot be written to the second temporary table.
  • the second temporary table may be divided into 32 partitions, and the data in each partition in the second temporary table may be processed by using 8 parallel threads and processing data in 4 partitions according to each thread. Archived into a pre-built archive table in the database.
  • the second temporary table may be divided into other numbers of partitions, and the partitioned data may be processed by other numbers of parallel threads as long as the number of partitions is greater than or equal to the number of parallel threads.
  • the predetermined second processing manner may be the same as or different from the predetermined first processing manner, and the predetermined second processing manner processes the data of at least one partition for each parallel thread, so that there is no idle thread, in order to be faster
  • the data in each partition is archived to an archive table, preferably each parallel thread processes the same number of partitions.
  • the principle of dividing the second temporary table into a preset third number of partitions includes: establishing a second partition function, based on the principle of the data original table partitioning described above. Determining, by the third quantity, the boundary value of the partition and the partition basis column, inputting the determined boundary value and the partition basis into the second partition function, based on the input of the boundary value and the second partition function of the partition according to the column The second temporary table is partitioned.
  • the data in each partition in the second temporary table is archived into the archive table in the database,
  • the data in the archive table is classified and saved.
  • the processing system when executed by the processor, the following steps are further implemented: when receiving an instruction for querying data in the database, querying the data in the database according to the query instruction If the corresponding data is not queried in the original data table, the data in the archive table in the database is queried.
  • the traditional scheme is to archive the data in the database into the archive library, that is, in another database, the original data table and the archive table are in different databases, so the archive table cannot be directly queried after querying the original data table.
  • the archive table of this embodiment is established in the same database as the original data table. Since both the original data table and the archive table are in the database, when the data in the database is queried, the original data table and the archive table can be queried.
  • the data first query the original data table in the database, if the corresponding data is not queried in the original data table, then query the archive table in the database, so that it supports the archive table query, improve the availability of the archive table, improve the service performance.
  • the present application archives data in a database by using a chain multi-threading method, including a three-step operation. After each step operation is performed, the next operation can be performed, in the first step. Partition the original data table and use multiple parallel threads to write data to the temporary table. In the second step, use a single thread to filter out the eligible data, and in the third step, partition the temporary table and Archiving data into the database's archive table using multiple parallel threads enables fast and efficient archiving of data in the database to maintain database system performance stability.
  • FIG. 4 is a schematic flowchart of a method for data chain archiving according to an embodiment of the present invention.
  • the data chain archiving method includes the following steps:
  • Step S1 dividing the original data table in the database into a preset first number of partitions, and using the preset second number of parallel threads and writing the completed data in each partition according to a predetermined first processing manner
  • the first quantity is greater than or equal to the second quantity
  • the data original table can be divided into 32 partitions, and the completed data in each partition is written into the first by using 8 parallel threads and processing the data in the 4 partitions according to each thread.
  • the data original table can be divided into other numbers of partitions, and the partitioned data is processed by other parallel threads, as long as the number of partitions is greater than or equal to the number of parallel threads.
  • the predetermined first processing mode is to process the data of at least one partition for each parallel thread, so that there will be no idle threads, in order to write the completed data in each partition into the pre-established first temporary table more quickly.
  • each parallel thread processes the same number of partitions.
  • the data is fetched one by one, and then the identifier of whether the processing is completed in the data is obtained. If the identifier of the processing completion is the completion identifier, the data may be The data is completely written in the first temporary table. If the identifier of the completed processing is an incomplete identifier, the data is retained in the original data table. In the above table 1, for the data whose process id is "parentProcessId", whether the flag of the processing completion is "Yes", the data is the completed data, and is written in the first temporary table.
  • the step of dividing the data original table in the database into a preset first number of partitions includes: establishing a partition function, determining a boundary value of the partition and a partition basis column based on the first quantity And inputting the determined boundary value and the partition basis into the partition function, and partitioning the original data table based on the partition function that inputs the boundary value and the partition according to the column.
  • the partition function is an independent object in the database, which maps the rows of the original data table to a set of partitions. When the partition function is created, the boundary value of the data partition and the partition basis column are executed, and the partition function is executed to The original table is partitioned.
  • Step S2 after the completed data in each partition is all written into the first temporary table, the data having the association of the upper and lower levels in the first temporary table is obtained by using a single thread, and the data having the association of the upper and lower levels is associated. Write a pre-established second temporary table;
  • the write operation of this step is performed.
  • a lot of data in the database has an associated relationship in the process, that is, there is a relationship between the upper and lower levels. If the data is broken, the specific meaning is lost. If the multi-thread processing method is adopted, there may be an association.
  • the data is processed into different threads and the data is broken up. Because the data cannot be broken, the data in the first temporary table is preprocessed by a single thread in the writing operation of the step, and the data with the upper and lower levels associated is written into the pre-established second temporary table. Ensure that the associated data is not broken up.
  • step S2 specifically includes:
  • each piece of data in the first temporary table Using a single thread to obtain a process id identifier of each piece of data in the first temporary table, analyzing whether each piece of data has parent node data and child node data associated with the parent node data; if there is parent node data and the parent node If the data of the parent node is associated with the data of the parent node, the processing identifier of the data of the parent node is obtained; if the processing identifier of the data of the parent node is the completed identifier, the data is written as the data with the association of the upper and lower levels into the second temporary table. in.
  • the parent node data is, for example, the survey in FIG. 2, the entry of the adjustment condition check, the adjustment, the quality check, and the payment.
  • the "into the adjustment condition check" is used as the parent node data, and the corresponding child node data is entered into the adjustment check rule, and the waiting for the adjustment condition is satisfied.
  • each parent node is assigned a parent node processing ID (parentProcessId), and the child node under the parent node is assigned a child node processing ID (sonProcessId), and the data under the parent node and its corresponding child node are
  • parentProcessId parentProcessId
  • childProcessId child node processing ID
  • the data is data with a relationship between the upper and lower levels.
  • the data generated in a scenario is as shown in Table 2 above.
  • the two data in Table 2 are completed data, and the first temporary table is entered in step S1, and the two data are analyzed in step S2, first through the process.
  • the ID obtains the data, and the parent node data and the child node data (that is, the data having the association relationship between the upper and lower levels) exists. After confirming that the parent node data is the completed data, the child node data is necessarily the completed data, and These two pieces of data are written to the second temporary table.
  • the data generated in another scenario is as shown in Table 3 above.
  • the sub-node data in the two data in Table 3 is the completed data, and the first temporary table is entered in step S1, and the parent node data is incomplete data. Cannot be written to the first temporary table.
  • Step S2 analyzes the data of the slice child node, and finds that the piece of data is data having a relationship between the upper and lower levels, but the corresponding parent node data does not exist in the first temporary table, and the data is uncompleted. Therefore, the data of the child node is Cannot be written to the second temporary table.
  • Step S3 after processing all the data in the first temporary table by using a single thread, dividing the second temporary table into a preset third number of partitions, and using a preset fourth number of parallel threads And archiving, in a predetermined second processing manner, the data in each partition in the second temporary table into a pre-established archive table in the database, and corresponding data in the data original table that has been written into the archive table Delete, the third quantity being greater than or equal to the fourth quantity.
  • the second temporary table may be divided into 32 partitions, and the data in each partition in the second temporary table may be processed by using 8 parallel threads and processing data in 4 partitions according to each thread. Archived into a pre-built archive table in the database.
  • the second temporary table may be divided into other numbers of partitions, and the partitioned data may be processed by other numbers of parallel threads as long as the number of partitions is greater than or equal to the number of parallel threads.
  • the predetermined second processing manner may be the same as or different from the predetermined first processing manner, and the predetermined second processing manner processes the data of at least one partition for each parallel thread, so that there is no idle thread, in order to be faster
  • the data in each partition is archived to an archive table, preferably each parallel thread processes the same number of partitions.
  • the principle of dividing the second temporary table into a preset third number of partitions includes: establishing a second partition function, based on the principle of the data original table partitioning described above. Determining, by the third quantity, the boundary value of the partition and the partition basis column, inputting the determined boundary value and the partition basis into the second partition function, based on the input of the boundary value and the second partition function of the partition according to the column The second temporary table is partitioned.
  • the data in each partition in the second temporary table is archived into the archive table in the database,
  • the data in the archive table is classified and saved.
  • the processing system when executed by the processor, the following steps are further implemented: when receiving an instruction for querying data in the database, querying the data in the database according to the query instruction If the corresponding data is not queried in the original data table, the data in the archive table in the database is queried.
  • the traditional scheme is to archive the data in the database into the archive library, that is, in another database, the original data table and the archive table are in different databases, so the archive table cannot be directly queried after querying the original data table.
  • the archive table of this embodiment is established in the same database as the original data table. Since both the original data table and the archive table are in the database, when the data in the database is queried, the original data table and the archive table can be queried.
  • the data first query the original data table in the database, if the corresponding data is not queried in the original data table, then query the archive table in the database, so that it supports the archive table query, improve the availability of the archive table, improve the service performance.
  • the present application archives data in a database by using a chain multi-threading method, including a three-step operation. After each step operation is performed, the next operation can be performed, in the first step. Partition the original data table and use multiple parallel threads to write data to the temporary table. In the second step, use a single thread to filter out the eligible data, and in the third step, partition the temporary table and Archiving data into the database's archive table using multiple parallel threads enables fast and efficient archiving of data in the database to maintain database system performance stability.
  • the present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the method of data chain archiving described above.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及一种电子装置、数据链式归档的方法、系统及存储介质,该方法包括:将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入第一临时表中;当写入第一临时表后,利用单线程获取第一临时表中具有上下级关联的数据,并将具有上下级关联的数据写入第二临时表;当对第一临时表中的全部数据处理完成后,将第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除。本申请能快速、高效地将数据库中的数据归档。

Description

电子装置、数据链式归档的方法、系统及存储介质
优先权申明
本申请基于巴黎公约申明享有2017年03月06日递交的申请号为CN201810183496.9、名称为“电子装置、数据链式归档的方法及存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种电子装置、数据链式归档的方法、系统及存储介质。
背景技术
在企业机构中,随着业务量增长,数据库数据积压会越来越多。大量的数据积压会导致数据库系统性能下降,直接影响用户体验和用户操作时效。由此需要一个合理高效的方案来对数据进行归档。传统的数据归档方案是直接将数据搬到归档库,由于数据量非常大且跨及两个数据库,因此数据归档操作较慢,不能保持数据库系统性能的稳定性。
发明内容
本申请的目的在于提供一种电子装置、数据链式归档的方法、系统及存储介质,旨在快速、高效地将数据库中的数据归档。
为实现上述目的,本申请提供一种电子装置,所述电子装置包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的处理系统,所述处理系统被所述处理器执行时实现如下步骤:
第一写入步骤,将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的 已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
第二写入步骤,当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
归档步骤,当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除,所述第三数量大于等于所述第四数量。
为实现上述目的,本申请还提供一种数据链式归档的方法,所述数据链式归档的方法包括:
S1,将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
S2,当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
S3,当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除,所述第三数量大于等于所述第四数量。
为实现上述目的,本申请还提供一种数据链式归档的系统,所述数据链式归档的系统包括:
第一写入模块,用于将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
第二写入模块,用于当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
归档模块,用于当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除,所述第三数量大于等于所述第四数量。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现上述的方法的步骤。
本申请的有益效果是:本申请在对数据库中的数据归档时,采用链式多线程的方式进行归档,包括三步操作,每一步操作执行完成后方可执行下一步操作,在第一步操作中对数据原表进行分区并采用多个并行线程的方式将数据写入临时表,在第二步操作中用单线程筛选出符合条件的数据,在第三步操作中对临时表进行分区并采用多个并行线程的方式将数据归档至该数据库的归档表中,能够快速、高效地将数据库中的数据归档,保持数据库系统性能的稳定性。
附图说明
图1为本申请电子装置一实施例的硬件架构的示意图;
图2为数据库的数据原表中父节点数据一实施例的示例图;
图3为图2所示父节点数据下的子节点数据一实施例的示例图;
图4为本申请数据链式归档的方法一实施例的流程示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
参阅图1所示,是本申请电子装置以实施例的硬件结构示意图。电子装置1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述电子装置1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
在本实施例中,电子装置1可包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,存储器11存储有可在处理器12上运行的处理系统。需要指出的是,图1仅示出了具有组件11-13的电 子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
其中,存储器11包括内存及至少一种类型的可读存储介质。内存为电子装置1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中,可读存储介质可以是电子装置1的内部存储单元,例如该电子装置1的硬盘;在另一些实施例中,该非易失性存储介质也可以是电子装置1的外部存储设备,例如电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储器11的可读存储介质通常用于存储安装于电子装置1的操作系统和各类应用软件,例如存储本申请一实施例中的处理系统的程序代码等。此外,存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述电子装置1的总体操作,例如执行与其他设备进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行处理系统等。
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述电子装置1与其他电子设备之间建立通信连接。
所述处理系统存储在存储器11中,包括至少一个存储在存储器11中的计算机可读指令,该至少一个计算机可读指令可被处理器器12执行,以实现本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分 所实现的功能不同,可被划为不同的逻辑模块。
本申请的电子装置中安装有大型数据库,例如Oracle数据库、mysql数据库等等,数据库的数据原表中存有大量的数据,为了使得数据原表无大量的数据积压,对数据进行归档。在归档时,采用链式多线程的方式进行归档,链式指的是包括多步操作,每一步操作完成后方可执行下一步操作:在第一步操作中对数据原表进行分区并采用多个并行线程的方式将已完成的数据快速写入临时表,为了不打散数据,在第二步操作中用单线程进一步筛选出符合预定规则的数据,在第三步操作中对临时表进行分区并采用多个并行线程的方式快速对数据进行归档处理。本申请能够快速、高效地将数据归档,保证数据库系统中数据的稳定。
在一实施例中,上述处理系统被所述处理器12执行时实现如下步骤:
第一写入步骤,将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
在一具体实施例中,可以将数据原表分为32个分区,利用8个并行线程并按照每个线程处理4个分区中的数据的方式,将各个分区中的已完成数据写入第一临时表中。在其他实施例中,可以将数据原表分为其他数量的分区,并利用其他个数的并行线程对分区的数据进行处理,只要分区的数量大于等于并行线程的数量即可。预定的第一处理方式为每一并行线程处理至少一个分区的数据,这样就不会有空闲的线程,为了更快速地将将各个分区中的已完成数据写入预先建立的第一临时表中,优选地,每一并行线程处理相同数量的分区。
其中,对于数据原表中的每一条数据,其包括对应的流程id、是否处理完成的标识、上级流程id、分区号及其他字段等等,如下表1所示:
Figure PCTCN2018089458-appb-000001
表1
在每一并行线程处理数据原表中的数据时,按序逐条取数据,然后获取该条数据中的是否处理完成的标识,若该是否处理完成的标识为完成标识,则可以将该条数据完整写入第一临时表中,若该是否处理完成的标识为未完成标识,则将该条数据保留在数据原表中。在上述表1中,对于流程id为“parentProcessId”的数据,其是否处理完成的标识为“是”,则这条数据为已完成数据,将其写入第一临时表中。
在一优选的实施例中,所述将数据库中的数据原表分为预设的第一数量的分区的步骤包括:建立分区函数,基于所述第一数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该分区函数中,基于输入所述边界值及分区依据列的分区函数对所述数据原表进行分区。其中,分区函数是数据库中的一个独立对象,其将数据原表的行映射到一组分区,在创建分区函数时,指明数据分区的边界值以及分区依据列,执行该分区函数,以对数据原表进行分区。
第二写入步骤,当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
当各个分区中的已完成数据全部写入所述第一临时表后,即上一步写入操作完成后,执行本步骤的写入操作。其中,数据库中的数据很多数据在流程上存在关联的关系,即存在上下级关联关系,若这些数据被打散处理则失去其具体意义,如果采用多线程处理的方式,则可能会将有关联的数据分到不同的线程进行处理,数据被打散。由于这些数据不能打散,因此本步骤的写入操作中利用单线程对第一临时表中的数据进行预处理,将所述具有上下 级关联的数据写入预先建立的第二临时表,能够保证有关联的数据不被打散。
在一实施例中,该第二写入步骤,具体包括:
利用单线程获取第一临时表中的各条数据的流程id标识,分析各条数据是否存在父节点数据及与该父节点数据相关联的子节点数据;若存在父节点数据及与该父节点数据相关联的子节点数据,则获取该父节点数据的处理标识;若该父节点数据的处理标识为已完成标识,则将该条数据作为具有上下级关联的数据写入该第二临时表中。
在一具体的实例中,如图2所示,父节点数据例如为图2中的查勘、进入理算条件检查、理算、质检、支付下发。如图3所示,以“进入理算条件检查”作为父节点数据,则对应的子节点数据为进入理算检查规则、等待进入理算条件满足。
其中,在整个流程的数据中,每一父节点分配一父节点处理ID(parentProcessId),父节点下的子节点分配一子节点处理ID(sonProcessId),父节点下的数据与其对应的子节点下的数据为存在上下级关联关系的数据。
在一场景下产生的数据如下述表2所示:
流程id 是否完成 上级流程ID 分区号 其他字段
parentProcessId 1 。。。
sonProcessId parentProcessId 1 。。。
表2
表2这两条数据均为已完成的数据,经第一写入步骤进入第一临时表中,第二写入步骤对这两条数据进行分析,首先经流程ID获取到该数据,其存在父节点数据及子节点数据(即存在上下级关联关系的数据),经确认父节点数据为已完成的数据,则其子节点数据必然为已完成的数据,将这两条数据写入第二临时表中。
在另一场景下产生的数据如下述表3所示:
流程id 是否完成 上级流程ID 分区号 其他字段
parentProcessId 1 。。。
sonProcessId parentProcessId 1 。。。
表3
表3中的两条数据中子节点数据为已完成的数据,经第一写入步骤进入第一临时表中,父节点数据为未完成的数据,不能写入第一临时表中。第二写入步骤对该条子节点数据进行分析,发现该条数据为存在上下级关联关系的数据,但是对应的父节点数据不存在第一临时表中,其为未完成的数据,因此,这条子节点数据不能写入第二临时表中。
归档步骤,当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除,所述第三数量大于等于所述第四数量。
当利用单线程对所述第一临时表中的全部数据处理完成后,即上一步写入操作完成后,执行本步骤的归档操作。
在一具体实施例中,可以将第二临时表分为32个分区,利用8个并行线程并按照每个线程处理4个分区中的数据的方式,将第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中。在其他实施例中,可以将第二临时表分为其他数量的分区,并利用其他个数的并行线程对分区的数据进行处理,只要分区的数量大于等于并行线程的数量即可。预定的第二处理方式与预定的第一处理方式可以相同或不同,预定的第二处理方式为每一并行线程处理至少一个分区的数据,这样就不会有空闲的线程,为了更快速地将将各个分区中的数据归档至归档表,优选地,每一并行线程处理相同数量的分区。
在一优选的实施例中,与上述的数据原表分区的原理相同,所述将所述第二临时表分为预设的第三数量的分区的步骤包括:建立第二分区函数,基于所述第三数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第二分区函数中,基于输入所述边界值及分区依据列的第二分区函数对所述第二临时表进行分区。
在一优选的实施例中,为了方便后续查询数据,以便能够快速查找到所需的数据,在将第二临时表中各个分区中的数据归档至该数据库中的归档表中之后,还可以对归档表中的数据进行分类保存。
在一优选的实施例中,处理系统被所述处理器执行时,还实现如下步骤:当接收到对该数据库中的数据进行查询的指令时,根据所述查询指令查询该数据库中的数据原表,若在所述数据原表中未查询到对应的数据,则对该数据库中的归档表中的数据进行查询。
其中,传统方案是将数据库中的数据归档至归档库中,也就是另一数据库中,数据原表与归档表处于不同的数据库中,因此无法在查询数据原表后直接查询归档表。本实施例的归档表建立在与该数据原表相同的数据库中,由于数据原表与归档表均在该数据库中,因此在查询该数据库中的数据时,可以查询数据原表及归档表中的数据,首先查询数据库中的数据原表,若在数据原表中未查询到对应的数据,则再查询该数据库中的归档表,使得其支持归档表查询,提高归档表的可用性,提高服务性能。
与现有技术相比,本申请在对数据库中的数据归档时,采用链式多线程的方式进行归档,包括三步操作,每一步操作执行完成后方可执行下一步操作,在第一步操作中对数据原表进行分区并采用多个并行线程的方式将数据写入临时表,在第二步操作中用单线程筛选出符合条件的数据,在第三步操作中对临时表进行分区并采用多个并行线程的方式将数据归档至该数据库的归档表中,能够快速、高效地将数据库中的数据归档,保持数据库系统性 能的稳定性。
如图4所示,图4为本申请数据链式归档的方法一实施例的流程示意图,该数据链式归档的方法包括以下步骤:
步骤S1,将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
在一具体实施例中,可以将数据原表分为32个分区,利用8个并行线程并按照每个线程处理4个分区中的数据的方式,将各个分区中的已完成数据写入第一临时表中。在其他实施例中,可以将数据原表分为其他数量的分区,并利用其他个数的并行线程对分区的数据进行处理,只要分区的数量大于等于并行线程的数量即可。预定的第一处理方式为每一并行线程处理至少一个分区的数据,这样就不会有空闲的线程,为了更快速地将将各个分区中的已完成数据写入预先建立的第一临时表中,优选地,每一并行线程处理相同数量的分区。
其中,对于数据原表中的每一条数据,其包括对应的流程id、是否处理完成的标识、上级流程id、分区号及其他字段等等,如上述表1所示,此处不再赘述。
在每一并行线程处理数据原表中的数据时,按序逐条取数据,然后获取该条数据中的是否处理完成的标识,若该是否处理完成的标识为完成标识,则可以将该条数据完整写入第一临时表中,若该是否处理完成的标识为未完成标识,则将该条数据保留在数据原表中。在上述表1中,对于流程id为“parentProcessId”的数据,其是否处理完成的标识为“是”,则这条数据为已完成数据,将其写入第一临时表中。
在一优选的实施例中,所述将数据库中的数据原表分为预设的第一数量的分区的步骤包括:建立分区函数,基于所述第一数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该分区函数中,基于输入所述边界值及分区依据列的分区函数对所述数据原表进行分区。其中,分区函数是数据库中的一个独立对象,其将数据原表的行映射到一组分区,在创建分区函数时,指明数据分区的边界值以及分区依据列,执行该分区函数,以对数据原表进行分区。
步骤S2,当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
当各个分区中的已完成数据全部写入所述第一临时表后,即上一步写入操作完成后,执行本步骤的写入操作。其中,数据库中的数据很多数据在流程上存在关联的关系,即存在上下级关联关系,若这些数据被打散处理则失去其具体意义,如果采用多线程处理的方式,则可能会将有关联的数据分到不同的线程进行处理,数据被打散。由于这些数据不能打散,因此本步骤的写入操作中利用单线程对第一临时表中的数据进行预处理,将所述具有上下级关联的数据写入预先建立的第二临时表,能够保证有关联的数据不被打散。
在一实施例中,该步骤S2,具体包括:
利用单线程获取第一临时表中的各条数据的流程id标识,分析各条数据是否存在父节点数据及与该父节点数据相关联的子节点数据;若存在父节点数据及与该父节点数据相关联的子节点数据,则获取该父节点数据的处理标识;若该父节点数据的处理标识为已完成标识,则将该条数据作为具有上下级关联的数据写入该第二临时表中。
在一具体的实例中,如图2所示,父节点数据例如为图2中的查勘、进 入理算条件检查、理算、质检、支付下发。如图3所示,以“进入理算条件检查”作为父节点数据,则对应的子节点数据为进入理算检查规则、等待进入理算条件满足。
其中,在整个流程的数据中,每一父节点分配一父节点处理ID(parentProcessId),父节点下的子节点分配一子节点处理ID(sonProcessId),父节点下的数据与其对应的子节点下的数据为存在上下级关联关系的数据。
在一场景下产生的数据如上述表2所示,表2这两条数据均为已完成的数据,经步骤S1进入第一临时表中,步骤S2对这两条数据进行分析,首先经流程ID获取到该数据,其存在父节点数据及子节点数据(即存在上下级关联关系的数据),经确认父节点数据为已完成的数据,则其子节点数据必然为已完成的数据,将这两条数据写入第二临时表中。
在另一场景下产生的数据如上述表3所示,表3中的两条数据中子节点数据为已完成的数据,经步骤S1进入第一临时表中,父节点数据为未完成的数据,不能写入第一临时表中。步骤S2对该条子节点数据进行分析,发现该条数据为存在上下级关联关系的数据,但是对应的父节点数据不存在第一临时表中,其为未完成的数据,因此,这条子节点数据不能写入第二临时表中。
步骤S3,当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除,所述第三数量大于等于所述第四数量。
当利用单线程对所述第一临时表中的全部数据处理完成后,即上一步写入操作完成后,执行本步骤的归档操作。
在一具体实施例中,可以将第二临时表分为32个分区,利用8个并行 线程并按照每个线程处理4个分区中的数据的方式,将第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中。在其他实施例中,可以将第二临时表分为其他数量的分区,并利用其他个数的并行线程对分区的数据进行处理,只要分区的数量大于等于并行线程的数量即可。预定的第二处理方式与预定的第一处理方式可以相同或不同,预定的第二处理方式为每一并行线程处理至少一个分区的数据,这样就不会有空闲的线程,为了更快速地将将各个分区中的数据归档至归档表,优选地,每一并行线程处理相同数量的分区。
在一优选的实施例中,与上述的数据原表分区的原理相同,所述将所述第二临时表分为预设的第三数量的分区的步骤包括:建立第二分区函数,基于所述第三数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第二分区函数中,基于输入所述边界值及分区依据列的第二分区函数对所述第二临时表进行分区。
在一优选的实施例中,为了方便后续查询数据,以便能够快速查找到所需的数据,在将第二临时表中各个分区中的数据归档至该数据库中的归档表中之后,还可以对归档表中的数据进行分类保存。
在一优选的实施例中,处理系统被所述处理器执行时,还实现如下步骤:当接收到对该数据库中的数据进行查询的指令时,根据所述查询指令查询该数据库中的数据原表,若在所述数据原表中未查询到对应的数据,则对该数据库中的归档表中的数据进行查询。
其中,传统方案是将数据库中的数据归档至归档库中,也就是另一数据库中,数据原表与归档表处于不同的数据库中,因此无法在查询数据原表后直接查询归档表。本实施例的归档表建立在与该数据原表相同的数据库中,由于数据原表与归档表均在该数据库中,因此在查询该数据库中的数据时,可以查询数据原表及归档表中的数据,首先查询数据库中的数据原表,若在 数据原表中未查询到对应的数据,则再查询该数据库中的归档表,使得其支持归档表查询,提高归档表的可用性,提高服务性能。
与现有技术相比,本申请在对数据库中的数据归档时,采用链式多线程的方式进行归档,包括三步操作,每一步操作执行完成后方可执行下一步操作,在第一步操作中对数据原表进行分区并采用多个并行线程的方式将数据写入临时表,在第二步操作中用单线程筛选出符合条件的数据,在第三步操作中对临时表进行分区并采用多个并行线程的方式将数据归档至该数据库的归档表中,能够快速、高效地将数据库中的数据归档,保持数据库系统性能的稳定性。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现上述的数据链式归档的方法的步骤。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种电子装置,其特征在于,所述电子装置包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的处理系统,所述处理系统被所述处理器执行时实现如下步骤:
    第一写入步骤,将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
    第二写入步骤,当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
    归档步骤,当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除,所述第三数量大于等于所述第四数量。
  2. 根据权利要求1所述的电子装置,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:
    当接收到对该数据库中的数据进行查询的指令时,根据所述查询指令查询该数据库中的数据原表,若在所述数据原表中未查询到对应的数据,则对该数据库中的归档表中的数据进行查询。
  3. 根据权利要求1或2所述的电子装置,其特征在于,所述第二写入步骤,具体包括:
    利用单线程获取第一临时表中的各条数据的流程id标识,分析各条数据是否存在父节点数据及与该父节点数据相关联的子节点数据;
    若存在父节点数据及与该父节点数据相关联的子节点数据,则获取该父节点数据的处理标识;
    若该父节点数据的处理标识为已完成标识,则将该条数据作为具有上下级关联的数据写入该第二临时表中。
  4. 根据权利要求1或2所述的电子装置,其特征在于,所述将数据库中的数据原表分为预设的第一数量的分区的步骤包括:
    建立第一分区函数,基于所述第一数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第一分区函数中,基于输入所述边界值及分区依据列的第一分区函数对所述数据原表进行分区;
    所述将所述第二临时表分为预设的第三数量的分区的步骤包括:
    建立第二分区函数,基于所述第三数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第二分区函数中,基于输入所述边界值及分区依据列的第二分区函数对所述第二临时表进行分区。
  5. 根据权利要求1或2所述的电子装置,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:对归档表中的数据进行分类保存。
  6. 一种数据链式归档的方法,其特征在于,所述数据链式归档的方法包括:
    S1,将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
    S2,当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
    S3,当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并 按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除,所述第三数量大于等于所述第四数量。
  7. 根据权利要求6所述的数据链式归档的方法,其特征在于,所述步骤S3之后,还包括:
    当接收到对该数据库中的数据进行查询的指令时,根据所述查询指令查询该数据库中的数据原表,若在所述数据原表中未查询到对应的数据,则对该数据库中的归档表中的数据进行查询。
  8. 根据权利要求6或7所述的数据链式归档的方法,其特征在于,所述步骤S2,具体包括:
    利用单线程获取第一临时表中的各条数据的流程id标识,分析各条数据是否存在父节点数据及与该父节点数据相关联的子节点数据;
    若存在父节点数据及与该父节点数据相关联的子节点数据,则获取该父节点数据的处理标识;
    若该父节点数据的处理标识为已完成标识,则将该条数据作为具有上下级关联的数据写入该第二临时表中。
  9. 根据权利要求6或7所述的数据链式归档的方法,其特征在于,所述将数据库中的数据原表分为预设的第一数量的分区的步骤包括:
    建立第一分区函数,基于所述第一数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第一分区函数中,基于输入所述边界值及分区依据列的第一分区函数对所述数据原表进行分区;
    所述将所述第二临时表分为预设的第三数量的分区的步骤包括:
    建立第二分区函数,基于所述第三数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第二分区函数中,基于输入所述边界值及分区依据列的第二分区函数对所述第二临时表进行分区。
  10. 根据权利要求6或7所述的数据链式归档的方法,其特征在于,所述数据链式归档的方法,还包括:对归档表中的数据进行分类保存。
  11. 一种数据链式归档的系统,其特征在于,所述数据链式归档的系统包括:
    第一写入模块,用于将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
    第二写入模块,用于当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
    归档模块,用于当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表中的数据删除,所述第三数量大于等于所述第四数量。
  12. 根据权利要求11所述的数据链式归档的系统,其特征在于,还包括查询模块,用于当接收到对该数据库中的数据进行查询的指令时,根据所述查询指令查询该数据库中的数据原表,若在所述数据原表中未查询到对应的数据,则对该数据库中的归档表中的数据进行查询。
  13. 根据权利要求11或12所述的数据链式归档的系统,其特征在于,所述第二写入模块,具体用于:利用单线程获取第一临时表中的各条数据的流程id标识,分析各条数据是否存在父节点数据及与该父节点数据相关联的子节点数据;若存在父节点数据及与该父节点数据相关联的子节点数据,则获取该父节点数据的处理标识;若该父节点数据的处理标识为已完成标识,则 将该条数据作为具有上下级关联的数据写入该第二临时表中。
  14. 根据权利要求11或12所述的数据链式归档的系统,其特征在于,所述第一写入模块具体用于:建立第一分区函数,基于所述第一数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第一分区函数中,基于输入所述边界值及分区依据列的第一分区函数对所述数据原表进行分区;
    所述归档模块具体用于:建立第二分区函数,基于所述第三数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第二分区函数中,基于输入所述边界值及分区依据列的第二分区函数对所述第二临时表进行分区。
  15. 根据权利要求11或12所述的数据链式归档的系统,其特征在于,还包括分类模块,用于对归档表中的数据进行分类保存。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现步骤:
    第一写入步骤,将数据库中的数据原表分为预设的第一数量的分区,利用预设的第二数量的并行线程并按照预定的第一处理方式,将各个分区中的已完成数据写入预先建立的第一临时表中,所述第一数量大于等于所述第二数量;
    第二写入步骤,当各个分区中的已完成数据全部写入所述第一临时表后,利用单线程获取所述第一临时表中具有上下级关联的数据,并将所述具有上下级关联的数据写入预先建立的第二临时表;
    归档步骤,当利用单线程对所述第一临时表中的全部数据处理完成后,将所述第二临时表分为预设的第三数量的分区,利用预设的第四数量的并行线程并按照预定的第二处理方式,将所述第二临时表中各个分区中的数据归档至该数据库中预先建立的归档表中,并将数据原表中对应已写入该归档表 中的数据删除,所述第三数量大于等于所述第四数量。
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:
    当接收到对该数据库中的数据进行查询的指令时,根据所述查询指令查询该数据库中的数据原表,若在所述数据原表中未查询到对应的数据,则对该数据库中的归档表中的数据进行查询。
  18. 根据权利要求16或17所述的计算机可读存储介质,其特征在于,所述第二写入步骤,具体包括:
    利用单线程获取第一临时表中的各条数据的流程id标识,分析各条数据是否存在父节点数据及与该父节点数据相关联的子节点数据;
    若存在父节点数据及与该父节点数据相关联的子节点数据,则获取该父节点数据的处理标识;
    若该父节点数据的处理标识为已完成标识,则将该条数据作为具有上下级关联的数据写入该第二临时表中。
  19. 根据权利要求16或17所述的计算机可读存储介质,其特征在于,所述将数据库中的数据原表分为预设的第一数量的分区的步骤包括:
    建立第一分区函数,基于所述第一数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第一分区函数中,基于输入所述边界值及分区依据列的第一分区函数对所述数据原表进行分区;
    所述将所述第二临时表分为预设的第三数量的分区的步骤包括:
    建立第二分区函数,基于所述第三数量确定分区的边界值及分区依据列,将所确定的边界值及分区依据列输入该第二分区函数中,基于输入所述边界值及分区依据列的第二分区函数对所述第二临时表进行分区。
  20. 根据权利要求16或17所述的计算机可读存储介质,其特征在于,所述处理系统被所述处理器执行时,还实现如下步骤:对归档表中的数据进行 分类保存。
PCT/CN2018/089458 2018-03-06 2018-06-01 电子装置、数据链式归档的方法、系统及存储介质 WO2019169764A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SG11202002842YA SG11202002842YA (en) 2018-03-06 2018-06-01 Electronic apparatus, data chain archiving method, system and storage medium
US16/632,552 US11106649B2 (en) 2018-03-06 2018-06-01 Electronic apparatus, data chain archiving method, system and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810183496.9A CN108470045B (zh) 2018-03-06 2018-03-06 电子装置、数据链式归档的方法及存储介质
CN201810183496.9 2018-03-06

Publications (1)

Publication Number Publication Date
WO2019169764A1 true WO2019169764A1 (zh) 2019-09-12

Family

ID=63265117

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/089458 WO2019169764A1 (zh) 2018-03-06 2018-06-01 电子装置、数据链式归档的方法、系统及存储介质

Country Status (4)

Country Link
US (1) US11106649B2 (zh)
CN (1) CN108470045B (zh)
SG (1) SG11202002842YA (zh)
WO (1) WO2019169764A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110908978A (zh) * 2019-11-06 2020-03-24 中盈优创资讯科技有限公司 数据库数据结构验证方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457255B (zh) * 2019-07-05 2023-11-21 中国平安人寿保险股份有限公司 数据归档的方法、服务器及计算机可读存储介质
CN110825695A (zh) * 2019-11-04 2020-02-21 泰康保险集团股份有限公司 数据处理方法、装置、介质及电子设备
CN111506573B (zh) * 2020-03-16 2024-03-12 中国平安人寿保险股份有限公司 数据库表分区方法、装置、计算机设备及存储介质
CN114880409A (zh) * 2022-06-15 2022-08-09 中银金融科技有限公司 数据表归档方法及相关装置
CN116842223B (zh) * 2023-08-29 2023-11-10 天津鑫宝龙电梯集团有限公司 一种工况数据管理方法、装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103262074A (zh) * 2010-11-16 2013-08-21 赛贝斯股份有限公司 并行再分区索引扫描
CN103778176A (zh) * 2012-10-18 2014-05-07 西门子公司 Mes系统中数据的长期归档
CN105279261A (zh) * 2015-10-23 2016-01-27 北京京东尚科信息技术有限公司 动态可扩展数据库归档方法和系统
CN105808633A (zh) * 2016-01-08 2016-07-27 平安科技(深圳)有限公司 数据归档方法和系统
CN106383897A (zh) * 2016-09-28 2017-02-08 平安科技(深圳)有限公司 数据库容量计算方法和装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6223182B1 (en) * 1998-06-30 2001-04-24 Oracle Corporation Dynamic data organization
US7624120B2 (en) * 2004-02-11 2009-11-24 Microsoft Corporation System and method for switching a data partition
CN101197876B (zh) * 2006-12-06 2012-02-29 中兴通讯股份有限公司 一种对消息类业务数据进行多维分析的方法和系统
CN101593202B (zh) * 2009-01-14 2013-05-01 中国人民解放军国防科学技术大学 基于共享Cache多核处理器的数据库哈希连接方法
CN103176988A (zh) * 2011-12-21 2013-06-26 上海博腾信息科技有限公司 基于SaaS的数据迁移系统
GB2520361A (en) * 2013-11-19 2015-05-20 Ibm Method and system for a safe archiving of data
US9830373B2 (en) * 2015-01-06 2017-11-28 Entit Software Llc Data transfer requests with data transfer policies
US10409770B1 (en) * 2015-05-14 2019-09-10 Amazon Technologies, Inc. Automatic archiving of data store log data
US9823982B1 (en) * 2015-06-19 2017-11-21 Amazon Technologies, Inc. Archiving and restoration of distributed database log records
CN105095384B (zh) * 2015-07-01 2018-09-14 北京京东尚科信息技术有限公司 数据结转的方法和装置
US10198495B1 (en) * 2015-09-25 2019-02-05 Wells Fargo Bank, N.A. Configurable database management
CN105589968A (zh) * 2015-12-25 2016-05-18 中国银联股份有限公司 数据汇总系统及方法
CN105760505A (zh) * 2016-02-23 2016-07-13 浪潮软件集团有限公司 基于hive的历史数据分析及存档方法
CN107305554A (zh) * 2016-04-20 2017-10-31 泰康保险集团股份有限公司 数据查询处理方法及装置
CN107016007B (zh) * 2016-06-06 2020-04-28 阿里巴巴集团控股有限公司 基于数据仓库进行大数据处理的方法以及装置
CN106776837A (zh) * 2016-11-25 2017-05-31 国云科技股份有限公司 一种基于MongoDB的证券实时交易关联分析的方法
CN108009223B (zh) * 2017-11-24 2021-12-07 中体彩科技发展有限公司 一种交易数据的一致性检测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103262074A (zh) * 2010-11-16 2013-08-21 赛贝斯股份有限公司 并行再分区索引扫描
CN103778176A (zh) * 2012-10-18 2014-05-07 西门子公司 Mes系统中数据的长期归档
CN105279261A (zh) * 2015-10-23 2016-01-27 北京京东尚科信息技术有限公司 动态可扩展数据库归档方法和系统
CN105808633A (zh) * 2016-01-08 2016-07-27 平安科技(深圳)有限公司 数据归档方法和系统
CN106383897A (zh) * 2016-09-28 2017-02-08 平安科技(深圳)有限公司 数据库容量计算方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110908978A (zh) * 2019-11-06 2020-03-24 中盈优创资讯科技有限公司 数据库数据结构验证方法及装置
CN110908978B (zh) * 2019-11-06 2022-09-13 中盈优创资讯科技有限公司 数据库数据结构验证方法及装置

Also Published As

Publication number Publication date
US20200242098A1 (en) 2020-07-30
SG11202002842YA (en) 2020-04-29
US11106649B2 (en) 2021-08-31
CN108470045B (zh) 2020-02-18
CN108470045A (zh) 2018-08-31

Similar Documents

Publication Publication Date Title
WO2019169764A1 (zh) 电子装置、数据链式归档的方法、系统及存储介质
US9672220B2 (en) Index record-level locking for record-oriented file systems
US10007699B2 (en) Optimized exclusion filters for multistage filter processing in queries
US9864876B2 (en) Live editing and publishing of documents within a content management system using a hybrid draft authorization workflow
US10262025B2 (en) Managing a temporal key property in a database management system
WO2017096892A1 (zh) 索引构建方法、查询方法及对应装置、设备、计算机存储介质
CN110134335B (zh) 一种基于键值对的rdf数据管理方法、装置及存储介质
US11687527B2 (en) System and method for analysis of graph databases using intelligent reasoning systems
WO2017161540A1 (zh) 数据查询的方法、数据对象的存储方法和数据系统
WO2019161645A1 (zh) 基于Shell的数据表提取方法、终端、设备及存储介质
US20190114303A1 (en) System and method for applying extended regular expressions against arbitrary data objects
US11500836B2 (en) Systems and methods of creation and deletion of tenants within a database
US11216412B2 (en) Intelligent merging for efficient updates in columnar databases
WO2023124217A1 (zh) 一种获取多列数据的综合排列数据的方法与设备
US10503718B2 (en) Parallel transfers of electronic data
CN107609011B (zh) 一种数据库记录的维护方法和装置
CN112416972A (zh) 实时数据流处理方法、装置、设备、及可读存储介质
WO2019061667A1 (zh) 电子装置、数据处理方法、系统及计算机可读存储介质
US20180011897A1 (en) Data processing method having structure of cache index specified to transaction in mobile environment dbms
CN111971667B (zh) 可恢复的合并排序
CN116010345A (zh) 一种实现流批一体数据湖的表服务方案的方法、装置及设备
US8224822B2 (en) Template based entity transformation
US20180012033A1 (en) Method and apparatus of non-volatile memory system having capability of key-value store database
US9256634B2 (en) Resuming big data transformations
US9639630B1 (en) System for business intelligence data integration

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11/12/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18909002

Country of ref document: EP

Kind code of ref document: A1