CN114706797B - Method for recovering storage space - Google Patents

Method for recovering storage space Download PDF

Info

Publication number
CN114706797B
CN114706797B CN202210638814.2A CN202210638814A CN114706797B CN 114706797 B CN114706797 B CN 114706797B CN 202210638814 A CN202210638814 A CN 202210638814A CN 114706797 B CN114706797 B CN 114706797B
Authority
CN
China
Prior art keywords
data
column
block
row
target table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210638814.2A
Other languages
Chinese (zh)
Other versions
CN114706797A (en
Inventor
刘秀鹏
李卓印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Nankai University General Data Technologies Co ltd
Original Assignee
Tianjin Nankai University General Data Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Nankai University General Data Technologies Co ltd filed Critical Tianjin Nankai University General Data Technologies Co ltd
Priority to CN202210638814.2A priority Critical patent/CN114706797B/en
Publication of CN114706797A publication Critical patent/CN114706797A/en
Application granted granted Critical
Publication of CN114706797B publication Critical patent/CN114706797B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0253Garbage collection, i.e. reclamation of unreferenced memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Abstract

The invention provides a method for efficiently recycling a storage space, which comprises the following specific steps: the client sends the recovery task information to the cluster coordinator, and the cluster coordinator applies for an exclusive lock of the target list to be processed; the cluster coordinator sends the recovery task information to the coordinator node; the coordinator node processes the data with holes in the disk data line by line or block by block according to the received recovery task information, generates an effective data column, and deletes the ineffective data of the original column; the coordinator node returns the deletion result to the cluster coordinator; the exclusive lock of the target table is released. The method for efficiently recycling the storage space solves the problem that after the table data is deleted from the big data, the data still occupies the disk space to cause a data hole, and solves the problems that the disk resources are insufficient and the data hole and the data expansion can be solved only by dumping the data through the table.

Description

Method for recovering storage space
Technical Field
The invention belongs to the field of database processing, and particularly relates to a method for recycling a storage space.
Background
A database is a repository that stores data, which can hold hundreds of millions, billions, or even billions of data that occupy a large amount of disk space. In the clustered database, a user may access any coordinator node through various ways of accessing the clustered database. The DDL and DML are respectively a data definition language and a data manipulation language, and the definition and the data of the table in the database can be defined and modified through the DDL and the DML. And sending the DDL or DML command to the coordinator node, wherein the inside of the coordinator node distributes corresponding processing logic to each computing unit for processing. After a large amount of DML data operations, the data space of the disk is wasted and the data is expanded. The space data can not recycle the disk space in a Delete table mode, and a data hole of the disk is actually formed, so that the deleted data continuously occupies the disk space, and finally, the disk space is smaller and smaller due to the addition of other new DML operations to the data, so that the space is urgent and the performance of full-table scanning is affected.
Disclosure of Invention
In view of this, the present invention aims to provide a method for recovering storage space, so as to solve the problems that a large amount of data exists in a database, occupies a large amount of space of a disk, and after a Delete table deletes database data, the data still occupies the space of the disk and is not released, so that a disk data hole and disk resources are caused to be worried.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a method for recovering storage space comprises the following specific steps:
s1, the client sends recovery task information to the cluster coordinator, the data recovery task information comprises target table information and execution modes of a recovery table space, and the execution modes of the recovery table space comprise row-level recovery and block-level recovery;
s2, the cluster coordinator applies for the exclusive lock of the target list to be processed;
s3, the cluster coordinator sends the recovery task information to the coordinator node;
s4, the coordinator node detects whether the target table has data deleting operation and confirms data holes according to the received recovery task information, processes the data with holes in the disk data row by row or block by block according to the columns, generates effective data columns and deletes the invalid data of the original columns;
s5, the coordinator node returns the deletion result to the cluster coordinator;
s6, releasing the exclusive lock of the target list.
Further, in step S4, the coordinator node detects whether the target table has data deletion operation and determines a data hole according to the received task recovery information, processes the hole data in the target table row by row or block by block and forms a new data block, and deletes the original data column, where the specific method is as follows:
s41, detecting whether the target table has the operation of deleting the data, and confirming the data hole;
s42, if the execution mode of the recovery tablespace in the recovery task information received by the coordinator node is row-level recovery, performing row-by-row hole data processing on each row of data to be processed in the target table to generate an effective data column, and if the execution mode of the recovery tablespace in the recovery task information received by the coordinator node is block-level recovery, processing the hole data block by block to generate an effective data column;
s43, reconstructing a hash index for each new effective data column;
s44, deleting invalid data and temporary data in steps S41-S43.
Further, in step S41, the hole data is confirmed by that the data block is composed of a fixed number of pieces of data, and each piece of data has a flag in the data block to identify whether the piece of data is valid. The deleted invalid data is marked as 1, and the deletion ratio of the block can be obtained by dividing the number of the data marked as 1 by the total number of the block data.
Further, in step S42, the row-level recycle is performed to perform row-by-row hole data processing on each row of data to be processed in the target table, and a new valid data column is established, specifically, the method is as follows,
s4201, acquiring the number of columns of the target table, and correspondingly generating a new column for storing valid data for each column of the target table in the memory;
s4202, creating metadata information of a new column according to the corresponding new column attribute in the original column structure memory of the target table;
s4203, performing data inspection row by row on the original column of each column of the target table, judging whether the original column is valid data, if the original column is valid data, placing the data into a corresponding new column, and leaving invalid data in the original column for non-processing;
s4204, when the number of the effective data in the new column reaches a set threshold, replacing the effective data in the new column with the corresponding original column data in the target table;
and S4205, repeating the steps S4203-S4204 until all data are sorted.
Further, the metadata information includes definition of table data and data information.
Further, in step S42, the block-level reclamation, the block-by-block processing of the hole data, and the establishment of a new valid data column are specifically performed as follows:
s4211, acquiring the number of columns of the target table, and correspondingly generating a new column for storing effective data for each column of the target table in a memory;
s4212, creating metadata information of a new column according to the corresponding new column attribute in the original column structure memory of the target table;
s4213, comparing the effective data occupation ratio in the data block to be processed with the set reuse rate of the data block, if the effective data occupation ratio is more than or equal to the set reuse rate of the data block, directly placing the data block into a new row of new data blocks as effective data without line-by-line judgment,
if the data block reuse rate is less than the set data block reuse rate, processing the data of the data block line by line, putting the data of the line into a corresponding new column by using valid data, and leaving invalid data in an original column for non-processing;
s4214, when the number of the effective data in the new column reaches a set threshold value, replacing the effective data in the new column with the corresponding original column data in the target table;
and S4215, repeatedly executing the steps S4213-S4214 until all data are sorted.
Compared with the prior art, the method for recycling the storage space has the following beneficial effects:
(1) the method for recycling the storage space solves the problem that the data still occupies the disk space to cause a data hole after the data in the table is deleted in a big data scene, and solves the problems that disk resources are insufficient and the data hole and data expansion can only be solved by dumping the data in the table.
(2) According to the method for recovering the storage space, the original sequence of the data in the database can be ensured by recovering the data according to the row level; according to the block-level recovery, the table space recovery speed is faster. Both the two methods can solve the problem of the holes between data, compress the use space of the disk and enable a user to select the space according to the requirement.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic flow chart of a method for recovering a storage space according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1, a method for recycling a storage space includes the following specific steps:
s1, the client sends recovery task information to the cluster coordinator, the data recovery task information includes target table information and execution modes of a recovery table space, the execution modes of the recovery table space include row-level recovery and BLOCK-level recovery, row-level recovery tank FULL or DC BLOCK-level recovery tank BLOCK _ REUSE _ RATIO is num, the num is a REUSE rate, and the range is (0-100 ];
s2, the cluster coordinator applies for the exclusive lock of the target list to be processed, and avoids the concurrent operation on the same target list and releases the exclusive lock of the list;
s3, the cluster coordinator sends the recovery task information to the coordinator node;
s4, the coordinator node detects whether the target table has data deleting operation and confirms data holes according to the received recovery task information, processes the data with holes in the disk data row by row or block by block according to the columns, generates effective data columns and deletes the invalid data of the original columns;
s5, the coordinator node returns the deletion result to the cluster coordinator;
s6, releasing the exclusive lock of the target list.
And releasing the exclusive lock, allowing the target table to be normally operated, achieving the purpose of recovering the storage space in the process, and solving the problems of data holes and data expansion.
In step S4, the coordinator node detects whether the target table has data deletion operation and determines a data hole according to the received task recovery information, processes the hole data in the target table row by row or block by block according to the columns to form a new data block, and deletes the original data column, which includes the following specific steps:
s41, detecting whether the target table has the operation of deleting the data, and confirming the data hole;
s42, if the execution mode of the recovery tablespace in the recovery task information received by the coordinator node is row-level recovery, performing row-by-row hole data processing on each row of data to be processed in the target table to generate an effective data column, and if the execution mode of the recovery tablespace in the recovery task information received by the coordinator node is block-level recovery, processing the hole data block by block to generate an effective data column;
s43, reconstructing a hash index for each new effective data column;
s44, deleting invalid data and temporary data in steps S41-S43.
In step S41, the hole data is confirmed by that the data block is composed of a fixed number of pieces of data, and each piece of data has a flag in the data block to identify whether the piece of data is valid. The deleted invalid data is marked as 1, and the deletion ratio of the block can be obtained by dividing the number of the data marked as 1 by the total number of the block data.
In step S42, row-level recycle is performed to perform row-by-row hole data processing on each line of data to be processed in the target table, and a new valid data column is established, specifically, the method is as follows,
s4201, acquiring the number of columns of the target table, and correspondingly generating a new column for storing valid data for each column of the target table in the memory;
s4202, creating metadata information of a new column according to the corresponding new column attribute in the original column structure memory of the target table;
s4203, performing data inspection row by row on the original column of each column of the target table, judging whether the original column is valid data, if the original column is valid data, placing the data into a corresponding new column, and leaving invalid data in the original column for non-processing;
s4204, when the number of the effective data in the new column reaches a set threshold, replacing the effective data in the new column with the corresponding original column data in the target table;
and S4205, repeating the steps S4203-S4204 until all data are sorted.
Row level reclamation may guarantee the order of reclamation of data.
The metadata information includes definition of table data, data information.
In step S42, the block-level reclamation is performed to process the hole data block by block and create a new valid data column, and the specific method is as follows:
s4211, acquiring the number of columns of the target table, and correspondingly generating a new column for storing effective data for each column of the target table in the memory;
s4212, creating metadata information of a new column according to the corresponding new column attribute in the original column structure memory of the target table;
s4213, comparing the effective data occupation ratio in the data block to be processed with the set reuse rate of the data block, if the effective data occupation ratio is more than or equal to the set reuse rate of the data block, directly placing the data block into a new row of new data blocks as effective data without line-by-line judgment,
if the data block reuse rate is less than the set data block reuse rate, processing the data of the data block line by line, putting the data of the line into a corresponding new column by using valid data, and leaving invalid data in an original column for non-processing;
s4214, when the number of the effective data in the new column reaches a set threshold value, replacing the effective data in the new column with the corresponding original column data in the target table;
and S4215, repeatedly executing the steps S4213-S4214 until all data are sorted.
Block level reclamation improves the efficiency of data table space reclamation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (4)

1. A method of reclaiming storage space, comprising: the method comprises the following specific steps:
s1, the client sends recovery task information to the cluster coordinator, the data recovery task information comprises target table information and execution modes of a recovery table space, and the execution modes of the recovery table space comprise row-level recovery and block-level recovery;
s2, the cluster coordinator applies for the exclusive lock of the target list to be processed;
s3, the cluster coordinator sends the recovery task information to the coordinator node;
s4, the coordinator node detects whether the target table has data deleting operation and confirms data holes according to the received recovery task information, processes the data with holes in the disk data row by row or block by block according to the columns, generates effective data columns and deletes the invalid data of the original columns; the specific method comprises the following steps:
s41, detecting whether the target table has the operation of deleting the data, and confirming the data hole; the method for confirming the hole data comprises the following steps that a data block is composed of a fixed number of data, and each piece of data has a mark in the data block to identify whether the piece of data is valid or not;
the deleted data hole is marked as 1, and the deletion ratio of the block can be obtained by dividing the number of the data marked as 1 by the total number of the block data;
s42, if the execution mode of the recovery tablespace in the recovery task information received by the coordinator node is row-level recovery, performing row-by-row hole data processing on each row of data to be processed in the target table to generate an effective data column, and if the execution mode of the recovery tablespace in the recovery task information received by the coordinator node is block-level recovery, processing the hole data block by block to generate an effective data column;
s43, reconstructing a hash index for each new effective data column;
s44, deleting invalid data and temporary data in the steps S41-S43;
s5, the coordinator node returns the deletion result to the cluster coordinator;
s6, releasing the exclusive lock of the target list.
2. A method of reclaiming storage space according to claim 1, wherein: in step S42, row-level recycle is performed to perform row-by-row hole data processing on each line of data to be processed in the target table, and a new valid data column is established, specifically, the method is as follows,
s4201, acquiring the number of columns of the target table, and correspondingly generating a new column for storing valid data for each column of the target table in the memory;
s4202, creating metadata information of a new column according to the corresponding new column attribute in the original column structure memory of the target table;
s4203, performing data inspection row by row on the original column of each column of the target table, judging whether the original column is valid data, if the original column is valid data, placing the data into a corresponding new column, and leaving invalid data in the original column for non-processing;
s4204, when the number of the effective data in the new column reaches a set threshold, replacing the effective data in the new column with the corresponding original column data in the target table;
and S4205, repeating the steps S4203-S4204 until all data are sorted.
3. A method of reclaiming storage space according to claim 2, wherein: the metadata information includes definition of table data, data information.
4. A method of reclaiming storage space according to claim 1, wherein: in step S42, block-level reclamation is performed to process the hole data block by block, and a new valid data column is established, which includes the following steps:
s4211, acquiring the number of columns of the target table, and correspondingly generating a new column for storing effective data for each column of the target table in the memory;
s4212, creating metadata information of a new column according to the corresponding new column attribute in the original column structure memory of the target table;
s4213, comparing the effective data occupation ratio in the data block to be processed with the set reuse rate of the data block, if the effective data occupation ratio is more than or equal to the set reuse rate of the data block, directly placing the data block into a new row of new data blocks as effective data without line-by-line judgment,
if the data block is less than the preset reuse rate of the data block, processing the data of the data block line by line, putting the data of the line into a corresponding new column by using valid data, and leaving invalid data in the original column to be unprocessed;
s4214, when the number of the effective data in the new column reaches a set threshold value, replacing the effective data in the new column with the corresponding original column data in the target table;
and S4215, repeatedly executing the steps S4213-S4214 until all data are sorted.
CN202210638814.2A 2022-06-08 2022-06-08 Method for recovering storage space Active CN114706797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210638814.2A CN114706797B (en) 2022-06-08 2022-06-08 Method for recovering storage space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210638814.2A CN114706797B (en) 2022-06-08 2022-06-08 Method for recovering storage space

Publications (2)

Publication Number Publication Date
CN114706797A CN114706797A (en) 2022-07-05
CN114706797B true CN114706797B (en) 2022-09-16

Family

ID=82178118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210638814.2A Active CN114706797B (en) 2022-06-08 2022-06-08 Method for recovering storage space

Country Status (1)

Country Link
CN (1) CN114706797B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503260A (en) * 2016-11-18 2017-03-15 北京奇虎科技有限公司 A kind of method and apparatus of the effective memory space for improving data base
WO2018127116A1 (en) * 2017-01-09 2018-07-12 腾讯科技(深圳)有限公司 Data cleaning method and apparatus, and computer-readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835534B (en) * 2021-02-26 2022-08-02 上海交通大学 Garbage recycling optimization method and device based on storage array data access

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503260A (en) * 2016-11-18 2017-03-15 北京奇虎科技有限公司 A kind of method and apparatus of the effective memory space for improving data base
WO2018127116A1 (en) * 2017-01-09 2018-07-12 腾讯科技(深圳)有限公司 Data cleaning method and apparatus, and computer-readable storage medium

Also Published As

Publication number Publication date
CN114706797A (en) 2022-07-05

Similar Documents

Publication Publication Date Title
US6950834B2 (en) Online database table reorganization
CN110473100B (en) Transaction processing method and device based on blockchain system
JP2002525755A (en) Method and apparatus for reorganizing an active DBMS table
CN110888837B (en) Object storage small file merging method and device
CN101382949B (en) Management method for database table and apparatus
US7818749B2 (en) Data processing method, data processing apparatus, and data processing program
CN107783988A (en) The locking method and equipment of a kind of directory tree
CN1848118A (en) Apparatus and method for a managing file system
CN111857890B (en) Service processing method, system, device and medium
CN106294886A (en) A kind of method and system of full dose extracted data from HBase
CN110727724A (en) Data extraction method and device, computer equipment and storage medium
CN106649146A (en) Memory release method and apparatus
CN110750517B (en) Data processing method, device and equipment of local storage engine system
CN114706797B (en) Method for recovering storage space
CN109819013A (en) A kind of block chain memory capacity optimization method based on cloud storage
US7051051B1 (en) Recovering from failed operations in a database system
CN112711649A (en) Database multi-field matching method, device, equipment and storage medium
CN115951832A (en) Method and system for merging intelligent small files aiming at object storage
CN107291574B (en) Backup data recovery primary key generation method based on interpretation system
US20220083522A1 (en) Data processing method, apparatus, electronic device, and computer storage medium
CN110990394B (en) Method, device and storage medium for counting number of rows of distributed column database table
CN112286873A (en) Hash tree caching method and device
US20060106880A1 (en) Managing free space in file systems
CN114020707B (en) Storage space recovery method, storage medium, and program product
CN112711627B (en) Data importing method, device and equipment of Greemplum database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant