WO2018145227A1 - Managing data records in object-based cloud storage systems - Google Patents

Managing data records in object-based cloud storage systems Download PDF

Info

Publication number
WO2018145227A1
WO2018145227A1 PCT/CN2017/000154 CN2017000154W WO2018145227A1 WO 2018145227 A1 WO2018145227 A1 WO 2018145227A1 CN 2017000154 W CN2017000154 W CN 2017000154W WO 2018145227 A1 WO2018145227 A1 WO 2018145227A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
target
status indicator
storage medium
data record
Prior art date
Application number
PCT/CN2017/000154
Other languages
French (fr)
Inventor
Kuien LIU
Haozhou WANG
Yu Yang
Ming Li
Yandong Yao
Original Assignee
Pivotal Software, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pivotal Software, Inc. filed Critical Pivotal Software, Inc.
Priority to PCT/CN2017/000154 priority Critical patent/WO2018145227A1/en
Publication of WO2018145227A1 publication Critical patent/WO2018145227A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • This disclosure relates to object-based cloud storage systems.
  • a client system e.g., a data warehouse
  • a storage system that manages data as storage objects.
  • Conventional object-based cloud storage systems allow the following operations by a client system on the storage objects in the storage system: downloading a storage object, uploading a storage object, and deleting a storage object. Deleting one or more data records of a storage object without deleting the storage object in such object-based cloud storage systems requires downloading the storage object to the client system, deleting the one or more data records from the downloaded storage object, uploading the storage object with the one or more data records deleted, and deleting the previous copy of the storage object.
  • the storage object can have a large size. Downloading and uploading the storage object may have a high cost in terms of network bandwidth.
  • This specification describes techniques for deleting data records in object-based cloud storage systems.
  • the system modifies a status indicator field of a status indicator file of the data object corresponding to the target data record to indicate that the target data record is deleted.
  • This specification discloses how to delete a target data record of a data object without having to incur the cost of uploading an entire copy of the data object to the storage medium.
  • This specification also describes techniques for vacuuming a non-transitory computer storage medium by merging active data records of data objects that are eligible for vacuum into target merge files.
  • the data objects that are eligible for vacuum are determined based on a vacuum strategy.
  • Vacuuming a storage medium can reduce the amount of data stored on the storage medium.
  • the subject matter described in this specification can be implemented in various embodiments so as to realize one or more of the following advantages.
  • the disclosed techniques improve upon conventional techniques by reducing the cost of deleting data records from storage media.
  • the disclosed operations reduce the total number of data objects on the storage medium and total amount of data stored on storage media.
  • FIG. 1 is a block diagram illustrating an example distributed computer storage system.
  • FIG. 2 illustrates an example process of deleting data records in an object-based cloud storage system.
  • FIG. 3 is a flowchart of an example process of vacuuming a storage medium.
  • FIG. 4 is a flowchart of an example process of processing requests for deleting data records in an object-based cloud storage system.
  • FIG. 1 is a block diagram illustrating an example distributed computer storage system 100.
  • the distributed computer storage system 100 enables deleting data records stored on a cloud-based computer storage medium and includes multiple components, which are described below.
  • Each component of the distributed computer storage system 100 can be implemented on one or more computers each including one or more computer processors.
  • the distributed computer storage system 100 includes a data warehouse 101 and a cloud storage medium 102.
  • the data warehouse 101 can request operations that include retrieval and/or modification of data on the cloud storage medium 102 by communicating with the cloud storage medium 102 through a network protocol, such as a network protocol implemented by a representational state transfer application programming interface (REST API) .
  • REST API representational state transfer application programming interface
  • An example of an operation that the data warehouse 101 can request from the cloud storage medium 102 is an operation that requests deletion of one or more data records in one or more storage objects of the cloud storage medium 102. Deleting data records in storage objects of the cloud storage medium 102 is described in greater detail below with reference to FIG. 2.
  • the data warehouse 101 can include a master node 111 and one or more segment nodes, such as segment node A (14lA) , segment node B (141B) and segment node C (141C) . Both the master node 111 and the segment nodes 141A-C can communicate with the cloud storage medium 102 and request retrieval and/or modification of data stored on the cloud storage medium 102.
  • An interconnect switch 131 enables communications between the master node 111 and the segment nodes 114A-C as well as communications between the individual segment nodes 114A-C.
  • the master node 111 and the individual segment nodes 141A-C can each be computer nodes with separate operating system, processing unit, storage unit, and memory unit components.
  • the master node 111 can receive a query, process the query to identify tasks that require retrieval and/or modification of data on the cloud storage medium 102, divide each task into subtasks, and assign the subtasks to individual segment nodes 141A-C.
  • the individual segment nodes 141A-C can perform the subtasks assigned to them by the master node 111 through communicating with the cloud storage medium 102.
  • the segment nodes 114A-C can perform their subtasks in parallel.
  • a data access module such as the data access modules A (112A) or data access module B (112B) , is a component of the cloud storage medium 102 that enables retrieving and/or modifying the data stored on the cloud storage medium 102.
  • Each data access module 112A-B can access at least a portion of the data stored on the cloud storage medium 102. In the example illustrated in FIG.
  • data access module 112A can access storage object A (122A) , storage object B (122B) , and storage object C (122C) as well as status indicator file A (132A) , status indicator file B (132B) and status indicator file C (132C)
  • data access module B 112B can access storage objects C (122C) and storage object D (122D) as well as status indicator file C (132C) and status indicator file D (132D) .
  • the cloud storage medium 102 stores data in one or more storage objects, such as storage objects 122A-D, where each storage object 122A-D includes one or more data records.
  • a storage object 122A-D is a unit of storage in an object-based storage model. In such an object-based storage model, each storage object 122A-D is typically managed and accessed through a unique identifier, rather than through a hierarchical file system or through division of data to blocks within sectors and tracks.
  • the cloud storage medium 102 also stores one or more status indicator files, such as status indicator files 132A-D.
  • Each status indicator file 132A-D is associated with a corresponding storage object 122A-D and stores information about whether each data record in the storage object 122A-D is deleted.
  • each status indicator file 132A-D associated with a corresponding storage object 122A-D may include a respective status indicator field, e.g., a bit, corresponding to each data record in the storage object 122A-D, respectively, where the value of each status indicator field denotes whether the corresponding data record is deleted.
  • a data record in a storage object may be marked as deleted in the corresponding status indicator file even though the data record is still in the storage object. This enables processing a request to delete a data record in an atomic operation by changing the relevant status indicator field associated with a data record. In contrast, some conventional techniques process a request to delete a data record from a storage object by physically deleting the data record, which may require uploading a modified version of the storage object at a substantial efficiency and bandwidth cost.
  • FIG. 2 illustrates an example process 200 of deleting data records in an object-based cloud storage system.
  • Process 200 can be implemented a system of one or more computers located in one or more locations.
  • a distributed computer storage system e.g., the distributed computer storage system 100 of FIG. 1, appropriately programmed in accordance with this specification can perform process 200.
  • Process will be described with reference to the distributed computer storage system 100 of FIG. 1.
  • a database engine 201 receives an input query that requests deleting one or more data records and communicates (221) that query to the master node 111.
  • Examples of the input query include a query that requests erasing data records, such as a DELETE query in a structured query language (SQL) statement.
  • Other examples of the input query include a query that requests replacing one or more data records with one or more other data records, such as an UPDATE query in SQL, if performing the task required by that query on the cloud storage medium 102 requires deleting the data records being modified and inserting new data records with the modified content.
  • the master node 111 requests (222) the status indicator files 132 for the storage objects 122 whose data records are affected by the input query through communicating with a data access module 112 of the cloud storage medium 102.
  • the data access module 112 fetches (223) the status indicator files 132 for the master node 111.
  • the master node 111 requests each status indicator file 132 from a particular data access module 112 responsible for retrieval of the respective status indicator file 132.
  • the particular data access module fetches the respective status indicator file 132 for the master node 111.
  • the master node 111 divides (224) the task requested by the input query into subtasks based on the status indicator files 132. For instance, the master node 111 may define a subtask that includes deleting all or a portion of the data records of a particular storage object 122.
  • the master node 111 assigns (225) each subtask to a segment node 141.
  • the master node 111 may use a load balancing technique in assigning subtasks to segment nodes 141 to ensure that each individual segment node 141 is assigned a subtask in accordance with its relative capacity at the time of such assignment.
  • the load balancing may serve to enhance the efficiency of processing the input query.
  • a segment node 141 may request (226) the storage objects 122 affected by the subtask through the data access module 112.
  • the data access module 112 fetches (227) the storage objects 122 for the respective segment node.
  • the segment node 141 filters out (228) the data records in the storage objects 122 that the input query requests to delete in response to the input query. This may provide the data warehouse 101 with updated copies of the storage objects 122 that have been modified in accordance with the request to delete in the input query.
  • a segment node 141 performing a subtask that updates (229) the status indicator file 132 of the storage objects 122 affected by the request to delete in the input query.
  • the segment node 141 can receive the status indicator files 132 for the affected storage objects 122 from the master node 111 through an interconnect switch 131 (of FIG. 1) .
  • the segment node 141 can then modify each status indicator file 132 by designating, in the status indicator file 132, the data records affected by the request to delete in the input query as deleted, for instance by setting a status indicator field associated with each affected data record, e.g., a corresponding bit, to a value, e.g., zero, denoting that the respective data record is deleted.
  • the segment node 141 can upload the modified status indicator file 132 to the cloud storage medium 102 through communicating with the data access modules 112. At this stage, the segment node 141 does not need to upload the modified storage object 122.
  • a segment node 141 After performing a subtask, a segment node 141 returns (230) the results of the subtask to the database engine 201.
  • the results may include an indication of the successful performance of the subtask. For example, if the subtask is to delete a data record from a storage object 122, the results may include an indication that the data record is successfully deleted, even when the data record remains in the storage object 122.
  • FIG. 3 is a flowchart of an example process 300 of vacuuming a storage medium.
  • the process 300 enables vacuuming a storage medium (e.g., the cloud storage medium 102 of FIG. 1) by merging one or more storage objects stored on the storage medium based on the data records in each storage object that are marked as deleted. Accordingly, vacuuming reduces the total number of storage objects stored on the storage medium.
  • the process 300 will be described as being performed by a system of one or more computers located in one or more locations.
  • a distributed computer storage system e.g., the distributed computer storage system 100 of FIG. 1, appropriately programmed in accordance with this specification, can perform the process 300.
  • the system obtains (302) a vacuum aggressiveness strategy.
  • a vacuum aggressiveness strategy indicates which storage objects on the storage medium are eligible for vacuum (i.e., will be merged to vacuum the storage medium) .
  • Examples of vacuum aggressiveness strategies include full, active, and lazy strategies explained below.
  • the system determines (304) the storage objects that are eligible for vacuum based on the vacuum aggressiveness strategy.
  • the aggressiveness can have multiple levels.
  • a dirty storage object is a storage object whose corresponding status indicator file has at least one status indicator field that denotes that a corresponding data record is deleted.
  • a deletion ratio associated with a storage object is a ratio of the status indicator fields denoting a deleted data record to the total number of status indicator fields in a status indicator file associated with the storage object.
  • a weighted deletion ratio for a storage object is a measure of multiplication of the deletion ration for the storage object and a measure of the size, i.e., a size indicator, of the storage object based on a count of the one or more data records in the respective storage object.
  • the system vacuums (306) the computer storage medium.
  • Vacuuming the computer storage medium includes merging the one or more storage objects eligible for vacuum based on the vacuum aggressiveness strategy.
  • the system creates a merge target storage object for each group of one or more storage objects eligible for vacuum. For each group, the system merges the data records in the storage objects of each group that are not marked as deleted according to the corresponding status indicator files in the merge target storage object associated with the group. In some implementations, the merging of storage objects of each group in a corresponding merge target storage object is performed in parallel with other such merging to obtain efficiency gains from parallelization.
  • the storage objects eligible for vacuum and their corresponding status indicator files are deleted.
  • FIG. 4 is an example process 400 of processing requests for deleting data records in an object-based cloud storage system.
  • the process 400 can be performed by a system of one or more server computers configured to provide a cloud-based data storage service.
  • the process 400 will be described as being performed by a system of one or more computers located in one or more locations.
  • a distributed computer storage system e.g., the distributed computer storage system 100 of FIG. 1, appropriately programmed in accordance with this specification, can perform the process 400.
  • the system receives (402) a request to delete a target data record from a non-transitory storage medium.
  • the non-transitory computer storage medium stores the following.
  • the non-transitory computer storage medium stores one or more storage objects.
  • Each of the one or more storage objects is a data object.
  • Each of the one or more storage objects conforms to an object-based cloud storage model.
  • Each of the one or more storage objects includes one or more data records.
  • the non-transitory computer storage medium stores a status indicator file for each storage object.
  • Each status indicator file for a storage object includes a status indicator field for each data record in the storage object.
  • each status indicator file has a bitmap format.
  • each status indicator field is a bit. The value of the bit denotes whether a data record is deleted.
  • the object-based cloud storage model prevents direct modification of content in each of the one or more storage objects stored on the non-transitory computer storage medium.
  • the system identifies (404) a target storage object of the one or more storage objects that includes the target data record.
  • the system identifies (406) a target status indicator file associated with the target storage object.
  • the system modifies (408) a status indicator field of the target status indicator file that is associated with the target data record while the system maintains the target data record in the target storage object.
  • the system downloads the target status indicator file from the non-transitory computer storage medium to the one or more server computers.
  • the system modifies the downloaded target status indicator file by changing the status indicator field associated with the target data record in the downloaded target status indicator field from a first value to a second value.
  • the first value denotes that the target data record is not deleted while the second value denotes that the target data record is deleted.
  • the system then uploads the modified target status indicator file from the one or more server computers to the non-transitory computer storage medium
  • the system receives (410) a request to access the one or more data records in the target storage object.
  • the system prevents (412) the target data record from being accessed in accordance with the modified status indicator field associated with the target data record.
  • the system prevents the target storage object from being downloaded from the non-transitory computer storage medium to the one or more server computers.
  • the system also prevents the target storage object from being uploaded from the one or more server computers to the non-transitory computer storage medium in response to the request to delete.
  • the system downloads the target storage object and the target status indicator file from the non-transitory computer storage medium to the one or more server computers. For each requested data record in the target storage object, the system determines whether the corresponding status indicator field denotes that that the data record is deleted. The system prevents the data record from being accessed ifthe corresponding status indicator field denotes that that the data record is deleted.
  • Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus.
  • the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
  • the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • data processing apparatus refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can also be or further include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit) .
  • the apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a computer program which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program may, but need not, correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code.
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit) .
  • Computers suitable for the execution of a computer program include, by way of example, general or special purpose microprocessors or both, or any other kind of central processing unit.
  • a central processing unit will receive instructions and data from a read-only memory or a random access memory or both.
  • the essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA) , a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
  • PDA personal digital assistant
  • GPS Global Positioning System
  • USB universal serial bus
  • Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices
  • magnetic disks e.g., internal hard disks or removable disks
  • magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
  • a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN) , e.g., the Internet.
  • LAN local area network
  • WAN wide area network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
  • a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the user device, which acts as a client.
  • Data generated at the user device e.g., a result of the user interaction, can be received from the user device at the server.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Systems, methods, and computer program products for deleting data records in object-based cloud storage systems. Upon receiving a request to delete a target data record of a data object from a non-transitory computer storage medium, a system modifies a status indicator field of a status indicator file of the data object corresponding to the target data record to indicate that the target data record is deleted. In conventional techniques, deleting a target data record of a data object from a non-transitory computer storage medium requires uploading a modified copy of the data object to the storage medium. This specification discloses how to delete a target data record of a data object without having to incur the cost of uploading an entire copy of the data object to the storage medium.

Description

MANAGING DATA RECORDS IN OBJECT-BASED CLOUD STORAGE SYSTEMS
This disclosure relates to object-based cloud storage systems.
In object-based cloud storage systems, a client system, e.g., a data warehouse, interacts with a storage system that manages data as storage objects. Conventional object-based cloud storage systems allow the following operations by a client system on the storage objects in the storage system: downloading a storage object, uploading a storage object, and deleting a storage object. Deleting one or more data records of a storage object without deleting the storage object in such object-based cloud storage systems requires downloading the storage object to the client system, deleting the one or more data records from the downloaded storage object, uploading the storage object with the one or more data records deleted, and deleting the previous copy of the storage object. The storage object can have a large size. Downloading and uploading the storage object may have a high cost in terms of network bandwidth.
SUMMARY
This specification describes techniques for deleting data records in object-based cloud storage systems. Upon receiving a request to delete a target data record of a data object from a non-transitory computer storage medium of an object-based cloud storage system, the system modifies a status indicator field of a status indicator file of the data object corresponding to the target data record to indicate that the target data record is deleted. This specification discloses how to delete a target data record of a data object without having to incur the cost of uploading an entire copy of the data object to the storage medium.
This specification also describes techniques for vacuuming a non-transitory computer storage medium by merging active data records of data objects that are eligible for vacuum into target merge files. The data objects that are eligible for vacuum are determined based on a vacuum strategy. Vacuuming a storage medium can reduce the amount of data stored on the storage medium.
The subject matter described in this specification can be implemented in various embodiments so as to realize one or more of the following advantages. The disclosed techniques improve upon conventional techniques by reducing the cost of deleting data records from storage media. The disclosed operations reduce the total  number of data objects on the storage medium and total amount of data stored on storage media.
The details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating an example distributed computer storage system.
FIG. 2 illustrates an example process of deleting data records in an object-based cloud storage system.
FIG. 3 is a flowchart of an example process of vacuuming a storage medium.
FIG. 4 is a flowchart of an example process of processing requests for deleting data records in an object-based cloud storage system.
Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
FIG. 1 is a block diagram illustrating an example distributed computer storage system 100. The distributed computer storage system 100 enables deleting data records stored on a cloud-based computer storage medium and includes multiple components, which are described below. Each component of the distributed computer storage system 100 can be implemented on one or more computers each including one or more computer processors.
The distributed computer storage system 100 includes a data warehouse 101 and a cloud storage medium 102. The data warehouse 101 can request operations that include retrieval and/or modification of data on the cloud storage medium 102 by communicating with the cloud storage medium 102 through a network protocol, such as a network protocol implemented by a representational state transfer application programming interface (REST API) .
An example of an operation that the data warehouse 101 can request from the cloud storage medium 102 is an operation that requests deletion of one or more data  records in one or more storage objects of the cloud storage medium 102. Deleting data records in storage objects of the cloud storage medium 102 is described in greater detail below with reference to FIG. 2.
The data warehouse 101 can include a master node 111 and one or more segment nodes, such as segment node A (14lA) , segment node B (141B) and segment node C (141C) . Both the master node 111 and the segment nodes 141A-C can communicate with the cloud storage medium 102 and request retrieval and/or modification of data stored on the cloud storage medium 102. An interconnect switch 131 enables communications between the master node 111 and the segment nodes 114A-C as well as communications between the individual segment nodes 114A-C. The master node 111 and the individual segment nodes 141A-C can each be computer nodes with separate operating system, processing unit, storage unit, and memory unit components.
The master node 111 can receive a query, process the query to identify tasks that require retrieval and/or modification of data on the cloud storage medium 102, divide each task into subtasks, and assign the subtasks to individual segment nodes 141A-C. The individual segment nodes 141A-C can perform the subtasks assigned to them by the master node 111 through communicating with the cloud storage medium 102. In some implementations, the segment nodes 114A-C can perform their subtasks in parallel.
The master node 111 and the segment nodes 141A-C communicate with the cloud storage medium 102 through a data access module of the cloud storage medium 102. A data access module, such as the data access modules A (112A) or data access module B (112B) , is a component of the cloud storage medium 102 that enables retrieving and/or modifying the data stored on the cloud storage medium 102. Each data access module 112A-B can access at least a portion of the data stored on the cloud storage medium 102. In the example illustrated in FIG. 1, data access module 112A can access storage object A (122A) , storage object B (122B) , and storage object C (122C) as well as status indicator file A (132A) , status indicator file B (132B) and status indicator file C (132C) , while data access module B 112B can access storage objects C (122C) and storage object D (122D) as well as status indicator file C (132C) and status indicator file D (132D) .
The cloud storage medium 102 stores data in one or more storage objects, such as storage objects 122A-D, where each storage object 122A-D includes one or more data records. A storage object 122A-D is a unit of storage in an object-based storage model. In such an object-based storage model, each storage object 122A-D is typically  managed and accessed through a unique identifier, rather than through a hierarchical file system or through division of data to blocks within sectors and tracks.
The cloud storage medium 102 also stores one or more status indicator files, such as status indicator files 132A-D. Each status indicator file 132A-D is associated with a corresponding storage object 122A-D and stores information about whether each data record in the storage object 122A-D is deleted. For instance, each status indicator file 132A-D associated with a corresponding storage object 122A-D may include a respective status indicator field, e.g., a bit, corresponding to each data record in the storage object 122A-D, respectively, where the value of each status indicator field denotes whether the corresponding data record is deleted.
A data record in a storage object may be marked as deleted in the corresponding status indicator file even though the data record is still in the storage object. This enables processing a request to delete a data record in an atomic operation by changing the relevant status indicator field associated with a data record. In contrast, some conventional techniques process a request to delete a data record from a storage object by physically deleting the data record, which may require uploading a modified version of the storage object at a substantial efficiency and bandwidth cost.
FIG. 2 illustrates an example process 200 of deleting data records in an object-based cloud storage system. Process 200 can be implemented a system of one or more computers located in one or more locations. For example, a distributed computer storage system, e.g., the distributed computer storage system 100 of FIG. 1, appropriately programmed in accordance with this specification can perform process 200. Process will be described with reference to the distributed computer storage system 100 of FIG. 1.
database engine 201 receives an input query that requests deleting one or more data records and communicates (221) that query to the master node 111. Examples of the input query include a query that requests erasing data records, such as a DELETE query in a structured query language (SQL) statement. Other examples of the input query include a query that requests replacing one or more data records with one or more other data records, such as an UPDATE query in SQL, if performing the task required by that query on the cloud storage medium 102 requires deleting the data records being modified and inserting new data records with the modified content.
The master node 111 requests (222) the status indicator files 132 for the storage objects 122 whose data records are affected by the input query through communicating with a data access module 112 of the cloud storage medium 102. In  response to the request, the data access module 112 fetches (223) the status indicator files 132 for the master node 111.
In some implementations, the master node 111 requests each status indicator file 132 from a particular data access module 112 responsible for retrieval of the respective status indicator file 132. In response, the particular data access module fetches the respective status indicator file 132 for the master node 111.
The master node 111 divides (224) the task requested by the input query into subtasks based on the status indicator files 132. For instance, the master node 111 may define a subtask that includes deleting all or a portion of the data records of a particular storage object 122.
The master node 111 assigns (225) each subtask to a segment node 141. In some implementations, the master node 111 may use a load balancing technique in assigning subtasks to segment nodes 141 to ensure that each individual segment node 141 is assigned a subtask in accordance with its relative capacity at the time of such assignment. The load balancing may serve to enhance the efficiency of processing the input query.
To perform a subtask assigned to it, a segment node 141 may request (226) the storage objects 122 affected by the subtask through the data access module 112. The data access module 112 fetches (227) the storage objects 122 for the respective segment node. The segment node 141 then filters out (228) the data records in the storage objects 122 that the input query requests to delete in response to the input query. This may provide the data warehouse 101 with updated copies of the storage objects 122 that have been modified in accordance with the request to delete in the input query.
segment node 141 performing a subtask that updates (229) the status indicator file 132 of the storage objects 122 affected by the request to delete in the input query. The segment node 141 can receive the status indicator files 132 for the affected storage objects 122 from the master node 111 through an interconnect switch 131 (of FIG. 1) . The segment node 141 can then modify each status indicator file 132 by designating, in the status indicator file 132, the data records affected by the request to delete in the input query as deleted, for instance by setting a status indicator field associated with each affected data record, e.g., a corresponding bit, to a value, e.g., zero, denoting that the respective data record is deleted. After modifying the status indicator files 132, the segment node 141 can upload the modified status indicator file 132 to the  cloud storage medium 102 through communicating with the data access modules 112. At this stage, the segment node 141 does not need to upload the modified storage object 122.
After performing a subtask, a segment node 141 returns (230) the results of the subtask to the database engine 201. The results may include an indication of the successful performance of the subtask. For example, if the subtask is to delete a data record from a storage object 122, the results may include an indication that the data record is successfully deleted, even when the data record remains in the storage object 122.
FIG. 3 is a flowchart of an example process 300 of vacuuming a storage medium. The process 300 enables vacuuming a storage medium (e.g., the cloud storage medium 102 of FIG. 1) by merging one or more storage objects stored on the storage medium based on the data records in each storage object that are marked as deleted. Accordingly, vacuuming reduces the total number of storage objects stored on the storage medium. For convenience, the process 300 will be described as being performed by a system of one or more computers located in one or more locations. For example, a distributed computer storage system, e.g., the distributed computer storage system 100 of FIG. 1, appropriately programmed in accordance with this specification, can perform the process 300.
The system obtains (302) a vacuum aggressiveness strategy. A vacuum aggressiveness strategy indicates which storage objects on the storage medium are eligible for vacuum (i.e., will be merged to vacuum the storage medium) . Examples of vacuum aggressiveness strategies include full, active, and lazy strategies explained below.
The system determines (304) the storage objects that are eligible for vacuum based on the vacuum aggressiveness strategy. The aggressiveness can have multiple levels.
For example, under a most aggressive vacuum aggressiveness strategy designated as a “full” vacuum aggressiveness strategy, any “dirty” storage object are eligible for vacuum. A dirty storage object is a storage object whose corresponding status indicator file has at least one status indicator field that denotes that a corresponding data record is deleted.
In an another example, under a less aggressive vacuum aggressiveness strategy designated as an “active” vacuum aggressiveness strategy, any storage object whose “deletion ratio” exceeds a threshold deletion ratio is eligible for vacuum. A deletion ratio associated with a storage object is a ratio of the status indicator fields  denoting a deleted data record to the total number of status indicator fields in a status indicator file associated with the storage object.
In yet another example, under a least aggressive vacuum aggressiveness strategy designated as a “lazy” vacuum aggressiveness strategy, storage objects are sorted in accordance with their weighted deletion ratios and the top K storage objects having the highest deletion ratios in the sorted list are eligible for vacuum, where K is a predetermined natural number. A weighted deletion ratio for a storage object is a measure of multiplication of the deletion ration for the storage object and a measure of the size, i.e., a size indicator, of the storage object based on a count of the one or more data records in the respective storage object.
The system vacuums (306) the computer storage medium. Vacuuming the computer storage medium includes merging the one or more storage objects eligible for vacuum based on the vacuum aggressiveness strategy.
In some implementations, the system creates a merge target storage object for each group of one or more storage objects eligible for vacuum. For each group, the system merges the data records in the storage objects of each group that are not marked as deleted according to the corresponding status indicator files in the merge target storage object associated with the group. In some implementations, the merging of storage objects of each group in a corresponding merge target storage object is performed in parallel with other such merging to obtain efficiency gains from parallelization.
In some implementations, after performing the merge of storage objects in merge target storage objects, the storage objects eligible for vacuum and their corresponding status indicator files are deleted.
FIG. 4 is an example process 400 of processing requests for deleting data records in an object-based cloud storage system. The process 400 can be performed by a system of one or more server computers configured to provide a cloud-based data storage service. For convenience, the process 400 will be described as being performed by a system of one or more computers located in one or more locations. For example, a distributed computer storage system, e.g., the distributed computer storage system 100 of FIG. 1, appropriately programmed in accordance with this specification, can perform the process 400.
The system receives (402) a request to delete a target data record from a non-transitory storage medium. The non-transitory computer storage medium stores the following. The non-transitory computer storage medium stores one or more storage  objects. Each of the one or more storage objects is a data object. Each of the one or more storage objects conforms to an object-based cloud storage model. Each of the one or more storage objects includes one or more data records. The non-transitory computer storage medium stores a status indicator file for each storage object. Each status indicator file for a storage object includes a status indicator field for each data record in the storage object.
In some implementations, the target data record is a row in a database table. In some implementations, each status indicator file has a bitmap format. In those implementations, each status indicator field is a bit. The value of the bit denotes whether a data record is deleted.
In some implementations, the object-based cloud storage model prevents direct modification of content in each of the one or more storage objects stored on the non-transitory computer storage medium.
The system identifies (404) a target storage object of the one or more storage objects that includes the target data record. The system identifies (406) a target status indicator file associated with the target storage object.
The system modifies (408) a status indicator field of the target status indicator file that is associated with the target data record while the system maintains the target data record in the target storage object.
In some implementations, the system downloads the target status indicator file from the non-transitory computer storage medium to the one or more server computers. The system then modifies the downloaded target status indicator file by changing the status indicator field associated with the target data record in the downloaded target status indicator field from a first value to a second value. The first value denotes that the target data record is not deleted while the second value denotes that the target data record is deleted. The system then uploads the modified target status indicator file from the one or more server computers to the non-transitory computer storage medium
The system receives (410) a request to access the one or more data records in the target storage object. The system prevents (412) the target data record from being accessed in accordance with the modified status indicator field associated with the target data record.
In some implementations, the system prevents the target storage object from being downloaded from the non-transitory computer storage medium to the one or  more server computers. The system also prevents the target storage object from being uploaded from the one or more server computers to the non-transitory computer storage medium in response to the request to delete.
In some implementations, the system downloads the target storage object and the target status indicator file from the non-transitory computer storage medium to the one or more server computers. For each requested data record in the target storage object, the system determines whether the corresponding status indicator field denotes that that the data record is deleted. The system prevents the data record from being accessed ifthe corresponding status indicator field denotes that that the data record is deleted.
Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively, or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can also be or further include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit) . The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program, which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit) .
Computers suitable for the execution of a computer program include, by way of example, general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA) , a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory  devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s device in response to requests received from the web browser.
Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN) , e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with  the user device, which acts as a client. Data generated at the user device, e.g., a result of the user interaction, can be received from the user device at the server.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.
What is claimed is:

Claims (20)

  1. A computer-implemented method comprising:
    receiving a request to delete a target data record from a non-transitory computer storage medium, wherein:
    the non-transitory computer storage medium stores one or more storage objects, each of the one or more storage objects being a data object conforming to an object-based cloud storage model and including one or more data records,
    the non-transitory computer storage medium further stores, for each of the one or more storage objects, a corresponding status indicator file, and
    each corresponding status indicator file includes, for each of the one or more data records in the respective storage object, a respective status indicator field;
    identifying a target storage object of the one or more storage objects that includes the target data record;
    identifying a corresponding target status indicator file associated with the target storage object;
    modifying a corresponding status indicator field of the corresponding target status indicator file associated with the target data record while maintaining the target data record in the target storage object;
    receiving a request to access the one or more data records in the target storage object; and
    preventing the target data record from being accessed in accordance with the modified status indicator field associated with the target data record, wherein the method is performed by one or more server computers configured to provide a cloud-based data storage service.
  2. The computer-implemented method of claim 1, wherein the object-based cloud storage model prevents direct modification of content in each of the one or more storage objects stored on the non-transitory computer storage medium.
  3. The computer-implemented method of claim 1, further comprising:
    preventing the target storage object from being downloaded from the non-transitory computer storage medium to the one or more server computers and being uploaded from the one or more server computers to the non-transitory computer storage medium in response to the request to delete.
  4. The computer-implemented method of claim 1, wherein the target data record is a row in a database table.
  5. The computer-implemented method of claim 1, wherein each status indicator file has a bitmap format.
  6. The computer-implemented method of claim 1, wherein modifying the corresponding status indicator field of the respective target status indicator file associated with the target data record comprises:
    downloading the target status indicator file from the non-transitory computer storage medium to the one or more server computers;
    modifying the downloaded target status indicator file by changing the corresponding status indicator field associated with the target data record in the downloaded target status indicator file from a first value to a second value, wherein the first value denotes that the corresponding status indicator field is not deleted and the second value denotes that the corresponding status indicator field is deleted; and
    uploading the modified target status indicator file from the one or more server computers to the non-transitory computer storage medium.
  7. The computer-implemented method of claim 1, wherein preventing the target data record from being accessed comprises, in response to receiving the request to access the one or more data records in the target storage object:
    downloading the target storage object and the target status indicator file from the non-transitory computer storage medium to the one or more server computers;
    for each requested data record in the target storage object, determining whether the corresponding status indicator field denotes that that the data record is deleted; and
    preventing the data record from being accessed ifthe corresponding status indicator field denotes that that the data record is deleted.
  8. The computer-implemented method of claim 1, further comprising:
    obtaining a vacuum aggressiveness strategy, wherein vacuuming the non-transitory computer storage medium includes merging active data records of one or more eligible storage objects into one or more merge target storage objects;
    determining, from the one or more storage objects stored on the non-transitory computer storage medium, one or more storage objects that are eligible for vacuum based  on the vacuum aggressiveness strategy; and
    vacuuming the non-transitory computer storage medium based on the vacuum aggressiveness strategy including merging the one or more storage objects that are eligible for vacuum.
  9. The computer-implemented method of claim 8, wherein the vacuum aggressiveness strategy is a full strategy, wherein the full strategy instructs that any of the one or more storage objects whose corresponding status indicator file has at least one status indicator field that denotes that a corresponding data record is deleted is eligible for vacuum.
  10. The computer-implemented method of claim 8, wherein:
    the vacuum aggressiveness strategy is an active strategy, wherein the active strategy instructs that a storage object of the one or more storage objects whose corresponding deletion ratio exceeds a threshold deletion ratio is eligible for vacuum;
    the method further comprises:
    generating, for each storage object stored on the non-transitory computer storage medium, the corresponding deletion ratio based on each status indicator field associated with each corresponding data record in the corresponding status indicator file associated with the respective storage object, and
    identifying the threshold deletion ratio; and
    determining the one or more storage objects that are eligible for vacuum comprises determining that any of the one or more storage objects stored on the non-transitory computer storage medium whose respective deletion ratio exceeds the threshold deletion ratio are eligible for vacuum.
  11. The computer-implemented method of claim 8, wherein:
    the vacuum aggressiveness strategy is a lazy strategy, wherein the lazy strategy instructs that a predetermined number of the one or more storage objects having a highest corresponding deletion ratio among a list of the one or more storage objects sorted by corresponding weighted deletion ratios are eligible for vacuum;
    the method further comprises, for each storage object stored on the non-transitory computer storage medium,
    generating the respective deletion ratio based on each status indicator field associated with each corresponding data record in the corresponding status indicator file  associated with the respective storage object,
    generating a respective size indicator based on a count of the one or more data records in the respective storage object, and
    generating a respective weighted deletion ratio based on the respective deletion ratio associated with the respective storage object and the respective size indicator associated with the respective storage object; and
    determining the one or more storage objects that are eligible for vacuum is performed based on each weighted deletion ratio associated with each storage object.
  12. The computer-implemented method of claim 11, comprising:
    ranking the one or more storage objects stored on the non-transitory computer storage medium based on each respective weighted deletion ratio associated with each storage object.
  13. The computer-implemented method of claim 8, wherein merging the one or more storage objects that are eligible for vacuum comprises:
    creating the one or more merge target storage objects, each of the one or more merge target storage objects being associated with a corresponding group of one or more storage objects that are eligible for vacuum;
    for each storage object of each group of one or more storage object that are eligible for vacuum, determining one or more active data records in the respective storage object that are not deleted; and
    merging each of the one or more active data records in each storage object of each group of one or more storage objects that are eligible for vacuum in the respective merge target storage object for the group.
  14. The computer-implemented method of claim 13, wherein each merging is performed in parallel with other merging operations.
  15. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
    receiving a request to delete a target data record from a non-transitory computer storage medium, wherein:
    the non-transitory computer storage medium stores one or more storage  objects, each of the one or more storage objects being a data object conforming to an object-based cloud storage model and including one or more data records,
    the non-transitory computer storage medium further stores, for each of the one or more storage objects, a corresponding status indicator file, and
    each corresponding status indicator file includes, for each of the one or more data records in the respective storage object, a respective status indicator field;
    identifying a target storage object of the one or more storage objects that includes the target data record;
    identifying a corresponding target status indicator file associated with the target storage object;
    modifying a corresponding status indicator field of the corresponding target status indicator file associated with the target data record while maintaining the target data record in the target storage object;
    receiving a request to access the one or more data records in the target storage object; and
    preventing the target data record from being accessed in accordance with the modified status indicator field associated with the target data record.
  16. The system of claim 15, the operations further comprising:
    preventing the target storage object from being downloaded from the non-transitory computer storage medium to the one or more server computers and being uploaded from the one or more server computers to the non-transitory computer storage medium in response to the request to delete.
  17. The system of claim 15, wherein modifying the corresponding status indicator field of the respective target status indicator file associated with the target data record comprises:
    downloading the target status indicator file from the non-transitory computer storage medium to the one or more server computers;
    modifying the downloaded target status indicator file by changing the corresponding status indicator field associated with the target data record in the downloaded target status indicator file from a first value to a second value, wherein the first value denotes that the corresponding status indicator field is not deleted and the second value denotes that the corresponding status indicator field is deleted; and
    uploading the modified target status indicator file from the one or more server computers to the non-transitory computer storage medium.
  18. The system of claim 15, the operations further comprising:
    obtaining a vacuum aggressiveness strategy, wherein vacuuming the non-transitory computer storage medium includes merging active data records of one or more eligible storage objects into one or more merge target storage objects;
    determining, from the one or more storage objects stored on the non-transitory computer storage medium, one or more storage objects that are eligible for vacuum based on the vacuum aggressiveness strategy; and
    vacuuming the non-transitory computer storage medium based on the vacuum aggressiveness strategy including merging the one or more storage objects that are eligible for vacuum.
  19. A non-transitory computer storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
    receiving a request to delete a target data record from a non-transitory computer storage medium, wherein:
    the non-transitory computer storage medium stores one or more storage objects, each of the one or more storage objects being a data object conforming to an object-based cloud storage model and including one or more data records,
    the computer storage medium further stores, for each of the one or more storage objects, a corresponding status indicator file, and
    each corresponding status indicator file includes, for each of the one or more data records in the respective storage object, a respective status indicator field;
    identifying a target storage object of the one or more storage objects that includes the target data record;
    identifying a corresponding target status indicator file associated with the target storage object;
    modifying a corresponding status indicator field of the corresponding target status indicator file associated with the target data record while maintaining the target data record in the target storage object;
    receiving a request to access the one or more data records in the target storage  object; and
    preventing the target data record from being accessed in accordance with the modified status indicator field associated with the target data record.
  20. The computer storage medium of claim 19, the operations further comprising:
    obtaining a vacuum aggressiveness strategy, wherein vacuuming the non-transitory computer storage medium includes merging active data records of one or more eligible storage objects into one or more merge target storage objects;
    determining, from the one or more storage objects stored on the non-transitory computer storage medium, one or more storage objects that are eligible for vacuum based on the vacuum aggressiveness strategy; and
    vacuuming the non-transitory computer storage medium based on the vacuum aggressiveness strategy including merging the one or more storage objects that are eligible for vacuum.
PCT/CN2017/000154 2017-02-13 2017-02-13 Managing data records in object-based cloud storage systems WO2018145227A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/000154 WO2018145227A1 (en) 2017-02-13 2017-02-13 Managing data records in object-based cloud storage systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/000154 WO2018145227A1 (en) 2017-02-13 2017-02-13 Managing data records in object-based cloud storage systems

Publications (1)

Publication Number Publication Date
WO2018145227A1 true WO2018145227A1 (en) 2018-08-16

Family

ID=63106910

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/000154 WO2018145227A1 (en) 2017-02-13 2017-02-13 Managing data records in object-based cloud storage systems

Country Status (1)

Country Link
WO (1) WO2018145227A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966867A (en) * 2020-08-18 2020-11-20 北京金山云网络技术有限公司 Object deleting method, data processing method and device
US11848764B1 (en) 2023-01-20 2023-12-19 International Business Machines Corporation Ordered object processing in cloud object storage

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097275A1 (en) * 2011-10-14 2013-04-18 Verizon Patent And Licensing Inc. Cloud-based storage deprovisioning
CN103491124A (en) * 2012-06-14 2014-01-01 中兴通讯股份有限公司 Method for processing multimedia message data and distributed cache system
CN103747103A (en) * 2014-01-24 2014-04-23 沈文策 Data processing method and device based on cloud storage system
US20140149794A1 (en) * 2011-12-07 2014-05-29 Sachin Shetty System and method of implementing an object storage infrastructure for cloud-based services
CN104184812A (en) * 2014-08-20 2014-12-03 四川九成信息技术有限公司 Multi-point data transmission method based on private cloud

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097275A1 (en) * 2011-10-14 2013-04-18 Verizon Patent And Licensing Inc. Cloud-based storage deprovisioning
US20140149794A1 (en) * 2011-12-07 2014-05-29 Sachin Shetty System and method of implementing an object storage infrastructure for cloud-based services
CN103491124A (en) * 2012-06-14 2014-01-01 中兴通讯股份有限公司 Method for processing multimedia message data and distributed cache system
CN103747103A (en) * 2014-01-24 2014-04-23 沈文策 Data processing method and device based on cloud storage system
CN104184812A (en) * 2014-08-20 2014-12-03 四川九成信息技术有限公司 Multi-point data transmission method based on private cloud

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966867A (en) * 2020-08-18 2020-11-20 北京金山云网络技术有限公司 Object deleting method, data processing method and device
US11848764B1 (en) 2023-01-20 2023-12-19 International Business Machines Corporation Ordered object processing in cloud object storage

Similar Documents

Publication Publication Date Title
US11556501B2 (en) Determining differences between two versions of a file directory tree structure
US10204133B2 (en) Optimizing update operations in in-memory database systems
AU2016405587B2 (en) Splitting and moving ranges in a distributed system
US10671606B2 (en) Materialized query tables with shared data
US8214388B2 (en) System and method for adding a storage server in a distributed column chunk data store
US20190121901A1 (en) Database Sharding
US20140304242A1 (en) Storage system for eliminating duplicated data
US10310748B2 (en) Determining data locality in a distributed system using aggregation of locality summaries
US11580148B2 (en) Document storage and management
WO2018145227A1 (en) Managing data records in object-based cloud storage systems
CN111475279B (en) System and method for intelligent data load balancing for backup
CN113127438B (en) Method, apparatus, server and medium for storing data
US11455309B2 (en) Partition key adjustment based on query workload
US20210248162A1 (en) Parallel data transfer from one database to another database
US11394780B2 (en) System and method for facilitating deduplication of operations to be performed
US20240126786A1 (en) Partitioning data in a versioned database
US11816088B2 (en) Method and system for managing cross data source data access requests
US11907195B2 (en) Relationship analysis using vector representations of database tables
US20240004837A1 (en) Deleting data in a versioned database
US11748002B2 (en) Highly concurrent data store allocation
US11573960B2 (en) Application-based query transformations
US20240303249A1 (en) Automatic data federation/replication toggle
US20160154796A1 (en) Method, system and apparatus for dynamically controlling web-based media galleries

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17895972

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17895972

Country of ref document: EP

Kind code of ref document: A1