US20200387511A1 - Architecture of hybrid in-memory and paged dictionary - Google Patents

Architecture of hybrid in-memory and paged dictionary Download PDF

Info

Publication number
US20200387511A1
US20200387511A1 US15/931,063 US202015931063A US2020387511A1 US 20200387511 A1 US20200387511 A1 US 20200387511A1 US 202015931063 A US202015931063 A US 202015931063A US 2020387511 A1 US2020387511 A1 US 2020387511A1
Authority
US
United States
Prior art keywords
value
dictionary
page
different
computing devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/931,063
Inventor
Reza Sherkat
Colin FLORENDO
Chaitanya Gottipati
Bernhard Scheirle
Carsten Thiel
Prasanta Ghosh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAP SE
Original Assignee
SAP SE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SAP SE filed Critical SAP SE
Priority to US15/931,063 priority Critical patent/US20200387511A1/en
Assigned to SAP SE reassignment SAP SE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GHOSH, PRASANTA, GOTTIPATI, CHAITANYA, FLORENDO, COLIN, SHERKAT, REZA, THIEL, CARSTEN, SCHEIRLE, BERNHARD
Publication of US20200387511A1 publication Critical patent/US20200387511A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • G06F2212/1044Space efficiency improvement

Definitions

  • Hybrid in-memory and paged storage configurations allow storage of data in dynamic storage devices as well as persistent storage.
  • providing on-demand access to such data requires loading the data stored in persistent storage to in-memory storage. This can leave a large memory footprint and can be burdensome on a computing system.
  • database columns can store compressed values rather than the entire value of data.
  • a dictionary corresponding to a database column can store the entire value.
  • the compressed value can act as a value identifier in the dictionary.
  • Conventional systems operated to load the entire dictionary in memory when attempting to retrieve the value corresponding to a value ID. But this can be burdensome on a computing system and waste operational resources, such as memory.
  • FIG. 1 is a block diagram of an architecture of in-memory and paged dictionary, according to some embodiments.
  • FIG. 2 is a block diagram of an example helper vector, in-memory dictionary, and paged dictionary, according to some embodiments.
  • FIG. 3 is a flowchart illustrating a process for loading a page of a dictionary for executing a query, according to some embodiments.
  • FIG. 4 is a flowchart illustrating a process for retrieving a value from a dictionary using a value ID, according to some embodiments.
  • FIG. 5 is a flowchart illustrating a process for generating a new helper vector and value ID index, according to some embodiments
  • FIG. 6 is an example computer system useful for implementing various embodiments.
  • a server receives a query to be executed.
  • the query includes a value for executing the query.
  • the server executes a query on a dictionary to retrieve a value ID corresponding to the value.
  • the dictionary includes pages including value IDs and corresponding values.
  • the server executes a binary search on a helper vector of the dictionary based on the value.
  • the helper vector includes the last value for each page of a dictionary.
  • the server identifies a page of the dictionary including the value.
  • the server loads the page into temporary memory and retrieves the value ID of the value from the page.
  • the server executes the query on a column using the value ID.
  • This configuration provides for loading the relevant page which includes the requested value rather than an entire dictionary into memory (in some embodiments, plural relevant pages are loaded). This solves the technical problem of reducing the burden on the computing system. Furthermore, this reduces the memory footprint and improves operational efficiency.
  • the described architecture also provides for a unified persistency format and the ability for transient paged and in-memory structures.
  • FIG. 1 is a block diagram of an architecture of in-memory and paged dictionary, according to some embodiments.
  • the architecture can include a server 100 , database 128 , and client device 130 .
  • Server 100 can be in communication with database 128 and client device 130 .
  • Server 100 , database 128 , and client device 130 can be connected through wired connections, wireless connections, or a combination of wired and wireless connections.
  • the network can be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless wide area network (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or a combination of two or more such networks.
  • VPN virtual private network
  • LAN local area network
  • WLAN wireless LAN
  • WAN wide area network
  • WWAN wireless wide area network
  • MAN metropolitan area network
  • PSTN Public Switched Telephone Network
  • PSTN Public Switched Telephone Network
  • Server 100 can include a database management system (DMS) 102 , temporary memory 104 , main memory 108 , and secondary memory 123 .
  • main memory 108 can be Random Access Memory (RAM) or any other type of dynamic memory.
  • Secondary memory 123 can be persistent memory such as a non-volatile storage device.
  • Temporary memory 104 can be a buffer or cache memory. The temporary memory 104 may reside within the main memory 108 .
  • Each of the main memory 108 , secondary memory 123 , and temporary memory 104 can be configured to store portion or entire copies of a dictionary corresponding to a column of the database 128 .
  • the column of the database 128 can be configured to store data including compressed versions of certain values (i.e., a value ID) rather than the entire value.
  • the dictionary can store the value ID and the corresponding entire value.
  • the dictionary can store the value and the value ID may be inferred based on the position of the value in the dictionary.
  • the dictionary can be used to correlate and retrieve values of different value IDs so that these value IDs can be looked up in the column.
  • the dictionary can include multiple pages of value IDs and values.
  • An example dictionary 116 can be stored in secondary memory 123 .
  • Dictionary 116 can include page 106 , 118 , and 120 .
  • an example dictionary 124 can be stored in main memory 108 .
  • Dictionary 124 can store a copy of pages 106 - a , 118 - a , and 120 - a which are a copy of pages 106 , 118 , and 120 separately or in contiguous memory space.
  • dictionary 124 can be an in-memory version of dictionary 116 .
  • main memory 108 can be dynamic memory.
  • dictionary 124 may be an in-memory dictionary which offers in-memory processing.
  • Secondary memory 123 may be a persistent storage device.
  • dictionary 116 may be a paged dictionary which offers paged processing. This may be referred to as a hybrid storage configuration or hybrid column store.
  • This hybrid storage configuration offers in-memory processing for performance critical operations and buffer-managed paged processing for less critical, economical data access.
  • This hybrid capability extends up from a unified persistence format which can be used to load paged or in-memory primitive structures, which in turn form paged or in-memory column store structures (data vectors, dictionaries 116 or 124 , and indexes) that are ultimately arranged by the hybrid storage configuration or hybrid column store according to each column's load unit configuration.
  • Hybrid columns can be configured to load all in-memory structures, all paged structures, or a mixture of both.
  • a user can control which portions of the dictionary are stored in main memory 108 or secondary memory 123 , in whichever configuration necessary.
  • a user can use client device 130 to manage the storage of the dictionaries 116 and 124 in main memory 108 and secondary memory 123 .
  • the storage of the dictionaries 116 and 124 can be automatically configured based on workload, data access algorithms, and intelligent algorithms.
  • Dictionary 124 can be referred to as an in-memory dictionary and dictionary 116 can be a paged dictionary.
  • an entire copy of dictionary 116 can be stored in main memory 108 .
  • a copy of a portion of dictionary 116 can be stored in main memory 108 .
  • different portions of a dictionary can be stored in main memory 108 and secondary memory 123 .
  • columns of database 128 can also be stored in a similar configuration as described above.
  • the columns of database 128 can be stored in secondary memory 123 and an entire copy of columns can be stored in main memory 108 .
  • the columns of database 128 can be stored in secondary memory 123 and a copy of a portion of the columns of database 128 can be stored in main memory 108 .
  • different portions of the columns can be stored in main memory 108 and secondary memory 123 .
  • Dictionary 116 can be a multi-page vector.
  • the multi-page vector can be a type of paged primitive that provides a uniform interface with in-memory counterparts to read-write algorithms. This allows the codebase to seamlessly operate on either format with minimum adaption, hiding the details of operating on data in native store extension (NSE). Furthermore, paged primitives can be enriched with auxiliary and performant search structures and indexes.
  • a multi-page vector is a large vector that can be stored on a single page chain with fixed-size pages. Having a fixed number of objects per page simplifies identifying the page that has the content for a given vector position.
  • the multi-page vector is configured to store more than one vector on each page chain, with each vector having its metadata. Once a vector is sorted, each multi-page vector can be extended with a helper structure (e.g., helper vector) to facilitate search and to avoid loading pages that are guaranteed not to have a value that does not satisfy the search.
  • helper structure e.g., helper vector
  • Dictionary 116 can include a value ID index 126 .
  • the value ID index 126 can be a data structure that stores the last value ID of each page of the dictionary 116 for data types of variable length.
  • dictionary 124 may not include a value ID index.
  • the value ID can act as the identifier of the value.
  • the columns of database 128 can store a value ID representing the value, while the dictionaries can store both the value ID and value.
  • Each page of dictionary 116 and dictionary 124 can store value IDs and the corresponding values.
  • each page of dictionary 116 can include a fixed amount of value IDs and values.
  • page 106 , page 118 , and page 120 can each include three value IDs and their corresponding value.
  • Dictionary 116 and dictionary 124 can be used to retrieve a given value ID so that the value ID can be used to execute a query on a given column.
  • the secondary memory 123 can also include a helper vector 122 .
  • Helper vector can also be a paged primitive.
  • helper vector 122 can be an arbitrary sized item. For an arbitrary sized item, a block is fully stored on a single page. The number of blocks stored on a page depends on block sizes. For data items larger than the page size, data is internally divided into multiple blocks.
  • Helper vector 122 can also be referred to as a compact memory resident sorted vector. Helper vector 122 is sorted and stores the last value of each page stored in dictionary 116 . Helper vector 122 can be used to quickly identify a particular page including a given value.
  • DMS 102 can include a wrapper 110 , unification engine 112 , paging engine 114 , and query engine 115 .
  • DMS 102 can receive and process query requests by retrieving value ID of a given value or value for a given value ID from a given page of a dictionary and executing the query.
  • Wrapper 110 can be configured to determine whether the dictionary to be queried is stored in main memory 108 or secondary memory 123 .
  • Unification engine 112 can generate a new helper vector and value index in response to an occurrence of an event save operation of the dictionary from main memory 108 to secondary memory 123 .
  • Paging engine 114 can be configured to load a particular page of dictionary 116 into temporary memory 104 .
  • Query engine 115 can be configured to execute queries against a given column.
  • DMS 102 can receive a request to execute a query including a given value, 2.50.
  • the query may be received from client device 130 .
  • the request can be to retrieve all plants which are 2.50 inches tall.
  • Wrapper 110 can determine whether to search dictionary 116 or dictionary 124 for the value ID corresponding to 2.50.
  • the request can indicate whether to query the in-memory dictionary (e.g., dictionary 124 ) or paged dictionary (e.g., dictionary 116 ).
  • wrapper 110 can determine that dictionary 116 is to be queried.
  • Paging engine 114 can execute a binary search on helper vector 122 to identify a particular page of dictionary 116 on which the value, 2.50, is located. To execute the binary search, paging engine 114 compares the target value (2.50) to the middle value of the helper vector. If they are not equal, the half in which the value cannot lie is eliminated and the search continues on the remaining half. Paging engine 114 repeats this process taking the middle element to compare to the target value, and repeating this until the target value is found or there is a single value remaining. As described above, helper vector 122 stores the last value of each page in dictionary 116 and the corresponding page. In the event, there is a single value remaining, paging engine 114 can determine that the 2.50 is on the page corresponding to the remaining value. In this example, the paging engine 114 can determine that 2.50 is included in page 106 .
  • Paging engine 114 can load page 106 from secondary memory 123 to temporary memory 104 (which can be from the buffer cache) and retrieve the value ID, 2, for the value 2.50. By doing so, the paging engine 114 avoids having to load the entire dictionary 116 into memory. This reduces the memory footprint.
  • Query engine 115 can execute the query for retrieving all plants that are 2.50 inches, using the value ID, 2.
  • DMS 102 can receive a request to materialize a value corresponding to the value identifier observed in a data vector at the desired row position.
  • the request can include the value ID. Since each page includes a fixed amount of values, the paging engine 114 can identify a particular page that includes the value based on the value ID included in the request and the fixed amount of values per page.
  • the dictionary 116 can include values that are variable size data types (e.g., VARCHAR). Each page can include a different number of value IDs and corresponding values.
  • the value ID 126 index can store the last value ID of each page.
  • DMS 102 receives a request to materialize a value corresponding to the value ID observed in a data vector at the desired row position, the paging engine 114 can identify a particular page that includes the value based on the last value ID index 126 and the value ID included in the request. For example, the paging engine 114 may execute a binary search on the value ID index 126 using the value identifier to identify the page that includes the value.
  • wrapper 110 can receive a request to execute a query from an upper layer.
  • the upper layer may be an API, which is agnostic to the configuration dictionary 116 and dictionary 124 . That is, the upper layer may query the dictionary 116 or 124 without distinguishing the dictionary 116 or 124 .
  • the lower layer includes services or APIs that are configured to distinguish dictionary 116 or 124 when querying either dictionary. In an example, these requests may specify a particular dictionary to query.
  • the wrapper 110 can determine which dictionary to query. This allows the upper layer to make a single API call to a dictionary and the wrapper 110 can identify which implementation of the dictionary, in-memory (e.g., dictionary 124 ) or paged (e.g., dictionary 116 ) to query.
  • in-memory e.g., dictionary 124
  • paged e.g., dictionary 116
  • wrapper 110 can allow a query for a value that matches a certain pattern rather than a specific value. Furthermore, the wrapper 110 allows such algorithms to be implemented and maintained once, independent of the storage location (in-memory vs paged).
  • unification engine 112 determines that dictionary 124 is being stored in paged format and can generate auxiliary data structures.
  • the auxiliary data structures can include metadata such as a value ID index 126 and a value index (i.e., the helper vector 122 ).
  • the value ID index 126 is the last value ID on a page.
  • the value index is the last value on a page.
  • the save operation occurs any time there is a delta merge, optimized compression, or any data definition language operation. This ensures when there is a write operation performed on dictionary 124 , the data persisted to the secondary memory 123 is accurately reflected in the value ID index 126 and helper vector 122 , so that the next received query is processed accurately.
  • FIG. 2 is a block diagram of an example helper vector, in-memory dictionary, and paged dictionary, according to some embodiments.
  • dictionary 116 can include pages 106 , 118 , and 120 . Additionally, dictionary 116 can include a value ID index 126 , and a helper vector 122 .
  • the value ID index 126 can include the last value ID for each page. In the event, the values stored in the dictionary 116 are of variable size data type, the value ID index 126 can be used to identify a page including a value using the corresponding value ID.
  • the helper vector 122 can include the last value for each page.
  • page 106 can include value IDs 1, 2, 3 and corresponding values 1.56, 2.50, 3.14.
  • Page 118 can include value IDs 4, 5, 6 and corresponding values 4.98, 6.11, and 6.50.
  • Page 120 can include value IDs 7, 8, 9 and corresponding values 6.70, 8.88, and 10.88.
  • Helper vector 122 can include the last value of 3.14 for page 106 , 6.50 as the last value of page 118 , and 10.88 as the last value for page 120 .
  • the DMS 102 as shown in FIG. 1 receives a query include value, 8.88
  • the paging engine 114 as shown in FIG. 1 can execute a binary search on helper vector 122 to page number and subsequently the identify the page number and subsequently the value ID.
  • the paging engine can identify the middle value of the helper vector 122 , which is 6.50.
  • the paging engine can determine that 8.88 is greater than 6.50 so the paging engine would eliminate pages 106 and 118 .
  • the last remaining page is page 120 .
  • the paging engine can load page 120 into temporary memory 104 as shown in FIG. 1 .
  • the paging engine can determine the value ID for value 8.88 is 8.
  • the DMS 102 receives a request to materialize a value based on a value ID 5.
  • the paging engine can determine that each page stores 3 value IDs. Based to this, the paging engine can determine page 106 stores the first three value IDs, page 118 can store the next three value IDs, and page 120 can store the last three value IDs.
  • the paging engine can determine that value ID 5 is located on page 118 .
  • the paging engine can load page 118 into temporary memory and determine the corresponding value for value ID 5 is 6.11.
  • FIG. 3 is a flowchart illustrating a process for loading a page of a dictionary for executing a query, according to an embodiment.
  • Method 300 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all steps can be needed to perform the disclosure provided herein. Further, some of the steps can be performed simultaneously, or in a different order than shown in FIG. 3 , as will be understood by a person of ordinary skill in the art.
  • Method 300 shall be described with reference to FIG. 1 . However, method 300 is not limited to that example embodiment.
  • DMS 102 receives a request to execute a query including a value.
  • the request can be directed to executing a query against a particular column.
  • columns may not store entire values. Rather, columns can store value IDs representing the value.
  • the request can be received from the client device 130 .
  • the paging engine 114 queries a dictionary for a value ID corresponding to the value received in the request.
  • the dictionary includes multiple pages including value IDs and the corresponding values.
  • the wrapper 110 can determine whether to query an in-memory dictionary (dictionary 124 ) or a paged dictionary (dictionary 116 ). In this embodiment, the paged dictionary is queried.
  • the paged dictionary includes a helper vector 122 .
  • the paging engine 114 executes a binary search on the helper vector 122 of the dictionary 116 using the value.
  • the helper vector 122 includes a last value for each page of a dictionary 116 .
  • the binary search can search for a value in the helper vector based on the last value for each page.
  • the paging engine 114 identifies a page of the dictionary including the value.
  • the paging engine identifies the page by executing the binary search on the helper vector. For example, based on the binary search the paging engine 114 can eliminate pages on which the value is not included and based on processes of elimination identify the page on which the value is included.
  • the paging engine 114 loads the page into temporary memory 104 .
  • Temporary memory 104 can be buffer or cache memory.
  • the page can include a list of value IDs and corresponding values.
  • the paging engine 114 retrieves the value ID of the value from the page loaded in the temporary memory.
  • the paging engine 114 can execute a query to retrieve the value ID from the page. Since a single page is loaded into temporary memory, executing this query is not burdensome on the operational resources.
  • the query engine 116 executes the requested query on the column using the value ID.
  • the query engine 116 can receive the results from the query and transmit the results to the client device 130 .
  • FIG. 4 is a flowchart illustrating a process for retrieving a value from a dictionary using a value ID, according to an embodiment.
  • Method 400 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all steps can be needed to perform the disclosure provided herein. Further, some of the steps can be performed simultaneously, or in a different order than shown in FIG. 4 , as will be understood by a person of ordinary skill in the art.
  • Method 400 shall be described with reference to FIG. 1 . However, method 400 is not limited to that example embodiment.
  • the DMS 102 receives a request to materialize a value using a value ID.
  • a (paged) dictionary 116 can include multiple pages of value IDs and corresponding values.
  • the dictionary 116 include values of fixed data type, the dictionary 116 can store a fixed amount of value IDs and corresponding values per page.
  • the paging engine 114 determines a number of values per page in the dictionary. As described above, each page of the dictionary can store a fixed amount of value IDs and values.
  • the paging engine 114 identifies a page including the requested value based on value ID and the number of value IDs per page in the dictionary. For example, paging engine 114 can divide the number of value IDs by the value ID (and take the floor of the result) to identify the page including the value ID and requested value.
  • the paging engine 114 loads the identified page into the temporary memory 104 .
  • temporary memory 104 can be buffer or cache configured to store data for a short amount of time.
  • the paging engine 114 retrieves value corresponding to the value ID form the page stored in the temporary memory 104 .
  • the paging engine 114 can execute a query to retrieve the value ID from the page.
  • FIG. 5 is a flowchart illustrating a process for generating a new value ID index and helper vector based on an event, according to an embodiment.
  • Method 500 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all steps can be needed to perform the disclosure provided herein. Further, some of the steps can be performed simultaneously, or in a different order than shown in FIG. 5 , as will be understood by a person of ordinary skill in the art.
  • Method 500 shall be described with reference to FIG. 1 . However, method 500 is not limited to that example embodiment.
  • the unification engine 112 detects an event causing an update to the values and value IDs in the dictionary.
  • a merge, optimized compression, or DDL operation can be executed on the in-memory dictionary (e.g., dictionary 124 ).
  • the dictionary 124 can be persisted or saved from the main memory 108 to the secondary memory 123 . This can cause the values or value IDs to be updated in the dictionary 116 .
  • the unification engine 112 identifies a change in the last values for each page in the dictionary based on the event. For example, based on a merge, more data can be added in the dictionary. This can shift the positions of the values such that the last value for each page can change.
  • the unification engine 112 identifies a change in the last value ID of each page in the dictionary. Continuing with the earlier example, based on a merge, more data can be added in the dictionary. This can shift the positions of the value IDs such that the last value ID for each page can change.
  • the unification engine 112 generates a new helper vector to reflect the changed last values for each page in the dictionary.
  • the new helper vector can store the updated last values for each page in the dictionary.
  • the new helper vector will be persisted in the secondary memory 123 .
  • the unification engine 112 generates a new value ID index to reflect the changed last value IDs for each page in the dictionary.
  • the new value ID index can store the update last value ID for each page in the dictionary in the event the dictionary includes values of variable size data types.
  • the new value ID index will be persisted in the secondary memory 123 .
  • Computer system 600 can be used, for example, to implement method 300 of FIG. 3, 400 of FIG. 4, and 500 of FIG. 5 .
  • Computer system 600 can be at least part of server 100 as shown in FIG. 1 .
  • computer system 600 can identify a page from dictionary 116 or 124 as shown in FIGS. 1-2 , load the page and retrieve a requested value from dictionary 116 and 124 .
  • Computer system 600 can be any computer capable of performing the functions described herein.
  • Computer system 600 can be any well-known computer capable of performing the functions described herein.
  • Computer system 600 includes one or more processors (also called central processing units, or CPUs), such as a processor 604 .
  • processors also called central processing units, or CPUs
  • Processor 604 is connected to a communication infrastructure or bus 606 .
  • One or more processors 604 can each be a graphics processing unit (GPU).
  • a GPU is a processor that is a specialized electronic circuit designed to process mathematically intensive applications.
  • the GPU can have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, etc.
  • Computer system 600 also includes user input/output device(s) 603 , such as monitors, keyboards, pointing devices, etc., that communicate with communication infrastructure 606 through user input/output interface(s) 602 .
  • user input/output device(s) 603 such as monitors, keyboards, pointing devices, etc.
  • communication infrastructure 606 such as keyboards, pointing devices, etc.
  • Computer system 600 also includes a main or primary memory 608 , such as random access memory (RAM).
  • Main memory 608 can include one or more levels of cache.
  • Main memory 608 has stored therein control logic (i.e., computer software) and/or data.
  • Computer system 600 can also include one or more secondary storage devices or memory 610 .
  • Secondary memory 610 can include, for example, a hard disk drive 612 and/or a removable storage device or drive 614 .
  • Removable storage drive 614 can be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
  • Removable storage drive 614 can interact with a removable storage unit 618 .
  • Removable storage unit 618 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data.
  • Removable storage unit 618 can be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device.
  • Removable storage drive 614 reads from and/or writes to removable storage unit 618 in a well-known manner.
  • secondary memory 610 can include other means, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 600 .
  • Such means, instrumentalities or other approaches can include, for example, a removable storage unit 622 and an interface 620 .
  • the removable storage unit 622 and the interface 620 can include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
  • Computer system 600 can further include a communication or network interface 624 .
  • Communication interface 624 enables computer system 600 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference number 628 ).
  • communication interface 624 can allow computer system 600 to communicate with remote devices 628 over communications path 626 , which can be wired and/or wireless, and which can include any combination of LANs, WANs, the Internet, etc. Control logic and/or data can be transmitted to and from computer system 600 via communication path 626 .
  • a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device.
  • control logic software stored thereon
  • control logic when executed by one or more data processing devices (such as computer system 600 ), causes such data processing devices to operate as described herein.
  • references herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases indicate that the embodiment described can include a particular feature, structure, or characteristic, but every embodiment can not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein. Additionally, some embodiments can be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other.
  • Coupled can also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Disclosed herein are system, method, and computer program product embodiments for identifying and loading a relevant page of a dictionary into temporary memory. An embodiment operates by receiving a query to be executed. The query includes a value for executing the query. The server queries a dictionary to retrieve a value ID. The server executes a binary search on a helper vector of the dictionary based on the value. The helper vector includes a last value for each page of a dictionary. The server identifies a page of the dictionary including the value. The server loads the page into temporary memory and retrieves the value ID of the value from the page. The server executes the query on a column using the value ID.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims priority to U.S. Provisional Application No. 62/858,693, filed on Jun. 7, 2019, the contents of which are incorporated herein in their entirety.
  • BACKGROUND
  • Hybrid in-memory and paged storage configurations allow storage of data in dynamic storage devices as well as persistent storage. However, providing on-demand access to such data requires loading the data stored in persistent storage to in-memory storage. This can leave a large memory footprint and can be burdensome on a computing system.
  • For example, to save space, database columns can store compressed values rather than the entire value of data. A dictionary corresponding to a database column can store the entire value. The compressed value can act as a value identifier in the dictionary. However, it can be difficult to retrieve a value from the dictionary as the dictionary can be voluminous and can include multiple different pages. Conventional systems operated to load the entire dictionary in memory when attempting to retrieve the value corresponding to a value ID. But this can be burdensome on a computing system and waste operational resources, such as memory.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings are incorporated herein and form a part of the specification.
  • FIG. 1 is a block diagram of an architecture of in-memory and paged dictionary, according to some embodiments.
  • FIG. 2 is a block diagram of an example helper vector, in-memory dictionary, and paged dictionary, according to some embodiments.
  • FIG. 3 is a flowchart illustrating a process for loading a page of a dictionary for executing a query, according to some embodiments.
  • FIG. 4 is a flowchart illustrating a process for retrieving a value from a dictionary using a value ID, according to some embodiments.
  • FIG. 5 is a flowchart illustrating a process for generating a new helper vector and value ID index, according to some embodiments
  • FIG. 6 is an example computer system useful for implementing various embodiments.
  • In the drawings, like reference numbers generally indicate identical or similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
  • DETAILED DESCRIPTION
  • Provided herein are system, apparatus, device, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for identifying and loading a relevant page of a dictionary into temporary memory.
  • In an embodiment, a server receives a query to be executed. The query includes a value for executing the query. The server executes a query on a dictionary to retrieve a value ID corresponding to the value. The dictionary includes pages including value IDs and corresponding values. The server executes a binary search on a helper vector of the dictionary based on the value. The helper vector includes the last value for each page of a dictionary. The server identifies a page of the dictionary including the value. The server loads the page into temporary memory and retrieves the value ID of the value from the page. The server executes the query on a column using the value ID.
  • This configuration provides for loading the relevant page which includes the requested value rather than an entire dictionary into memory (in some embodiments, plural relevant pages are loaded). This solves the technical problem of reducing the burden on the computing system. Furthermore, this reduces the memory footprint and improves operational efficiency. The described architecture also provides for a unified persistency format and the ability for transient paged and in-memory structures.
  • FIG. 1 is a block diagram of an architecture of in-memory and paged dictionary, according to some embodiments. In an embodiment, the architecture can include a server 100, database 128, and client device 130. Server 100 can be in communication with database 128 and client device 130. Server 100, database 128, and client device 130 can be connected through wired connections, wireless connections, or a combination of wired and wireless connections.
  • As an example server 100, database 128, and client device 130, can be connected through a network. The network can be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless wide area network (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or a combination of two or more such networks.
  • Server 100 can include a database management system (DMS) 102, temporary memory 104, main memory 108, and secondary memory 123. As an example, main memory 108 can be Random Access Memory (RAM) or any other type of dynamic memory. Secondary memory 123 can be persistent memory such as a non-volatile storage device. Temporary memory 104 can be a buffer or cache memory. The temporary memory 104 may reside within the main memory 108.
  • Each of the main memory 108, secondary memory 123, and temporary memory 104 can be configured to store portion or entire copies of a dictionary corresponding to a column of the database 128. The column of the database 128 can be configured to store data including compressed versions of certain values (i.e., a value ID) rather than the entire value. The dictionary can store the value ID and the corresponding entire value. Alternatively, the dictionary can store the value and the value ID may be inferred based on the position of the value in the dictionary. In this regard, the dictionary can be used to correlate and retrieve values of different value IDs so that these value IDs can be looked up in the column.
  • The dictionary can include multiple pages of value IDs and values. An example dictionary 116 can be stored in secondary memory 123. Dictionary 116 can include page 106, 118, and 120. Also, an example dictionary 124 can be stored in main memory 108. Dictionary 124 can store a copy of pages 106-a, 118-a, and 120-a which are a copy of pages 106, 118, and 120 separately or in contiguous memory space. In this regard, dictionary 124 can be an in-memory version of dictionary 116. As described above, main memory 108 can be dynamic memory. In this regard, dictionary 124 may be an in-memory dictionary which offers in-memory processing. Secondary memory 123 may be a persistent storage device. In this regard, dictionary 116 may be a paged dictionary which offers paged processing. This may be referred to as a hybrid storage configuration or hybrid column store.
  • This hybrid storage configuration offers in-memory processing for performance critical operations and buffer-managed paged processing for less critical, economical data access. This hybrid capability extends up from a unified persistence format which can be used to load paged or in-memory primitive structures, which in turn form paged or in-memory column store structures (data vectors, dictionaries 116 or 124, and indexes) that are ultimately arranged by the hybrid storage configuration or hybrid column store according to each column's load unit configuration. Hybrid columns can be configured to load all in-memory structures, all paged structures, or a mixture of both.
  • It can be appreciated, a user can control which portions of the dictionary are stored in main memory 108 or secondary memory 123, in whichever configuration necessary. For example, a user can use client device 130 to manage the storage of the dictionaries 116 and 124 in main memory 108 and secondary memory 123. Alternatively, the storage of the dictionaries 116 and 124 can be automatically configured based on workload, data access algorithms, and intelligent algorithms. Dictionary 124 can be referred to as an in-memory dictionary and dictionary 116 can be a paged dictionary.
  • In an example, an entire copy of dictionary 116 can be stored in main memory 108. Alternatively, a copy of a portion of dictionary 116 can be stored in main memory 108. In another example, different portions of a dictionary can be stored in main memory 108 and secondary memory 123.
  • Furthermore, columns of database 128 can also be stored in a similar configuration as described above. In an example, the columns of database 128 can be stored in secondary memory 123 and an entire copy of columns can be stored in main memory 108. Alternatively, the columns of database 128 can be stored in secondary memory 123 and a copy of a portion of the columns of database 128 can be stored in main memory 108. In another example, different portions of the columns can be stored in main memory 108 and secondary memory 123.
  • Dictionary 116 can be a multi-page vector. The multi-page vector can be a type of paged primitive that provides a uniform interface with in-memory counterparts to read-write algorithms. This allows the codebase to seamlessly operate on either format with minimum adaption, hiding the details of operating on data in native store extension (NSE). Furthermore, paged primitives can be enriched with auxiliary and performant search structures and indexes.
  • A multi-page vector is a large vector that can be stored on a single page chain with fixed-size pages. Having a fixed number of objects per page simplifies identifying the page that has the content for a given vector position. The multi-page vector is configured to store more than one vector on each page chain, with each vector having its metadata. Once a vector is sorted, each multi-page vector can be extended with a helper structure (e.g., helper vector) to facilitate search and to avoid loading pages that are guaranteed not to have a value that does not satisfy the search.
  • Dictionary 116 can include a value ID index 126. The value ID index 126 can be a data structure that stores the last value ID of each page of the dictionary 116 for data types of variable length. In an embodiment, dictionary 124 may not include a value ID index. The value ID can act as the identifier of the value. As described above, the columns of database 128 can store a value ID representing the value, while the dictionaries can store both the value ID and value. Each page of dictionary 116 and dictionary 124 can store value IDs and the corresponding values.
  • The values can be fixed data types such as integers, doubles, longs, shorts, or the like. Furthermore, in the event the values are fixed size data types, each page of dictionary 116 can include a fixed amount of value IDs and values. As a non-limiting example, page 106, page 118, and page 120 can each include three value IDs and their corresponding value. Dictionary 116 and dictionary 124 can be used to retrieve a given value ID so that the value ID can be used to execute a query on a given column.
  • The secondary memory 123 can also include a helper vector 122. Helper vector can also be a paged primitive. In particular, helper vector 122 can be an arbitrary sized item. For an arbitrary sized item, a block is fully stored on a single page. The number of blocks stored on a page depends on block sizes. For data items larger than the page size, data is internally divided into multiple blocks.
  • Helper vector 122 can also be referred to as a compact memory resident sorted vector. Helper vector 122 is sorted and stores the last value of each page stored in dictionary 116. Helper vector 122 can be used to quickly identify a particular page including a given value.
  • DMS 102 can include a wrapper 110, unification engine 112, paging engine 114, and query engine 115. DMS 102 can receive and process query requests by retrieving value ID of a given value or value for a given value ID from a given page of a dictionary and executing the query. Wrapper 110 can be configured to determine whether the dictionary to be queried is stored in main memory 108 or secondary memory 123. Unification engine 112 can generate a new helper vector and value index in response to an occurrence of an event save operation of the dictionary from main memory 108 to secondary memory 123. Paging engine 114 can be configured to load a particular page of dictionary 116 into temporary memory 104. Query engine 115 can be configured to execute queries against a given column.
  • As a non-limiting example, DMS 102 can receive a request to execute a query including a given value, 2.50. The query may be received from client device 130. For example, the request can be to retrieve all plants which are 2.50 inches tall. Wrapper 110 can determine whether to search dictionary 116 or dictionary 124 for the value ID corresponding to 2.50. In some instances, the request can indicate whether to query the in-memory dictionary (e.g., dictionary 124) or paged dictionary (e.g., dictionary 116). In this example, wrapper 110 can determine that dictionary 116 is to be queried.
  • Paging engine 114 can execute a binary search on helper vector 122 to identify a particular page of dictionary 116 on which the value, 2.50, is located. To execute the binary search, paging engine 114 compares the target value (2.50) to the middle value of the helper vector. If they are not equal, the half in which the value cannot lie is eliminated and the search continues on the remaining half. Paging engine 114 repeats this process taking the middle element to compare to the target value, and repeating this until the target value is found or there is a single value remaining. As described above, helper vector 122 stores the last value of each page in dictionary 116 and the corresponding page. In the event, there is a single value remaining, paging engine 114 can determine that the 2.50 is on the page corresponding to the remaining value. In this example, the paging engine 114 can determine that 2.50 is included in page 106.
  • Paging engine 114 can load page 106 from secondary memory 123 to temporary memory 104 (which can be from the buffer cache) and retrieve the value ID, 2, for the value 2.50. By doing so, the paging engine 114 avoids having to load the entire dictionary 116 into memory. This reduces the memory footprint. Query engine 115 can execute the query for retrieving all plants that are 2.50 inches, using the value ID, 2.
  • In an embodiment, DMS 102 can receive a request to materialize a value corresponding to the value identifier observed in a data vector at the desired row position. The request can include the value ID. Since each page includes a fixed amount of values, the paging engine 114 can identify a particular page that includes the value based on the value ID included in the request and the fixed amount of values per page.
  • In an alternative embodiment, the dictionary 116 can include values that are variable size data types (e.g., VARCHAR). Each page can include a different number of value IDs and corresponding values. The value ID 126 index can store the last value ID of each page. In this regard, in the event, DMS 102 receives a request to materialize a value corresponding to the value ID observed in a data vector at the desired row position, the paging engine 114 can identify a particular page that includes the value based on the last value ID index 126 and the value ID included in the request. For example, the paging engine 114 may execute a binary search on the value ID index 126 using the value identifier to identify the page that includes the value.
  • In an embodiment, wrapper 110 can receive a request to execute a query from an upper layer. The upper layer may be an API, which is agnostic to the configuration dictionary 116 and dictionary 124. That is, the upper layer may query the dictionary 116 or 124 without distinguishing the dictionary 116 or 124. The lower layer includes services or APIs that are configured to distinguish dictionary 116 or 124 when querying either dictionary. In an example, these requests may specify a particular dictionary to query. The wrapper 110 can determine which dictionary to query. This allows the upper layer to make a single API call to a dictionary and the wrapper 110 can identify which implementation of the dictionary, in-memory (e.g., dictionary 124) or paged (e.g., dictionary 116) to query. This allows for more complex queries on dictionary 116 and 124. For example, wrapper 110 can allow a query for a value that matches a certain pattern rather than a specific value. Furthermore, the wrapper 110 allows such algorithms to be implemented and maintained once, independent of the storage location (in-memory vs paged).
  • In an embodiment, when dictionary 124 is being saved or persisted from main memory 108 to secondary memory 123, unification engine 112 determines that dictionary 124 is being stored in paged format and can generate auxiliary data structures. The auxiliary data structures can include metadata such as a value ID index 126 and a value index (i.e., the helper vector 122). The value ID index 126 is the last value ID on a page. The value index is the last value on a page. The save operation occurs any time there is a delta merge, optimized compression, or any data definition language operation. This ensures when there is a write operation performed on dictionary 124, the data persisted to the secondary memory 123 is accurately reflected in the value ID index 126 and helper vector 122, so that the next received query is processed accurately.
  • FIG. 2 is a block diagram of an example helper vector, in-memory dictionary, and paged dictionary, according to some embodiments. As described in FIG. 1, dictionary 116 can include pages 106, 118, and 120. Additionally, dictionary 116 can include a value ID index 126, and a helper vector 122. The value ID index 126 can include the last value ID for each page. In the event, the values stored in the dictionary 116 are of variable size data type, the value ID index 126 can be used to identify a page including a value using the corresponding value ID. The helper vector 122 can include the last value for each page.
  • As a non-limiting example, page 106 can include value IDs 1, 2, 3 and corresponding values 1.56, 2.50, 3.14. Page 118 can include value IDs 4, 5, 6 and corresponding values 4.98, 6.11, and 6.50. Page 120 can include value IDs 7, 8, 9 and corresponding values 6.70, 8.88, and 10.88. Helper vector 122 can include the last value of 3.14 for page 106, 6.50 as the last value of page 118, and 10.88 as the last value for page 120.
  • In the event, the DMS 102 as shown in FIG. 1, receives a query include value, 8.88, the paging engine 114 as shown in FIG. 1 can execute a binary search on helper vector 122 to page number and subsequently the identify the page number and subsequently the value ID. The paging engine can identify the middle value of the helper vector 122, which is 6.50. The paging engine can determine that 8.88 is greater than 6.50 so the paging engine would eliminate pages 106 and 118. The last remaining page is page 120. The paging engine can load page 120 into temporary memory 104 as shown in FIG. 1. The paging engine can determine the value ID for value 8.88 is 8.
  • In the event, the DMS 102 receives a request to materialize a value based on a value ID 5. The paging engine can determine that each page stores 3 value IDs. Based to this, the paging engine can determine page 106 stores the first three value IDs, page 118 can store the next three value IDs, and page 120 can store the last three value IDs. The paging engine can determine that value ID 5 is located on page 118. The paging engine can load page 118 into temporary memory and determine the corresponding value for value ID 5 is 6.11.
  • FIG. 3 is a flowchart illustrating a process for loading a page of a dictionary for executing a query, according to an embodiment. Method 300 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all steps can be needed to perform the disclosure provided herein. Further, some of the steps can be performed simultaneously, or in a different order than shown in FIG. 3, as will be understood by a person of ordinary skill in the art.
  • Method 300 shall be described with reference to FIG. 1. However, method 300 is not limited to that example embodiment.
  • In 302, DMS 102 receives a request to execute a query including a value. The request can be directed to executing a query against a particular column. As described above, in some embodiments, columns may not store entire values. Rather, columns can store value IDs representing the value. For example, the request can be received from the client device 130.
  • In 304, the paging engine 114 queries a dictionary for a value ID corresponding to the value received in the request. The dictionary includes multiple pages including value IDs and the corresponding values. Before executing the query, the wrapper 110 can determine whether to query an in-memory dictionary (dictionary 124) or a paged dictionary (dictionary 116). In this embodiment, the paged dictionary is queried. The paged dictionary includes a helper vector 122.
  • In 306, the paging engine 114 executes a binary search on the helper vector 122 of the dictionary 116 using the value. The helper vector 122 includes a last value for each page of a dictionary 116. The binary search can search for a value in the helper vector based on the last value for each page.
  • In 308, the paging engine 114 identifies a page of the dictionary including the value. The paging engine identifies the page by executing the binary search on the helper vector. For example, based on the binary search the paging engine 114 can eliminate pages on which the value is not included and based on processes of elimination identify the page on which the value is included.
  • In 310, the paging engine 114 loads the page into temporary memory 104. Temporary memory 104 can be buffer or cache memory. The page can include a list of value IDs and corresponding values.
  • In 312, the paging engine 114 retrieves the value ID of the value from the page loaded in the temporary memory. When the page is loaded into memory the paging engine 114 can execute a query to retrieve the value ID from the page. Since a single page is loaded into temporary memory, executing this query is not burdensome on the operational resources.
  • In 314, the query engine 116 executes the requested query on the column using the value ID. The query engine 116 can receive the results from the query and transmit the results to the client device 130.
  • FIG. 4 is a flowchart illustrating a process for retrieving a value from a dictionary using a value ID, according to an embodiment. Method 400 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all steps can be needed to perform the disclosure provided herein. Further, some of the steps can be performed simultaneously, or in a different order than shown in FIG. 4, as will be understood by a person of ordinary skill in the art.
  • Method 400 shall be described with reference to FIG. 1. However, method 400 is not limited to that example embodiment.
  • In 402, the DMS 102 receives a request to materialize a value using a value ID. As described above, a (paged) dictionary 116 can include multiple pages of value IDs and corresponding values. In the event, the dictionary 116 include values of fixed data type, the dictionary 116 can store a fixed amount of value IDs and corresponding values per page.
  • In 404, the paging engine 114 determines a number of values per page in the dictionary. As described above, each page of the dictionary can store a fixed amount of value IDs and values.
  • In 406, the paging engine 114 identifies a page including the requested value based on value ID and the number of value IDs per page in the dictionary. For example, paging engine 114 can divide the number of value IDs by the value ID (and take the floor of the result) to identify the page including the value ID and requested value.
  • In 408, the paging engine 114 loads the identified page into the temporary memory 104. As described above, temporary memory 104 can be buffer or cache configured to store data for a short amount of time.
  • In 410, the paging engine 114 retrieves value corresponding to the value ID form the page stored in the temporary memory 104. When the page is loaded into memory the paging engine 114 can execute a query to retrieve the value ID from the page.
  • FIG. 5 is a flowchart illustrating a process for generating a new value ID index and helper vector based on an event, according to an embodiment. Method 500 can be performed by processing logic that can comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions executing on a processing device), or a combination thereof. It is to be appreciated that not all steps can be needed to perform the disclosure provided herein. Further, some of the steps can be performed simultaneously, or in a different order than shown in FIG. 5, as will be understood by a person of ordinary skill in the art.
  • Method 500 shall be described with reference to FIG. 1. However, method 500 is not limited to that example embodiment.
  • In 502, the unification engine 112 detects an event causing an update to the values and value IDs in the dictionary. For example, a merge, optimized compression, or DDL operation can be executed on the in-memory dictionary (e.g., dictionary 124). The dictionary 124 can be persisted or saved from the main memory 108 to the secondary memory 123. This can cause the values or value IDs to be updated in the dictionary 116.
  • In 504, the unification engine 112 identifies a change in the last values for each page in the dictionary based on the event. For example, based on a merge, more data can be added in the dictionary. This can shift the positions of the values such that the last value for each page can change.
  • In 506, the unification engine 112 identifies a change in the last value ID of each page in the dictionary. Continuing with the earlier example, based on a merge, more data can be added in the dictionary. This can shift the positions of the value IDs such that the last value ID for each page can change.
  • In 508, the unification engine 112 generates a new helper vector to reflect the changed last values for each page in the dictionary. The new helper vector can store the updated last values for each page in the dictionary. The new helper vector will be persisted in the secondary memory 123.
  • In 510, the unification engine 112 generates a new value ID index to reflect the changed last value IDs for each page in the dictionary. The new value ID index can store the update last value ID for each page in the dictionary in the event the dictionary includes values of variable size data types. The new value ID index will be persisted in the secondary memory 123.
  • Various embodiments can be implemented, for example, using one or more computer systems, such as computer system 600 shown in FIG. 6. Computer system 600 can be used, for example, to implement method 300 of FIG. 3, 400 of FIG. 4, and 500 of FIG. 5. Furthermore, computer system 600 can be at least part of server 100 as shown in FIG. 1. For example, computer system 600 can identify a page from dictionary 116 or 124 as shown in FIGS. 1-2, load the page and retrieve a requested value from dictionary 116 and 124. Computer system 600 can be any computer capable of performing the functions described herein.
  • Computer system 600 can be any well-known computer capable of performing the functions described herein.
  • Computer system 600 includes one or more processors (also called central processing units, or CPUs), such as a processor 604. Processor 604 is connected to a communication infrastructure or bus 606.
  • One or more processors 604 can each be a graphics processing unit (GPU). In an embodiment, a GPU is a processor that is a specialized electronic circuit designed to process mathematically intensive applications. The GPU can have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, etc.
  • Computer system 600 also includes user input/output device(s) 603, such as monitors, keyboards, pointing devices, etc., that communicate with communication infrastructure 606 through user input/output interface(s) 602.
  • Computer system 600 also includes a main or primary memory 608, such as random access memory (RAM). Main memory 608 can include one or more levels of cache. Main memory 608 has stored therein control logic (i.e., computer software) and/or data.
  • Computer system 600 can also include one or more secondary storage devices or memory 610. Secondary memory 610 can include, for example, a hard disk drive 612 and/or a removable storage device or drive 614. Removable storage drive 614 can be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
  • Removable storage drive 614 can interact with a removable storage unit 618. Removable storage unit 618 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data. Removable storage unit 618 can be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device. Removable storage drive 614 reads from and/or writes to removable storage unit 618 in a well-known manner.
  • According to an exemplary embodiment, secondary memory 610 can include other means, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 600. Such means, instrumentalities or other approaches can include, for example, a removable storage unit 622 and an interface 620. Examples of the removable storage unit 622 and the interface 620 can include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
  • Computer system 600 can further include a communication or network interface 624. Communication interface 624 enables computer system 600 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference number 628). For example, communication interface 624 can allow computer system 600 to communicate with remote devices 628 over communications path 626, which can be wired and/or wireless, and which can include any combination of LANs, WANs, the Internet, etc. Control logic and/or data can be transmitted to and from computer system 600 via communication path 626.
  • In an embodiment, a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer system 600, main memory 608, secondary memory 610, and removable storage units 618 and 622, as well as tangible articles of manufacture embodying any combination of the foregoing. Such control logic, when executed by one or more data processing devices (such as computer system 600), causes such data processing devices to operate as described herein.
  • Based on the teachings contained in this disclosure, it will be apparent to persons skilled in the relevant art(s) how to make and use embodiments of this disclosure using data processing devices, computer systems and/or computer architectures other than that shown in FIG. 6. In particular, embodiments can operate with software, hardware, and/or operating system implementations other than those described herein.
  • It is to be appreciated that the Detailed Description section, and not any other section, is intended to be used to interpret the claims. Other sections can set forth one or more but not all exemplary embodiments as contemplated by the inventor(s), and thus, are not intended to limit this disclosure or the appended claims in any way.
  • While this disclosure describes exemplary embodiments for exemplary fields and applications, it should be understood that the disclosure is not limited thereto. Other embodiments and modifications thereto are possible, and are within the scope and spirit of this disclosure. For example, and without limiting the generality of this paragraph, embodiments are not limited to the software, hardware, firmware, and/or entities illustrated in the figures and/or described herein. Further, embodiments (whether or not explicitly described herein) have significant utility to fields and applications beyond the examples described herein.
  • Embodiments have been described herein with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined as long as the specified functions and relationships (or equivalents thereof) are appropriately performed. Also, alternative embodiments can perform functional blocks, steps, operations, methods, etc. using orderings different than those described herein.
  • References herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases, indicate that the embodiment described can include a particular feature, structure, or characteristic, but every embodiment can not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein. Additionally, some embodiments can be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, some embodiments can be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, can also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
  • The breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (20)

What is claimed is:
1. A computer-implemented method comprising:
receiving, by one or more computing devices, a query including a value;
querying, by the one or more computing devices, a dictionary for a value ID corresponding to the value, using the value, wherein the dictionary includes a plurality of pages including value IDs and corresponding values;
executing, by the one or more computing devices, a binary search on a helper vector of the dictionary using the value, wherein the helper vector includes a last value for each page of a dictionary;
identifying, by the one or more computing devices, a page of the dictionary including the value;
loading, by the one or more computing devices, the page into temporary memory;
retrieving, by the one or more computing devices, the value ID of the value from the page; and
executing, by the one or more computing devices, the query on a column using the value ID.
2. The method of claim 1, wherein each page of the dictionary includes a predetermined number of value IDs.
3. The method of claim 2, further comprising:
receiving, by the one or more computing devices, a request to retrieve a different value, the request including a different value ID;
identifying, by the one or more computing devices, a different page including the different value based on the different value ID and the predetermined number of value IDs per page in the dictionary;
loading, by the one or more computing devices, the different page into the temporary memory; and
retrieving, by the one or more computing devices, the different value.
4. The method of claim 1, further comprising:
determining, by the one or more computing devices, that the dictionary is stored in a persistent storage device.
5. The method of claim 1, wherein the value is a fixed data type.
6. The method of claim 1, wherein the dictionary is a multi-page vector.
7. The method of claim 1, further comprising:
detecting, by the one or more computing devices, an event causing an update in the values and value IDs in the dictionary;
identifying, by the one or more computing devices, a change in the last values for each page in the dictionary based on the event; and
generating, by the one or more computing devices, a new helper vector to reflect the changed last values for each page in the dictionary.
8. The method of claim 1, further comprising loading, by the one or more computing devices, the helper vector into the temporary memory in response to receiving the query.
9. A system comprising:
a memory; and
at least one processor coupled to the memory and configured to:
receive a query including a value;
query a dictionary for a value ID corresponding to the value, using the value, wherein the dictionary includes a plurality of pages including value IDs and corresponding values;
execute a binary search on a helper vector of the dictionary using the value, wherein the helper vector includes a last value for each page of a dictionary;
identify a page of the dictionary including the value;
load the page into temporary memory;
retrieve the value ID of the value from the page; and
execute the query on a column using the value ID.
10. The system of claim 9, wherein each page of the dictionary includes a predetermined number of value IDs.
11. The system of claim 10, the at least one processor further configured to:
receive a request to retrieve a different value, the request including a different value ID;
identify a different page including the different value based on the different value ID and the predetermined number of value IDs per page in the dictionary;
load the different page into the temporary memory; and
retrieve the different value.
12. The system of claim 9, the at least one processor further configured to:
determine that the dictionary is stored in a persistent storage device.
13. The system of claim 9, wherein the dictionary is a multi-page vector.
14. The system of claim 9, the at least one processor further configured to:
detect an event causing an update in the values and value IDs in the dictionary;
identify a change in the last values for each page in the dictionary based on the event; and
generate a new helper vector to reflect the changed last values for each page in the dictionary.
15. The system of claim 9, the at least one processor configured to:
load the helper vector into the temporary memory in response to receiving the query.
16. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising:
receiving a query including a value;
querying a dictionary for a value ID corresponding to the value, using the value, wherein the dictionary includes a plurality of pages including value IDs and corresponding values;
executing a binary search on a helper vector of the dictionary using the value, wherein the helper vector includes a last value for each page of the dictionary;
identifying a page of the dictionary including the value;
loading the page into temporary memory;
retrieving the value ID of the value from the page; and
executing the query on a column using the value ID.
17. The non-transitory computer-readable device of claim 16, wherein each page of the dictionary includes a predetermined number of value IDs.
18. The non-transitory computer-readable device of claim 17, the operations further comprising:
receiving a request to retrieve a different value, the request including a different value ID;
identifying a different page including the different value based on the different value ID and the predetermined number of value IDs per page in the dictionary;
loading the different page into the temporary memory; and
retrieving the different value.
19. The non-transitory computer-readable device of claim 16, the operations further comprising:
determining that the dictionary is stored in a persistent storage device.
20. The non-transitory computer-readable device of claim 16, the operations further comprising:
detecting an event causing an update in the values and value IDs in the dictionary;
identifying a change in the last values for each page in the dictionary based on the event; and
generating a new helper vector to reflect the changed last values for each page in the dictionary.
US15/931,063 2019-06-07 2020-05-13 Architecture of hybrid in-memory and paged dictionary Abandoned US20200387511A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/931,063 US20200387511A1 (en) 2019-06-07 2020-05-13 Architecture of hybrid in-memory and paged dictionary

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962858693P 2019-06-07 2019-06-07
US15/931,063 US20200387511A1 (en) 2019-06-07 2020-05-13 Architecture of hybrid in-memory and paged dictionary

Publications (1)

Publication Number Publication Date
US20200387511A1 true US20200387511A1 (en) 2020-12-10

Family

ID=73650063

Family Applications (13)

Application Number Title Priority Date Filing Date
US16/857,982 Active 2040-10-29 US11514027B2 (en) 2019-06-07 2020-04-24 Paged hybrid LOBs
US16/858,344 Active 2040-06-04 US11194789B2 (en) 2019-06-07 2020-04-24 Content agnostic memory pageable storage model
US16/863,860 Abandoned US20200387451A1 (en) 2019-06-07 2020-04-30 Hybrid data storage and load system with position lookup
US16/863,834 Active 2041-02-26 US11514028B2 (en) 2019-06-07 2020-04-30 Hybrid data storage and load system with ROWID lookup
US16/866,766 Active US11151126B2 (en) 2019-06-07 2020-05-05 Hybrid column store providing both paged and memory-resident configurations
US15/931,063 Abandoned US20200387511A1 (en) 2019-06-07 2020-05-13 Architecture of hybrid in-memory and paged dictionary
US15/931,179 Active 2040-05-27 US11341120B2 (en) 2019-06-07 2020-05-13 Hash composite indexes
US16/890,020 Active 2041-10-27 US11726985B2 (en) 2019-06-07 2020-06-02 Hybrid in-memory/pageable spatial column data
US16/893,703 Active 2040-10-02 US11372845B2 (en) 2019-06-07 2020-06-05 In-place load unit conversion
US16/893,697 Active US11386082B2 (en) 2019-06-07 2020-06-05 Space efficient vector for columnar data storage
US17/496,108 Active US11755565B2 (en) 2019-06-07 2021-10-07 Hybrid column store providing both paged and memory-resident configurations
US17/540,921 Active US11663200B2 (en) 2019-06-07 2021-12-02 Content agnostic memory pageable storage model
US18/216,218 Pending US20230350879A1 (en) 2019-06-07 2023-06-29 Hybrid in-memory/pageable spatial column data

Family Applications Before (5)

Application Number Title Priority Date Filing Date
US16/857,982 Active 2040-10-29 US11514027B2 (en) 2019-06-07 2020-04-24 Paged hybrid LOBs
US16/858,344 Active 2040-06-04 US11194789B2 (en) 2019-06-07 2020-04-24 Content agnostic memory pageable storage model
US16/863,860 Abandoned US20200387451A1 (en) 2019-06-07 2020-04-30 Hybrid data storage and load system with position lookup
US16/863,834 Active 2041-02-26 US11514028B2 (en) 2019-06-07 2020-04-30 Hybrid data storage and load system with ROWID lookup
US16/866,766 Active US11151126B2 (en) 2019-06-07 2020-05-05 Hybrid column store providing both paged and memory-resident configurations

Family Applications After (7)

Application Number Title Priority Date Filing Date
US15/931,179 Active 2040-05-27 US11341120B2 (en) 2019-06-07 2020-05-13 Hash composite indexes
US16/890,020 Active 2041-10-27 US11726985B2 (en) 2019-06-07 2020-06-02 Hybrid in-memory/pageable spatial column data
US16/893,703 Active 2040-10-02 US11372845B2 (en) 2019-06-07 2020-06-05 In-place load unit conversion
US16/893,697 Active US11386082B2 (en) 2019-06-07 2020-06-05 Space efficient vector for columnar data storage
US17/496,108 Active US11755565B2 (en) 2019-06-07 2021-10-07 Hybrid column store providing both paged and memory-resident configurations
US17/540,921 Active US11663200B2 (en) 2019-06-07 2021-12-02 Content agnostic memory pageable storage model
US18/216,218 Pending US20230350879A1 (en) 2019-06-07 2023-06-29 Hybrid in-memory/pageable spatial column data

Country Status (1)

Country Link
US (13) US11514027B2 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11514027B2 (en) * 2019-06-07 2022-11-29 Sap Se Paged hybrid LOBs
CN112732174A (en) * 2020-12-25 2021-04-30 北京金山云网络技术有限公司 Data processing method and device, electronic equipment and storage medium
US11550762B2 (en) 2021-02-24 2023-01-10 Sap Se Implementation of data access metrics for automated physical database design
EP4341831A1 (en) * 2021-05-20 2024-03-27 Cinchy Inc. System for enabling temporal structural metadata changes using virtual addressing of data
US20220398235A1 (en) * 2021-06-11 2022-12-15 Actian Corporation Method and apparatus for storing object tokens in a database
US11687560B2 (en) * 2021-07-16 2023-06-27 International Business Machines Corporation Database replication using adaptive compression
CN118020057A (en) * 2021-09-29 2024-05-10 美光科技公司 Early detection of compression state using inline metadata
US20230135151A1 (en) * 2021-10-28 2023-05-04 Netapp Inc. Persistent metafile used to cache slot header metadata for improved read performance of objects within an object store
US20240086392A1 (en) * 2022-09-14 2024-03-14 Sap Se Consistency checks for compressed data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221528A1 (en) * 2011-01-14 2012-08-30 Sap Ag Logging scheme for column-oriented in-memory databases
US20130124491A1 (en) * 2011-11-11 2013-05-16 Gerald Pepper Efficient Pipelined Binary Search
US20140222418A1 (en) * 2012-04-30 2014-08-07 Martin Richtarsky Fixed string dictionary
US20160012089A1 (en) * 2014-07-10 2016-01-14 Reza Sherkat Main memory database management using page index vectors

Family Cites Families (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5625711A (en) * 1994-08-31 1997-04-29 Adobe Systems Incorporated Method and apparatus for producing a hybrid data structure for displaying a raster image
US5729637A (en) * 1994-08-31 1998-03-17 Adobe Systems, Inc. Method and apparatus for producing a hybrid data structure for displaying a raster image
US6307962B1 (en) * 1995-09-01 2001-10-23 The University Of Rochester Document data compression system which automatically segments documents and generates compressed smart documents therefrom
US6112286A (en) * 1997-09-19 2000-08-29 Silicon Graphics, Inc. Reverse mapping page frame data structures to page table entries
US20020099691A1 (en) * 1998-06-24 2002-07-25 Michael Dean Lore Method and apparatus for aggregation of data in a database management system
US6191710B1 (en) 1998-12-30 2001-02-20 Intel Corp. Data compression and decompression method and system for data compression and decompression
US7054504B2 (en) * 1999-02-25 2006-05-30 Ludwig Lester F Relative optical path phase reconstruction in the correction of misfocused images using fractional powers of the fourier transform
US20030041110A1 (en) * 2000-07-28 2003-02-27 Storymail, Inc. System, Method and Structure for generating and using a compressed digital certificate
US20020199096A1 (en) * 2001-02-25 2002-12-26 Storymail, Inc. System and method for secure unidirectional messaging
US7089567B2 (en) * 2001-04-09 2006-08-08 International Business Machines Corporation Efficient RPC mechanism using XML
US8413205B2 (en) * 2001-09-19 2013-04-02 Tvworks, Llc System and method for construction, delivery and display of iTV content
US8042132B2 (en) * 2002-03-15 2011-10-18 Tvworks, Llc System and method for construction, delivery and display of iTV content
US7383414B2 (en) * 2004-05-28 2008-06-03 Oracle International Corporation Method and apparatus for memory-mapped input/output
US8200887B2 (en) * 2007-03-29 2012-06-12 Violin Memory, Inc. Memory management system and method
JP5041458B2 (en) * 2006-02-09 2012-10-03 本田技研工業株式会社 Device for detecting three-dimensional objects
FR2899353B1 (en) * 2006-03-31 2008-06-27 Infovista Sa Sa MEMORY MANAGEMENT SYSTEM FOR REDUCING MEMORY FRAGMENTATION
US8032499B2 (en) 2007-05-21 2011-10-04 Sap Ag Compression of tables based on occurrence of values
US8427552B2 (en) * 2008-03-03 2013-04-23 Videoiq, Inc. Extending the operational lifetime of a hard-disk drive used in video data storage applications
US20090327621A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Virtual memory compaction and compression using collaboration between a virtual memory manager and a memory manager
US8495036B2 (en) * 2008-10-24 2013-07-23 Microsoft Corporation Blob manipulation in an integrated structured storage system
US10509304B2 (en) * 2008-11-12 2019-12-17 Tseng-Lu Chien LED projection light has features
US9292549B2 (en) * 2008-12-22 2016-03-22 Sap Se Method and system for index serialization
US8326893B2 (en) * 2009-02-24 2012-12-04 International Business Machines Corporation Allocating data sets to a container data set
US8935223B2 (en) * 2009-04-30 2015-01-13 Oracle International Corporation Structure of hierarchical compressed data structure for tabular data
US20120089775A1 (en) * 2010-10-08 2012-04-12 Sandeep Ranade Method and apparatus for selecting references to use in data compression
US8819056B2 (en) * 2010-11-19 2014-08-26 International Business Machines Corporation Facilitation of search, list, and retrieval operations on persistent data set using distributed shared memory
US9201652B2 (en) * 2011-05-03 2015-12-01 Qualcomm Incorporated Methods and apparatus for storage and translation of entropy encoded software embedded within a memory hierarchy
US20130346700A1 (en) * 2012-06-21 2013-12-26 Alexander I. Tomlinson Systems and methods for managing memory
DE102012022728A1 (en) * 2012-11-21 2014-05-22 Unify Gmbh & Co. Kg A method of controlling a flash memory for mass storage comprised of a communication device connectable to a host, and computer program product for executing the method
US9009439B2 (en) 2013-03-12 2015-04-14 Sap Se On-disk operations on fragments to support huge data sizes
US10474652B2 (en) * 2013-03-14 2019-11-12 Inpixon Optimizing wide data-type storage and analysis of data in a column store database
US8972465B1 (en) * 2013-03-15 2015-03-03 Emc Corporation Burst buffer appliance with small file aggregation
US20160085955A1 (en) * 2013-06-10 2016-03-24 Doosra, Inc. Secure Storing and Offline Transferring of Digitally Transferable Assets
US9128968B2 (en) * 2013-07-25 2015-09-08 Facebook, Inc. Systems and methods for data compression
US9350550B2 (en) * 2013-09-10 2016-05-24 M2M And Iot Technologies, Llc Power management and security for wireless modules in “machine-to-machine” communications
US10311154B2 (en) * 2013-09-21 2019-06-04 Oracle International Corporation Combined row and columnar storage for in-memory databases for OLTP and analytics workloads
US9684682B2 (en) * 2013-09-21 2017-06-20 Oracle International Corporation Sharding of in-memory objects across NUMA nodes
US9606921B2 (en) * 2013-09-21 2017-03-28 Oracle International Corporation Granular creation and refresh of columnar data
US9977802B2 (en) * 2013-11-21 2018-05-22 Sap Se Large string access and storage
US9384206B1 (en) * 2013-12-26 2016-07-05 Emc Corporation Managing data deduplication in storage systems
US9317208B2 (en) * 2013-12-31 2016-04-19 Sybase, Inc. Data row cache for an acid compliant in-memory row store in a page-based RDBMS engine
DE102014201571B4 (en) * 2014-01-29 2022-08-04 Carl Zeiss Meditec Ag Module for data mirroring in a visualization device, visualization device and method for adapting the device
US9342544B2 (en) * 2014-01-30 2016-05-17 International Business Machines Corporation Parallel load in a column-store database
DK3107429T3 (en) * 2014-02-20 2024-02-12 Mbl Ltd METHODS AND SYSTEMS FOR PREPARING FOOD IN A ROBOT COOKING KITCHEN
US9875180B2 (en) * 2014-02-24 2018-01-23 Sandisk Technologies Llc Systems and methods for managing storage compression operations
US9891831B2 (en) 2014-11-25 2018-02-13 Sap Se Dual data storage using an in-memory array and an on-disk page structure
US9898551B2 (en) 2014-11-25 2018-02-20 Sap Se Fast row to page lookup of data table using capacity index
US9811549B2 (en) * 2014-11-25 2017-11-07 Sap Se Applying a database transaction log record directly to a database table container
US20160148000A1 (en) * 2014-11-25 2016-05-26 Freescale Semiconductor, Inc. Method and apparatus for encoding image data
US9817858B2 (en) 2014-12-10 2017-11-14 Sap Se Generating hash values
US20160188176A1 (en) * 2014-12-31 2016-06-30 Valepro, LLC Systems and Methods for Resident Space Object Visualization
WO2016168211A1 (en) * 2015-04-13 2016-10-20 Risk Management Solutions, Inc. High performance big data computing system and platform
US10037270B2 (en) * 2015-04-14 2018-07-31 Microsoft Technology Licensing, Llc Reducing memory commit charge when compressing memory
US10552058B1 (en) * 2015-07-17 2020-02-04 Radian Memory Systems, Inc. Techniques for delegating data processing to a cooperative memory controller
US10204135B2 (en) * 2015-07-29 2019-02-12 Oracle International Corporation Materializing expressions within in-memory virtual column units to accelerate analytic queries
US10372706B2 (en) * 2015-07-29 2019-08-06 Oracle International Corporation Tracking and maintaining expression statistics across database queries
US9990308B2 (en) * 2015-08-31 2018-06-05 Oracle International Corporation Selective data compression for in-memory databases
US11249968B2 (en) * 2016-05-09 2022-02-15 Sap Se Large object containers with size criteria for storing mid-sized large objects
US10002010B2 (en) * 2016-05-13 2018-06-19 International Business Machines Corporation Multi-byte compressed string representation
US10691688B2 (en) * 2016-06-17 2020-06-23 Sap Se Cracking page-loadable columns for in-memory data management
US10733045B2 (en) * 2016-07-14 2020-08-04 Microsoft Technology Licensing, Llc Online repair of metadata for structured data including file systems
US11204895B1 (en) * 2016-09-28 2021-12-21 Amazon Technologies, Inc. Data payload clustering for data storage systems
US10642841B2 (en) * 2016-11-17 2020-05-05 Sap Se Document store utilizing partial object compression
US10762071B2 (en) * 2016-11-29 2020-09-01 Sap Se Value-ID-based sorting in column-store databases
US10901639B2 (en) * 2016-11-29 2021-01-26 Sap Se Memory allocation in multi-core processors
US11132128B2 (en) * 2017-03-24 2021-09-28 Veritas Technologies Llc Systems and methods for data placement in container-based storage systems
US10831787B2 (en) * 2017-06-30 2020-11-10 Sap Se Security of a computer system
WO2019000386A1 (en) * 2017-06-30 2019-01-03 Microsoft Technology Licensing, Llc Online schema change of range-partitioned index in distributed storage system
WO2019018727A1 (en) * 2017-07-21 2019-01-24 Washington University Methods and compositions for t cell activation
US10803040B2 (en) * 2017-08-28 2020-10-13 International Business Machines Corporation Efficient and accurate lookups of data by a stream processor using a hash table
FI20176151A1 (en) * 2017-12-22 2019-06-23 Vuolearning Ltd A heuristic method for analyzing content of an electronic document
US11550485B2 (en) * 2018-04-23 2023-01-10 Sap Se Paging and disk storage for document store
TWI671631B (en) * 2018-08-01 2019-09-11 大陸商深圳大心電子科技有限公司 Memory management method and storage controller
US10803087B2 (en) * 2018-10-19 2020-10-13 Oracle International Corporation Language interoperable runtime adaptable data collections
US11562085B2 (en) * 2018-10-19 2023-01-24 Oracle International Corporation Anisotropic compression as applied to columnar storage formats
US20200241792A1 (en) * 2019-01-29 2020-07-30 Sap Se Selective Restriction of Large Object Pages in a Database
US20200320159A1 (en) * 2019-03-22 2020-10-08 Robert R. Matthews Magnetic-enthalpy symbol H gasiform species dilute, or solution a heater fluid dynamic mechanic class system transfer the integral sub-plasma thermal cycle state of matter dilute, or solution containing the greater amount of solute synthesis extraction the solution pipeline standard a heater fluid.
US11514027B2 (en) * 2019-06-07 2022-11-29 Sap Se Paged hybrid LOBs
US11502705B2 (en) * 2019-06-21 2022-11-15 Sap Se Advanced database decompression

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120221528A1 (en) * 2011-01-14 2012-08-30 Sap Ag Logging scheme for column-oriented in-memory databases
US20130124491A1 (en) * 2011-11-11 2013-05-16 Gerald Pepper Efficient Pipelined Binary Search
US20140222418A1 (en) * 2012-04-30 2014-08-07 Martin Richtarsky Fixed string dictionary
US20160012089A1 (en) * 2014-07-10 2016-01-14 Reza Sherkat Main memory database management using page index vectors

Also Published As

Publication number Publication date
US11151126B2 (en) 2021-10-19
US20200387502A1 (en) 2020-12-10
US20200387305A1 (en) 2020-12-10
US11514027B2 (en) 2022-11-29
US20200387451A1 (en) 2020-12-10
US11386082B2 (en) 2022-07-12
US11755565B2 (en) 2023-09-12
US20200387490A1 (en) 2020-12-10
US11194789B2 (en) 2021-12-07
US20220092057A1 (en) 2022-03-24
US20200387495A1 (en) 2020-12-10
US20200387509A1 (en) 2020-12-10
US20220027354A1 (en) 2022-01-27
US11341120B2 (en) 2022-05-24
US11514028B2 (en) 2022-11-29
US11726985B2 (en) 2023-08-15
US11372845B2 (en) 2022-06-28
US20230350879A1 (en) 2023-11-02
US20200387487A1 (en) 2020-12-10
US20200387488A1 (en) 2020-12-10
US20200387486A1 (en) 2020-12-10
US11663200B2 (en) 2023-05-30

Similar Documents

Publication Publication Date Title
US20200387511A1 (en) Architecture of hybrid in-memory and paged dictionary
US9886464B2 (en) Versioned bloom filter
EP3518118A1 (en) Non-homogenous synopsis for efficient partition pruning
US10127281B2 (en) Dynamic hash table size estimation during database aggregation processing
US10810174B2 (en) Database management system, database server, and database management method
US11036517B2 (en) Database management system performing column operations using a set of SIMD processor instructions selected based on performance
EP3173947A1 (en) Paged inverted index
US9626428B2 (en) Apparatus and method for hash table access
US10565205B2 (en) Incrementally building hash collision tables
US10936595B2 (en) Deferring and/or eliminating decompressing database data
US10565204B2 (en) Hash collision tables for relational join operations
US11775496B2 (en) Data transfer and management system for in-memory database
US9449046B1 (en) Constant-vector computation system and method that exploits constant-value sequences during data processing
US20240070124A1 (en) Delta merge with location data
US9471634B2 (en) Execution of negated conditions using a bitmap
US20200311062A1 (en) Data Partitioning and Transfer System
CN114625548A (en) Look-ahead ranking for accelerated data extraction
US20180137173A1 (en) Fast Aggregation on Compressed Data

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAP SE, GERMANY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHERKAT, REZA;FLORENDO, COLIN;GOTTIPATI, CHAITANYA;AND OTHERS;SIGNING DATES FROM 20200427 TO 20200511;REEL/FRAME:052851/0401

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION