US20130060780A1 - Column Domain Dictionary Compression - Google Patents

Column Domain Dictionary Compression Download PDF

Info

Publication number
US20130060780A1
US20130060780A1 US13/224,327 US201113224327A US2013060780A1 US 20130060780 A1 US20130060780 A1 US 20130060780A1 US 201113224327 A US201113224327 A US 201113224327A US 2013060780 A1 US2013060780 A1 US 2013060780A1
Authority
US
United States
Prior art keywords
dictionary
column
tokens
query
base table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/224,327
Other versions
US10756759B2 (en
Inventor
Tirthankar Lahiri
Chi-Kim Hoang
Dina Thomas
Kirk Meredith Edson
Subhradyuti Sarkar
Mark McAuliffe
Marie-Anne Neimat
Chih-Ping Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Priority to US13/224,327 priority Critical patent/US10756759B2/en
Assigned to ORACLE INTERNATIONAL CORPORATION reassignment ORACLE INTERNATIONAL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EDSON, KIRK MEREDITH, HOANG, CHI-KIM, LAHIRI, TIRTHANKAR, MCAULIFFE, MARK, NEIMAT, MARIE-ANNE, SARKAR, Subhradyuti, THOMAS, DINA, WANG, CHIH-PING
Priority to PCT/US2012/052547 priority patent/WO2013033030A1/en
Publication of US20130060780A1 publication Critical patent/US20130060780A1/en
Application granted granted Critical
Publication of US10756759B2 publication Critical patent/US10756759B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/42Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code using table look-up for the coding or decoding process, e.g. using read-only memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3088Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78

Definitions

  • the present application is related to database systems, and in particular, storing database data for efficient access.
  • Data in a relational or object-relational database is stored as relations comprising rows or tuples that share the same columns.
  • rows There are several formats of storing rows, which are row-major and column-major.
  • row-major format column values of a single row are stored contiguously within a unit of persistent storage, such as a data block.
  • column-major format values of a column of multiple rows are stored contiguously.
  • Row-major format or column-major format are collectively referred to herein as major formats.
  • Each major format has its advantageous.
  • column-major format is advantageous for queries that require scanning a single column to generate aggregate values for those columns. Such queries occur often in the context of data warehousing and decision support systems.
  • row-major format is advantageous for queries that return rows with multiple columns or for modifications to a few number of rows.
  • the major format of a column and the compression technique used, if any, is referred to herein as a storage format.
  • Described herein are techniques that provide advantages of both row-major and column-major format.
  • FIG. 1 is diagram depicting a table and dictionaries arranged according to column domain compression, according to an embodiment of the present invention.
  • FIG. 2 is diagram depicting a computer system that may be used in an embodiment.
  • dictionary based compression techniques referred to herein as column domain dictionary compression.
  • a value is represented by a token, which is typically much smaller than the value the token represents.
  • a dictionary maps tokens to values. Occurrences of a value in a column are replaced with the token mapped by the dictionary to the value.
  • a value that is mapped to a token by a dictionary is referred to herein as the actual value.
  • the set of actual values that are to be encoded according to the dictionary is referred to herein as the domain of the dictionary.
  • column values in a set of one or more columns are the domain of a single dictionary.
  • Each column in the set is referred to as a base column with respect to the dictionary and the set is referred to as a column group.
  • a table that contains a base column for a dictionary is referred to herein as a base table with respect to the dictionary and the column group.
  • a dictionary may not only map a token to an actual value, but also to a count (“token count”) of the number of occurrences of the token in the column group.
  • the token count also represents the number of occurrences that actual value mapped to the token occurs in the column group.
  • Such information may be used to compute queries on the base table.
  • queries on a base table may be rewritten to instead reference a dictionary. The query may be computed much more efficiently without have to access the base table.
  • column domain dictionary compression provides advantages of both row-major format and column-major formant, as shall be explained later in greater detail.
  • An embodiment of the present invention may be implemented in a database system.
  • a database system includes at least one database server that manages a database.
  • a server such as a database server, is a combination of integrated software components and an allocation of computational resources, such as memory, a node, and processes on the node for executing the integrated software components, where the combination of the software and computational resources are dedicated to providing a particular type of function on behalf of clients of the server.
  • a database server governs and facilitates access to a particular database, processing requests by clients to access the database.
  • a database comprises data and metadata that is stored on a persistent or volatile memory mechanism, such as a set of hard disks.
  • Database metadata defines database objects, such as tables and columns therein, object tables, views, or complex types, such as object types, and functions.
  • Application clients interact with a database server to access data in the database managed by the database server.
  • Clients access the database data by submitting to the database server commands that cause the database server to perform operations on data stored in a database.
  • a database command may be in the form of a database statement that conforms to a database language, such as the Structured Query Language (SQL).
  • SQL Structured Query Language
  • SQL Structured Query Language
  • DML Data manipulation language
  • DDL Data definition language
  • instructions are issued to a database server to create or configure database objects, such as tables, views, or complex data types, or to control how DML statements are to be processed.
  • an application client issues database server commands via a database session.
  • a session such as a database session, is a particular connection established for a client to a server, such as a database instance, through which the client issues a series of requests (e.g., requests for execution of database statements).
  • Session state includes the data stored for a database session for the duration of the database session. Such data includes, for example, the identity of the client for which the session is established, and temporary variable values generated by processes and database components executing software within the database session.
  • a database server is an in-memory database server.
  • In-memory database servers are described in related patent applications Automated Integrated High Availability Of The In-Memory and Database Cache and the Backend Enterprise Database.
  • FIG. 1 depicts an example table and dictionaries used to illustrate column domain dictionary compression according to an embodiment.
  • FIG. 1 depicts two versions of a database table CUSTOMER, which is compressed according columnar domain dictionary compression.
  • the two versions depicted are an uncompressed and a compressed version.
  • CUSTOMER includes four columns, which are CUSTID, NAME, STATE and ZIP. In the uncompressed version, NAME, STATE, and ZIP store strings.
  • CUSTID is an id column, such as a primary key.
  • An id value (“id”) in an id column is used herein to refer to the row that contains the id. For example, row 1 in uncompressed CUSTOMER is referring to the row with CUSTID value 1, which also contains the strings CHARLIE, CA, and 94301 in columns NAME, STATE, and ZIP, respectively.
  • DICTIONARY NAME is a relational table containing columns NAME ID, NAME VALUE, and TOKCNT.
  • NAME ID is an id column
  • NAME VALUE is a column containing distinct values from NAME in CUSTOMER.
  • Values in NAME ID also serve as a token for an actual value stored in NAME VALUE.
  • An id used as a token to encode an actual value may also be referred to herein as a token id.
  • Each row in DICTIONARY NAME maps a token id in NAME ID to the actual value in column NAME VALUE.
  • a token id in NAME ID is stored in lieu of the corresponding actual value to which the NAME ID value is mapped.
  • the string value CHARLIE is mapped to token id 0.
  • rows 1 and 4 contain the actual column value CHARLIE in column NAME, while in compressed CUSTOMER, rows 1 and 4 contain the token id 0 in lieu of CHARLIE.
  • Each row in a dictionary table holds a token count that is stored in column TOKCNT.
  • row 0 of NAME DICTIONARY contains the value 2 in TOKCNT, specifying the two rows in the base table CUSTOMER store the token id 0 in column NAME and representing that the column contains two occurrences the actual value CHARLIE.
  • STATE and ZIP represent a compression group of two columns that are together encoded by STATE-ZIP DICTIONARY table.
  • STATE-ZIP DICTIONARY includes columns STATE-ZIP ID, STATE VALUE, ZIP VALUE, and TOKCNT.
  • STATE-ZIP ID holds token ids.
  • Each row in STATE-ZIP DICTIONARY maps a token id to a concatenation of the pair of column values in columns STATE VALUE and ZIP VALUE. For example, row 0 maps STATE-ZIP ID 0 to a concatenation of CA and 94301.
  • row 1 contains token id 0 to represent the combination of column values CA and 94301 in columns STATE and ZIP.
  • the token ids of a dictionary table are storage location identifiers that identify a storage location of a row within storage structures that store a dictionary table. For example, a row-id of a row identifies a data block storing the row and a storage location within the data block of where the row is stored. Using the row-id as token, a dictionary table row maps its row-id to an actual value. Given a row-id, the row may be accessed very quickly to access the mapped actual value.
  • CUSTOMER and NAME DICTIONARY and STATE-ZIP DICTIONARY are in-memory tables in an in-memory database server.
  • a row-id is an offset from a base memory address of the in-memory database.
  • column groups may be declaratively set up in a database by submitting DDL commands to a database server.
  • a COMPRESS clause in CREATE TABLE DDL statement or ALTER TABLE ADD statement can specify the one or more columns of the compression group and one or more parameters for compression.
  • the parameter may specify one of three threshold numbers: 256, 64k, and 4G, which correspond to that number of entries.
  • the size of the id used for each number is respectively 8 bits, 16 bits and 32 bits.
  • the database server when a database server receives and/or processes the DDL statement, the database server configures itself for column domain compression according to the DDL statement.
  • Such configuration entails creating a dictionary for the specified column group in the statement.
  • the configuration also includes modifying database metadata to define the dictionary, to define the column group on the base table, and to associate the dictionary with the column group.
  • the database server computes queries referencing and/or requiring access to a column group, the database server consults the dictionary to decode the tokens in the column group.
  • the dictionary is a dictionary table that may be accessed via queries in the way other database tables are accessed. All actual values contained in the column group are mapped to tokens according to the dictionary.
  • a base table with a compression group may be modified by delete, insert, and update operations, in response to, for example, database commands received by a database server.
  • Such operations may entail encoding a column value for the compression group according to a dictionary and storing the resultant token in the compression group, and modifying the dictionary to increment or decrement a token count or to insert a new entry for a new token for a new column value.
  • this operation causes the token count in TOKCNT to be incremented to 3 to reflect that the number of rows in CUSTOMER that hold token id 0 in NAME in lieu of the actual value CHAPLIN is three.
  • NAME DICTIONARY in row 2
  • this operation causes TOKCNT to be decremented to 1.
  • a new row is inserted in NAME DICTIONARY with the values 3, DOE, and 1 in columns NAME ID, NAME VALUE, and TOKCNT, respectively.
  • an advantage of column-major format is that many values of columns can be efficiently scanned to perform operations such as aggregations.
  • a reason for this efficiency is that sets of column values are stored contiguously in blocks of persistent storage, while in row-major format the column values are not stored contiguously but are instead interleaved with the values of other columns of the table.
  • the column values of a column are stored more densely while in row-major the column values of a column are stored more sparsely.
  • values that are stored contiguously are of the same data type. Being the same data type may mean that contiguously stored values may share common properties that can be exploited to enhance compressibility for certain compression techniques.
  • a dictionary provides a form of storage that is akin to column-major format, and for a given compression group, provides advantages of data density and compressibility. These advantages are achieved even though the dictionary itself may be stored in row-major format. Assuming the dictionary has a significantly smaller number of columns than that of the base table, the density of rows is greater for the dictionary than for the base table, and, inherently, so is the density of column values. Second, the mapping of the column values to token counts is, in effect, a compressed representation that may usable for certain types of computations. In fact, as described further below, the compression representation is exploited to perform certain types of queries more efficiently.
  • Queries rewritten in this may be executed more efficiently to provide the same result.
  • the rewrites are performed by a query optimizer of a database server.
  • a query optimizer evaluates a query to optimize execution of a query by rewriting the query and/or generating and selecting an execution plan for executing the query or a rewritten version thereof.
  • the query may be rewritten before generating an execution plan.
  • the rewritten query is semantically equivalent to the version of the query before rewrite.
  • a query is semantically equivalent to another when the queries request (or declare) the same results; computation of either should return the same result.
  • the fact that the dictionary table is a relational table facilitates query rewrites to access the dictionary table.
  • token_id(name) returns the token id contained in NAME.
  • query rewrites There are other types of query rewrites that may be performed. Those are illustrated using different tables than those illustrated in FIG. 1 as described below.
  • Compression group x1 and y1 are defined for a table t1.
  • Table t1_compr is the dictionary table with columns x1 value and y1 value holding the actual values of x1 and y1, respectively, and column tokcnt storing the token counts.
  • SELECT SUM (X1) FROM T1 is rewritten to:
  • Compression group x2 is defined for a table t2.
  • t2_compr is a dictionary table with column x2 value holding the actual values for x2, and column tokcnt storing the token counts.
  • a data block is a unit of persistent storage used by a database server to store database records (e.g. to store rows of a table, to store column values of a column).
  • database records e.g. to store rows of a table, to store column values of a column.
  • a data block containing the record is copied into a data block buffer in volatile memory of a database server.
  • a data block usually contains multiple rows, and control and formatting information, (e.g. offsets to sequences of bytes representing rows or other data structures, list of transactions affecting a row).
  • a data block is referred to as being atomic because, at least in part, a data block is the smallest unit of database data a database server may request from a persistent storage device. For example, when a database server seeks a row that is stored in a data block, the data block may only read the row from persistent storage by reading in the entire data block.
  • a dictionary is stored in the data block.
  • the domain of the dictionary is the data block and it may be used to tokenize column values of columns of rows in the data block.
  • the domain of the dictionary in a data block is the data block; the dictionary does not pertain to other data blocks.
  • the domain of a dictionary is the entire column.
  • the dictionary maps column values of a column (compression group) that are stored in multiple data blocks.
  • the dictionary is stored in a separate data structure (e.g. a database object, e.g. dictionary table), while in data block domain dictionary compression the dictionaries are not stored separately but in the same data structure at several levels, i.e. in the same table and even in the same atomic level of storage, the data block.
  • the techniques described herein are implemented by one or more special-purpose computing devices.
  • the special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques.
  • the special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
  • FIG. 2 is a block diagram that illustrates a computer system 200 upon which an embodiment of the invention may be implemented.
  • Computer system 200 includes a bus 202 or other communication mechanism for communicating information, and a hardware processor 204 coupled with bus 202 for processing information.
  • Hardware processor 204 may be, for example, a general purpose microprocessor.
  • Computer system 200 also includes a main memory 206 , such as a random access memory (RAM) or other dynamic storage device, coupled to bus 202 for storing information and instructions to be executed by processor 204 .
  • Main memory 206 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 204 .
  • Such instructions when stored in non-transitory storage media accessible to processor 204 , render computer system 200 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • Computer system 200 further includes a read only memory (ROM) 208 or other static storage device coupled to bus 202 for storing static information and instructions for processor 204 .
  • ROM read only memory
  • a storage device 210 such as a magnetic disk or optical disk, is provided and coupled to bus 202 for storing information and instructions.
  • Computer system 200 may be coupled via bus 202 to a display 212 , such as a cathode ray tube (CRT), for displaying information to a computer user.
  • a display 212 such as a cathode ray tube (CRT)
  • An input device 214 is coupled to bus 202 for communicating information and command selections to processor 204 .
  • cursor control 216 is Another type of user input device
  • cursor control 216 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 204 and for controlling cursor movement on display 212 .
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • Computer system 200 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 200 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 200 in response to processor 204 executing one or more sequences of one or more instructions contained in main memory 206 . Such instructions may be read into main memory 206 from another storage medium, such as storage device 210 . Execution of the sequences of instructions contained in main memory 206 causes processor 204 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • Non-volatile media includes, for example, optical or magnetic disks, such as storage device 210 .
  • Volatile media includes dynamic memory, such as main memory 206 .
  • Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
  • Storage media is distinct from but may be used in conjunction with transmission media.
  • Transmission media participates in transferring information between storage media.
  • transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 202 .
  • transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 204 for execution.
  • the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to computer system 200 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
  • An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 202 .
  • Bus 202 carries the data to main memory 206 , from which processor 204 retrieves and executes the instructions.
  • the instructions received by main memory 206 may optionally be stored on storage device 210 either before or after execution by processor 204 .
  • Computer system 200 also includes a communication interface 218 coupled to bus 202 .
  • Communication interface 218 provides a two-way data communication coupling to a network link 220 that is connected to a local network 222 .
  • communication interface 218 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 218 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 218 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • Network link 220 typically provides data communication through one or more networks to other data devices.
  • network link 220 may provide a connection through local network 222 to a host computer 224 or to data equipment operated by an Internet Service Provider (ISP) 226 .
  • ISP 226 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 228 .
  • Internet 228 uses electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link 220 and through communication interface 218 which carry the digital data to and from computer system 200 , are example forms of transmission media.
  • Computer system 200 can send messages and receive data, including program code, through the network(s), network link 220 and communication interface 218 .
  • a server 230 might transmit a requested code for an application program through Internet 228 , ISP 226 , local network 222 and communication interface 218 .
  • the received code may be executed by processor 204 as it is received, and/or stored in storage device 210 , or other non-volatile storage for later execution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In column domain dictionary compression, column values in one or more columns are tokenized by a single dictionary. The domain of the dictionary is the entire set of columns. A dictionary may not only map a token to a tokenized value, but also to a count (“token count”) of the number of occurrences of the token and corresponding tokenized value in the dictionary's domain. Such information may be used to compute queries on the base table.

Description

    FIELD OF THE INVENTION
  • The present application is related to database systems, and in particular, storing database data for efficient access.
  • BACKGROUND
  • Data in a relational or object-relational database is stored as relations comprising rows or tuples that share the same columns. There are several formats of storing rows, which are row-major and column-major. In row-major format, column values of a single row are stored contiguously within a unit of persistent storage, such as a data block. In column-major format, values of a column of multiple rows are stored contiguously. Row-major format or column-major format are collectively referred to herein as major formats.
  • Each major format has its advantageous. For example, column-major format is advantageous for queries that require scanning a single column to generate aggregate values for those columns. Such queries occur often in the context of data warehousing and decision support systems. On the other hand, row-major format is advantageous for queries that return rows with multiple columns or for modifications to a few number of rows. The major format of a column and the compression technique used, if any, is referred to herein as a storage format.
  • Described herein are techniques that provide advantages of both row-major and column-major format.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • In the drawings:
  • FIG. 1 is diagram depicting a table and dictionaries arranged according to column domain compression, according to an embodiment of the present invention.
  • FIG. 2 is diagram depicting a computer system that may be used in an embodiment.
  • DETAILED DESCRIPTION
  • In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
  • General Overview
  • Described herein are dictionary based compression techniques referred to herein as column domain dictionary compression. In dictionary based compression of a column, a value is represented by a token, which is typically much smaller than the value the token represents. A dictionary maps tokens to values. Occurrences of a value in a column are replaced with the token mapped by the dictionary to the value.
  • A value that is mapped to a token by a dictionary is referred to herein as the actual value. With respect to a dictionary, the set of actual values that are to be encoded according to the dictionary is referred to herein as the domain of the dictionary.
  • In column domain dictionary compression, column values in a set of one or more columns are the domain of a single dictionary. Each column in the set is referred to as a base column with respect to the dictionary and the set is referred to as a column group. Similarly, a table that contains a base column for a dictionary is referred to herein as a base table with respect to the dictionary and the column group.
  • A dictionary may not only map a token to an actual value, but also to a count (“token count”) of the number of occurrences of the token in the column group. The token count also represents the number of occurrences that actual value mapped to the token occurs in the column group. Such information may be used to compute queries on the base table. According to an embodiment of the present invention, queries on a base table may be rewritten to instead reference a dictionary. The query may be computed much more efficiently without have to access the base table.
  • Importantly, column domain dictionary compression provides advantages of both row-major format and column-major formant, as shall be explained later in greater detail.
  • Examplary Database System
  • An embodiment of the present invention may be implemented in a database system. A database system includes at least one database server that manages a database. Generally, a server, such as a database server, is a combination of integrated software components and an allocation of computational resources, such as memory, a node, and processes on the node for executing the integrated software components, where the combination of the software and computational resources are dedicated to providing a particular type of function on behalf of clients of the server. A database server governs and facilitates access to a particular database, processing requests by clients to access the database.
  • A database comprises data and metadata that is stored on a persistent or volatile memory mechanism, such as a set of hard disks. Database metadata defines database objects, such as tables and columns therein, object tables, views, or complex types, such as object types, and functions.
  • Application clients interact with a database server to access data in the database managed by the database server. Clients access the database data by submitting to the database server commands that cause the database server to perform operations on data stored in a database. A database command may be in the form of a database statement that conforms to a database language, such as the Structured Query Language (SQL). There are many different versions of SQL, some versions are standard and some proprietary, and there are a variety of extensions. Data manipulation language (“DML”) statements are issued to a database server to query or request changes to a database. Data definition language (“DDL”) instructions are issued to a database server to create or configure database objects, such as tables, views, or complex data types, or to control how DML statements are to be processed.
  • According to an embodiment, an application client issues database server commands via a database session. A session, such as a database session, is a particular connection established for a client to a server, such as a database instance, through which the client issues a series of requests (e.g., requests for execution of database statements).
  • For each database session established on a database instance, session state is maintained for the session. Session state includes the data stored for a database session for the duration of the database session. Such data includes, for example, the identity of the client for which the session is established, and temporary variable values generated by processes and database components executing software within the database session.
  • In an embodiment, a database server is an in-memory database server. In-memory database servers are described in related patent applications Automated Integrated High Availability Of The In-Memory and Database Cache and the Backend Enterprise Database.
  • Example Compressed Table and Dictionaries
  • FIG. 1 depicts an example table and dictionaries used to illustrate column domain dictionary compression according to an embodiment. Referring to FIG. 1, it depicts two versions of a database table CUSTOMER, which is compressed according columnar domain dictionary compression. The two versions depicted are an uncompressed and a compressed version.
  • CUSTOMER includes four columns, which are CUSTID, NAME, STATE and ZIP. In the uncompressed version, NAME, STATE, and ZIP store strings. CUSTID is an id column, such as a primary key. An id value (“id”) in an id column is used herein to refer to the row that contains the id. For example, row 1 in uncompressed CUSTOMER is referring to the row with CUSTID value 1, which also contains the strings CHARLIE, CA, and 94301 in columns NAME, STATE, and ZIP, respectively.
  • In compressed CUSTOMER, column NAME is compressed using DICTIONARY NAME and columns STATE and ZIP are compressed using DICTIONARY STATE-ZIP. DICTIONARY NAME is a relational table containing columns NAME ID, NAME VALUE, and TOKCNT. NAME ID is an id column, and NAME VALUE is a column containing distinct values from NAME in CUSTOMER. Values in NAME ID also serve as a token for an actual value stored in NAME VALUE. An id used as a token to encode an actual value may also be referred to herein as a token id.
  • Each row in DICTIONARY NAME maps a token id in NAME ID to the actual value in column NAME VALUE. In compressed CUSTOMER, a token id in NAME ID is stored in lieu of the corresponding actual value to which the NAME ID value is mapped. For example, in DICTIONARY NAME, the string value CHARLIE is mapped to token id 0. In uncompressed CUSTOMER, rows 1 and 4 contain the actual column value CHARLIE in column NAME, while in compressed CUSTOMER, rows 1 and 4 contain the token id 0 in lieu of CHARLIE.
  • Each row in a dictionary table holds a token count that is stored in column TOKCNT. For example, row 0 of NAME DICTIONARY contains the value 2 in TOKCNT, specifying the two rows in the base table CUSTOMER store the token id 0 in column NAME and representing that the column contains two occurrences the actual value CHARLIE.
  • Columns STATE and ZIP represent a compression group of two columns that are together encoded by STATE-ZIP DICTIONARY table. STATE-ZIP DICTIONARY includes columns STATE-ZIP ID, STATE VALUE, ZIP VALUE, and TOKCNT. STATE-ZIP ID holds token ids. Each row in STATE-ZIP DICTIONARY maps a token id to a concatenation of the pair of column values in columns STATE VALUE and ZIP VALUE. For example, row 0 maps STATE-ZIP ID 0 to a concatenation of CA and 94301. In compressed CUSTOMER, row 1 contains token id 0 to represent the combination of column values CA and 94301 in columns STATE and ZIP.
  • In an embodiment, the token ids of a dictionary table are storage location identifiers that identify a storage location of a row within storage structures that store a dictionary table. For example, a row-id of a row identifies a data block storing the row and a storage location within the data block of where the row is stored. Using the row-id as token, a dictionary table row maps its row-id to an actual value. Given a row-id, the row may be accessed very quickly to access the mapped actual value.
  • In an embodiment, CUSTOMER and NAME DICTIONARY and STATE-ZIP DICTIONARY are in-memory tables in an in-memory database server. A row-id is an offset from a base memory address of the in-memory database.
  • Setting Up and Maintaining Compression Groups
  • According to an embodiment, column groups may be declaratively set up in a database by submitting DDL commands to a database server. For example, a COMPRESS clause in CREATE TABLE DDL statement or ALTER TABLE ADD statement can specify the one or more columns of the compression group and one or more parameters for compression.
  • Among the one or more parameters is a maximum number of distinct values to tokenize. This number dictates the needed number of entries in a dictionary and the size (e.g byte size) of the token id. According to an embodiment, the parameter may specify one of three threshold numbers: 256, 64k, and 4G, which correspond to that number of entries. The size of the id used for each number is respectively 8 bits, 16 bits and 32 bits.
  • According to an embodiment, when a database server receives and/or processes the DDL statement, the database server configures itself for column domain compression according to the DDL statement. Such configuration entails creating a dictionary for the specified column group in the statement. The configuration also includes modifying database metadata to define the dictionary, to define the column group on the base table, and to associate the dictionary with the column group. When a database server computes queries referencing and/or requiring access to a column group, the database server consults the dictionary to decode the tokens in the column group.
  • Preferably the dictionary is a dictionary table that may be accessed via queries in the way other database tables are accessed. All actual values contained in the column group are mapped to tokens according to the dictionary.
  • A base table with a compression group may be modified by delete, insert, and update operations, in response to, for example, database commands received by a database server. Such operations may entail encoding a column value for the compression group according to a dictionary and storing the resultant token in the compression group, and modifying the dictionary to increment or decrement a token count or to insert a new entry for a new token for a new column value.
  • For example, a row is inserted into CUSTOMER with values CUSTID=7, NAME=CHAPLIN, and STATE=CA and ZIP=94301. In NAME DICTIONARY in row 2, this operation causes the token count in TOKCNT to be incremented to 3 to reflect that the number of rows in CUSTOMER that hold token id 0 in NAME in lieu of the actual value CHAPLIN is three.
  • As another example, an update is performed on row 6 to change NAME to DOE. In NAME DICTIONARY in row 2, this operation causes TOKCNT to be decremented to 1. In addition, a new row is inserted in NAME DICTIONARY with the values 3, DOE, and 1 in columns NAME ID, NAME VALUE, and TOKCNT, respectively.
  • Column-Major Format Like Representation
  • As mentioned before, an advantage of column-major format is that many values of columns can be efficiently scanned to perform operations such as aggregations. A reason for this efficiency is that sets of column values are stored contiguously in blocks of persistent storage, while in row-major format the column values are not stored contiguously but are instead interleaved with the values of other columns of the table. In effect, in column-major format, the column values of a column are stored more densely while in row-major the column values of a column are stored more sparsely.
  • Further, in column-major format, values that are stored contiguously are of the same data type. Being the same data type may mean that contiguously stored values may share common properties that can be exploited to enhance compressibility for certain compression techniques.
  • Thus, under column-major format for a column, the data density and compressibility of data can be enhanced. As a result, to read a given set of columns values less data is required to be read from storage.
  • A dictionary provides a form of storage that is akin to column-major format, and for a given compression group, provides advantages of data density and compressibility. These advantages are achieved even though the dictionary itself may be stored in row-major format. Assuming the dictionary has a significantly smaller number of columns than that of the base table, the density of rows is greater for the dictionary than for the base table, and, inherently, so is the density of column values. Second, the mapping of the column values to token counts is, in effect, a compressed representation that may usable for certain types of computations. In fact, as described further below, the compression representation is exploited to perform certain types of queries more efficiently.
  • Finally, while a dictionary is akin to column-major format in many respects, the advantages of row-major format are preserved because the base table is stored in row-major format. For example, entire rows may be retrieved from the base table without having to stitch column values read from many data blocks in storage.
  • Query Rewriting
  • Certain types of queries that require access to a base table can be rewritten to instead access a dictionary table. Queries rewritten in this may be executed more efficiently to provide the same result.
  • According to an embodiment, the rewrites are performed by a query optimizer of a database server. A query optimizer evaluates a query to optimize execution of a query by rewriting the query and/or generating and selecting an execution plan for executing the query or a rewritten version thereof. The query may be rewritten before generating an execution plan.
  • More than one kind of rewrite may be performed. Generally, the rewritten query is semantically equivalent to the version of the query before rewrite. A query is semantically equivalent to another when the queries request (or declare) the same results; computation of either should return the same result. The fact that the dictionary table is a relational table facilitates query rewrites to access the dictionary table.
  • The following are examples of queries issued against table CUSTOMER that may be rewritten to access a dictionary table.
  • 1. Distinct Operation
  • SELECT DISTINCT ZIP, STATE FROM CUSTOMER IS REWRITTEN TO: SELECT “ZIP VALUE”, “STATE VALUE” FROM “STATE-ZIP DICTIONARY”. Note that if state-zip dictionary included another column in the compression group for state-zip dictionary, the query can be rewritten to:
    • SELECT DISTINCT “NAME VALUE”, “STATE VALUE” FROM “STATE-ZIP DICTIONARY”.
  • 2. Aggregate Distinct
  • SELECT COUNT (DISTINCT NAME) FROM CUSTOMER is rewritten to:
    • SELECT COUNT (“NAME VALUE”) FROM “NAME DICTIONARY”.
  • 3. Complex Expression Evaluator
  • SELECT * FROM CUSTOMER WHERE NAME LIKE ‘% ABC %’ is rewritten to:
    • SELECT * FROM CUSTOMER WHERE TOKEN ID (NAME) IN (SELECT ID FROM “NAME DICTIONARY” WHERE “NAME VALUE” LIKE ‘% ABC %’).
  • Note the expression token_id(name) returns the token id contained in NAME. There are other types of query rewrites that may be performed. Those are illustrated using different tables than those illustrated in FIG. 1 as described below.
  • 4. Aggregation Query without Groupby:
  • Compression group x1 and y1 are defined for a table t1. Table t1_compr is the dictionary table with columns x1 value and y1 value holding the actual values of x1 and y1, respectively, and column tokcnt storing the token counts.
    SELECT SUM (X1) FROM T1 is rewritten to:
    • SELECT SUM (“X1 VALUE”* TOKCNT) FROM T1_COMPR.
  • 5. Aggregation with Groupby:
  • Using the current example for table T1.
  • SELECT SUM (X1) FROM T1 GROUP BY Y1 is rewritten to:
    • SELECT SUM (“X1 VALUE”* TOKCNT) FROM T1_COMPRESS GROUP BY Y1;
  • 6. Semi-Join
  • Compression group x2 is defined for a table t2. t2_compr is a dictionary table with column x2 value holding the actual values for x2, and column tokcnt storing the token counts.
    SELECT * FROM T1 WHERE X1 IN (SELECT X2 FROM T2) is rewritten to:
    • SELECT * FROM T1, T2_COMPR WHERE X1=“X2 VALUE”;
    Column Domain Compression Versus Other Dictionary Compression
  • For purposes of disambiguating column domain dictionary compression from other possible forms of dictionary compression in database systems, a comparison is useful. One such other form is data block domain dictionary compression.
  • A data block is a unit of persistent storage used by a database server to store database records (e.g. to store rows of a table, to store column values of a column). When records are read from persistent storage, a data block containing the record is copied into a data block buffer in volatile memory of a database server. A data block usually contains multiple rows, and control and formatting information, (e.g. offsets to sequences of bytes representing rows or other data structures, list of transactions affecting a row).
  • A data block is referred to as being atomic because, at least in part, a data block is the smallest unit of database data a database server may request from a persistent storage device. For example, when a database server seeks a row that is stored in a data block, the data block may only read the row from persistent storage by reading in the entire data block.
  • In data block domain dictionary compression, a dictionary is stored in the data block. The domain of the dictionary is the data block and it may be used to tokenize column values of columns of rows in the data block. In other words, the domain of the dictionary in a data block is the data block; the dictionary does not pertain to other data blocks.
  • In column domain dictionary compression, the domain of a dictionary is the entire column. In database systems where database data is stored in data blocks, the dictionary maps column values of a column (compression group) that are stored in multiple data blocks. Further, the dictionary is stored in a separate data structure (e.g. a database object, e.g. dictionary table), while in data block domain dictionary compression the dictionaries are not stored separately but in the same data structure at several levels, i.e. in the same table and even in the same atomic level of storage, the data block.
  • Hardware Overview
  • According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
  • For example, FIG. 2 is a block diagram that illustrates a computer system 200 upon which an embodiment of the invention may be implemented. Computer system 200 includes a bus 202 or other communication mechanism for communicating information, and a hardware processor 204 coupled with bus 202 for processing information. Hardware processor 204 may be, for example, a general purpose microprocessor.
  • Computer system 200 also includes a main memory 206, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 202 for storing information and instructions to be executed by processor 204. Main memory 206 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 204. Such instructions, when stored in non-transitory storage media accessible to processor 204, render computer system 200 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • Computer system 200 further includes a read only memory (ROM) 208 or other static storage device coupled to bus 202 for storing static information and instructions for processor 204. A storage device 210, such as a magnetic disk or optical disk, is provided and coupled to bus 202 for storing information and instructions.
  • Computer system 200 may be coupled via bus 202 to a display 212, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 214, including alphanumeric and other keys, is coupled to bus 202 for communicating information and command selections to processor 204. Another type of user input device is cursor control 216, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 204 and for controlling cursor movement on display 212. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • Computer system 200 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 200 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 200 in response to processor 204 executing one or more sequences of one or more instructions contained in main memory 206. Such instructions may be read into main memory 206 from another storage medium, such as storage device 210. Execution of the sequences of instructions contained in main memory 206 causes processor 204 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 210. Volatile media includes dynamic memory, such as main memory 206. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
  • Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 202. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 204 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 200 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 202. Bus 202 carries the data to main memory 206, from which processor 204 retrieves and executes the instructions. The instructions received by main memory 206 may optionally be stored on storage device 210 either before or after execution by processor 204.
  • Computer system 200 also includes a communication interface 218 coupled to bus 202. Communication interface 218 provides a two-way data communication coupling to a network link 220 that is connected to a local network 222. For example, communication interface 218 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 218 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 218 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • Network link 220 typically provides data communication through one or more networks to other data devices. For example, network link 220 may provide a connection through local network 222 to a host computer 224 or to data equipment operated by an Internet Service Provider (ISP) 226. ISP 226 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 228. Local network 222 and Internet 228 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 220 and through communication interface 218, which carry the digital data to and from computer system 200, are example forms of transmission media.
  • Computer system 200 can send messages and receive data, including program code, through the network(s), network link 220 and communication interface 218. In the Internet example, a server 230 might transmit a requested code for an application program through Internet 228, ISP 226, local network 222 and communication interface 218.
  • The received code may be executed by processor 204 as it is received, and/or stored in storage device 210, or other non-volatile storage for later execution.
  • In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what is intended by the applicants to be the scope of the invention, is the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.

Claims (18)

1. A method comprising steps of:
storing a dictionary that maps tokens to column values contained in a compression group that includes a column of a base table, said dictionary mapping each token of said token to said column value of said column, said dictionary being stored and maintained as a data structure separate from said base table;
storing in said column said tokens in lieu of said column values; and
computing a query that conforms to a database language, wherein computing a query comprises decoding, according to said dictionary, a set of tokens to generate corresponding column values.
2. The method of claim 1, wherein said dictionary is a dictionary table.
3. The method of claim 2, the steps further including rewriting a second query, that references said base table but not said dictionary table, into a transformed that references the dictionary table.
4. The method of claim 2, the steps further including receiving a DDL statement that defines said compression group on said base table.
5. A method comprising steps of:
storing a dictionary that maps tokens to column values contained in a compression group that includes multiple columns of a base table, said dictionary mapping each token of said tokens to a combination constituting a column value from each column of said multiple columns, said dictionary being stored and maintained as a data structure separate from said base table;
storing said tokens in said table in lieu of the combinations; and
computing a query that conforms to a database language, wherein computing a query comprises decoding, according to said dictionary, a set of tokens to generate corresponding column values for each column of said compression group.
6. The method of claim 5, wherein:
wherein said dictionary is a dictionary table; and
the steps further including rewriting a second query, that references said base table but not said dictionary table, into a transformed that references the dictionary table.
7. The method of claim 6, wherein said second query references one of said multiple columns but does not reference another of said multiple columns.
8. A method comprising steps of:
storing a dictionary table that maps tokens to column values contained in a compression group that includes a column of a base table, said dictionary mapping each token of said tokens to said column value of said column, said dictionary being stored and maintained as a data structure separate from said base table;
storing in said column said tokens in lieu of said column values; and
rewriting a query, that references said base table but not said dictionary table, into a transformed query that references the dictionary table.
9. The method of claim 8, the steps further including receiving a DDL statement that defines said compression group on said base table.
10. A non-transitory computer-readable storage medium storing one or more sequences of instructions, said one or more sequences of instructions, which, when executed by one or more processors, causes the one or more processors to perform steps of:
storing a dictionary that maps tokens to column values contained in a compression group that includes a column of a base table, said dictionary mapping each token of said token to said column value of said column, said dictionary being stored and maintained as a data structure separate from said base table;
storing in said column said tokens in lieu of said column values; and
computing a query that conforms to a database language, wherein computing a query comprises decoding, according to said dictionary, a set of tokens to generate corresponding column values.
11. The non-transitory computer-readable storage medium of claim 10, wherein said dictionary is a dictionary table.
12. The non-transitory computer-readable storage medium of claim 11, the steps further including rewriting a second query, that references said base table but not said dictionary table, into a transformed that references the dictionary table.
13. The non-transitory computer-readable storage medium of claim 11, the steps further including receiving a DDL statement that defines said compression group on said base table.
14. A non-transitory computer-readable storage medium storing one or more sequences of instructions, said one or more sequences of instructions, which, when executed by one or more processors, causes the one or more processors to perform steps of:
storing a dictionary that maps tokens to column values contained in a compression group that includes multiple columns of a base table, said dictionary mapping each token of said tokens to a combination constituting a column value from each column of said multiple columns, said dictionary being stored and maintained as a data structure separate from said base table;
storing said tokens in said table in lieu of the combinations; and
computing a query that conforms to a database language, wherein computing a query comprises decoding, according to said dictionary, a set of tokens to generate corresponding column values for each column of said compression group.
15. The non-transitory computer-readable storage medium of claim 14, wherein:
wherein said dictionary is a dictionary table; and
the steps further including rewriting a second query, that references said base table but not said dictionary table, into a transformed that references the dictionary table.
16. The non-transitory computer-readable storage medium of claim 15, wherein said second query references one of said multiple columns but does not reference another of said multiple columns.
17. A non-transitory computer-readable storage medium storing one or more sequences of instructions, said one or more sequences of instructions, which, when executed by one or more processors, causes the one or more processors to perform steps of:
storing a dictionary table that maps tokens to column values contained in a compression group that includes a column of a base table, said dictionary mapping each token of said tokens to said column value of said column, said dictionary being stored and maintained as a data structure separate from said base table;
storing in said column said tokens in lieu of said column values; and
rewriting a query, that references said base table but not said dictionary table, into a transformed query that references the dictionary table.
18. The non-transitory computer-readable storage medium of claim 17, the steps further including receiving a DDL statement that defines said compression group on said base table.
US13/224,327 2011-09-02 2011-09-02 Column domain dictionary compression Active US10756759B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/224,327 US10756759B2 (en) 2011-09-02 2011-09-02 Column domain dictionary compression
PCT/US2012/052547 WO2013033030A1 (en) 2011-09-02 2012-08-27 Column domain dictionary compression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/224,327 US10756759B2 (en) 2011-09-02 2011-09-02 Column domain dictionary compression

Publications (2)

Publication Number Publication Date
US20130060780A1 true US20130060780A1 (en) 2013-03-07
US10756759B2 US10756759B2 (en) 2020-08-25

Family

ID=46852365

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/224,327 Active US10756759B2 (en) 2011-09-02 2011-09-02 Column domain dictionary compression

Country Status (2)

Country Link
US (1) US10756759B2 (en)
WO (1) WO2013033030A1 (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070271305A1 (en) * 2006-05-18 2007-11-22 Sivansankaran Chandrasekar Efficient piece-wise updates of binary encoded XML data
US8812523B2 (en) 2012-09-28 2014-08-19 Oracle International Corporation Predicate result cache
US20150058858A1 (en) * 2013-08-21 2015-02-26 Hasso-Platt ner-Institut fur Softwaresystemtechnik GmbH Dynamic task prioritization for in-memory databases
US20150074151A1 (en) * 2013-09-11 2015-03-12 Microsoft Corporation Processing datasets with a dbms engine
US20150081453A1 (en) * 2012-04-24 2015-03-19 Jeffrey Taihana TUATINI High-throughput message generation
US20150113026A1 (en) * 2013-10-17 2015-04-23 Muhammed Sharique Rollover strategies in a n-bit dictionary compressed column store
US20150142818A1 (en) * 2013-11-21 2015-05-21 Colin FLORENDO Paged column dictionary
US20150178305A1 (en) * 2013-12-23 2015-06-25 Ingo Mueller Adaptive dictionary compression/decompression for column-store databases
US20160147776A1 (en) * 2014-11-25 2016-05-26 Colin FLORENDO Altering data type of a column in a database
US20160378783A1 (en) * 2015-06-29 2016-12-29 International Business Machines Corporation Query processing using a dimension table implemented as decompression dictionaries
US9648011B1 (en) * 2012-02-10 2017-05-09 Protegrity Corporation Tokenization-driven password generation
US9684639B2 (en) 2010-01-18 2017-06-20 Oracle International Corporation Efficient validation of binary XML data
US9697242B2 (en) 2014-01-30 2017-07-04 International Business Machines Corporation Buffering inserts into a column store database
US9760593B2 (en) 2014-09-30 2017-09-12 International Business Machines Corporation Data dictionary with a reduced need for rebuilding
US9928267B2 (en) 2014-06-13 2018-03-27 International Business Machines Corporation Hierarchical database compression and query processing
US9977802B2 (en) 2013-11-21 2018-05-22 Sap Se Large string access and storage
US10241979B2 (en) * 2015-07-21 2019-03-26 Oracle International Corporation Accelerated detection of matching patterns
US10303655B1 (en) * 2015-12-21 2019-05-28 EMC IP Holding Company LLC Storage array compression based on the structure of the data being compressed
US20200097571A1 (en) * 2018-09-25 2020-03-26 Salesforce.Com, Inc. Column data compression schemes for scaling writes and reads on database systems
US20200110820A1 (en) * 2018-10-09 2020-04-09 Oracle International Corporation Relational method for transforming unsorted sparse dictionary encodings into unsorted-dense or sorted-dense dictionary encodings
US10810198B2 (en) * 2017-09-26 2020-10-20 Oracle International Corporation Group determination based on multi-table dictionary codes
US11023430B2 (en) 2017-11-21 2021-06-01 Oracle International Corporation Sparse dictionary tree
US11023469B2 (en) 2017-11-29 2021-06-01 Teradata Us, Inc. Value list compression (VLC) aware qualification
US11126611B2 (en) 2018-02-15 2021-09-21 Oracle International Corporation Code dictionary generation based on non-blocking operations
US11139827B2 (en) 2019-03-15 2021-10-05 Samsung Electronics Co., Ltd. Conditional transcoding for encoded data
US11169995B2 (en) 2017-11-21 2021-11-09 Oracle International Corporation Relational dictionaries
US11461328B2 (en) 2020-09-21 2022-10-04 Oracle International Corporation Method for using a sematic model to transform SQL against a relational table to enable performance improvements
US11520790B2 (en) * 2020-09-17 2022-12-06 International Business Machines Corporation Providing character encoding
US11537594B2 (en) 2021-02-05 2022-12-27 Oracle International Corporation Approximate estimation of number of distinct keys in a multiset using a sample
US11580123B2 (en) * 2020-11-13 2023-02-14 Google Llc Columnar techniques for big metadata management
US11860830B2 (en) * 2013-09-21 2024-01-02 Oracle International Corporation Combined row and columnar storage for in-memory databases for OLTP and analytics workloads

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462334A (en) * 2014-12-03 2015-03-25 天津南大通用数据技术股份有限公司 Data compression method and device for packing database
US10340945B2 (en) * 2017-07-24 2019-07-02 iDensify LLC Memory compression method and apparatus
US20240086392A1 (en) * 2022-09-14 2024-03-14 Sap Se Consistency checks for compressed data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6879986B1 (en) * 2001-10-19 2005-04-12 Neon Enterprise Software, Inc. Space management of an IMS database
US20070256061A1 (en) * 2006-04-26 2007-11-01 9Rays.Net, Inc. System and method for obfuscation of reverse compiled computer code
US20080294676A1 (en) * 2007-05-21 2008-11-27 Sap Ag Compression of tables based on occurrence of values
US20090141629A1 (en) * 2007-09-28 2009-06-04 Alcatel Lucent Circuit emulation service method and telecommunication system for implementing the method
US20100205198A1 (en) * 2009-02-06 2010-08-12 Gilad Mishne Search query disambiguation
US20120109910A1 (en) * 2008-07-31 2012-05-03 Microsoft Corporation Efficient column based data encoding for large-scale data storage
US20130018853A1 (en) * 2011-07-11 2013-01-17 Dell Products L.P. Accelerated deduplication

Family Cites Families (88)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6061697A (en) 1996-09-11 2000-05-09 Fujitsu Limited SGML type document managing apparatus and managing method
US6414610B1 (en) 1997-02-24 2002-07-02 Rodney J Smith Data compression
US6018747A (en) 1997-11-26 2000-01-25 International Business Machines Corporation Method for generating and reconstructing in-place delta files
US6674924B2 (en) 1997-12-30 2004-01-06 Steven F. Wright Apparatus and method for dynamically routing documents using dynamic control documents and data streams
US6671853B1 (en) 1999-07-15 2003-12-30 International Business Machines Corporation Method and system for selectively streaming markup language documents
US6721727B2 (en) 1999-12-02 2004-04-13 International Business Machines Corporation XML documents stored as column data
US6598055B1 (en) 1999-12-23 2003-07-22 International Business Machines Corporation Generic code for manipulating data of a structured object
US7031956B1 (en) 2000-02-16 2006-04-18 Verizon Laboratories Inc. System and method for synchronizing and/or updating an existing relational database with supplemental XML data
US6675355B1 (en) 2000-03-16 2004-01-06 Autodesk, Inc. Redline extensible markup language (XML) schema
US6883137B1 (en) 2000-04-17 2005-04-19 International Business Machines Corporation System and method for schema-driven compression of extensible mark-up language (XML) documents
US6898761B2 (en) 2000-05-01 2005-05-24 Raytheon Company Extensible markup language genetic algorithm
US6941510B1 (en) 2000-06-06 2005-09-06 Groove Networks, Inc. Method and apparatus for efficient management of XML documents
EP1307828B1 (en) 2000-08-02 2004-06-09 Philipp Kutter Xml-robot
WO2002057926A1 (en) 2001-01-19 2002-07-25 Orderware Solutions Limited Data transfer and/or transformation system and method
JP3894280B2 (en) 2001-02-02 2007-03-14 インターナショナル・ビジネス・マシーンズ・コーポレーション Encoding method of XML data, decoding method of encoded XML data, encoding system of XML data, decoding system of encoded XML data, program, and recording medium
CN100337407C (en) 2001-02-05 2007-09-12 捷通公司 Method and system for compressing structured descriptions of documents
US6804677B2 (en) 2001-02-26 2004-10-12 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
US7500017B2 (en) 2001-04-19 2009-03-03 Microsoft Corporation Method and system for providing an XML binary format
JP3832807B2 (en) 2001-06-28 2006-10-11 インターナショナル・ビジネス・マシーンズ・コーポレーション Data processing method and encoder, decoder and XML parser using the method
US6865599B2 (en) 2001-09-04 2005-03-08 Chenglin Zhang Browser-to-browser, dom-based, peer-to-peer communication with delta synchronization
US20030069881A1 (en) 2001-10-03 2003-04-10 Nokia Corporation Apparatus and method for dynamic partitioning of structured documents
US20030093626A1 (en) 2001-11-14 2003-05-15 Fister James D.M. Memory caching scheme in a distributed-memory network
US7143343B2 (en) 2002-04-11 2006-11-28 International Business Machines Corporation Dynamic creation of an application's XML document type definition (DTD)
US7454760B2 (en) 2002-04-22 2008-11-18 Rosebud Lms, Inc. Method and software for enabling n-way collaborative work over a network of computers
US7434163B2 (en) 2002-05-31 2008-10-07 Sap Aktiengesellschaft Document structures for delta handling in server pages
AU2003276815A1 (en) 2002-06-13 2003-12-31 Cerisent Corporation Xml-db transactional update system
US6996571B2 (en) 2002-06-28 2006-02-07 Microsoft Corporation XML storage solution and data interchange file format structure
DE60333238D1 (en) 2002-06-28 2010-08-12 Nippon Telegraph & Telephone Extraction of information from structured documents
US7340673B2 (en) 2002-08-29 2008-03-04 Vistaprint Technologies Limited System and method for browser document editing
US6965897B1 (en) 2002-10-25 2005-11-15 At&T Corp. Data compression method and apparatus
US7080094B2 (en) 2002-10-29 2006-07-18 Lockheed Martin Corporation Hardware accelerated validating parser
KR100636909B1 (en) 2002-11-14 2006-10-19 엘지전자 주식회사 Electronic document versioning method and updated information supply method using version number based on XML
US7090318B2 (en) 2002-11-26 2006-08-15 Tci Supply, Inc. System for a sliding door with a camber
US6976340B2 (en) 2002-12-05 2005-12-20 Venturedyne Ltd. Universal access port
US7350199B2 (en) 2003-01-17 2008-03-25 Microsoft Corporation Converting XML code to binary format
US20040148278A1 (en) 2003-01-22 2004-07-29 Amir Milo System and method for providing content warehouse
US6836778B2 (en) 2003-05-01 2004-12-28 Oracle International Corporation Techniques for changing XML content in a relational database
US7308458B2 (en) 2003-06-11 2007-12-11 Wtviii, Inc. System for normalizing and archiving schemas
US7519577B2 (en) 2003-06-23 2009-04-14 Microsoft Corporation Query intermediate language method and system
US7219330B2 (en) 2003-06-26 2007-05-15 Microsoft Corporation Extensible metadata
US7113942B2 (en) 2003-06-27 2006-09-26 Microsoft Corporation Scalable storage and processing of hierarchical documents
US7302489B2 (en) 2003-08-01 2007-11-27 Sap Ag Systems and methods for synchronizing data objects among participating systems via asynchronous exchange of messages
US7349913B2 (en) 2003-08-21 2008-03-25 Microsoft Corporation Storage platform for organizing, searching, and sharing data
US8150818B2 (en) 2003-08-25 2012-04-03 International Business Machines Corporation Method and system for storing structured documents in their native format in a database
US7571391B2 (en) 2003-10-17 2009-08-04 Sap Ag Selective rendering of user interface of computer program
US7634498B2 (en) 2003-10-24 2009-12-15 Microsoft Corporation Indexing XML datatype content system and method
US7315852B2 (en) 2003-10-31 2008-01-01 International Business Machines Corporation XPath containment for index and materialized view matching
US7165063B2 (en) 2003-11-19 2007-01-16 International Business Machines Corporation Context quantifier transformation in XML query rewrite
US7991786B2 (en) 2003-11-25 2011-08-02 International Business Machines Corporation Using intra-document indices to improve XQuery processing over XML streams
US8886614B2 (en) 2004-02-03 2014-11-11 Teradata Us, Inc. Executing a join plan using data compression
US7318063B2 (en) 2004-02-19 2008-01-08 Microsoft Corporation Managing XML documents containing hierarchical database information
US7269606B2 (en) 2004-02-26 2007-09-11 Sap Ag Automatic reduction of table memory footprint using column cardinality information
US7366735B2 (en) 2004-04-09 2008-04-29 Oracle International Corporation Efficient extraction of XML content stored in a LOB
US7440954B2 (en) 2004-04-09 2008-10-21 Oracle International Corporation Index maintenance for operations involving indexed XML data
US7493305B2 (en) 2004-04-09 2009-02-17 Oracle International Corporation Efficient queribility and manageability of an XML index with path subsetting
US7877356B1 (en) 2004-05-24 2011-01-25 Apple Inc. Retaining intermediate states of shared groups of objects and notification of changes to shared groups of objects
US7769904B2 (en) 2004-06-09 2010-08-03 L-3 Communications Integrated Systems L.P. Extensible binary mark-up language for efficient XML-based data communications and related systems and methods
US7260580B2 (en) 2004-06-14 2007-08-21 Sap Ag Binary XML
US7516121B2 (en) 2004-06-23 2009-04-07 Oracle International Corporation Efficient evaluation of queries using translation
US7627589B2 (en) 2004-08-10 2009-12-01 Palo Alto Research Center Incorporated High performance XML storage retrieval system and method
US7310648B2 (en) 2004-09-15 2007-12-18 Hewlett-Packard Development Company, L.P. System for compression of physiological signals
US7464082B2 (en) 2004-11-29 2008-12-09 International Business Machines Corporation Methods for de-serializing data objects on demand
WO2010049742A1 (en) 2004-12-01 2010-05-06 Computer Associates Think, Inc. Managing elements residing on legacy systems
US20060136508A1 (en) 2004-12-16 2006-06-22 Sam Idicula Techniques for providing locks for file operations in a database management system
US7586839B2 (en) 2004-12-16 2009-09-08 Lenovo Singapore Pte. Ltd. Peer to peer backup and recovery
US7945590B2 (en) 2005-01-06 2011-05-17 Microsoft Corporation Programmability for binding data
US20060167912A1 (en) 2005-01-25 2006-07-27 Microsoft Corporation Method and system for use of subsets in serialized documents
US7441185B2 (en) 2005-01-25 2008-10-21 Microsoft Corporation Method and system for binary serialization of documents
US8346737B2 (en) 2005-03-21 2013-01-01 Oracle International Corporation Encoding of hierarchically organized data for efficient storage and processing
US7730399B2 (en) 2005-04-22 2010-06-01 Microsoft Corporation Journal file reader
US8103880B2 (en) 2005-06-03 2012-01-24 Adobe Systems Incorporated Method for communication between computing devices using coded values
US7739586B2 (en) 2005-08-19 2010-06-15 Microsoft Corporation Encoding of markup language data
US7447865B2 (en) 2005-09-13 2008-11-04 Yahoo ! Inc. System and method for compression in a distributed column chunk data store
US20070067461A1 (en) 2005-09-21 2007-03-22 Savchenko Vladimir S Token streaming process for processing web services message body information
US20070079234A1 (en) 2005-09-30 2007-04-05 Microsoft Corporation Modeling XML from binary data
US8073841B2 (en) 2005-10-07 2011-12-06 Oracle International Corporation Optimizing correlated XML extracts
US7774321B2 (en) 2005-11-07 2010-08-10 Microsoft Corporation Partial XML validation
US9460064B2 (en) 2006-05-18 2016-10-04 Oracle International Corporation Efficient piece-wise updates of binary encoded XML data
US20080077606A1 (en) 2006-09-26 2008-03-27 Motorola, Inc. Method and apparatus for facilitating efficient processing of extensible markup language documents
US7844632B2 (en) 2006-10-18 2010-11-30 Oracle International Corporation Scalable DOM implementation
US7739251B2 (en) 2006-10-20 2010-06-15 Oracle International Corporation Incremental maintenance of an XML index on binary XML data
US7627566B2 (en) 2006-10-20 2009-12-01 Oracle International Corporation Encoding insignificant whitespace of XML data
US8010889B2 (en) 2006-10-20 2011-08-30 Oracle International Corporation Techniques for efficient loading of binary XML data
US8965864B2 (en) 2006-10-31 2015-02-24 Sap Se Method and system for efficient execution and rendering of client/server interactive applications
US7836037B2 (en) 2007-10-04 2010-11-16 Sap Ag Selection of rows and values from indexes with updates
US7831540B2 (en) 2007-10-25 2010-11-09 Oracle International Corporation Efficient update of binary XML content in a database system
US9842090B2 (en) 2007-12-05 2017-12-12 Oracle International Corporation Efficient streaming evaluation of XPaths on binary-encoded XML schema-based documents
US7840554B2 (en) 2008-03-27 2010-11-23 International Business Machines Corporation Method for evaluating a conjunction of equity and range predicates using a constant number of operations

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6879986B1 (en) * 2001-10-19 2005-04-12 Neon Enterprise Software, Inc. Space management of an IMS database
US20070256061A1 (en) * 2006-04-26 2007-11-01 9Rays.Net, Inc. System and method for obfuscation of reverse compiled computer code
US20080294676A1 (en) * 2007-05-21 2008-11-27 Sap Ag Compression of tables based on occurrence of values
US20090141629A1 (en) * 2007-09-28 2009-06-04 Alcatel Lucent Circuit emulation service method and telecommunication system for implementing the method
US20120109910A1 (en) * 2008-07-31 2012-05-03 Microsoft Corporation Efficient column based data encoding for large-scale data storage
US20100205198A1 (en) * 2009-02-06 2010-08-12 Gilad Mishne Search query disambiguation
US20130018853A1 (en) * 2011-07-11 2013-01-17 Dell Products L.P. Accelerated deduplication

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9460064B2 (en) 2006-05-18 2016-10-04 Oracle International Corporation Efficient piece-wise updates of binary encoded XML data
US20070271305A1 (en) * 2006-05-18 2007-11-22 Sivansankaran Chandrasekar Efficient piece-wise updates of binary encoded XML data
US9684639B2 (en) 2010-01-18 2017-06-20 Oracle International Corporation Efficient validation of binary XML data
US9648011B1 (en) * 2012-02-10 2017-05-09 Protegrity Corporation Tokenization-driven password generation
US20150081453A1 (en) * 2012-04-24 2015-03-19 Jeffrey Taihana TUATINI High-throughput message generation
US9852453B2 (en) * 2012-04-24 2017-12-26 Responsys, Inc. High-throughput message generation
US8812523B2 (en) 2012-09-28 2014-08-19 Oracle International Corporation Predicate result cache
US20150058858A1 (en) * 2013-08-21 2015-02-26 Hasso-Platt ner-Institut fur Softwaresystemtechnik GmbH Dynamic task prioritization for in-memory databases
US10089142B2 (en) * 2013-08-21 2018-10-02 Hasso-Plattner-Institut Fur Softwaresystemtechnik Gmbh Dynamic task prioritization for in-memory databases
US20150074151A1 (en) * 2013-09-11 2015-03-12 Microsoft Corporation Processing datasets with a dbms engine
US10133800B2 (en) * 2013-09-11 2018-11-20 Microsoft Technology Licensing, Llc Processing datasets with a DBMS engine
US11860830B2 (en) * 2013-09-21 2024-01-02 Oracle International Corporation Combined row and columnar storage for in-memory databases for OLTP and analytics workloads
US10152501B2 (en) * 2013-10-17 2018-12-11 Sybase, Inc. Rollover strategies in a n-bit dictionary compressed column store
US9489409B2 (en) * 2013-10-17 2016-11-08 Sybase, Inc. Rollover strategies in a N-bit dictionary compressed column store
US20150113026A1 (en) * 2013-10-17 2015-04-23 Muhammed Sharique Rollover strategies in a n-bit dictionary compressed column store
US9977801B2 (en) * 2013-11-21 2018-05-22 Sap Se Paged column dictionary
US11537578B2 (en) 2013-11-21 2022-12-27 Sap Se Paged column dictionary
US20150142818A1 (en) * 2013-11-21 2015-05-21 Colin FLORENDO Paged column dictionary
US9977802B2 (en) 2013-11-21 2018-05-22 Sap Se Large string access and storage
US10235377B2 (en) * 2013-12-23 2019-03-19 Sap Se Adaptive dictionary compression/decompression for column-store databases
US20150178305A1 (en) * 2013-12-23 2015-06-25 Ingo Mueller Adaptive dictionary compression/decompression for column-store databases
US10824596B2 (en) 2013-12-23 2020-11-03 Sap Se Adaptive dictionary compression/decompression for column-store databases
US9697242B2 (en) 2014-01-30 2017-07-04 International Business Machines Corporation Buffering inserts into a column store database
US9928267B2 (en) 2014-06-13 2018-03-27 International Business Machines Corporation Hierarchical database compression and query processing
US9760593B2 (en) 2014-09-30 2017-09-12 International Business Machines Corporation Data dictionary with a reduced need for rebuilding
US11023452B2 (en) * 2014-09-30 2021-06-01 International Business Machines Corporation Data dictionary with a reduced need for rebuilding
US20160147776A1 (en) * 2014-11-25 2016-05-26 Colin FLORENDO Altering data type of a column in a database
US10747737B2 (en) * 2014-11-25 2020-08-18 Sap Se Altering data type of a column in a database
US20160378833A1 (en) * 2015-06-29 2016-12-29 International Business Machines Corporation Query processing using a dimension table implemented as decompression dictionaries
US20160378783A1 (en) * 2015-06-29 2016-12-29 International Business Machines Corporation Query processing using a dimension table implemented as decompression dictionaries
US9953025B2 (en) * 2015-06-29 2018-04-24 International Business Machines Corporation Query processing using a dimension table implemented as decompression dictionaries
US9946705B2 (en) * 2015-06-29 2018-04-17 International Business Machines Corporation Query processing using a dimension table implemented as decompression dictionaries
US10241979B2 (en) * 2015-07-21 2019-03-26 Oracle International Corporation Accelerated detection of matching patterns
US10303655B1 (en) * 2015-12-21 2019-05-28 EMC IP Holding Company LLC Storage array compression based on the structure of the data being compressed
US10810198B2 (en) * 2017-09-26 2020-10-20 Oracle International Corporation Group determination based on multi-table dictionary codes
US11023430B2 (en) 2017-11-21 2021-06-01 Oracle International Corporation Sparse dictionary tree
US11169995B2 (en) 2017-11-21 2021-11-09 Oracle International Corporation Relational dictionaries
US11023469B2 (en) 2017-11-29 2021-06-01 Teradata Us, Inc. Value list compression (VLC) aware qualification
US11126611B2 (en) 2018-02-15 2021-09-21 Oracle International Corporation Code dictionary generation based on non-blocking operations
US11537571B2 (en) * 2018-09-25 2022-12-27 Salesforce, Inc. Column data compression schemes for scaling writes and reads on database systems
US20200097571A1 (en) * 2018-09-25 2020-03-26 Salesforce.Com, Inc. Column data compression schemes for scaling writes and reads on database systems
US11947515B2 (en) 2018-10-09 2024-04-02 Oracle International Corporation Relational method for transforming unsorted sparse dictionary encodings into unsorted-dense or sorted-dense dictionary encodings
US11379450B2 (en) * 2018-10-09 2022-07-05 Oracle International Corporation Relational method for transforming unsorted sparse dictionary encodings into unsorted-dense or sorted-dense dictionary encodings
US20200110820A1 (en) * 2018-10-09 2020-04-09 Oracle International Corporation Relational method for transforming unsorted sparse dictionary encodings into unsorted-dense or sorted-dense dictionary encodings
US20220060195A1 (en) * 2019-03-15 2022-02-24 Samsung Electronics Co., Ltd. Using predicates in conditional transcoder for column store
US11838035B2 (en) * 2019-03-15 2023-12-05 Samsung Electronics Co., Ltd. Using predicates in conditional transcoder for column store
US11139827B2 (en) 2019-03-15 2021-10-05 Samsung Electronics Co., Ltd. Conditional transcoding for encoded data
US11184021B2 (en) 2019-03-15 2021-11-23 Samsung Electronics Co., Ltd. Using predicates in conditional transcoder for column store
US11520790B2 (en) * 2020-09-17 2022-12-06 International Business Machines Corporation Providing character encoding
US11461328B2 (en) 2020-09-21 2022-10-04 Oracle International Corporation Method for using a sematic model to transform SQL against a relational table to enable performance improvements
US11580123B2 (en) * 2020-11-13 2023-02-14 Google Llc Columnar techniques for big metadata management
US12026168B2 (en) 2020-11-13 2024-07-02 Google Llc Columnar techniques for big metadata management
US11537594B2 (en) 2021-02-05 2022-12-27 Oracle International Corporation Approximate estimation of number of distinct keys in a multiset using a sample

Also Published As

Publication number Publication date
US10756759B2 (en) 2020-08-25
WO2013033030A1 (en) 2013-03-07

Similar Documents

Publication Publication Date Title
US10756759B2 (en) Column domain dictionary compression
US10216794B2 (en) Techniques for evaluating query predicates during in-memory table scans
US11036756B2 (en) In-memory key-value store for a multi-model database
US8892586B2 (en) Accelerated query operators for high-speed, in-memory online analytical processing queries and operations
US11010415B2 (en) Fixed string dictionary
US9298775B2 (en) Changing the compression level of query plans
US11789923B2 (en) Compression units in an index block
US10678792B2 (en) Parallel execution of queries with a recursive clause
US8046352B2 (en) Expression replacement in virtual columns
US20140372470A1 (en) On-the-fly encoding method for efficient grouping and aggregation
US9569485B2 (en) Optimizing database query
US10860579B2 (en) Query planning and execution with reusable memory stack
US20150302035A1 (en) Partial indexes for partitioned tables
US9646053B2 (en) OLTP compression of wide tables
US20160224579A1 (en) Workload aware data placement for join-based query processing in a cluster
US20120078860A1 (en) Algorithmic compression via user-defined functions
US8812523B2 (en) Predicate result cache
US10558661B2 (en) Query plan generation based on table adapter
US10366067B2 (en) Adaptive index leaf block compression
US10606833B2 (en) Context sensitive indexes
US9129001B2 (en) Character data compression for reducing storage requirements in a database system
US11860906B2 (en) Partition-local partition value identifier grouping

Legal Events

Date Code Title Description
AS Assignment

Owner name: ORACLE INTERNATIONAL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LAHIRI, TIRTHANKAR;HOANG, CHI-KIM;THOMAS, DINA;AND OTHERS;REEL/FRAME:026858/0082

Effective date: 20110901

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCV Information on status: appeal procedure

Free format text: NOTICE OF APPEAL FILED

STCV Information on status: appeal procedure

Free format text: APPEAL BRIEF (OR SUPPLEMENTAL BRIEF) ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4