CN108140040A

CN108140040A - The selective data compression of database in memory

Info

Publication number: CN108140040A
Application number: CN201680057698.8A
Authority: CN
Inventors: C·克利施纳帕; V·玛瓦; A·加内什
Original assignee: Oracle International Corp
Current assignee: Oracle International Corp
Priority date: 2015-08-31
Filing date: 2016-06-30
Publication date: 2018-06-08
Anticipated expiration: 2036-06-30
Also published as: US9990308B2; WO2017039817A1; US10331572B2; EP3345101A1; US20170060772A1; US20180260338A1; CN108140040B; CN116701398A; EP3345101B1

Abstract

A kind of technology is provided, which safeguards data for persistence in one format, but the data is made to can be used for database server with more than one form.It can specially safeguard and be in independently of the data in the form of disk form in volatile memory, to reduce with keeping data associated expense synchronous with form copy on the disk of data.Select the data to be maintained in volatile memory that can be based on various factors.Once being chosen, which can also be compressed to save the space in volatile memory.Compression level can depend on the one or more factors being evaluated for selected data.Periodically the factor of the compression level of data and selection can be assessed, and based on the assessment, selected data can be removed from volatile memory or correspondingly changes the compression level of selected data.

Description

The selective data compression of database in memory

Technical field

The present invention relates to Database Systems, and relate more specifically in memory the selection of (in-memory) database Property data compression.

Background technology

In view of volatile memory (otherwise referred to as " main memory ") is becoming more and more cheaper and increasing, More data can be cached to volatile memory from disc memory device.This cache quickly to access Data, and for the application for using data, allow to perform work in a manner of faster.

But data is made to may have access in volatile memory and still have many challenges.First, usually by using Data volume also dramatically increase.Particularly, in order in volatile memory the relatively large data of complete cache (be commonly called as " big Data "), it would be desirable to extremely large amount of volatile memory.Therefore, it regardless of the size of volatile memory, still may deposit In the data (and being the pith of data in some cases) that cannot be simultaneously cached in volatile memory. These data will be accessed from disc memory device as needed and these data are loaded into cache and (replace cache In other data).

When Database Systems need to perform operation to the data of non-cache, needing first will be in disc memory device Data are read from disc memory device in the volatile memory of Database Systems.Once it is loaded into volatile memory, number According to library system operation can be performed to data.But compared with obtaining the data resided in volatile memory, from disk Storage device reads data and normally results in significant performance loss.Therefore, when Database Systems are needed to non-cache number During according to performing operation, Database Systems, which can not have from Database Systems in the fact that a large amount of volatile memory, to be undergone significantly Benefit in performance.

It is to store data into volatile memory by a kind of method on more data adaptations to volatile memory Compressed data before.Once data are compressed, it is less to be occupied in volatile memory which will be readjusted size Space.But and not all data can be compressed significantly.It is used in addition, if continually accessing compressed data Operation, then will need continually to decompress data to use.This frequent decompression is grasped using the original data that are possibly used for The computing resource of work, so as to slow down data manipulation and therefore slow down the application of data operation request.Therefore, it is deposited in volatibility The indifference data compression of reservoir medium-speed cached data has the shortcomings that apparent.

In addition, no matter be copied in volatile memory in the compressed data of what rank, sometime, database System will use up the space in volatile memory to store more data.Therefore, when volatile memory is in full negative When lotus state and Database Systems need to perform operation to the data being merely stored on disc memory device, deposited in volatibility Some data in reservoir will need to be replaced to be the data vacating space from disc memory device.This replace gets over frequency Numerous, more computing resource wastes are being moved data into and are being removed in volatile memory.Therefore, make the frequency that data are replaced most Smallization will be helpful to the efficient performance of Database Systems.

Method described in this part is the method that can be carried out, but is not necessarily the side for being previously contemplated or having carried out Method.Therefore, unless otherwise noted, it otherwise should not assume that any method described in this part is only included in this with it The prior art is just qualified as in part.

Description of the drawings

In the accompanying drawings：

Fig. 1 is mirror-image format data that are according to the embodiment while being maintained in volatile memory and in persistent storage The block diagram of the Database Systems of persistence formatted data on device；

Fig. 2A is for the block diagram of exemplary table；

Fig. 2 B be it is according to the embodiment how can the block diagram of the data item of table that be safeguarded with two kinds of forms simultaneously, A kind of middle form is form in memory；

Fig. 3 A are the processing for selection for the candidate data components of mirror image illustrated according to one or more embodiments Block diagram；

Fig. 3 B are illustrated according to one or more embodiments for specifying the mirror for being removed from volatile memory As the block diagram of the processing of data portion；

Fig. 4 is the processing for selection for the compression level of selected part according to the description of one or more embodiments Block diagram；And

Fig. 5 is the block diagram for the computer system that diagram can be used to implement techniques described herein.

Specific embodiment

In the following description, for illustrative purposes, numerous specific details are set forth in order to provide to the thorough of the present invention Understand.However, it is possible to put into practice what the present invention will be apparent without these specific details.In other cases, with The form of block diagram shows well-known structure and equipment to avoid the present invention is unnecessarily obscured.

Overview

Different data format has different benefits.Therefore, techniques described herein is for persistence in one format Data are safeguarded on ground, but enable the data that database server to be used with more than one form.In one embodiment, make number It is to be based on (on-disk) form on disk according to a kind of form available for query processing, and data is made to can be used for the another of query processing A kind of form is independently of form on disk.

Form corresponding with form on disk is referred to herein as " persistence form " or " PF ".Data in persistence form Referred to herein as PF data.It is referred to as " mirror-image format " or " MF " independently of form in the memory of form on disk.Mirror image lattice Data in formula are referred to herein as MF data.More details about the Database Systems for utilizing MF data and PF data are at this Text be known as " mirror image data apply (Mirroring Data Application) " on July 21st, 2014 U.S. submitted it is special 14/337,179 " Mirroring, In Memory, Data From Disk To Improve Query of profit application It is described in Performance ", the full content of the patent application is incorporated herein by reference by this.

According to one embodiment, mirror-image format is totally independent of persistence form.But the PF data based on persistent storage And it is not based on any persistence MF structures and MF data is initially built in volatile memory.Due to not needing to persistence MF knots Structure, therefore the user of existing database is not needed to the data in its existing database or structural transfer to another form.Cause This, can be continued using the conventional database system of disk unpacked data in the block in the case where not performing any Data Migration Carry out the data of Persistent Store Database system using these disk blocks, while still obtain due to having in faster volatile storage The compression expression of available data in device and the storage space benefit generated.

MF data carry out mirror image to the data being already present in PF data.But although all items in MF data All it is the image release (although being organized in a different format) of the respective items in PF data, it is all not in PF data Item is required for being mirrored in MF data.Therefore, MF data can be the subset of PF data.

Due to and not all PF data must be all mirrored in MF data, using selection criteria come automatically select will The data portion in MF data is mirrored onto from PF data, is such as arranged.In embodiment, about the data portion in PF data Various factors, which is used to determine, to carry out mirror image to which data portion of PF data.If for example, table have row A, B and C, and And row A has most frequent read access, then arranging A can be selected for from PF data images to MF data.Show at this In example, the promotion of performance is can see using the inquiry of row A, because the access to the MF data in volatile memory compares The access of PF data in the nonvolatile memory is faster.Acess control value in addition to considering data portion is (such as above to show Example) except, it is also contemplated that data statistics value, operation statistics value and the data type of other factors, such as data portion.

Even if in the case where inquiry may need the data that can only be met by PF data, MF data still can be used for (a) meet the retrieval of the part and/or (b) quickening of inquiry from PF data to required data.For example, MF data can be used In the specific row that identification must be retrieved from PF data.

In embodiment, can using various factors come determine whether compress MF data in data portion and for number According to the compression level of part.Compression level designation date part occupies how many volatile memory space and in retrieval data Database Systems are used for decompressing the resource spent needed for data portion during part.For example, relative to other data types, it may Less computing cost is needed to decompress some data types.Therefore, Database Systems can be automatically determined with this number The compression level of other data types should be higher than according to the compression level of the data portion of type.On the other hand, different data Part can be accessed with different frequencies.In order to save Database Systems resource, have the data portion more frequently accessed can With than having those data portions infrequently accessed to be compressed with lower rank.In addition to data portion acess control value and Except data type, other factors can include the historical performance statistical value of the set of metadata of similar data part in MF data.

In embodiment, MF data can be column format or row format, and can be different from PF data.For example, PF lattice Formula is that row are preferential (column-major), and MF forms are row major (row-major), and vice versa.No matter what is used The specific mirror-image format of kind, mirror-image format data are all based on existing PF structures (such as table and index) quilt in volatile memory It creates, without causing the change to the form of these structures.

General system framework

Fig. 1 is the block diagram according to the data base management system of one embodiment.With reference to figure 1, Database Systems 100 include easy The property lost memory 102 and persistent storage 110.Volatile memory 102 typicallys represent what is used by Database Systems 100 Random access memory, and can be realized by any amount of memory devices.In general, when a failure occurs it, storage Loss of data in volatile memory 102.

Persistent storage 110 typicallys represent any amount of permanent storage device, such as disk, flash memory (FLASH) memory and/or solid state drive.It is different from volatile memory 102, when a failure occurs it, it is stored in persistence Data in storage device 110 will not lose.Therefore, after the failure, the data in persistent storage 110 can be used for It is reconstituted in the data lost in volatile memory 102.

Database Systems 100 can receive the inquiry from one or more database application (not shown) in data It is performed on library server 120.In order to perform inquiry, using query optimizer 126, Database Systems 100 can be first by inquiry Be converted into for access PF data 112 or MF data 104 in database server 120 operator ordered set.Each behaviour The output data to another operator or one or more data to PF data 112 or MF data 104 can be included by making symbol Operation.Query optimizer 126 can generate the displacement of the operator collection of herein referred as " executive plan ".In embodiment, in order to Ensure to select optimal execution plan, the statistical value collector 128 of query optimizer 126 is collected and safeguarded about Database Systems 100 statistical value.Term " statistical value " in this article refers to descriptive data base management system and is stored in Database Systems 100 Data any numerical value represent.By various types of statistical values that statistical value collector 128 is collected into " statistical value " part In further discuss.

PF data 112 are resided in the permanent storage device 110 in PF data structures 108.PF structures 108 can be appointed The structure of PF data 112 in what tissue rank, for example, the disk block of table, column and row, row major, the preferential disk block of row etc..

Volatile memory 102 further includes the cache 106 of PF data.In cache 106, stayed with being based on data Stay in the format memory data of the form in PF data structures 108.For example, if persistence form is the disk block of row major, Cache 106 can include the cache copies of the disk block of row major.

On the other hand, MF data 104 are in the form unrelated with persistence form.For example, it is capable excellent in persistence form In the case of first uncompressed disk block, mirror-image format can be the preferential compression unit of row.Due to mirror-image format and lasting personality Formula is different, therefore generates MF data 104 by performing to become to bring in PF data 112.

Mirror image data generator 124 can perform these transformation, these transformation both can be initial in volatile memory 102 When as needed (no matter on startup or) occurs when ground is filled by MF data 104, but can volatile memory 102 therefore When being refilled after barrier by MF data 104.In embodiment, mirror image data generator 124 can be based on being described below One or more factors data are selected from PF data 112 and mirror image is carried out to generate MF data 104 to the data.

In embodiment, task manager 122 maintains MF data 104 in memory with PF data 112 one in affairs It causes.MF data 104 are consistent in affairs, because any data item provided from MF data 104 to affairs will be if from PF numbers The identical version that will provide for during according to 112 offer data item.In addition, the version reflects the premise in the Snapshot time of affairs All changes handed over, and there is no the changes submitted after the Snapshot time of affairs.Therefore, when to the quilt in MF data 104 When the affairs that the data item of mirror image makes a change are submitted, both which may be used relative to PF data 112 and MF data 104 See.On the other hand, if the affairs made a change are aborted or rollback, the change is relative to PF data 112 and MF data 104 Both rollback.

In one embodiment, task manager 122 not only ensures consistent between the reading and write-in of PF data 112 Property, and also serve to ensure that the consistency between the reading and write-in of MF data 104.Since MF data 104 are with consistent in affairs Mode keep current state, then can be from depositing so if MF data include the data needed for database manipulation in memory MF data or meet database manipulation from PF data 112 in reservoir.

Importantly, the presence of MF data 104 can the opposite database server submission database using MF data 104 The database application of order is transparent.For example, it is designed to interact with the Database Systems specially operated in PF data 112 Those identical applications can also be without modification with safeguarding the database services of MF data 104 other than PF data 112 Device interacts.Further, to those applications pellucidly, which can be located in using MF data 104 are more efficient Manage some or all of those database commands.

Mirror-image format data

MF data 104 can carry out mirror image to the subset of all PF data 112 or PF data 112.User can specify PF Which of data 112 is partly " (the in-memory enabled) that enables in memory ".It can be done with any granularity level It is specified to go out this.For example, can at least with following granularity level make to what be enable in memory it is specified：

Entire database

The table specified

The row specified

The line range specified

The subregion (partition) specified

The section (segment) specified

The expansion area (extent) specified

Any of above combination (such as the row and subregion specified)

The data of any granularity level listed above are referred to herein as " data portion ".In embodiment, mirror image Data Generator 124 can be such that the data portion of PF data 112 enables in memory or automatically by data portion with recommended user Divide and be identified as enabling in memory.

As described below, the data enabled in memory by mirror image data generator 124 be converted into mirror-image format and MF data 104 are stored as in volatile memory.Therefore, when inquiry needs the data enabled in memory, database clothes Business utensil has the option that data are provided from PF data 112 or from MF data 104.Conversion and loading can be when database start Occur or occurred in a manner of lazy or on-demand.The data enabled in non-memory are not mirrored in MF data 104.Cause This, when inquiry needs this data, database server does not have the option that the data are obtained from MF data 104.

For purposes of explanation, it will be assumed that PF data structures 108 include the table 200 illustrated in fig. 2.Table 200 includes three Arrange c1-c3 and six row r1-r6.Although how by mathematical logic graphically depicting for table 200 is organized in persistence in fig. 2 In storage device 110, but wherein data by the actual format of physical store may be entirely different.

Specifically, with reference to figure 2B, Fig. 2 B are organized with illustrating how the Data Physical that can be will reside in table 200 In persistent storage 110.In this example, the data of table 200 are stored in the disk block 202 of three row majors, 204 and In 206.The value of the value, then all row of storage line r2 of all row of 202 storage line r1 of block.204 storage line r3's of block is all The value of the value of row, then all row of storage line r4.Finally, the value of all row of 206 storage line r5 of block, then storage line r6 The value of all row.

The copy of these disks some disk blocks in the block can be temporarily stored in cache 106.In Fig. 2 B diagrams In example, the cache copies 212 of block 204 are resided in cache 106.Various cache management techniques can be used Any one of technology to manage cache 106, and delay embodiment described herein any specific high speed is not limited to Deposit administrative skill.In general, these technologies attempt to retain in volatile memory 102 it is most possible in the near future by The copy of the disk block of request.Therefore, when cache 106 uses up space, copy that relatively may be requested piece is replaced less The cache copies of possible requested disk block.

With the data in cache 106 on the contrary, the not no quilt in a manner of based on persistence form of mirror-image format data 104 It formats.In the example shown in the series of figures, mirror-image format data 104 include two column vectors 220 and 222.Each column vector storage comes From a series of continuous values of the single row of table 200.In this example, column vector 220 stores the value of the row 1 from table 200, and And column vector 222 stores the value of the row 3 from table 200.In this illustration, MF data 104 carry out the subset of PF data 112 Mirror image, because MF data 104 do not include the column vector of the row 2 of table 200.

Database Systems statistical value

In embodiment, statistical value collector 128 collects the access about data and data in Database Systems 100 Various types of statistical values.Statistical value collector 128 also collects the statistical value of the performance about Database Systems 100.At one In embodiment, statistical value is collected, and (such as subregion, row, table, view are very with different grain size rank to various types of objects To index) collect statistical value.

Database Systems statistical value is normally divided into data statistics value, acess control value and system statistics value.Hereafter will Each in these statistical value classifications is more fully described.

Data statistics value

Term " data statistics value " in this article refers to the data point for quantifying the data being stored in Database Systems Cloth and the numerical value of storage characteristics represent.In embodiment, for any granularity data portion collection data statistics value.

The non-limiting example of data statistics value includes：

The quantity of row in line number-description data portion.

The quantity of data block in block number-description data portion.

Average row length in average row length-description data portion.

The quantity (radix) of the unique value in row in row in the quantity (NDV) of different value-description data portion.

The quantity of the null value in row in row in null value number-description data portion.

The distribution of the column mean of data distribution statistical value-description data portion.Data distribution statistical value is included in row Minimum value, maximum value, average value and intermediate value.In one embodiment, other than intermediate value, distribution statistics value can also include needle To the more complicated statistical value of the frequency of value, herein referred as statistics with histogram value.Value is then based on by arranging the value in arranging originally Body or quantity based on the value in each bucket (bucket) store the statistics with histogram of these values next life in column with bucket Value.Based on statistics with histogram value, it may be determined that hot spot (frequent) value in row and non-hot (non-frequently) value and value range and Track the part of above-mentioned value and value range as data distribution statistical value.

Index statistical value-and when data portion is index, index statistical value includes the information about index, such as indexes Relationship between the quantity of rank, the quantity of index block and index and data block.

Other data statistics values can include any group of the above statistical value of the application of one or more statistical functions It closes, minimum value, maximum value, average value, intermediate value and standard deviation in one or more of such as above-mentioned statistical value.

Acess control value

Term " acess control value " in this article refers to and various types of visits to the data portion in Database Systems Ask and (such as read or (modification) is written) relevant numerical value.Term " thermal map (heat map) " in the application refers to reflect logarithm According to the statistical value of the comparison resistant frequency of the access of the various data portions in library system 100：With nearest and more frequently access Data portion is " heat ", and the data portion without accessing and less frequently accessing recently is " cold ".In embodiment, base How close the acess control value of data portion is calculated in the quantity of the access of data portion and based on having on these access times. Correspondingly, access nearer on the time of data portion is endowed the relatively early of comparison data part and accesses more weights.Base In the acess control value of the data portion calculated, some data portions have the access than other data portions " hotter ". If for example, the data of the row in a data portion have been accessed five times within last day, and another data Row in part has only been accessed once, then the first data portion will be than the second data portion " warmmer ".

In embodiment, in order to distinguish the access level of thermal map, the different range of acess control value is designated as " heat " visit It asks and is accessed with " cold ".If for the acess control value of data portion in the range of " heat ", which is represented as heat.Equally Ground, if the acess control value of data portion, in the range of " cold ", which is represented as cold.

In embodiment, the acess control values of multiple data portions based on the part as data portion calculates number According to the acess control value of part.It polymerize the acess control value of multiple data portions to generate the acess control value of data portion.It should Polymerization can any one of the aggregate function based on such as average value, intermediate value, minimum value and maximum value or combination.For example, The thermal map of database includes the polymerization of the acess control value of the data portion of the part as database.

Although describing " heat " access level and " cold " access level herein for data portion acess control value, this The method of text contemplates the other embodiments of other access levels with acess control value.For example, acess control can be added The range of value represents " to warm up " rank, " warm " rank represent data portion with more access than " cold " rank but with than The less access of " heat " rank.

In embodiment, individual thermal map is collected for read access type and write-access type.Identical data portion Different acess control values can be had based on access type by dividing.For example, the thermal map statistical value of the table comprising journal entries has Thermo writing accesses, because continually sending order to Database Systems 100 using the application of the table to add new journal entries. But if this log sheet mainly due to historical purpose, and the application of Database Systems 100 is not just in active request Table data, then the log sheet is with cold read access.

In embodiment, also each action type of data portion collects acess control value.Term " operation statistics value " Refer to that the numerical value for describing the measurement for the different types of operation that data portion performs represents.It is handled when by query optimizer 126 During inquiry, query optimizer 126 selects specific executive plan for the inquiry.Specific executive plan is included by Database Systems Data portion involved in 100 pairs of inquiries operates the one or more of execution.Therefore, query optimizer 126 record about The information of the action type performed in inquiry to each data portion.Use this information from each query execution, system Evaluation collector 128 calculates the sum of the certain types of operation performed to specific data part, and index is based on such as table 200 Scanning quantity.

For example, Database Systems 100 are using with integer type row " EventID ", varying length string type column The table " EventLog " of " RecordText " and date-time type column " Timestamp " manages database, for these numbers According to each in part, statistical value collector 128 collects operation statistics value.For example, it is looked into when Database Systems 100 receive It askes:When " select*from EventLog where EventID=5 ", query optimizer 126 can generate different inquiries and hold Row plan, for traversing the table, identification row corresponding with the EventID values for 5 and returning to the row.In one case, EventLog tables are supported to arrange the traversal based on index of the index as traversal by the use of EventID.Therefore, query optimizer 126 It generates and selects the executive plan of unique index scan operation for including arranging EventID to select row corresponding with the value for 5. In response to this operation, statistical value collector 128 increases the behaviour of the unique index scan operation for EventID column datas part Take statistics value.In another case, it if EventLog tables support full index traversal but are not based on EventID row, inquires The selection of optimizer 126 includes the executive plan of the operation of full index scanning arranged EventID.Correspondingly, statistical value collector 128 increase the operation statistics value of the full index scan operation for EventID row.In another scene, completely not for base EventLog is indexed in the traversal of index, and therefore query optimizer 126 generates and selects to arrange for EventID Full scan operation.Therefore, the operation statistics value to the EventID full scans arranged will be increased.

Other non-limiting examples of operation (collecting operation statistics value for the operation) include：Attended operation, wherein connecting The row of data portion are connect to create another data portion；Sorting operation, wherein each value pair in the row based on data portion The row are ranked up；And division operation, wherein the value in row is grouped and/or is polymerize based on function.

In embodiment, operation statistics value is opposite, and with identical data part is performed it is other types of It is measured in the comparison of operation.For example, query optimizer 126 keeps all types of operations performed to EventID row It counts, and the operation statistics value of each action type is recalculated based on the percentage of all operations to the row.It can replace For ground or additionally, operation statistics value is absolute, and behaviour is measured in the absolute quantity of the operation performed in data portion Take statistics value.For example, as described above, corresponding operation statistics value can be led to the operation of each type of EventID row Increase.

System statistics value

The numerical value that the various system resources that term " system statistics value " in this article refers to Database Systems measure represents.

The non-limiting example of system statistics value includes：

The consumption of CPU uses-description CPU.

CPU speed-with average in per second cpu cycle number describes CPU speed.

I/O (input/output) search when it is m- description for position read from persistent storage data when Between.

I/O transmission speeds-descriptive data base system reads data in single read requests from persistent storage Rate.

Maximum I/O handling capacities-descriptive data base system is read and the maximum data rate of write-in persistent storage.

Read and be written to Parallel I/O handling capacities-descriptive data base system in parallel the average of persistent storage According to rate.

The average time of monolithic read access time-single piece of description random read take from persistent storage.

Multiple pieces of average time is sequentially read in polylith read access time-description from persistent storage.

Polylith reads the average quantity of sequence block of the counting-description in polylith reading.

Any combinations of above-mentioned example also include the minimum value based on said one or multiple statistical values, maximum value, put down Mean value, intermediate value, standard deviation one or more statistical functions application.

Term " performance statistics value " in this article refers to Database Systems in the number to being loaded into volatile memory The numerical value of resource consumption when operation is performed according to part represents.Database Systems 100 use such as meter of CPU, memory and I/O Resource is calculated to perform operation to the data portion of MF data.Database Systems measured using above system statistical value it is this for The resource consumption of operation.In embodiment, for each action type of one or more data portions of MF data collection property It can statistical value.Then it is performance statistics value is associated with the data portion for its collection performance statistics value and action type.Property Can statistical value also with data portion is compressed in volatile storage when performing operation (be directed to the operation collection performance statistics value) Compression level in device 102 is associated.Other information is also associated with performance statistics value, the data statistics value of such as data portion With data type (data statistics value and data type for the data portion collect performance statistics value).

As non-limiting example, statistical value collector 128 collects the row to being stored in the row in volatile memory 102 The CPU performance statistical values of sequence operation.When Database Systems 100 perform sorting operation to row, statistical value collector 128 is remembered The measurement that the CPU of the operation is used is recorded, and the measurement being collected into and average CPU already present to the sorting operation of the row are used It is averaged.If this is listed in volatile memory 102 and is recompressed with different stage, statistical value collector 128 is also The new example of CPU performance statistical values is created for the row.Therefore, in this case, individual performance statistics value will with it is same The different compression levels of one row are associated.The other examples of performance statistics value based on for data portion each is certain types of Scan operation (such as, full table scan or index scanning) measures obtained I/O and searches time, I/O transfer rates or monolithic reading Time.

For collecting the trigger of statistical value

Statistical value can be collected in a number of different ways.For example, depending on implementing, in phase compilation phase of query processing Between, during the execution of inquiry and/or it is no inquiry handled when collect statistical value.In one embodiment, some Statistical value is collected to be changed by database and be triggered：For example, the change of database value (new entry, row, column, table) or system resource are (attached Add processing capacity or memory) change triggering statistical value collection.

A kind of method for collecting acess control value is information of the record about the access of data portion when accessing.Work as inquiry When performing and retrieving or change the data in data portion, the record that execution also results in use information is associated with data portion. Use information can include the type of access time stamp and access/operation.In order to collect the system of the access in Database Systems 100 Evaluation, statistical value collector 128 then read about data portion access information and based on the access information generation thermal map or Operation statistics value.Alternatively, acess control value is updated during the query compiler stage.For example, before inquiry is performed, inquiry Optimizer 126 determines the access inquired and the type of operation and the operation data portion to be accessed.Therefore, query optimizer 126 can directly update the acess control value of the data portion influenced by query execution.

Selection carries out mirror image to which PF data

In embodiment, which PF data is carried out with mirror image and when loads the decision of these PF data based on various Factor.For example, if system has the small database of a large amount of volatile memory 102 and opposite school, mirror image data generation Device 124 can carry out mirror image to entire database.In such an example, all PF data will also be mirrored in MF data 104 In.On the other hand, if there are the volatile memory 102 of relatively small amount, optimal ways relative to the size of database It is that mirror image only is carried out to the very small part of database.Therefore, in embodiment, the free space in volatile memory 102 Amount is a factor for selecting to want the data portion of mirror image from PF data 112.

In another embodiment, it is used to determine whether the data portion in PF data 112 being mirrored to volatile storage The other factors of MF data 104 in device 102 include data type relevant with data portion and statistical value.Term " candidate portion Point " refer to data portion in PF data 112, statistical value and data type are assessed for the data portion to determine whether to pair Data portion carries out mirror image.In embodiment, candidate section is the data portion that enables in memory or in memory The subdivision of the data portion of enabling.Can data portion be specified with any granularity level for being described above for data portion Subdivision.According to such embodiment, if table is chosen or is automatically recognized as the data portion enabled in memory, The subdivision for then forming the column or row of the table is also the data portion enabled in memory.

In embodiment, candidate section is assessed and is generated " membership data " as each candidate section using being based on factor. Term " membership data " refers to assess whether to carry out mirror image to candidate section in volatile memory and reflect to make time Select any information of estimation benefit that data portion is mirrored in volatile memory.In this example, membership data be from 0 to 100 numerical score value, wherein for mirror image, score increases, and when factor is denied ground when factor is assessed for certain Score reduces during assessment.If factor is assessed as being conducive to the mirror image of candidate section, the membership data of candidate section is claimed For " by changing for certain ".On the other hand, if factor is assessed as being unfavorable for the mirror image of candidate section, membership data is no Surely it changes.Correspondingly, based on modified membership data, then by maximum lift system in the case of given available memory The candidate section of overall performance be selected for mirror image.The candidate section for being selected for mirror image is referred to herein as " choosing Determine part ".The membership data of selected part can be changed based on the change of factor.In embodiment, 100 base of Database Systems It determines to remove one or more selected parts from volatile memory 102 in the membership data of change.

Fig. 3 A are the processing for selection for the candidate data components of mirror image described according to one or more embodiments Block diagram.One or more of box described below box can be omitted, repeat and/or be executed in different order. Therefore, the specific arrangement of the box shown in Fig. 3 A should not be construed as limited to the range of this method.

At box 305, the candidate section to be evaluated for mirror image is selected from PF data 112.For candidate section Generate initial membership data.Initial membership data can with neither be conducive to nor be unfavorable for candidate section mirror image or can be with Based on the membership data for candidate section previous evaluation.At box 310, the acess control value of candidate section is commented Estimate.If acess control value instruction candidate section has the thermal map of the heat of read access, candidate section, which is accessed frequently, to be used for It reads.Therefore, the performance that mirror image will promote Database Systems 100 is carried out to candidate section in volatile memory 102, and The membership data of candidate section is changed to reflect this point for certain.On the other hand, if candidate section has read access Cold thermal map, then membership data change with being denied.

Alternatively, or in addition, the write-access thermal map of candidate section is assessed also at box 310.If it waits Choosing part has the thermal map of the heat of write-access, then candidate section is written to frequently, and is therefore once mirrored, must not Infrequently by mirror image again to avoid outmoded in volatile memory 102, so as to 100 resource of consumption data library system.Cause This, in this case, the membership data of candidate section is changed with being denied.On the other hand, if candidate section is with cold Write-access thermal map, then membership data changed for certain, with reflection once be mirrored, candidate section is just less easy to become old It is old and be less easy on mirror image again waste 100 resource of Database Systems.

For example, previously described EventLog tables and EventID row are represented as the data portion enabled in memory, And EventLog tables and EventID row are assessed to be loaded into easily based on 0 to 100 scores of corresponding membership data In the property lost memory.Initially, both EventLog tables and EventID row, which are assigned, gives tacit consent to score 50, and Database Systems 100 are configured as being conducive to for having the mirror image of any data portion for the membership data score for being more than 75.

The acess control value of EventLog tables and EventID row is retrieved from statistical value collector 128.The access retrieved Statistical value indicates that two data portions all have hot read access.Correspondingly, the qualification of EventID and EventLog data portions Fractional data increases by 10 to have value 60.But, EventLog table access statistical value different from EventID row acess control values Indicate that thermo writing accesses.Therefore, the score of EventLog reduces 10, and back to value 50, and the score of EventID remains unchanged.

At box 315, which assesses the data type of candidate section.The data type of candidate section can be with The degree that candidate section is compressed in volatile memory 102 is influenced, and therefore when candidate section is mirrored onto volatibility It is (such as following into one can to influence the storage space occupied by candidate section for the data type of candidate section when in memory 102 What step discussed).If the data type compression of candidate section is good, candidate section occupied in volatile memory 102 compared with Few storage space, and therefore the membership data of candidate section is changed for certain.On the other hand, if data type pressure It contracts bad, then candidate section can occupy a large amount of storage spaces if being mirrored.Therefore, there is the time of this data type The membership data of part is selected to change with being denied.For example, large object data type (LOB) (such as binary large object type (BLOB) or character blob type (CLOB)) be less prone to compress, and with very high size of data limitation.Therefore, Even if when being compressed, lob data categorical data can also occupy a large amount of storage spaces.Therefore, if candidate section includes LOB numbers According to type, then membership data is changed with being denied.Similarly, it is empty may also to occupy a large amount of memories for variable length string data type Between, because the size limitation of the data of this data type is larger.Therefore, although variable length string data type compares lob data Compression is more preferable, but the candidate section with this data type is assessed in which may be denied.It is on the contrary, related to date and time Data type usually have that fixed size limits and usually compression is good.Therefore, if candidate section be with the date and The data type of time correlation, then the membership data of candidate section changed for certain.Membership data pair can be similarly directed to Other data types are assessed.

Continue the assessment example of EventLog tables and EventID row, the data type of data portion is assessed.For EventLog tables are assessed to change EventLog tables to all row (arranging including EventID) in EventLog tables Membership data, and EventID is arranged, the data type of EventID row is assessed to change EventID qualification numbers According to.Since EventID row are the integer types that do not influence compression level, EventID membership data scores are not changed in assessment. Other than EventID is arranged, EventLog tables are also arranged comprising date-time type column Timestamp and character string type Record.It is good accordingly, due to date-time type and character string type row compression, and integer type does not influence to compress, because This is averaged EventLog tables, the membership data score of these row.Average mark be calculated as 6.66 ((10+10+0)/ 3), and the membership data score of EventLog correspondingly increases to 66.66.

At box 320, operation statistics value is assessed for candidate section.If to carry out mirror image to candidate section This mode for improving access speed is accessed into candidate section, then the membership data of candidate section is changed for certain.For example, such as Fruit is based on operation statistics value, and operation most common to candidate data components is I/O intensities, and such as full table scan and sequence are grasped Make, then carrying out mirror image to candidate section fully improves performance, and the membership data of candidate data is changed for certain.Separately On the one hand, if needing to load other data portions from PF data to the common operation of candidate section, candidate section is carried out Mirror image may not be obviously improved such performance operated, and membership data is changed with being denied.For example, usually to being more than The data portion of one performs connection or division operation, therefore the processing is unfavorable for the candidate section for having these to operate, and this The membership data of a little candidate sections is assessed with being denied.

Continue the assessment example of EventLog tables and EventID row, EventLog tables are retrieved from statistical value collector 128 Operation statistics value with EventID row is for further assessing membership data.In this illustration, the behaviour based on EventLog Take statistics value, and the most frequent operation of EventLog is sorting operation.Therefore, the membership data score of EventLog further from Value 66.66 increases to value 76.66.

On the other hand, in this illustration, the most frequent operation of EventID row is attended operation.Due to attended operation It may need another data portion to be loaded into volatile memory for assessing attended operation, therefore EventID is arranged Membership data score be reduced to value 50 from value 60.

At box 325, another factor for mirror image is evaluated as to the data statistics value of candidate section. If candidate section has the mass data for a large amount of storage spaces being occupied in volatile memory 102, membership data It changes with being denied.On the other hand, if candidate section is with less data or with the data that can be compressed well, Membership data is changed for certain.For example, the processing assesses line number, block number and average row length statistical value to determine number According to size.The quantity of quantity and null value to the different value in row is assessed to determine if candidate section is mirrored The candidate section is by the degree of compression.If the quantity of different value is low, candidate section will preferably compress, and therefore qualification Data will be changed for certain.If the quantity statistics value of null value is high, candidate section also will preferably compress, and therefore provide Lattice data are also changed for certain.Similarly, the data distribution statistical value of candidate section is assessed to determine whether candidate Part will be compressed well.If candidate section has many commonly-used value/value range, candidate section will compress well, And therefore membership data is changed for certain.On the contrary, if candidate section has many value/value ranges being of little use, Candidate section will compress bad, and therefore membership data is changed with being denied.

Continue EventLog tables and EventID row assessment example, from statistical value collector 128 retrieve EventLog and The data statistics value of EventID row is for further assessing membership data.The data statistics of EventLog tables and EventID row Value instruction row counts, and the average row that row counting is approximately equal to the other data portions managed by Database Systems 100 counts, system Evaluation collector 128 is being directed to other data portions and is collecting data statistics value.But since EventID row are comprising each Capable unique integral, therefore EventID row have high base value, and other row of EventLog have low-down radix.Phase Ying Di, EventID row membership data score reduce 10, and data statistics value of the EventLog tables membership data based on each row is commented The average mark estimated and changed.The data statistics value of Timestamp and Record row is conducive to mirror image, and EventID row Data statistics value is unfavorable for mirror image.Therefore, EventLog tables membership data score can increase：(10+10-10)/3=3.33, To have value 80.

At box 327, the membership data of candidate section stores in association with candidate section.At box 330, if The membership data of candidate section is conducive to the mirror image of candidate section, then the processing proceeds to box 335 with to Database Systems 100 System statistics value assessed.Otherwise, the membership data of candidate section is unfavorable for the mirror image of candidate section, and the processing is right After proceed to box 305, wherein selecting next candidate section for assessing discussed factor.

Continue the assessment example of EventLog tables and EventID row, the membership data score of EventLog tables is assessed as 80, and the membership data score of EventID row is assessed as 40.EventLog membership datas score is more than 75, and therefore EventLog tables are conducive to be mirrored to volatile memory 102, and EventID row are unfavorable for being mirrored to volatile memory 102.

At box 335, the system statistics value of Database Systems 100 is assessed to determine whether system can be handled Candidate section is mirrored to volatile memory 102 and accesses candidate section from volatile memory 102.Though access MF data So faster, but additional system resource may be needed due to the additional decompression contracting step of MF data, such as begged for further below Opinion.If data portion is run ragged, system further spends resource for compressed data part.Correspondingly, in box At 335, based on system statistics value, which determines whether Database Systems 100 have to access from volatile memory 102 The additional resource of candidate section.For example, if CPU is enough to access candidate from volatile memory 102 using low and CPU speed Data portion then selects candidate section to access candidate section for mirror image and from volatile memory 102.If on the contrary, CPU Using high and/or CPU underspeeds to access candidate data components from volatile memory, then by the time point behind some System statistics value is reappraised.Similarly, in embodiment, the processing is relevant to the I/O and read access time of system Statistical value is assessed, to determine whether disk access speed is insufficient and therefore needs to carry out mirror image to more PF data to accelerate Data access.For example, such as maximum I/O handling capacities of fruit tray and/or parallel handling capacity is low and each discharge plate is read and searches the time It is long, then more PF data are carried out with mirror image by lifting system performance because data will from volatile memory 102 rather than from It is accessed in persistent storage 110.But if statistical value (such as maximum I/O handling capacities) is high and during polylith/monolithic Between it is low, then Database Systems 100 are with enough speed from 110 service data of persistent storage, and therefore mirror image To avoid consumption system resource until being not required and being delayed to later.If the system system assessed at box 335 Evaluation is conducive to mirror image, then at box 340, it is available in volatile memory 102 to determine which proceeds to box 345 Space.

At box 345, the space in volatile memory 102 is assessed to determine whether candidate section can be deposited It stores up in the free space in volatile memory 102.If the processing determines to be not present for candidate section at box 345 Available memory space, then the processing continues to select next candidate section at box 305.Otherwise, which proceeds to Box 350 with specify in volatile memory 102 carry out mirror image candidate section.

In a related embodiment, if at box 345, which determines that there is no use in volatile memory 102 In the available memory space of candidate section, then the processing determines whether to remove from volatile memory volatile Property memory 102 in any data portion so that storage space can be used for candidate section.If the qualification of data portion The membership data that data have been changed to be unfavorable for mirror image or data portion is less than the membership data of candidate section, then from volatibility The data portion is removed in memory 102.The flow chart for removing data portion is further described in figure 3b.

In embodiment, data portion includes the subdivision of data, based on the factor of the qualification for mirror image discussed And the subdivision itself of data is differently assessed.Candidate section can be evaluated as to volatile memory to be mirrored onto 102, and the subdivision of candidate section can be assessed as not being mirrored onto volatile memory based on the membership data of subdivision 102.According to one embodiment, candidate section is changed to exclude to be assessed as the subdivision not being mirrored.For example, if table is based on Factor discussed above has been assessed as being mirrored, and the particular column of table has been assessed as not being mirrored, then the table It is mirrored in the case of no particular column, that is, other than particular column, other row of the table will be mirrored.

Continue the example of EventLog tables and EventID row, EventLog tables can be in no EventLog tables EventID is mirrored onto in the case of arranging in volatile memory 102.Since EventID row have been assessed as being unfavorable for mirror Picture, and on the contrary, EventLog tables (EventID row are the subdivisions of the EventLog tables) have been assessed as being conducive to mirror Picture, therefore EventLog tables can be mirrored in the case where no EventID is arranged.

According to other embodiments, the subdivision of candidate section is assessed as being mirrored, and candidate section is evaluated in itself Not to be mirrored.In such embodiments, candidate subdivision is mirrored onto volatile memory 102, and candidate section Other parts are not mirrored.

In embodiment, the described processing of Fig. 3 A is periodically carried out in Database Systems 100.It can be by data The time or be somebody's turn to do by the time execution determining based on system statistics value of Database Systems 100 that the user of library system 100 specifies Processing.Alternatively, based on the processing can be performed by trigger that user specifies.Trigger can include volatile storage The availability in the space in device 102 or any of above designated statistics value of Database Systems 100 cross over one or more threshold values.

Fig. 3 B are described according to one or more embodiments for specified from the mirror image data part that MF data remove The block diagram of processing.One or more of box described below box can be omitted, repeat and/or hold in a different order Row.Therefore, the specific arrangement of the box shown in Fig. 3 B should not be construed as limited to the range of this method.

At box 365, the mirror to be evaluated for being removed from volatile memory 102 is selected from MF data 104 As part.Initial membership data is generated for the mirror image data part.Initial membership data can be with neither be conducive to nor be unfavorable for The removal of mirror image data part can be based on the membership data for mirror image data part previous evaluation.In box 370 Place similar to the assessment described at the box 310 of Fig. 3 A, assesses the acess control value of mirror image data part.It is similar The modification of the membership data described at the box 310 in Fig. 3 A, the membership data of mirror image data part are also changed.

At box 375, similar to the assessment described at the box 315 of Fig. 3 A, to the data class of mirror image data part Type is assessed.Similar to the modification of the membership data described at the box 315 of Fig. 3 A, the membership data of mirror image data part Also it is changed.

At box 380, similar to the assessment described at the box 320 of Fig. 3 A, unite to the operation of mirror image data part Evaluation is assessed.Similar to the modification of the membership data described at the box 320 of Fig. 3 A, the qualification number of mirror image data part According to also being changed.

At box 385, similar to the assessment described at the box 325 of Fig. 3 A, unite to the data of mirror image data part Evaluation is assessed.Similar to the modification of the membership data described at the box 325 of Fig. 3 A, the qualification number of mirror image data part According to also being changed.

At box 387, membership data and the mirror image data part of mirror image data part store in association.In box At 390, if the membership data of mirror image data part is conducive to remove mirror image data part from volatile memory 102, Mirror image data part is specified for removing at box 395.In one embodiment, by the membership data of mirror image data part Whether the membership data that mirror image data part is compared to determine with threshold value membership data is conducive to from volatile memory 102 Middle removal mirror image data part.In another embodiment, the membership data of mirror image data part is selected with one or more Part membership data is compared.If relative to selected part membership data, mirror image data part membership data is less favorable for Mirror image, then mirror image data part be specified for removing from volatile memory 102.On the other hand, if mirror image data portion Membership data is divided to be more advantageous to mirror image than selected part membership data, then a part of the mirror image data part as MF data 104 It is retained in volatile memory 102.

In embodiment, data portion includes the subdivision of data, is used for from volatile memory based on what is discussed The factor of the qualification of removal and the subdivision itself of data is differently assessed.Candidate section can be assessed as will be from easy It is removed in the property lost memory 102, and the subdivision of candidate section can be assessed as protecting based on the membership data of subdivision It stays in volatile memory 102.According to one embodiment, modification mirrored portion is deposited with excluding to be assessed as to be retained in volatibility Subdivision in reservoir 102.For example, if the table in volatile memory 102 is commented based on factor discussed above Estimate to be to be removed, and the particular column of the table has been assessed as being retained in volatile memory 102, then the table is in no spy It is removed in the case of fixed row, that is, other than particular column, other row of table will be removed from volatile memory 102.

According to other embodiments, the subdivision of candidate section is assessed as being removed from volatile memory 102, and Candidate section is assessed as being retained in volatile memory 102 in itself.In such embodiments, candidate subdivision is from easy It is removed in the property lost memory 102, and the other parts of candidate section are retained in volatile memory.

The tissue of MF data

According to one embodiment, as a part for MF data 104, selected part and selected part in PF data 112 Copy is differently formatted.Even if MF data 104 use the form different from PF data 112, MF data 104 also with PF numbers It is organized according to the corresponding mode of 112 tissue.For example, in persistent storage 110, PF data 112 can be stored In the block in residing in expansion area, wherein expansion area is organized into section again.In these cases, in volatile memory 102 It is interior, it can be based on the expansion area belonging to MF data 104 and/or section come tissue MF data 104.Therefore, column vector 220 can be drawn It is divided into vector portion, wherein each vector portion corresponds to expansion area and/or the section of particular range.

In expansion area, data are usually ranked up by rowid (line identifier).Similarly, in one embodiment, MF numbers It is ranked up according to 104 based on rowid.For example, the value in column vector 220 is based on for the PF numbers in block 202,204 and 206 The identical rowid being ranked up according to 112 is ranked up.Specifically, rowid r1 are located next to before rowid r2, so In column vector 220 r1c1 be located next to before r2c1, and in block 202 r1c1 to r1c3 be located next to r2c1 to r2c3 it Before.

In alternative embodiments, some or all of data item in MF data 104 is not pressed in MF data 104 Rowid is ranked up.Storage data item may be useful in a different order, if for example, different sequences generates More compressions.As another example, column vector can be initially ranked up by rowid.But when new update " is closed And to " in column vector when (as will be discussed in greater detail below), can by newer value be attached to the tail portion of existing column vector Existing column vector is decompressed and re-compressed to avoid needs.

When the data item in column vector by rowid sequence when, can establish in memory and index on rowid, with Quick positioning data item associated with any given rowid in MF data 104.Without discuss point by point data item in row vector whether base It is ranked up in rowid, rowid can be established to item (rowid- by safeguarding the vector of rowid combined with column vector To-item mapping).The other embodiments of the tissue of MF data 104 are described in mirror image data application.

When receiving the inquiry to be performed by Database Systems 100, MF data 104 and PF data 112 can be used The two is inquired to meet.In the tissue embodiment corresponding with the tissue of PF data 112 of MF data 104, database is taken It is engaged in for device, the database manipulation between separation MF data 104 and PF data 112 is easier.For example, database server can be with It determines to look into come expansion area (for example, expansion area 1 to the expansion area 10) met relative to a range using MF data 104 It askes, and to be looked into using PF data come expansion area (for example, expansion area 11 to the expansion area 20) met relative to another range It askes.The other embodiments for meeting inquiry using MF data 104 are further described in mirror image data application.

Compression

As set forth above, it is possible to compress MF data 104.But according to one embodiment, and not all MF data be required for Identical degree is compressed or be compressed to identical mode.Such as, if it is determined that the data quilt of the row c1 from table 200 It frequently uses, and the data from row c3 are not frequently used, then can be compressed in column vector 220 with mild compression or nothing Data, and data of the high compression in column vector 222.

Compression algorithm and by algorithm use for compress MF data 104 each part compression level can by with Family is specified or can be automatically determined by database server based on various factors.Possible compression algorithm is included but not It is limited to the compression based on dictionary, run length encoding (run-length encoding, RLE), Ozip compressions etc..It is 2014 It is retouched in the U.S. Patent application 14/337/113 " OZIP Compression And Decompression " that on July 21, in submits Ozip compressions are stated, the content of the patent application is incorporated herein by reference.

The factor how each part of the determining MF data 104 used by database server is compressed can include example The frequency and how many data and how many volatile memory are available in the portion that such as each part is accessed. In general, a part for MF data 104 is more continually accessed, data are fewer to be compressed.As another generality rule Then, available for storing the volatile memory of MF data 104 less and/or the size of the part of MF data 104 is bigger, compression It is more.

Fig. 4 is the frame for being used to select the processing of compression level for selected part described according to one or more embodiments Figure.One or more of box described below box can be omitted, repeat and/or be executed in different order.Therefore, The specific arrangement of the box shown in Fig. 4 should not be construed as limited to the range of this method.

At box 410, in embodiment, the acess control value of selected part is assessed.If acess control value Indicate that selected part has the thermal map of heat of read access, then the selected part is frequently accessed from volatile memory 102. In order to promote the performance of Database Systems 100, compression level is reduced for selected part so that Database Systems 100 will be every Fewer resource is spent to decompress selected part in secondary access.On the other hand, if candidate section has the cold of read access Thermal map then increases compression level to save the space in volatile memory 102.

At box 415, in embodiment, which assesses the data type of candidate section.Selected part Data type may influence the degree that selected part can be compressed in volatile memory 102.Some data types are than it Its data type is preferably compressed, and therefore the data portion of this data type occupies less identical information content Memory.Therefore, it is assessed, compression level can be revised as to selected part by the data type to selected part Data type is optimal.For example, large object data type (LOB) (such as binary large object type (BLOB) or character blob Type (CLOB)) it is less prone to compress, and therefore the compression level of the selected part with lob data type reduces.On the contrary Ground, the usually compression of the relevant data type of date and time are good.Therefore, if selected part is the relevant number of date and time According to the compression level increase of type, then selected part.Other data types can be similarly evaluated to change selected portion The compression level divided.

At box 420, in embodiment, operation statistics value is assessed for selected part.If based on selected Partial operation statistics value is computing resource intensity to most of operation of selected part, then is pressed with higher compression rank Contracting selected part may deteriorate the performance of Database Systems 100.It will be into the decompression of selected part from higher compression rank One step aggravates the resource burden of Database Systems 100, it is contemplated that resource-intensive operation will additionally aggravate Database Systems 100 Resource burden.On the other hand, if for selected part use compared with level of compression, the performances of Database Systems 100 by It influences smaller.Correspondingly, if operation statistics value represents that the resource-intensive for selected part operates at box 420, Lower value is modified to for the compression level of selected part.It represents to grasp for non-resource intensity for the most of of selected part The operation statistics value of work may cause compression level to increase.For example, if based on the operation statistics value of selected part, it is most common Operation be resource-intensive grouping or sorting operation, then compression level be modified to lower value.But if based on operation statistics Value, most of operations of selected part are all the less scanning based on index of CPU intensive type compared with full scan, then compression stage Do not increase.

At box 425, in embodiment, the performance statistics of pair one or more data portions similar to selected part Value is assessed.Similar data portion is data type of the Database Systems 100 based on those data portions and selected part And/or similitude in data statistics value and think those similar data portions.For example, if selected part is date-time Data type, and Database Systems 100 include the performance statistics value of other data portions of date-time data type, then select The performance statistics value of those other set of metadata of similar data parts is selected for assessing.Similarly, if other data portions have similarity number The null value of amount can then assess the performance statistics value of other set of metadata of similar data parts the compression for determining selected part Rank.In various embodiments, similarity can change.In some embodiments, if data type matches, data are established The similitude of type, and in other embodiments, if data type shares sizable general character, establish similitude (for example, The char data type of different length is considered similar (including varchar)).In addition, in some embodiments, use number Value percentage determines similitude.If for example, numerical value of the data statistics value of different data part in corresponding data statistical value 10% in match, then different data part is considered similar.

Then the performance statistics value of set of metadata of similar data part is assessed to determine operation going through under various compression levels History performance.In embodiment, if performance statistics value can be used for different compression levels, which generates optimal compression grade Not.Optimal compression rank can be obtained by comparing the performance statistics value of the different compression levels of set of metadata of similar data part.Once Optimal compression rank is determined, the compression level of selected part is increased by, remained unchanged or reduced with closer or accurate matching Optimal compression rank.For example it is assumed that selected part with two of associated performance statistics value in Database Systems 100 Data portion has the null value of similar amt.The two data portions are respectively provided with compression level three (3) and nine (9).Based on property Energy statistical value, if the first set of metadata of similar data part with compression level 9 is than the second set of metadata of similar data part with compression level 3 The resource more consumed can be neglected, then Database Systems 100 select compression level 9 as optimal compression rank.Then, selected part Compression level increase is either remained unchanged or is reduced with close or matching optimal compression rank 9.

In other embodiments, the performance statistics value of set of metadata of similar data part is compared with the predetermined threshold in performance statistics value Compared with.If performance statistics value is unsatisfactory for threshold value, the compression level of selected part relative to set of metadata of similar data part compression level It is adjusted in a manner of being adjusted to closer to meet threshold value by the resource consumption of selected part.For example, threshold value can represent The maximum CPU of action type consumption is used.If the CPU of the performance statistics value of set of metadata of similar data part is less than CPU using statistical value Using threshold value, then the compression level of selected part be adjusted to higher than set of metadata of similar data part compression level or remain unchanged.Phase Instead, if CPU uses threshold value using statistical value higher than maximum CPU, the compression level of selected part is adjusted to less than phase The compression level of likelihood data part is to consume resource less in volatile memory 102.

At box 430, once the compression level of selected part is determined based on the assessment of box before, just to determine Compression level compress selected part.Then, compressed selected part is loaded into volatile memory 102.Each In kind of embodiment, about when creating the decisions of MF data 104 and be based on various other factors.If for example, when system starts There is time enough can be used, then compressed selected part is pre-loaded in volatile memory 102 on startup.Other In embodiment, as further described in mirror image data application, selected compression section is loaded on demand.

In embodiment, any amount of selected part can be directed to and repeat the box 410-430 of Fig. 4 to determine suitably Selected part is simultaneously loaded into volatile memory 102 by compression level.

In some embodiments, the described processing of Fig. 4 is periodically carried out in Database Systems 100.Can by Time that the user of Database Systems 100 specifies or by Database Systems 100 based on its system statistics value the determining time Perform the processing.Alternatively, based on the processing can be performed by trigger that user specifies.Trigger can include volatile Property memory 102 in space availability or Database Systems 100 any of above designated statistics value cross over one or more A threshold value.

The processing in Fig. 4 is repeated to the selected part in MF data 104 may cause change to be loaded in volatibility The compression level of selected part in memory 102.If the processing obtains the MF data in volatile memory 102 104 selected part needs have the compression level different from current compression rank, then with new compression level by selected part weight Selected part, is then loaded back into volatile memory 102 by new compression.

Even if can in MF data 104 compressed data item, it is also possible to do not need to decompression MF data 104 to use the MF Data.For example, the U.S. Patent application 13/708 such as submitted on December 7th, 2012,054 is described, directly to compression after Value perform Vector Processing operation, the entire content of the patent application is incorporated into this article by reference.As also described in this application , column vector value upon compression is had been delivered to after CPU, it is also possible to decompression is performed on chip.

In some embodiments compressed in MF data 104, MF data 104 are organized into volatile memory 102 " compression unit in memory " (IMCU).Each IMCU stores a different set of MF data, this group of data can correspond to or not Corresponding to selected part.The mapping of data to IMCU indicate which selected part is included in each IMCU.In one embodiment, The mapping of data to IMCU can be a parts for the metadata of MF data.It is further described in mirror image data application IMCU。

In order to determine whether MF data 104 have the data needed for processing inquiry, and if it does, in order to find place MF data 104 needed for reason inquiry, database server need to know which PF data is mirrored in MF data 104, and Specifically, which specific PF data is by each IMCU mirror images.Therefore, according to one embodiment, by first number of MF data 104 According to being maintained in volatile memory 102, which includes mapping of the data to IMCU.In mirror image data application further Describe the metadata of MF data 104.

In some embodiments, it provides when PF data 112 are performed with update, is inserted into and deletes for keeping mirror image lattice The mechanism synchronous with PF data 112 of formula data 104.It is further described in mirror image data application and keeps MF data 104 and PF Data 112 synchronize.

In order to reduce the amount of holding required decompression operation synchronous with MF data 104 and decompression, one embodiment profit MF data 104 are implicitly updated with daily record.Daily record is further described in mirror image data application.

Because MF data 104 are only the mirror image (although in a different format) of some data in PF data 112, therefore All data item in MF data 104 are comprised in also in PF data 112.Therefore, for needing to access in MF data 104 Any inquiry for the data item being mirrored, database server have from MF data 104, from PF data 112 or partly from MF Data 104 and part obtain the selection of the data from PF data 112.It is described in mirror image data application in response to looking into Ask and determine wherefrom to obtain the various embodiments of data.

Loading and removing MF data

Before being serviced based on MF data 104 inquiry, one or more of Database Systems 100 can be based on MF data 104 are loaded into volatile memory 102 by " loading " event.In one embodiment, MF data 104 are in database System is pre-loaded to when starting in volatile memory.It for example, can be for comprising will be by the data of 104 mirror image of MF data The data structure enabled in memory of item performs preloading by background process before to perform any database manipulation.

MF data 104 can once be created an IMCU.In more example environments, the metadata of persistent storage can be used To determine which database instance is which MF data is pre-loaded in.Such metadata can be arrived including such as MF data The mapping of IMCU (MF-data-to-IMCU) and the mapping of IMCU to example (IMCU-to-instance).

Instead of simply preloading MF data 104, MF can be generated when accessing corresponding PF data by database manipulation Some or all of data 104.For example it is assumed that assignment database example 1 carrys out the column vector of the row c1 and c2 of trustship table 200. Instead of building and loading these column vectors on startup, database instance 1 can not generate MF data initially.On the contrary, data Library example 1 can be waited for until database command needs the scanning of table 200.Because MF data are not yet created, therefore completely Scanning is performed based on PF data 112.During the scanning, by the value needed for the column vector for accessing structure c1 and c2.Therefore, may be used To establish the column vector of c1 and c2 in the time, accessed without incurring any additional disk.

The on-demand loading of MF data can be used in combination with preloading.It will be in reality for example, can be created when example 1 starts Some in example 1 in MF data 104 in trust.The other parts of MF data 104 can be built when by queried access data.

In one embodiment, user can set config option to indicate to preload which MF data and want on-demand Which MF data loaded.In alternative embodiments, database server automatically determine MF data 104 which be partly by pre-add Carry and which be partly loaded on demand.In general, data item is used more frequent, database server will More automatically data item may be pre-loaded in MF data so that have first database manipulation for needing data item There is the option that data are obtained from MF data 104.

Finally, MF data 104 are removed from volatile memory 102 when Database Systems 100 are powered off.But according to One or more embodiments are specified for the data portion in the MF data 104 removed in one in addition to power-off or more It is removed from volatile memory 102 in a " removing " event.For example, Database Systems 100 can have the task of planning Periodically to remove specified data portion from volatile memory 102.In another embodiment, for removal The reality of data portion, which is specified, can lead to the removing event for removing the data portion.In yet another embodiment, event is removed It is triggered by the available memory space (or any other system statistics value) of volatile memory 102 across threshold value.For example, such as Fruit volatile memory 102 only has 5% free storage, then triggers removing event to be moved from volatile memory 102 Except the data portion for being specified for removing.

Data base management system

Data base management system (DBMS) manages database.DBMS can include one or more database servers.Number The database data and database dictionary for including being stored in non-volatile storage mechanism (a such as group hard disc) according to library.Database Data can be stored in one or more data capsules.Each container includes record.Data in each record are organized into One or more fields.In relationship DBMS, data capsule is referred to as table, and record is referred to as going, and field is referred to as arranging. In OODB Object Oriented Data Base, data capsule is referred to as object class, and record is referred to as object, and field is referred to as attribute.Its Its database architecture framework can use other terms.

In embodiment, DBMS may be coupled to the node cluster that can store one or more tables or including that can store The node cluster of one or more tables.DBMS can be similar to the table that management is stored in persistent storage and be stored to manage Table on node cluster.

User to database server by submitting database server to perform the data stored in the database The order of operation is interacted with the database server of DBMS.User can be run on client computers and database The one or more application of server interaction.Multiple users herein can also be collectively referred to as user.

As it is used herein, " inquiry " refers to database command and can be the database language for meeting database language The form of sentence.In one embodiment, it is structured query language (SQL) for expressing the database language of inquiry.In the presence of perhaps The SQL of more different editions, some versions be standard and some be proprietary, and there are various extensions.Data definition language Speech (" DDL ") order is distributed to database server to create database object or database object be configured, data Library object such as table, view or complex data type.SQL/XML is used when being and XML data is manipulated in Object-relational Database SQL common extension.Although describing the embodiment of the present invention using term " SQL " herein, the present invention is not limited only to The specific database queries language, and can be used in combination with other data base query languages and construction.

Client can send out one by establishing database session (herein referred to as " session ") to database server Series requests such as perform the request of inquiry.Session includes arriving database server (such as database reality for what client was established Example) specific connection, client can send out a series of requests by the connection.Database server can be safeguarded about the meeting The session state data of words.Session state data reflects the current state of session, and (can be built comprising user for the user Vertical session) identity, used by user service, example, language and the character set data of object type, the resource about session The statistical value used, the temporary variable value of the process generation by performing software in session and pointer and variable and other letters The storage of breath.Session state data can also be included as the executive plan parameter of session configuration.

Multinode data base management system is made of the interconnecting nodes of the access of same database shared.In general, node It is interconnected via network and the access to sharing and storing device is shared with the degree of variation, for example, to one group of disk drive and depositing Store up the share and access of the data block on this group of disk drive.Node in multinode Database Systems can be mutual via network The form of one group of computer (for example, work station, personal computer) even.Alternatively, which can be the node of grid, Wherein grid is made of node, other server blades interconnection of these nodes in the form of server blade and in rack.

Each node managed database server in multinode data base management system.The clothes of such as database server Business device is the combination of integrable software component and computational resource allocation, wherein computing resource such as memory, node and for locating The processing that integrable software component is performed on device is managed, the combination of software and computing resource, which is exclusively used in representing one or more clients, holds The specific function of row.

Can be operation certain database service by the resource allocation of multiple nodes in multinode Database Systems The software of device.Each combination of software and resource allocation from node is referred to herein as " server instance " or " real The server of example ".Database server can exist including some or all of multiple database instances, these database instances It is run on independent computer including individual server blade.

Ardware overview

According to one embodiment, techniques described herein is realized by one or more dedicated computing equipments.Dedicated computing Equipment can be hard-wired to perform the technology or can include such as permanently being programmed to carry out one of the technology Or multiple application-specific integrated circuits (ASIC) or field programmable gate array (FPGA) digital electronic device or quilt can be included Programming performs the one or more of the technology with the program instruction in firmware, memory, other storage devices or combination Common hardware processor.This dedicated computing equipment can also combine customization firmware hardwired logic, ASIC or FPGA and the volume of customization Journey realizes the technology.Dedicated computing equipment can be desk side computer system, portable computer system, portable equipment, Networked devices realize any other equipment of the technology with reference to hardwired and/or programmed logic.

For example, Fig. 5 is the block diagram for the computer system 500 that diagram can realize the embodiment of the present invention on it.It calculates Machine system 500 includes bus 502 or other communication mechanisms for transmitting information and is coupled to handle with bus 502 The hardware processor 504 of information.Hardware processor 504 can be such as general purpose microprocessor.

Computer system 500, which further includes, is coupled to bus 502 for storage information and the finger to be performed by processor 504 The main memory (herein also referred to as " volatile memory ") 506 of order, such as random access memory (RAM) or other dynamics are deposited Store up equipment.Main memory 506 can be also used for during the execution of instruction to be performed by processor 504 store temporary variable or Other average informations.When being stored in 504 addressable non-transient storage media of processor, which makes meter for this instruction Calculation machine system 500 becomes being customized the special purpose machinery to operation specified in execute instruction.

Computer system 500, which further includes, is coupled to bus 502 for storing static information and instruction for processor 504 Read-only memory (ROM) 508 or other static storage devices.Provide the storage of such as disk, CD or solid state drive Equipment 510, and storage device 510 is coupled to bus 502 for storage information and instruction.

Computer system 500 can be coupled to via bus 502 display 512 of such as cathode-ray tube (CRT) with In to computer user show information.Input equipment 514 including alphanumeric key and other keys is coupled to bus 502, is used for Information and command selection are transmitted to processor 504.The user input equipment of another type is cursor control 516, such as mouse, Trace ball or cursor direction key, for 504 direction of transfer information of processor and command selection and for controlling display Cursor movement on 512.This input equipment usually have two axis (first axis (such as x) and second axis (such as Y) two degree of freedom in), this allows equipment planar designated position.

Computer system 500 can use customization firmware hardwired logic, one or more ASIC or FPGA, firmware and/or journey Sequence logic realizes techniques described herein, customization firmware hardwired logic, one or more ASIC or FPGA, firmware and/or journey Sequence logical AND computer system combines so that computer system 500 becomes or computer system 500 is programmed for special machine Device.According to one embodiment, by computer system 500 in response to performing be comprised in main memory 506 one or more The processor 504 of one or more sequences of instruction and the technology for performing this paper.Can from such as storage device 510 another Storage medium reads this instruction in main memory 506.Performing the sequence of instruction being comprised in main memory 506 makes It obtains processor 504 and performs process steps described below.In alternative embodiments, hard-wired circuit system can replace software Instruction is applied in combination with software instruction.

As used in this article, term " storage medium " refer to storage make the data that machine operates in a specific way and/ Or any non-state medium of instruction.This storage medium can include non-volatile media and/or Volatile media.It is non-volatile Property medium include such as CD, disk or solid state drive, such as storage device 510.Volatile media includes dynamic memory, Such as main memory 506.The common form of storage medium include for example floppy disk, flexible disk, hard disk, solid state drive, tape or Any other magnetic data storage medium of person, CD-ROM, any other optical data carrier, any object with sectional hole patterns Manage medium, RAM, PROM and EPROM, FLASH-EPROM, NVRAM, any other memory chip or cassette tape.

Storage medium is completely different with transmission medium but can be used in combination with transmission medium.Transmission medium participation is being deposited Information is transmitted between storage media.For example, transmission medium includes coaxial cable, copper wire and optical fiber, including drawing comprising bus 502 Line.Transmission medium can also take the form of sound wave or light wave, such as be generated in radio wave and infrared data communication Those waves.

Various forms of media can participate in one or more sequences of one or more instruction to be carried to processor 504 For performing.For example, instruction can be initially carried on the disk or solid state drive of far-end computer.Far-end computer can be with Instruction is loaded into the dynamic memory of the far-end computer and is sent using modem through telephone wire and is instructed.It is located at The local modem of computer system 500 can receive data and use RF transmitter by data on the telephone line It is converted into infrared signal.Infrared detector may be received in the data carried in infrared signal and appropriate circuit system System can put data on the bus 502.Data are carried to main memory 506 by bus 502, and processor 504 is from the primary storage The retrieval of device 506 and execute instruction.The instruction received by main memory 506 can optionally before being performed by processor 504 or It is stored in later in storage device 510.

Computer system 500 further includes the communication interface 518 for being coupled to bus 502.Net is coupled in the offer of communication interface 518 The bidirectional data communication of network link 520, wherein network link 520 are connected to local network 522.For example, communication interface 518 can be with It is that corresponding types are arrived in integrated services digital network (ISDN) card, cable modem, satellite modem or offer The modem of the data communication connection of telephone wire.As another example, communication interface 518 can be supplied to compatible The LAN card of the data communication connection of LAN (LAN).It can also realize Radio Link.In any this realization, communication interface 518 send and receive electric signal, electromagnetic signal or the optical signal for carrying the digit data stream for representing various types information.

Network link 520 usually provides data to other data equipments by one or more networks and communicates.For example, network Link 520 can be provided to master computer 524 by local network 522 or be grasped to by Internet Service Provider (ISP) 526 The connection of the data set of work.ISP 526 so pass through now commonly referred to as " internet " 528 global packet data communication Network provided data communication service.Local network 522 and internet 528 both using carry digit data stream electric signal, Electromagnetic signal or optical signal.By the signals of various networks and on network link 520 and pass through the signal of communication interface 518 It is the exemplary forms of transmission medium, wherein numerical data is carried to computer system 500 or from computer system by the signal 500 carry numerical data.

Computer system 500 can be sent by (one or more) network, network link 520 and communication interface 518 to disappear Breath and reception data, including program code.In the Internet example, server 530 can by internet 528, ISP 526, Local network 522 and the transmission of communication interface 518 are used for the requested code of application program.

The code received by processor 504 can be performed and/or be stored in when it is received storage device 510 or For performing later in other non-volatile memory devices.

In the foregoing specification, the embodiment of the present invention is described by reference to many details, these details It can be different due to implementation.Therefore, the description and the appended drawings are considered as illustrative rather than restrictive meaning.The model of the present invention The unique and exclusive instruction enclosed and the desired content as the scope of the present invention of applicant, are one generated from the application The written range and equivalent scope of the concrete form generated with this group of claim of group claim, this group of claim include Any subsequent correction.

Claims

1. a kind of method, including：

The database in persistent storage is safeguarded by database server；

Wherein described database includes：

First data portion, have been designated as it is qualified in volatile memory be used for mirror image；And

Second data portion, also have been designated as it is qualified in the volatile memory be used for mirror image；

Corresponding with first data portion the first membership data of storage, wherein first membership data, which reflects, makes described the The estimation benefit that one data portion is mirrored in the volatile memory；

Corresponding with second data portion the second membership data of storage, wherein second membership data, which reflects, makes described the The estimation benefit that two data portions are mirrored in the volatile memory；

Wherein described first membership data and second membership data by the database server be based on it is one or more because Usually determine；

Based on first membership data, automatically determine first data portion and should not be loaded into the volatibility and deposit In reservoir；

Based on second membership data, the volatile storage should be loaded by automatically determining second data portion In device；And

In response to the load events that data portion is caused to be mirrored in the volatile memory, second data will be come from Partial data are loaded into the volatile memory, without any data from first data portion are loaded into In volatile memory.

2. the method as described in claim 1, wherein first data portion is the subset of second data portion.

3. the method as described in claim 1, wherein the data from second data portion are loaded into the volatibility The data from second data portion are included the use of in memory to build compression unit in memory.

4. method as claimed in claim 3 wherein in the persistent storage, carrys out tissue with the form of row major From the data of second data portion, and in the memory in compression unit, tissue is come with the form for arranging preferential From the data of second data portion.

5. the method as described in claim 1, wherein one or more of factors include at least one of the following：With for Specific data part is determining the associated acess control value in the specific data part, described specific of specific membership data The data type of data portion, operation statistics value associated with the specific data part or with the specific data part Associated data statistics value.

6. method as claimed in claim 5,

The thermal map that the acess control value wherein associated with the specific data part includes the specific data part is united Evaluation；And

The method further includes：

The thermal map statistical value of the specific data part is assessed,

If the thermal map statistical value instruction changes the specific qualification to the frequent read access of the specific data part Data are to increase the possibility that the specific data part will be loaded into the volatile memory；

If the thermal map statistical value instruction changes the specific money to the infrequently read access of the specific data part Lattice data are to reduce the possibility that the specific data part will be loaded into the volatile memory.

7. method as claimed in claim 5,

The method further includes：

The thermal map statistical value of the specific data part is assessed,

If the thermal map statistical value instruction frequently writes into access to specific data part,

Then changing the specific membership data will be loaded into reducing the specific data part in the volatile memory Possibility.

8. method as claimed in claim 5, further includes：

The data type of the specific data part is assessed,

If the data type is blob type, the specific membership data is changed to reduce the specific data part The possibility in the volatile memory will be loaded into.

9. the method as described in claim 1 further includes：

At least one of the following is assessed for certain data portion of second data portion：

The acess control value of certain data portion,

The data type of certain data portion,

For the performance statistics value of the one or more compressed data parts similar to certain data portion；

Based on the assessment, the compression level of certain data portion is changed；

Data in certain data portion are compressed with modified compression level；And

Compressed data are loaded into the volatile memory.

10. method as claimed in claim 9, wherein for the one or more of pressures similar to certain data portion The performance statistics value of contracting data portion is assessed as indicating that different compression levels consumes the database server more Few resource；And the compression level of certain data portion is changed to approach the different compression level.

11. the method as described in claim 1, wherein：

Wherein described database includes having been designated as the qualified third number for being used for mirror image in the volatile memory According to part；And

The method further includes：

Corresponding with the third data portion third membership data of storage, wherein the third membership data, which reflects, makes described the The estimation benefit that three data portions are mirrored in the volatile memory；

When the data from the third data portion are mirrored in the volatile memory and from the described first number When not being mirrored in the volatile memory according to the data of part, the database server performs following steps：

Recalculate first membership data；

Recalculate the third membership data；

Based on first membership data, automatically determine the data from first data portion should be loaded into it is described In volatile memory；

Execution is compared, wherein the comparison the third membership data and it is following in any one between：

First membership data；Or

Threshold value；

Based on the comparison, the database server determines that the data from the third data portion will be from the volatibility It is removed in memory；

In response to the removing event that the copy for leading to mirror image data part will be removed from the volatile memory, from it is described easily The data from the third data portion are removed in the property lost memory；

In response to the load events that the copy for leading to data portion will be mirrored in the volatile memory, will come from described The data of first data portion are loaded into the volatile memory.

12. one or more non-transient storage media of store instruction, described instruction are performed by one or more computing devices When so that perform the method as described in any one of claim 1-11.