WO2015139193A1 - Method and apparatus for conversion of data storage formats - Google Patents

Method and apparatus for conversion of data storage formats Download PDF

Info

Publication number
WO2015139193A1
WO2015139193A1 PCT/CN2014/073576 CN2014073576W WO2015139193A1 WO 2015139193 A1 WO2015139193 A1 WO 2015139193A1 CN 2014073576 W CN2014073576 W CN 2014073576W WO 2015139193 A1 WO2015139193 A1 WO 2015139193A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage format
database
storage
conversion
Prior art date
Application number
PCT/CN2014/073576
Other languages
French (fr)
Chinese (zh)
Inventor
李怀洲
姜旭栋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201480000190.5A priority Critical patent/CN105378716B/en
Priority to PCT/CN2014/073576 priority patent/WO2015139193A1/en
Publication of WO2015139193A1 publication Critical patent/WO2015139193A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Definitions

  • the present invention relates to the field of database system technologies, and in particular, to a method and an apparatus for converting a data storage format.
  • the data stored in the database has a certain storage format, and different storage formats can affect the performance of the database system.
  • the row storage structure can quickly complete data loading, and the adaptation to dynamic load is high, but the row storage structure cannot support fast query processing, and the space utilization rate is not easily improved. Although a better compression ratio is achieved by entropy coding and column correlation performance, complex data storage implementations result in increased decompression overhead.
  • the column storage structure distributes the different domains of the same record and the reconstruction of these records will bring a large overhead, but the column storage can avoid reading unnecessary columns, and compressing similar data in one column can achieve higher Compression ratio.
  • the embodiment of the invention provides a method and a device for converting a data storage format, which can enable the database system to dynamically determine the underlying storage format of the database according to the load situation, realize the automatic adjustment and optimization function of the system, reduce the throughput of the query statement, and improve the system. Storage space and utilization.
  • an embodiment of the present invention provides a controller, including:
  • a decision unit configured to: if the system of the database storing the data in the first storage format satisfies a conversion condition set by the user, determine a second storage format required for storing the data in the database;
  • a storage format conversion unit configured to convert a storage format of data in the database from the first storage format to the second storage format
  • a feedback unit configured to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sort the data in the database after the storage format conversion, and test whether the sorting time meets the second preset condition; If the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, performing external service, or if the compression ratio does not satisfy the first preset condition, and/or the sorting If the time does not satisfy the second preset condition, the feedback information is sent to the decision unit, so that the decision unit needs to re-determine the data stored in the database according to the core indicator threshold to be set by the user in the feedback information.
  • the second storage format configured to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sort the data in the database after the storage format conversion, and test whether the sorting time meets the second preset condition; If the compression ratio satisfies the first preset condition, and the sorting time satisfies
  • the method further includes: the determining unit, configured to determine the system performance indicator before determining a second storage format required for storing data in the database Whether the core indicator threshold set by the user is met; if the system performance indicator meets the core indicator threshold, it is determined whether the system performance indicator satisfies the conversion condition.
  • the determining unit is further configured to: if the system performance indicator does not meet the conversion condition, The format of the stored data in the database is the first storage format.
  • the controller further includes a data collection unit, and the data collection unit is configured to collect the system performance indicator.
  • the determining unit is further configured to: after determining a second storage format required for storing data in the database, determine, according to user configuration information, data in the database.
  • the storage format is converted from the first storage format to the conversion timing of the second storage format.
  • the storage format conversion unit is specifically configured to use, according to the second storage format and the conversion time,
  • the data in the database is reorganized in the buffer, and if the amount of data in the buffer reaches the disk write threshold, the data in the buffer is written to the disk.
  • the storage format conversion unit is specifically configured to: if the second part of data in the buffer
  • the storage format is a single column storage, and the storage format of the data in the buffer is converted from the first storage format to the single column storage, and the data in the buffer is compressed and stored; or
  • the second storage format of the data in the buffer is a row-column mixed storage or a row storage, and the storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or The row stores and stores data in the buffer.
  • the controller further includes a reading unit
  • the reading unit is configured to: before the storage format conversion unit reorganizes data in the database in a buffer according to the second storage format and the conversion time, if the first storage format For row storage, the data in the database is read in rows.
  • the data collection unit is further configured to: after the compression ratio meets the first preset condition, and the sorting time satisfies the second preset condition, after performing the external service, collecting the storage format converted The data throughput of the database and the response time of the query statement.
  • the system performance indicators include at least the amount of data, the average amount of data accessed by the query, the number of rows processed, the ratio of rows read, the column ratio of the average query, and the proportion of query statements.
  • an embodiment of the present invention provides a controller, including:
  • a processor configured to determine a second storage format required to store data in the database if a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user; After the storage format of the data in the database is converted from the first storage format to the second storage format, determining whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and converting the storage format
  • the data in the database is sorted, and the test sorting time meets the second preset condition; if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the external service is performed, or If the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset condition, the first required for storing data in the database is re-determined according to the core indicator threshold to be set by the user.
  • a format converter for converting a storage format of data in the database from the first storage format to the second storage format determined by the processor.
  • the processor is further configured to determine whether the system performance indicator meets a user setting before determining a second storage format required for storing data in the database The core indicator threshold; if the system performance indicator meets the core indicator threshold, it is determined whether the system performance indicator satisfies the conversion condition.
  • the processor is further configured to: if the system performance indicator does not meet the conversion condition, The format of the data stored in the database is the first storage format.
  • the controller further includes a data collector
  • the data collector is configured to collect the system performance indicators.
  • the processor is further configured to: after determining the second storage format required for storing the data in the database, determine, according to the user configuration information, a storage format of the data in the database from the first storage. The format is converted to the conversion time of the second storage format.
  • the format converter is specifically configured to use, according to the second storage format and the conversion time,
  • the data in the database is reorganized in the buffer, and if the amount of data in the buffer reaches the disk write threshold, the data in the buffer is written to the disk.
  • the format converter is specifically configured to: if the second of the data in the buffer
  • the storage format is a single column storage, and the storage format of the data in the buffer is converted from the first storage format to the single column storage, and the data in the buffer is compressed and stored; or
  • the second storage format of the data in the buffer is a row-column mixed storage or a row storage, and the storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or The row stores and stores data in the buffer.
  • the processor is further configured to perform, according to the second storage format, The conversion time, before the data in the database is recombined in the buffer, if the first storage format is row storage, the data in the database is read in rows.
  • the data collector is further configured to: after the compression ratio meets the first preset condition, and the sorting time satisfies the second preset condition, after performing the external service, collecting the storage format converted The data throughput of the database and the response time of the query statement.
  • the system performance indicator includes at least a data volume, a query average access data amount, a processed row number, a read row number ratio, a query average access column ratio, and a proportion of the query statement.
  • an embodiment of the present invention provides a data storage format conversion method, including:
  • Step a determining, if the system performance indicator of the database storing the data in the first storage format satisfies a conversion condition set by the user, determining a second storage format required for storing the data in the database;
  • Step b converting a storage format of the data in the database from the first storage format to the second storage format;
  • Step c determining whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorting the data in the database after the storage format conversion, and testing whether the sorting time meets the second preset condition;
  • Step d if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, performing external service; or, if the compression ratio does not satisfy the first preset condition, and/or If the sorting time does not satisfy the second preset condition, the foregoing step is re-executed according to the core ' ⁇ index threshold to be set by the user.
  • the method before the determining the second storage format required for storing the data in the database, the method further includes:
  • the second aspect of the third aspect In combination with the first possible implementation of the third aspect, the second aspect of the third aspect In a possible implementation manner, if the system performance indicator does not meet the conversion condition, maintaining a format of the stored data in the database is the first storage format.
  • the method also includes:
  • the method further includes:
  • the converting a storage format of the data in the database from the first storage format to the The second storage format specifically includes:
  • the data in the buffer is written to disk.
  • the data in the buffer is written to the disk, and the method includes:
  • the second storage format of the data in the buffer is a single column storage, converting a storage format of the data in the buffer from the first storage format to the single column storage, and compressing and Storing data in the buffer;
  • the method further includes: if the first storage format is row storage, reading the data in the database by row.
  • the method further includes: collecting data throughput of the database after the storage format conversion And query statement response time.
  • the performance indicators include at least the amount of data, the average amount of data accessed by the query, the number of rows processed, the ratio of rows read, the column ratio of the average query, and the proportion of query statements.
  • Embodiments of the present invention provide a data storage format conversion method and apparatus. If a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user, the controller determines a required number of data stored in the database. The second storage format converts the storage format of the data in the database from the first storage format to the second storage format, and then the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and stores The data in the formatted database is sorted, and the test sorting time meets the second preset condition.
  • the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation. It solves the problem that the database administrator needs to manually modify the database storage format, and the system storage space and utilization rate are low.
  • the self-decision data storage format reduces the throughput of the query statement and improves the storage space of the system. Utilization rate.
  • FIG. 1 is a schematic structural view 1 of a controller according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram 2 of a controller according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram 3 of a controller according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram 4 of a controller according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a method for converting a data storage format according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of a method for converting a data storage format according to an embodiment of the present invention.
  • OLTP On-Line Transaction Processing
  • OLAP ONAP Analytical Processing
  • the advantage of the row storage structure is the high adaptability of fast data loading and dynamic loading, because row storage guarantees that all domains of the same record are on the same node.
  • the disadvantages of row storage are obvious, for example, it does not support fast query processing, because when a query is only for a few columns in a multi-list, it cannot skip unnecessary column reads; Columns of data values, row storage is not easy to obtain a very high compression ratio, that is, space utilization is not easy to increase significantly. Although a better compression ratio can be obtained by entropy coding and column correlation, complex data storage implementations result in increased decompression overhead.
  • PAX or RCFile storage methods optimize system performance by optimizing the underlying storage format.
  • the existing database's row, column, or row-column mixed storage format is completely dependent on the initial setup of the database when it is initialized, which is the underlying storage format of the database that the user specified when creating the database.
  • the database administrator DBA, Database Administrator
  • the system automatic adjustment optimization function is missing.
  • the invention provides a data storage format conversion method and device, which can enable the database system to dynamically determine the underlying storage format of the database according to the load situation, realize the automatic adjustment and optimization function of the system, reduce the throughput of the query statement, and improve the storage space of the system. And utilization.
  • Embodiment 1
  • An embodiment of the present invention provides a controller, as shown in FIG. 1, including: The determining unit 10 is configured to determine, if the system performance indicator of the database storing the data in the first storage format meets a conversion condition set by the user, determining a second storage format required for storing the data in the database;
  • a storage format conversion unit 11 for converting a storage format of data in the database from the first storage format to the second storage format determined by the decision unit 10;
  • the feedback unit 12 is configured to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sort the data in the database after the storage format conversion, and test whether the sorting time meets the second preset condition. And if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, performing external service, or if the compression ratio does not satisfy the first preset condition, and/or If the sorting time does not satisfy the second preset condition, the feedback information is sent to the decision unit 10, so that the determining unit re-determines the stored data in the database according to the core indicator threshold to be set by the user in the feedback information.
  • the second storage format required.
  • the determining unit 10 is further configured to: before determining a second storage format required for storing data in the database, determining whether the system performance indicator meets a core indicator threshold set by a user; If the indicator meets the core indicator threshold, it is determined whether the system performance indicator satisfies the conversion condition.
  • the determining unit 10 is further configured to: if the system performance indicator does not satisfy the conversion condition, maintain a format of the stored data in the database as the first storage format.
  • the controller further includes a data collection unit 13; the data collection unit 13 is further configured to collect the system performance indicator.
  • the determining unit 10 is further configured to: after determining the second storage format required for storing the data in the database, determine, according to the user configuration information, a storage format of the data in the database from the first The storage format is converted to a conversion time of the second storage format.
  • the storage format conversion unit 11 is specifically configured to: in the buffer according to the second storage format and the conversion time determined by the decision unit 10 Data reorganization in the database, if the amount of data in the buffer reaches the disk write threshold, the data in the buffer is written to the disk.
  • the storage format conversion unit 1 is specifically configured to: if the second storage format of the data in the buffer is a single column storage, store the data in the buffer from a storage format Converting the first storage format to the single column storage, and compressing and storing the data in the buffer; or if the second storage format of the data in the buffer is a row or column hybrid storage or a row storage And converting the storage format of the data in the buffer from the first storage format to the row-column mixed storage or the row storage, and storing the data in the buffer.
  • the controller further includes a reading unit 1 4 for the storage format conversion unit 1 1 according to the second storage format and the At the time of conversion, before the data in the database is recombined in the buffer, if the first storage format is row storage, the data in the database is read in rows.
  • the data collection unit 13 is further configured to: after the external compression is performed, if the compression ratio satisfies the first preset condition, and the scheduling time meets the second preset condition, The data throughput of the database after the format conversion and the response time of the query statement.
  • system performance indicator includes at least a data volume, a query average access data amount, a processing row number as a read row number ratio, a query average access column ratio, and a proportion of the query statement.
  • the invention provides a controller, which mainly comprises a decision unit, a storage format conversion unit and a feedback unit. If the system performance indicator of the database storing the data in the first storage format satisfies the conversion condition set by the user, the decision unit determines a second storage format required for storing the data in the database, and then the storage format conversion unit sets the data in the database. The storage format is converted from the first storage format to the second storage format. Finally, the feedback unit determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion.
  • the feedback information is sent to the decision unit, so that the The core indicator threshold to be set by the user in the feedback information re-determines the second storage format required to store the data in the database.
  • the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation, and solve the current
  • the database administrator needs to manually modify the offline, the system storage space and low utilization rate, through the self-decision data storage format, reduce the throughput of the query statement, and at the same time improve the storage space and utilization of the system.
  • the embodiment of the present invention provides a controller, as shown in FIG. 3, including: a processor 20, configured to: if a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user, Determining a second storage format required for storing data in the database; determining, after the format converter 21 converts a storage format of the data in the database from the first storage format to the second storage format Whether the compression ratio of the formatted database satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time satisfies the second preset condition; if the compression ratio satisfies the first a preset condition, and if the sorting time satisfies the second preset condition, the external service is performed, or if the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset Condition, the second storage format required for storing data in the database is re-determined according to a
  • the format converter 2 1 is configured to convert a storage format of data in the database from the first storage format to the second storage format determined by the processor 20.
  • the processor 20 is further configured to: before determining a second storage format required for storing data in the database, determining whether the system performance indicator meets a core indicator threshold set by a user; The performance indicator meets the core indicator threshold. Then, it is determined whether the system performance indicator satisfies the conversion condition.
  • the processor 20 is further configured to keep the format of the stored data in the database as the first storage format if the system performance indicator does not satisfy the conversion condition.
  • the controller further includes a data collector 22, and the data collector 22 is configured to collect the system performance indicators.
  • the processor 20 is further configured to: after determining a second storage format required for storing data in the database, determine, according to user configuration information, a storage format of data in the database from the first A storage format is converted to a conversion instant of the second storage format.
  • the format converter 2 1 is specifically configured to reassemble data in the database in a buffer according to the second storage format and the conversion time, if data in the buffer When the amount reaches the disk write threshold, the data in the buffer is written to the disk.
  • the format converter 2 1 is specifically configured to: if the second storage format of the data in the buffer is a single column storage, store a format of the data in the buffer from the Converting the first storage format to the single column storage, and compressing and storing the data in the buffer; or, if the second storage format of the data in the buffer is a row or column hybrid storage or a row storage, And storing the storage format of the data in the buffer from the first storage format to the row and column hybrid storage or the row storage, and storing the data in the buffer.
  • the processor 20 is further configured to: before the storage format converter reorganizes data in the database in a buffer according to the second storage format and the conversion moment,
  • the first storage format is row storage, and the data in the database is read in rows.
  • the data collector 2 2 is configured to collect the storage after the external service is performed if the compression ratio satisfies the first preset condition and the sorting time satisfies the second preset condition.
  • the system performance indicator includes at least a data volume, a query average access data amount, a processed row number as a read row number ratio, a query average access column ratio, and a proportion of the query statement.
  • Embodiments of the present invention provide a controller, which mainly includes a processor and a format converter. If the system performance indicator of the database storing the data in the first storage format satisfies the conversion condition set by the user, the controller determines a second storage format required for storing the data in the database, and stores the data in the database from the first format. The storage format is converted into the second storage format, and then the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time is satisfied.
  • the second storage format required to determine the data stored in the database is re-executed according to the core indicator threshold to be set by the user.
  • the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation, and solve the current
  • the database administrator needs to manually modify the offline, the system storage space and low utilization rate, through the self-decision data storage format, reduce the throughput of the query statement, and at the same time improve the storage space and utilization of the system.
  • An embodiment of the present invention provides a data storage format conversion method. As shown in FIG. 5, the method includes:
  • the controller determines the second storage format required for storing the data in the database.
  • the existing database's row, column, or row and column hybrid storage format is completely dependent on the initial setting of the database initialization, that is, the underlying storage format of the database specified by the user when creating the database.
  • DB A hand is required when the user needs to change the database storage format Dynamic offline modification, lack of automatic adjustment and optimization function.
  • the present invention provides a data storage format conversion method, which can enable the database system to dynamically determine the underlying storage format of the database according to the load situation, and realize the automatic adjustment and optimization function of the system. .
  • the 0L TP application and the 0LAP application of the database have advantages in write operations and read operations, respectively.
  • various types of 'J combination storage methods 0LTP and 0LAP bugs are generated.
  • database initialization exists in rows.
  • the controller needs to first determine the storage format required to store the data in the database, that is, to determine which format the data stored in the database needs to be stored.
  • the controller determines a second storage format required for storing the data in the database.
  • the system performance indicator is collected by the controller in a certain period of time, and includes at least a data volume, a query average access data amount, a processed row number, a read row number ratio, a query average access column ratio, and a query statement.
  • the proportion is a proportion.
  • the controller needs to determine whether the system performance indicator of the database meets the conversion condition set by the user, and if the system performance indicator does not meet the conversion condition set by the user, The controller does not do any processing, and the data in the database is still stored in the first storage format.
  • the conversion condition set by the user is any column access frequency (the number of accesses to the column/the number of times the table is accessed) reaches 80%
  • the column can be converted into column storage, and the controller is collected into the database.
  • the access frequency of the nth column reaches 80%, and the controller determines that the nth column data is stored in the column storage format.
  • the controller when the controller determines the second storage format required for storing the data in the database, the controller needs to calculate a table in the database that needs to perform row and column conversion, and determine that the columns in the database that need to be aggregated and the database need to be separately stored.
  • Column E.g, Individually accessed columns with the highest frequency of columns are stored separately in columns.
  • the controller determines the second storage format required for storing the data in the database
  • the controller further determines, according to the user configuration information, converting the storage format of the data in the database from the first storage format to the second storage format. time.
  • the controller may perform a storage format conversion according to the system performance indicator during the idle time of the load, or may prompt the user to perform the conversion and the second storage format after determining the second storage format required for storing the data in the database. Displayed to the user, the controller performs the conversion of the storage format when the user inputs the command.
  • the controller converts the storage format of the data in the database from the first storage format to the second storage format.
  • the controller After the controller determines the second storage format required by the database to store the data, the controller converts the storage format of the data in the database from the first storage format to the second storage format.
  • the controller reorganizes the data in the database according to the second storage format in the buffer, when the amount of data in the buffer reaches the disk write threshold.
  • the controller writes the data in the buffer to disk.
  • the controller performs different processing on the data in the buffer according to the second storage format of the data in the buffer.
  • the second storage format of the data in the buffer is a single column storage, converting the storage format of the data in the buffer from the first storage format to the single column storage, and compressing and storing the data in the buffer;
  • the second storage format of the data in the buffer is row-column mixed storage or row storage, the storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or the row storage, and the storage is slow.
  • the data in the flush area is a single column storage, converting the storage format of the data in the buffer from the first storage format to the single column storage, and compressing and storing the data in the buffer.
  • the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time satisfies the second preset condition.
  • the controller After the controller converts the storage format of the data in the database from the first storage format to the second storage format, the controller needs to detect whether the conversion of the storage format is reasonable, whether Ability to optimize the database.
  • the controller judges the size change of the table that has been converted into a column in the database, and tests the sorting time by a simple sorting. Because the size of the table directly reflects the compression ratio of the database, the higher the compression ratio, the higher the space utilization; the length of the sorting time reflects the amount of memory consumed by the C P U, and the column storage can improve the sorting efficiency. Therefore, the controller detects whether the conversion of the storage format is reasonable, and needs to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time is satisfied. The second preset condition.
  • the first preset condition is that the compression ratio of the database after the storage format conversion is less than or equal to the first preset threshold
  • the second preset condition is that the sorting time is less than or equal to the second preset threshold
  • the controller performs external service.
  • the controller needs to further observe the corresponding indicators to determine whether the performance of the database system is optimized, that is, whether the database system needs to be further split into columns, or to avoid splitting the columns without division.
  • the indicators collected by the controller include data throughput and query statement response time.
  • the data throughput reflects whether the redundant data reading is reduced. If the data throughput is not obvious, it may need to be further split into columns.
  • the query response time is a direct indicator to determine whether the column storage is valid.
  • the controller re-executes according to the core index threshold to be set by the user.
  • the controller needs to re-execute the next data storage format conversion according to the core index threshold to be set by the user to re-determine the second storage format of the data.
  • An embodiment of the present invention provides a data storage format conversion method. If a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user, the controller determines a second storage required to store data in the database.
  • the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and converts the storage format After the data in the database is sorted, whether the test sorting time satisfies the second preset condition, if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the external service is performed; or, if the compression ratio is If the first preset condition is not met, and/or the sorting time does not satisfy the second preset condition, the second storage format required to determine the data stored in the database is re-executed according to the core indicator threshold to be set by the user.
  • the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation, and solve the current
  • the database administrator needs to manually modify the offline, the system storage space and low utilization rate, through the self-decision data storage format, reduce the throughput of the query statement, and at the same time improve the storage space and utilization of the system.
  • the S 2 0 K controller collects the systematic nature of the database in which the data is stored in the first storage format.
  • the row, column, or row-column hybrid storage format of an existing database is completely dependent on the initial setting of the database initialization, that is, the underlying storage format of the database specified by the user when creating the database.
  • the DBA needs to manually modify it offline, and the system automatic adjustment optimization function is missing.
  • the present invention provides a data storage format conversion method, which can enable the database system to dynamically determine the underlying storage format of the database according to the load situation, and realize the system automatic Adjust the optimization function.
  • the 0L TP application and the 0LAP application of the database have advantages in write operations and read operations, respectively.
  • various types of 'J combination storage methods 0LTP and 0LAP bugs are generated.
  • database initialization exists in rows.
  • the controller first collects system performance indicators of the database storing the data in the first storage format, so that the controller determines the database storage data according to the collected performance indicators. Storage format.
  • the system performance indicator includes at least the amount of data, the average amount of data accessed by the query, the number of rows processed, the proportion of rows read, the column ratio of the average access of the query, and the proportion of the query. specific,
  • the amount of data is an important indicator of whether or not to use the column.
  • the larger the amount of data the more the query is more suitable for using the column, and the amount of data is the size of the entire database;
  • the average amount of access data is the number of data rows used by the database per query.
  • the scenario with a large amount of average access data is suitable for inventory.
  • the ratio of the number of processed lines to the number of read lines refers to the ratio of the number of data lines actually used per operation to the total number of read lines.
  • some data is read from the disk, but actually The system does not perform related analysis operations. We expect that all the number of rows read can be processed by the system, so the higher the ratio, the more suitable for inventory;
  • the average access column ratio refers to the ratio of the columns accessed by the average query statement to the total number of columns, the smaller the ratio, the more suitable for column storage;
  • the proportion of query statements refers to the proportion of query operations in all database operations. The closer the ratio is to 100%, the more suitable it is.
  • the controller determines whether the system performance indicator satisfies the core indicator threshold set by the user.
  • the controller After the controller collects the system performance indicators of the database storing the data in the first storage format, the controller analyzes the system performance indicators. Specifically, the controller determines whether the system performance indicator satisfies the core index threshold set by the user, that is, the controller performs decision analysis on the system performance index according to the core index threshold and the decision algorithm set by the user.
  • the user-set decision algorithm It can be (query average access column ratio ⁇ Ta) and / or (query statement proportion > Tq) and / or (process row number to read row number ratio > Tp).
  • the controller determines whether the system performance indicator meets the conversion condition set by the user.
  • the controller determines whether the performance indicator satisfies the conversion condition set by the user. Only if the performance indicator meets the conversion conditions set by the user, the controller can determine the storage format required to store the data in the database.
  • the conversion condition set by the user is any column access frequency (the number of accesses to the column/the number of times the table is accessed) reaches 80%
  • the column can be converted into a column storage, when the controller is collected into the database.
  • the access frequency of the n columns reaches 80%, and the controller can determine that the nth column of data is stored in the column storage format.
  • the controller determines a second storage format required for storing data in the database.
  • the storage format of the data in the database may be converted, and the controller may determine the second required for storing the data in the database. Storage format.
  • the controller when the controller determines the second storage format required for storing the data in the database, the controller needs to calculate a table in the database that needs to perform row and column conversion, and determine that the columns in the database that need to be aggregated and the database need to be separately stored. Column. For example, columns that have the highest frequency of access to the column are stored separately in columns.
  • the controller determines, according to the user configuration information, a conversion time for converting a storage format of the data in the database from the first storage format to the second storage format. After the controller determines the second storage format required for storing the data in the database, the controller further determines, according to the user configuration information, a conversion time for converting the storage format of the data in the database from the first storage format to the second storage format.
  • the controller may perform a storage format conversion according to the system performance indicator during the idle time of the load, or may prompt the user to perform the conversion and the second storage format after determining the second storage format required for storing the data in the database. Displayed to the user, the controller performs the conversion of the storage format when the user inputs the command.
  • the controller converts the storage format of the data in the database from the first storage format to the second storage format.
  • the controller After the controller determines the second storage format required by the database to store the data, the controller converts the storage format of the data in the database from the first storage format to the second storage format.
  • the controller reorganizes the data in the database according to the second storage format in the buffer, when the amount of data in the buffer reaches the disk write threshold.
  • the controller writes the data in the buffer to the disk, wherein if the first storage format is row storage, the controller first reads the data in the database by row, and then the database is in the red burst area.
  • the data is reorganized according to the second storage format.
  • the controller performs different processing on the data in the buffer according to the second storage format of the data in the buffer.
  • the second storage format of the data in the buffer is a single column storage, converting the storage format of the data in the buffer from the first storage format to the single column storage, and compressing and storing the data in the buffer;
  • the second storage format of the data in the buffer is row-column mixed storage or row storage, the storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or the row storage, and the storage is slow.
  • the data in the flush area is a single column storage, converting the storage format of the data in the buffer from the first storage format to the single column storage, and compressing and storing the data in the buffer.
  • S2 07 The controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time meets the second preset condition.
  • the controller converts the storage format of the data in the database from the first storage format to After the second storage format, the controller needs to detect whether the conversion of the storage format is reasonable and whether the database can be optimized.
  • the controller judges the size change of the table that has been converted into a column in the database, and tests the sorting time by a simple sorting. Because the size of the table directly reflects the compression ratio of the database, the higher the compression ratio, the higher the space utilization; the length of the sorting time reflects the amount of CPU memory resources consumed, and the column storage can improve the sorting efficiency. Therefore, the controller detects whether the conversion of the storage format is reasonable, and needs to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time is satisfied. The second preset condition.
  • the first preset condition is that the compression ratio of the database after the storage format conversion is less than or equal to the first preset threshold
  • the second preset condition is that the sorting time is less than or equal to the second preset threshold
  • the controller performs external service.
  • the controller needs to further observe the corresponding indicators to determine whether the performance of the database system is optimized, that is, whether the database system needs to be further split into columns, or to avoid splitting the columns without division.
  • the indicators collected by the controller include data throughput and query statement response time.
  • the data throughput reflects whether the redundant data reading is reduced. If the data throughput is not obvious, it may need to be further split into columns.
  • the query response time is a direct indicator to determine whether the column storage is valid.
  • the controller re-executes the conversion of the next-order data storage format according to the core index threshold to be set by the user. .
  • the condition indicates that the second storage format determined by the controller cannot effectively improve the data of the database system, and the controller needs to re-execute the conversion of the next data storage format according to the core index threshold to be set by the user.
  • the second storage format is
  • the controller converts the column into a separate column storage. However, after the conversion, the system performance is not improved, and the controller feedbacks the column access ratio by 9. 0% is increased to 91%, the user resets the value of the column access ratio according to the feedback information, and the controller re-determines the second storage format according to the column access ratio newly set by the user.
  • the controller keeps the format of the stored data in the database as the first storage format.
  • An embodiment of the present invention provides a data storage format conversion method. If a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user, the controller determines a second storage required to store data in the database.
  • the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and converts the storage format After the data in the database is sorted, whether the test sorting time satisfies the second preset condition, if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the external service is performed; or, if the compression ratio is If the first preset condition is not met, and/or the sorting time does not satisfy the second preset condition, the second storage required to determine the data stored in the database is performed according to the core indicator threshold to be set by the user in the feedback information.
  • the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation, and solve the current
  • the database administrator needs to manually modify the offline, the system storage space and low utilization rate, through the self-decision data storage format, reduce the throughput of the query statement, and at the same time improve the storage space and utilization of the system.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used. Combined or can be integrated into another system, or some features can be ignored, or not executed.

Abstract

Provided in the embodiment are a method and apparatus for the conversion of data storage formats, which relate to the technical field of database systems and can enable a dynamic determination of the low-level storage format by a database system so as to realize the automatic regulation and optimization functions of the system. The method comprises: if the system performance indexes of a database in which data is stored in a first storage format meet a conversion condition set by a user, then determining a second storage form required for data storage in the database; converting the storage format of the data from the first storage format to the second storage format; judging whether the compression ratio of the database undergoing storage format conversion meets a first preset condition, sorting the data in the database, and testing whether the time for sorting meets a second preset condition; if the compression ratio meets the first preset condition and the time for sorting meets the second preset condition, then providing external services; or if the compression ratio fails to meet the first preset condition and/or the time for sorting fails to meet the second preset condition, then re-determining the second storage format.

Description

一种数据存储格式的转换方法及装置  Method and device for converting data storage format
技术领域 Technical field
本发明涉及数据库系统技术领域, 尤其涉及一种数据存储格式 的转换方法及装置。  The present invention relates to the field of database system technologies, and in particular, to a method and an apparatus for converting a data storage format.
背景技术 Background technique
随着社会信息化程度的不断加深, 数据库系统的使用越来越广 泛, 不断累积的海量数据和不断增长的数据膨胀对数据库系统提出 了新的要求。  With the deepening of the degree of social informatization, the use of database systems has become more widespread, and the ever-increasing massive data and ever-increasing data expansion have placed new demands on database systems.
在数据库中存储的数据具有一定的存储格式, 不同的存储格式 能够影响数据库系统的性能。 行存储结构能够快速完成数据加载, 对动态负载的适应较高, 但行存储结构不能支持快速查询处理, 同 时空间利用率也不易大幅提高。 尽管通过熵编码和利用列相关性能 够获得一个较好的压缩比, 但是复杂数据存储实现会导致解压开销 增大。 列存储结构则将同一个记录的不同域分散存储而这些记录的 重构将带来较大开销, 但是列存储能够避免读不必要的列, 并且压 缩一个列中的相似数据能够达到较高的压缩比。  The data stored in the database has a certain storage format, and different storage formats can affect the performance of the database system. The row storage structure can quickly complete data loading, and the adaptation to dynamic load is high, but the row storage structure cannot support fast query processing, and the space utilization rate is not easily improved. Although a better compression ratio is achieved by entropy coding and column correlation performance, complex data storage implementations result in increased decompression overhead. The column storage structure distributes the different domains of the same record and the reconstruction of these records will bring a large overhead, but the column storage can avoid reading unnecessary columns, and compressing similar data in one column can achieve higher Compression ratio.
目前, 综合行存储、 列存储的优缺点, 产生了各种行列组合存 储方式, ¾口 PAX或者行歹 'J 昆合存储 ( RCFi le, Record Columnar File )  At present, the advantages and disadvantages of comprehensive row storage and column storage have resulted in a variety of row and column combination storage methods, 3⁄4 port PAX or row 歹 J J J J J J J Record Record Record Record Record Record Record Record Record Record Record Record
然而, 现有的数据库的行、 列或者行列混合存储格式都是完全 依赖于数据库初始化时的初始设定, 即用户在创建数据库时所指定 的数据库底层存储格式。 当用户需要更改数据库存储格式时, 需要 数据库管理员 ( DBA, Database Administrator ) 手动离线修改。 发明内容 However, the existing database's row, column, or row-column mixed storage format is completely dependent on the initial setup of the database initialization, which is the underlying storage format of the database that the user specified when creating the database. When the user needs to change the database storage format, the database administrator (DBA, Database Administrator) is required to manually modify it offline. Summary of the invention
本发明的实施例提供一种数据存储格式的转换方法及装置, 能 够使得数据库系统根据负载情况, 动态确定数据库底层存储格式, 实现了系统自动调节优化功能, 降低查询语句的吞吐量, 同时提升 系统的存储空间和利用率。  The embodiment of the invention provides a method and a device for converting a data storage format, which can enable the database system to dynamically determine the underlying storage format of the database according to the load situation, realize the automatic adjustment and optimization function of the system, reduce the throughput of the query statement, and improve the system. Storage space and utilization.
为达到上述目的, 本发明的实施例釆用如下技术方案: 第一方面, 本发明实施例提供一种控制器, 包括: In order to achieve the above object, embodiments of the present invention use the following technical solutions: In a first aspect, an embodiment of the present invention provides a controller, including:
决策单元, 用于若以第一存储格式存储数据的数据库的系统 ' f生 能指标满足用户设定的转换条件, 则确定所述数据库中存储数据所 需的第二存储格式;  a decision unit, configured to: if the system of the database storing the data in the first storage format satisfies a conversion condition set by the user, determine a second storage format required for storing the data in the database;
存储格式转换单元, 用于将所述数据库中的数据的存储格式从 所述第一存储格式转换为所述第二存储格式;  a storage format conversion unit, configured to convert a storage format of data in the database from the first storage format to the second storage format;
反馈单元, 用于判断存储格式转换后的数据库的压缩比是否满 足第一预设条件, 并对所述存储格式转换后的数据库中的数据进行 排序, 测试排序时间是否满足第二预设条件; 若所述压缩比满足第 一预设条件, 且所述排序时间满足第二预设条件, 则进行对外服务, 或者, 若所述压缩比不满足第一预设条件, 和 /或所述排序时间不满 足第二预设条件, 则发送反馈信息至所述决策单元, 以使得所述决 策单元根据所述反馈信息中的用户待设定的核心指标阈值重新确定 所述数据库中存储数据所需的第二存储格式。  a feedback unit, configured to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sort the data in the database after the storage format conversion, and test whether the sorting time meets the second preset condition; If the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, performing external service, or if the compression ratio does not satisfy the first preset condition, and/or the sorting If the time does not satisfy the second preset condition, the feedback information is sent to the decision unit, so that the decision unit needs to re-determine the data stored in the database according to the core indicator threshold to be set by the user in the feedback information. The second storage format.
在第一方面的第一种可能实现方式中, 所述方法还包括: 所述决策单元, 还用于在确定所述数据库中存储数据所需的第 二存储格式之前, 判断所述系统性能指标是否满足用户设定的核心 指标阈值; 若所述系统性能指标满足所述核心指标阈值, 则判断所 述系统性能指标是否满足所述转换条件。  In a first possible implementation manner of the first aspect, the method further includes: the determining unit, configured to determine the system performance indicator before determining a second storage format required for storing data in the database Whether the core indicator threshold set by the user is met; if the system performance indicator meets the core indicator threshold, it is determined whether the system performance indicator satisfies the conversion condition.
结合第一方面的第一种可能实现方式, 在第一方面的第二种可 能的实现方式中, 所述决策单元, 还用于若所述系统性能指标不满 足所述转换条件, 则保持所述数据库中存储数据的格式为所述第一 存储格式。  In conjunction with the first possible implementation of the first aspect, in a second possible implementation manner of the first aspect, the determining unit is further configured to: if the system performance indicator does not meet the conversion condition, The format of the stored data in the database is the first storage format.
结合前述的第一方面或第一方面的第一种可能的实现方式至第 一方面的第二种可能的实现方式中的任一种可能实现方式, 在第一 方面的第三种可能的实现方式中, 所述控制器还包括数据釆集单元; 所述数据釆集单元, 用于釆集所述系统性能指标。  In combination with the first aspect of the foregoing or the first possible implementation of the first aspect to any one of the possible implementations of the second possible implementation of the first aspect, the third possible implementation of the first aspect In the mode, the controller further includes a data collection unit, and the data collection unit is configured to collect the system performance indicator.
结合前述的第一方面或第一方面的第一种可能的实现方式至第 一方面的第三种可能的实现方式中的任一种可能实现方式, 在第一 方面的第四种可能的实现方式中, 所述决策单元, 还用于在确定所 述数据库中存储数据所需的第二存储格式之后, 根据用户配置信息, 确定将所述数据库中的数据的存储格式从所述第一存储格式转换为 所述第二存储格式的转换时刻。 In combination with the first aspect of the foregoing or the first possible implementation of the first aspect to any one of the third possible implementations of the first aspect, at the first In a fourth possible implementation manner of the aspect, the determining unit is further configured to: after determining a second storage format required for storing data in the database, determine, according to user configuration information, data in the database. The storage format is converted from the first storage format to the conversion timing of the second storage format.
结合第一方面的第四种可能实现方式, 在第一方面的第五种可 能的实现方式中, 所述存储格式转换单元, 具体用于根据所述第二 存储格式和所述转换时刻, 在緩冲区中将所述数据库中的数据重组, 若所述緩冲区中的数据量达到磁盘写阈值, 则将所述緩冲区中的数 据写入磁盘。  With reference to the fourth possible implementation of the first aspect, in a fifth possible implementation manner of the first aspect, the storage format conversion unit is specifically configured to use, according to the second storage format and the conversion time, The data in the database is reorganized in the buffer, and if the amount of data in the buffer reaches the disk write threshold, the data in the buffer is written to the disk.
结合第一方面的第五种可能实现方式, 在第一方面的第六种可 能的实现方式中, 所述存储格式转换单元, 具体用于若所述緩冲区 中的数据的所述第二存储格式为单列存储, 则将所述緩冲区中的数 据的存储格式从所述第一存储格式转换为所述单列存储, 以及压缩 并存储所述緩冲区中的数据; 或者, 若所述緩冲区中的数据的所述 第二存储格式为行列混合存储或行存储, 则将所述緩冲区中的数据 的存储格式从所述第一存储格式转换为所述行列混合存储或所述行 存储, 并存储所述緩冲区中的数据。  In conjunction with the fifth possible implementation of the first aspect, in a sixth possible implementation manner of the first aspect, the storage format conversion unit is specifically configured to: if the second part of data in the buffer The storage format is a single column storage, and the storage format of the data in the buffer is converted from the first storage format to the single column storage, and the data in the buffer is compressed and stored; or The second storage format of the data in the buffer is a row-column mixed storage or a row storage, and the storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or The row stores and stores data in the buffer.
结合第一方面的第五种可能实现方式, 在第一方面的第七种可 能的实现方式中, 所述控制器还包括读取单元;  In conjunction with the fifth possible implementation of the first aspect, in a seventh possible implementation manner of the first aspect, the controller further includes a reading unit;
所述读取单元, 用于在所述存储格式转换单元根据所述第二存 储格式和所述转换时刻, 在緩冲区中将所述数据库中的数据重组之 前, 若所述第一存储格式为行存储, 则按行读取所述数据库中的数 据。  The reading unit is configured to: before the storage format conversion unit reorganizes data in the database in a buffer according to the second storage format and the conversion time, if the first storage format For row storage, the data in the database is read in rows.
结合前述的第一方面或第一方面的第一种可能实现方式至第一 方面的第七种可能实现方式中的任一种实现方式, 在第一方面的第 八种可能实现方式中,  In conjunction with the first aspect of the foregoing or the first possible implementation of the first aspect to any one of the seventh possible implementations of the first aspect, in an eighth possible implementation of the first aspect,
所述数据釆集单元,还用于在若所述压缩比满足第一预设条件, 且所述排序时间满足第二预设条件, 则进行对外服务之后, 釆集所 述存储格式转换后的数据库的数据吞吐量及查询语句响应时间。 结合前述的第一方面或第一方面的第一种可能实现方式至第一 方面的第八种可能实现方式中的任一种实现方式, 在第一方面的第 九种可能实现方式中, 所述系统性能指标至少包括数据量、 查询平 均访问数据量、 处理行数占读取行数比例、 查询平均访问的列比、 以及查询语句所占比例。 The data collection unit is further configured to: after the compression ratio meets the first preset condition, and the sorting time satisfies the second preset condition, after performing the external service, collecting the storage format converted The data throughput of the database and the response time of the query statement. In combination with the foregoing first aspect or the first possible implementation of the first aspect to any one of the eighth possible implementation manners of the first aspect, in a ninth possible implementation manner of the first aspect, The system performance indicators include at least the amount of data, the average amount of data accessed by the query, the number of rows processed, the ratio of rows read, the column ratio of the average query, and the proportion of query statements.
第二方面, 本发明实施例提供一种控制器, 包括:  In a second aspect, an embodiment of the present invention provides a controller, including:
处理器, 用于若以第一存储格式存储数据的数据库的系统性能 指标满足用户设定的转换条件, 则确定所述数据库中存储数据所需 的第二存储格式; 在格式转换器将所述数据库中的数据的存储格式 从所述第一存储格式转换为所述第二存储格式后, 判断存储格式转 换后的数据库的压缩比是否满足第一预设条件, 并对所述存储格式 转换后的数据库中的数据进行排序, 测试排序时间是否满足第二预 设条件; 若所述压缩比满足第一预设条件, 且所述排序时间满足第 二预设条件, 则进行对外服务, 或者, 若所述压缩比不满足第一预 设条件, 和 /或所述排序时间不满足第二预设条件, 则根据用户待设 定的核心指标阈值重新确定所述数据库中存储数据所需的第二存储 格式;  a processor, configured to determine a second storage format required to store data in the database if a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user; After the storage format of the data in the database is converted from the first storage format to the second storage format, determining whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and converting the storage format The data in the database is sorted, and the test sorting time meets the second preset condition; if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the external service is performed, or If the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset condition, the first required for storing data in the database is re-determined according to the core indicator threshold to be set by the user. Two storage formats;
格式转换器, 用于将所述数据库中的数据的存储格式从所述第 一存储格式转换为所述处理器确定的所述第二存储格式。  a format converter for converting a storage format of data in the database from the first storage format to the second storage format determined by the processor.
在第二方面的第一种可能的实现方式中, 所述处理器, 还用于 在确定所述数据库中存储数据所需的第二存储格式之前, 判断所述 系统性能指标是否满足用户设定的核心指标阈值; 若所述系统性能 指标满足所述核心指标阈值, 则判断所述系统性能指标是否满足所 述转换条件。  In a first possible implementation manner of the second aspect, the processor is further configured to determine whether the system performance indicator meets a user setting before determining a second storage format required for storing data in the database The core indicator threshold; if the system performance indicator meets the core indicator threshold, it is determined whether the system performance indicator satisfies the conversion condition.
结合第二方面的第一种可能的实现方式, 在第二方面的第二种 可能的实现方式中, 所述处理器, 还用于若所述系统性能指标不满 足所述转换条件, 则保持所述数据库中存储数据的格式为所述第一 存储格式。  With reference to the first possible implementation of the second aspect, in a second possible implementation manner of the second aspect, the processor is further configured to: if the system performance indicator does not meet the conversion condition, The format of the data stored in the database is the first storage format.
结合前述的第二方面或第二方面的第一种可能的实现方式至第 二方面的第二种可能的实现方式中的任一种实现方式, 在第二方面 的第三种可能的实现方式中, 所述控制器还包括数据釆集器; Combining the foregoing second aspect or the first possible implementation of the second aspect to the The second possible implementation manner of the second aspect, the third possible implementation manner of the second aspect, the controller further includes a data collector;
所述数据釆集器, 用于釆集所述系统性能指标。  The data collector is configured to collect the system performance indicators.
结合前述的第二方面或第二方面的第一种可能的实现方式至第 二方面的第三种可能的实现方式中的任一种实现方式中, 在第二方 面的第四种可能的实现方式中, 所述处理器, 还用于在确定所述数 据库中存储数据所需的第二存储格式之后, 根据用户配置信息, 确 定将所述数据库中的数据的存储格式从所述第一存储格式转换为所 述第二存储格式的转换时刻。  In combination with the foregoing second aspect or the first possible implementation of the second aspect to any one of the third possible implementations of the second aspect, the fourth possible implementation of the second aspect In the mode, the processor is further configured to: after determining the second storage format required for storing the data in the database, determine, according to the user configuration information, a storage format of the data in the database from the first storage. The format is converted to the conversion time of the second storage format.
结合第二方面的第四种可能的实现方式, 在第二方面的第五种 可能的实现方式中, 所述格式转换器, 具体用于根据所述第二存储 格式和所述转换时刻, 在緩冲区中将所述数据库中的数据重组, 若 所述緩冲区中的数据量达到磁盘写阈值, 则将所述緩冲区中的数据 写入磁盘。  With the fourth possible implementation of the second aspect, in a fifth possible implementation manner of the second aspect, the format converter is specifically configured to use, according to the second storage format and the conversion time, The data in the database is reorganized in the buffer, and if the amount of data in the buffer reaches the disk write threshold, the data in the buffer is written to the disk.
结合第二方面的第五种可能的实现方式, 在第二方面的第六种 可能的实现方式中, 所述格式转换器, 具体用于若所述緩冲区中的 数据的所述第二存储格式为单列存储, 则将所述緩冲区中的数据的 存储格式从所述第一存储格式转换为所述单列存储, 以及压缩并存 储所述緩冲区中的数据; 或者, 若所述緩冲区中的数据的所述第二 存储格式为行列混合存储或行存储, 则将所述緩冲区中的数据的存 储格式从所述第一存储格式转换为所述行列混合存储或所述行存 储, 并存储所述緩冲区中的数据。  With reference to the fifth possible implementation of the second aspect, in a sixth possible implementation manner of the second aspect, the format converter is specifically configured to: if the second of the data in the buffer The storage format is a single column storage, and the storage format of the data in the buffer is converted from the first storage format to the single column storage, and the data in the buffer is compressed and stored; or The second storage format of the data in the buffer is a row-column mixed storage or a row storage, and the storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or The row stores and stores data in the buffer.
结合第二方面的第五种可能的实现方式, 在第二方面的第七种 可能的实现方式中, 所述处理器, 还用于在所述存储格式转换器根 据所述第二存储格式和所述转换时刻, 在緩冲区中将所述数据库中 的数据重组之前, 若所述第一存储格式为行存储, 则按行读取所述 数据库中的数据。  With reference to the fifth possible implementation of the second aspect, in a seventh possible implementation manner of the second aspect, the processor is further configured to perform, according to the second storage format, The conversion time, before the data in the database is recombined in the buffer, if the first storage format is row storage, the data in the database is read in rows.
结合前述的第二方面或第二方面的第一种可能的实现方式至第 二方面的第七种可能的实现方式中的任一种实现方式, 在第二方面 的第八种可能的实现方式中, In combination with the foregoing second aspect or the first possible implementation of the second aspect to any one of the seventh possible implementations of the second aspect, in a second aspect In the eighth possible implementation,
所述数据釆集器, 还用于在若所述压缩比满足第一预设条件, 且所述排序时间满足第二预设条件, 则进行对外服务之后, 釆集所 述存储格式转换后的数据库的数据吞吐量及查询语句响应时间。  The data collector is further configured to: after the compression ratio meets the first preset condition, and the sorting time satisfies the second preset condition, after performing the external service, collecting the storage format converted The data throughput of the database and the response time of the query statement.
结合前述的第二方面或第二方面的第一种可能的实现方式至第 二方面的第八种可能的实现方式中的任一种实现方式, 在第二方面 的第九种可能的实现方式中, 所述系统性能指标至少包括数据量、 查询平均访问数据量、 处理行数占读取行数比例、 查询平均访问的 列比、 以及查询语句所占比例。  In combination with the foregoing second aspect or the first possible implementation of the second aspect to any one of the eighth possible implementations of the second aspect, the ninth possible implementation of the second aspect The system performance indicator includes at least a data volume, a query average access data amount, a processed row number, a read row number ratio, a query average access column ratio, and a proportion of the query statement.
第三方面, 本发明实施例提供一种数据存储格式的转换方法, 包括:  In a third aspect, an embodiment of the present invention provides a data storage format conversion method, including:
步骤 a : 若以第一存储格式存储数据的数据库的系统性能指标 满足用户设定的转换条件, 则确定所述数据库中存储数据所需的第 二存储格式;  Step a: determining, if the system performance indicator of the database storing the data in the first storage format satisfies a conversion condition set by the user, determining a second storage format required for storing the data in the database;
步骤 b : 将所述数据库中的数据的存储格式从所述第一存储格 式转换为所述第二存储格式;  Step b: converting a storage format of the data in the database from the first storage format to the second storage format;
步骤 c : 判断存储格式转换后的数据库的压缩比是否满足第一 预设条件, 并对所述存储格式转换后的数据库中的数据进行排序, 测试排序时间是否满足第二预设条件;  Step c: determining whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorting the data in the database after the storage format conversion, and testing whether the sorting time meets the second preset condition;
步骤 d : 若所述压缩比满足第一预设条件, 且所述排序时间满 足第二预设条件, 则进行对外服务; 或者, 若所述压缩比不满足第 一预设条件, 和 /或所述排序时间不满足第二预设条件, 则根据用户 待设定的核' ^指标阈值重新执行上述步骤。  Step d: if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, performing external service; or, if the compression ratio does not satisfy the first preset condition, and/or If the sorting time does not satisfy the second preset condition, the foregoing step is re-executed according to the core '^ index threshold to be set by the user.
在第三方面的第一种可能的实现方式中, 所述确定所述数据库 中存储数据所需的第二存储格式之前, 所述方法还包括:  In a first possible implementation manner of the third aspect, before the determining the second storage format required for storing the data in the database, the method further includes:
判断所述系统性能指标是否满足用户设定的核心指标阈值; 若所述系统性能指标满足所述核心指标阈值, 则判断所述系统 性能指标是否满足所述转换条件。  Determining whether the system performance indicator meets a core indicator threshold set by the user; if the system performance indicator meets the core indicator threshold, determining whether the system performance indicator satisfies the conversion condition.
结合第三方面的第一种可能的实现方式, 在第三方面的第二种 可能的实现方式中, 若所述系统性能指标不满足所述转换条件, 则 保持所述数据库中存储数据的格式为所述第一存储格式。 In combination with the first possible implementation of the third aspect, the second aspect of the third aspect In a possible implementation manner, if the system performance indicator does not meet the conversion condition, maintaining a format of the stored data in the database is the first storage format.
结合前述的第三方面或第三方面的第一种可能的实现方式至第 二种可能的实现方式的任一种实现方式, 在第三方面的第三种可能 的实现方式中, 所述方法还包括:  With reference to the foregoing third aspect, or any one of the first possible implementation manner of the third aspect to the second possible implementation manner, in a third possible implementation manner of the third aspect, the method Also includes:
釆集所述系统性能指标。  Collect the system performance indicators.
结合前述的第三方面或第三方面的第一种可能的实现方式至第 三种可能的实现方式的任一种实现方式, 在第三方面的第四种可能 的实现方式中, 在确定所述数据库中存储数据所需的第二存储格式 之后, 所述方法还包括:  In combination with the third aspect of the foregoing or the first possible implementation of the third aspect to any one of the third possible implementation manners, in the fourth possible implementation manner of the third aspect, After the second storage format required for storing data in the database, the method further includes:
根据用户配置信息, 确定将所述数据库中的数据的存储格式从 所述第一存储格式转换为所述第二存储格式的转换时刻。  And determining, according to the user configuration information, a conversion timing of converting a storage format of the data in the database from the first storage format to the second storage format.
结合第三方面的第四种可能的实现方式, 在第三方面的第五种 可能的实现方式中, 所述将所述数据库中的数据的存储格式从所述 第一存储格式转换为所述第二存储格式, 具体包括:  In conjunction with the fourth possible implementation of the third aspect, in a fifth possible implementation manner of the third aspect, the converting a storage format of the data in the database from the first storage format to the The second storage format specifically includes:
根据所述第二存储格式和所述转换时刻, 在緩冲区中将所述数 据库中的数据重组;  Reconstructing data in the database in a buffer according to the second storage format and the conversion time;
若所述緩冲区中的数据量达到磁盘写阈值, 则将所述緩冲区中 的数据写入磁盘。  If the amount of data in the buffer reaches the disk write threshold, the data in the buffer is written to disk.
结合第三方面的第五种可能的实现方式, 在第三方面的第六种 可能的实现方式中, 所述将所述緩冲区中的数据写入磁盘, 具体包 括:  With the fifth possible implementation of the third aspect, in a sixth possible implementation manner of the third aspect, the data in the buffer is written to the disk, and the method includes:
若所述緩冲区中的数据的所述第二存储格式为单列存储, 则将 所述緩冲区中的数据的存储格式从所述第一存储格式转换为所述单 列存储, 以及压缩并存储所述緩冲区中的数据; 或者,  If the second storage format of the data in the buffer is a single column storage, converting a storage format of the data in the buffer from the first storage format to the single column storage, and compressing and Storing data in the buffer; or
若所述緩冲区中的数据的所述第二存储格式为行列混合存储或 行存储, 则将所述緩冲区中的数据的存储格式从所述第一存储格式 转换为所述行列混合存储或所述行存储, 并存储所述緩冲区中的数 据。 结合第三方面的第五种可能的实现方式, 在第三方面的第七种 可能的实现方式中, 所述根据所述第二存储格式和所述转换时刻, 在緩冲区中将所述数据库中的数据重组之前, 所述方法还包括: 若所述第一存储格式为行存储, 则按行读取所述数据库中的数 据。 If the second storage format of the data in the buffer is row-column mixed storage or row storage, converting a storage format of data in the buffer from the first storage format to the row-column mix The storage or the row is stored and the data in the buffer is stored. With reference to the fifth possible implementation manner of the third aspect, in a seventh possible implementation manner of the third aspect, the performing, according to the second storage format and the switching moment, in a buffer Before the data is reorganized in the database, the method further includes: if the first storage format is row storage, reading the data in the database by row.
结合前述的第三方面或第三方面的第一种可能的实现方式至第 七种可能的实现方式中的任一种实现方式, 在第三方面的第八种可 能的实现方式中, 所述若所述压缩比满足第一预设条件, 且所述排 序时间满足第二预设条件, 则进行对外服务之后, 所述方法还包括: 釆集所述存储格式转换后的数据库的数据吞吐量及查询语句响 应时间。  With reference to the foregoing third aspect, or any one of the first possible implementation manner of the third aspect to the seventh possible implementation manner, in an eighth possible implementation manner of the third aspect, If the compression ratio meets the first preset condition, and the sorting time meets the second preset condition, after performing the external service, the method further includes: collecting data throughput of the database after the storage format conversion And query statement response time.
结合前述的第三方面或第三方面的第一种可能实现方式至第八 种可能的实现方式中的任一种实现方式, 在第三方面的第九种可能 的实现方式中, 所述系统性能指标至少包括数据量、 查询平均访问 数据量、 处理行数占读取行数比例、 查询平均访问的列比、 以及查 询语句所占比例。  With reference to the foregoing third aspect, or any one of the first possible implementation to the eighth possible implementation of the third aspect, in a ninth possible implementation manner of the third aspect, the system The performance indicators include at least the amount of data, the average amount of data accessed by the query, the number of rows processed, the ratio of rows read, the column ratio of the average query, and the proportion of query statements.
本发明实施例提供一种数据存储格式的转换方法及装置, 若以 第一存储格式存储数据的数据库的系统性能指标满足用户设定的转 换条件, 控制器则确定数据库中存储数据所需的第二存储格式, 并 将数据库中的数据的存储格式从第一存储格式转换为第二存储格 式, 然后, 控制器判断存储格式转换后的数据库的压缩比是否满足 第一预设条件, 并对存储格式转换后的数据库中的数据进行排序, 测试排序时间是否满足第二预设条件, 若压缩比满足第一预设条件, 且排序时间满足第二预设条件, 则进行对外服务; 或者, 若压缩比 不满足第一预设条件, 和 /或排序时间不满足第二预设条件, 则根据 用户待设定的核心指标阈值重新执行确定数据库中存储数据所需的 第二存储格式。 通过该方案, 控制器通过对系统实际运行数据的监 测, 不断确定数据库系统中数据的最优存储格式, 即控制器能够使 数据库系统根据负载情况动态确定数据库系统中数据的存储格式, 解决了 目前在改变数据库存储格式时, 需要数据库管理员手动离线 修改, 系统存储空间和利用率低的问题, 通过自决策数据存储格式, 降低了查询语句的吞吐量, 同时提升系统的存储空间和利用率。 Embodiments of the present invention provide a data storage format conversion method and apparatus. If a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user, the controller determines a required number of data stored in the database. The second storage format converts the storage format of the data in the database from the first storage format to the second storage format, and then the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and stores The data in the formatted database is sorted, and the test sorting time meets the second preset condition. If the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the external service is performed; or, if If the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset condition, the second storage format required to determine the data stored in the database is re-executed according to the core indicator threshold to be set by the user. Through the scheme, the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation. It solves the problem that the database administrator needs to manually modify the database storage format, and the system storage space and utilization rate are low. The self-decision data storage format reduces the throughput of the query statement and improves the storage space of the system. Utilization rate.
附图说明  DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下 面将对实施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本发明的一些实施例, 对于 本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以 根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图 1为本发明实施例的控制器的结构示意图一;  1 is a schematic structural view 1 of a controller according to an embodiment of the present invention;
图 2为本发明实施例的控制器的结构示意图二;  2 is a schematic structural diagram 2 of a controller according to an embodiment of the present invention;
图 3为本发明实施例的控制器的结构示意图三;  3 is a schematic structural diagram 3 of a controller according to an embodiment of the present invention;
图 4为本发明实施例的控制器的结构示意图四;  4 is a schematic structural diagram 4 of a controller according to an embodiment of the present invention;
图 5 为本发明实施例的数据存储格式的转换方法流程示意图 图 6 为本发明实施例的数据存储格式的转换方法流程示意图 具体实施方式  FIG. 5 is a schematic flowchart of a method for converting a data storage format according to an embodiment of the present invention. FIG. 6 is a schematic flowchart of a method for converting a data storage format according to an embodiment of the present invention.
下面将结合本发明实施例中的附图, 对本发明实施例中的技术 方案进行清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明 一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本 领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他 实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本文描述的各种技术适用于数据库领域, 例如: 数据库底层数 据存储格式的动态优化、 数据库集群中数据分布、 数据库物化策略、 数据库索引策略等。  The various techniques described in this paper apply to the database domain, such as: dynamic optimization of database underlying data storage formats, data distribution in database clusters, database materialization strategies, database indexing strategies, and so on.
目前, 基于数据库的应用主要分为 OLTP ( On-Line Transact ion Processing , 联机事 务处理 ) 和 0LAP ( On-Line Analytical Processing, 联机分析处理) 两类, 前者需要处理涉及频繁 "写" 操作的事物性查询, 后者侧重于处理涉及大量 "读" 操作的分析型 查询。 列存储在读操作上有着较大的优势, 非常适用于 0LAP查询, 但对写操作的支持并不理想, 因此并不适合 0LTP 查询。 行存储对 0LTP查询支持的非常好。 Currently, database-based applications are mainly divided into two types: OLTP (On-Line Transaction Processing) and ONAP Analytical Processing (OLAP). The former needs to be processed frequently to "write". The transactional query of operations, the latter focuses on processing analytical queries involving a large number of "read" operations. Column storage has a big advantage in read operations, and is very suitable for 0LAP queries, but support for write operations is not ideal, so it is not suitable for 0LTP queries. Row storage is very good for 0LTP query support.
行存储结构的优点在于快速数据加载和动态负载的高适应能 力, 这是因为行存储保证了相同记录的所有域都在同一个节点。 但 是, 行存储的缺点也是显而易见的, 例如, 它不能支持快速查询处 理, 因为当查询仅仅针对多列表中的少数几列时, 它不能跳过不必 要的列读取; 此外, 由于混合着不同数据值的列, 行存储不易获得 一个极高的压缩比, 即空间利用率不易大幅提高。 尽管通过熵编码 和利用列相关性能够获得一个较好的压缩比, 但是复杂数据存储实 现会导致解压开销增大。  The advantage of the row storage structure is the high adaptability of fast data loading and dynamic loading, because row storage guarantees that all domains of the same record are on the same node. However, the disadvantages of row storage are obvious, for example, it does not support fast query processing, because when a query is only for a few columns in a multi-list, it cannot skip unnecessary column reads; Columns of data values, row storage is not easy to obtain a very high compression ratio, that is, space utilization is not easy to increase significantly. Although a better compression ratio can be obtained by entropy coding and column correlation, complex data storage implementations result in increased decompression overhead.
列存储将同一个记录的不同域分散存储, 而这些记录的重构将 带来较大开销。 但是列存储能够避免读不必要的列, 并且压缩一个 列中的相似数据能够达到较高的压缩比。  Column storage scatters the different domains of the same record, and the refactoring of these records introduces significant overhead. However, column storage can avoid reading unnecessary columns, and compressing similar data in a column can achieve a higher compression ratio.
目前, 综合行存储、 列存储的优缺点, 产生了各种行列组合存 储方式, 如 PAX或者 RCFile这些存储方式通过对底层存储格式的优 化, 使系统性能更加优化。 但是, 现有的数据库的行、 列或者行列 混合存储格式都是完全依赖于数据库初始化时的初始设定, 即用户 在创建数据库时所指定的数据库底层存储格式。 当用户需要更改数 据 库 存 储格 式 时 , 需 要数据 库 管 理 员 ( DBA , Database Administrator ) 手动离线修改, 缺少了系统自动调节优化功能。  At present, the advantages and disadvantages of comprehensive row and column storage have resulted in a variety of row and column combination storage methods. For example, PAX or RCFile storage methods optimize system performance by optimizing the underlying storage format. However, the existing database's row, column, or row-column mixed storage format is completely dependent on the initial setup of the database when it is initialized, which is the underlying storage format of the database that the user specified when creating the database. When the user needs to change the database storage format, the database administrator (DBA, Database Administrator) needs manual offline modification, and the system automatic adjustment optimization function is missing.
本发明提供一种数据存储格式的转换方法及装置, 能够使得数 据库系统根据负载情况, 动态确定数据库底层存储格式, 实现了系 统自动调节优化功能, 降低查询语句的吞吐量, 同时提升系统的存 储空间和利用率。 实施例一  The invention provides a data storage format conversion method and device, which can enable the database system to dynamically determine the underlying storage format of the database according to the load situation, realize the automatic adjustment and optimization function of the system, reduce the throughput of the query statement, and improve the storage space of the system. And utilization. Embodiment 1
本发明实施例提供一种控制器, 如图 1所示, 包括: 决策单元 10, 用于若以第一存储格式存储数据的数据库的系统 性能指标满足用户设定的转换条件, 则确定所述数据库中存储数据 所需的第二存储格式; An embodiment of the present invention provides a controller, as shown in FIG. 1, including: The determining unit 10 is configured to determine, if the system performance indicator of the database storing the data in the first storage format meets a conversion condition set by the user, determining a second storage format required for storing the data in the database;
存储格式转换单元 11, 用于将所述数据库中的数据的存储格式 从所述第一存储格式转换为所述决策单元 10 确定的所述第二存储 格式;  a storage format conversion unit 11 for converting a storage format of data in the database from the first storage format to the second storage format determined by the decision unit 10;
反馈单元 12, 用于判断存储格式转换后的数据库的压缩比是否 满足第一预设条件, 并对所述存储格式转换后的数据库中的数据进 行排序, 测试排序时间是否满足第二预设条件; 若所述压缩比满足 第一预设条件, 且所述排序时间满足第二预设条件, 则进行对外服 务, 或者, 若所述压缩比不满足第一预设条件, 和 /或所述排序时间 不满足第二预设条件, 则发送反馈信息至所述决策单元 10, 以使得 所述决策单元根据所述反馈信息中的用户待设定的核心指标阈值重 新确定所述数据库中存储数据所需的第二存储格式。  The feedback unit 12 is configured to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sort the data in the database after the storage format conversion, and test whether the sorting time meets the second preset condition. And if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, performing external service, or if the compression ratio does not satisfy the first preset condition, and/or If the sorting time does not satisfy the second preset condition, the feedback information is sent to the decision unit 10, so that the determining unit re-determines the stored data in the database according to the core indicator threshold to be set by the user in the feedback information. The second storage format required.
进一步地, 所述决策单元 10, 还用于在确定所述数据库中存储 数据所需的第二存储格式之前, 判断所述系统性能指标是否满足用 户设定的核心指标阈值; 若所述系统性能指标满足所述核心指标阈 值, 则判断所述系统性能指标是否满足所述转换条件。  Further, the determining unit 10 is further configured to: before determining a second storage format required for storing data in the database, determining whether the system performance indicator meets a core indicator threshold set by a user; If the indicator meets the core indicator threshold, it is determined whether the system performance indicator satisfies the conversion condition.
进一步地, 所述决策单元 10, 还用于若所述系统性能指标不满 足所述转换条件, 则保持所述数据库中存储数据的格式为所述第一 存储格式。  Further, the determining unit 10 is further configured to: if the system performance indicator does not satisfy the conversion condition, maintain a format of the stored data in the database as the first storage format.
进一步地, 如图 2所示, 所述控制器还包括数据釆集单元 13; 所述数据釆集单元 13, 还用于釆集所述系统性能指标。  Further, as shown in FIG. 2, the controller further includes a data collection unit 13; the data collection unit 13 is further configured to collect the system performance indicator.
进一步地, 所述决策单元 10, 还用于在确定所述数据库中存储 数据所需的第二存储格式之后, 根据用户配置信息, 确定将所述数 据库中的数据的存储格式从所述第一存储格式转换为所述第二存储 格式的转换时刻。  Further, the determining unit 10 is further configured to: after determining the second storage format required for storing the data in the database, determine, according to the user configuration information, a storage format of the data in the database from the first The storage format is converted to a conversion time of the second storage format.
进一步地, 所述存储格式转换单元 11, 具体用于根据所述决策 单元 10确定的所述第二存储格式和所述转换时刻, 在緩冲区中将所 述数据库中的数据重组, 若所述緩冲区中的数据量达到磁盘写阈值, 则将所述緩冲区中的数据写入磁盘。 Further, the storage format conversion unit 11 is specifically configured to: in the buffer according to the second storage format and the conversion time determined by the decision unit 10 Data reorganization in the database, if the amount of data in the buffer reaches the disk write threshold, the data in the buffer is written to the disk.
进一步地, 所述存储格式转换单元 1 1 , 具体用于若所述緩冲区 中的数据的所述第二存储格式为单列存储, 则将所述緩冲区中的数 据的存储格式从所述第一存储格式转换为所述单列存储, 以及压缩 并存储所述緩冲区中的数据; 或者, 若所述緩冲区中的数据的所述 第二存储格式为行列混合存储或行存储, 则将所述緩冲区中的数据 的存储格式从所述第一存储格式转换为所述行列混合存储或所述行 存储, 并存储所述緩冲区中的数据。  Further, the storage format conversion unit 1 is specifically configured to: if the second storage format of the data in the buffer is a single column storage, store the data in the buffer from a storage format Converting the first storage format to the single column storage, and compressing and storing the data in the buffer; or if the second storage format of the data in the buffer is a row or column hybrid storage or a row storage And converting the storage format of the data in the buffer from the first storage format to the row-column mixed storage or the row storage, and storing the data in the buffer.
进一步地, 如图 2所示, 所述控制器还包括读取单元 1 4 , 所述读取单元 1 4 , 用于在所述存储格式转换单元 1 1 根据所述 第二存储格式和所述转换时刻, 在緩冲区中将所述数据库中的数据 重组之前, 若所述第一存储格式为行存储, 则按行读取所述数据库 中的数据。  Further, as shown in FIG. 2, the controller further includes a reading unit 1 4 for the storage format conversion unit 1 1 according to the second storage format and the At the time of conversion, before the data in the database is recombined in the buffer, if the first storage format is row storage, the data in the database is read in rows.
进一步地, 所述数据釆集单元 1 3 , 还用于在若所述压缩比满足 第一预设条件, 且所述排序时间满足第二预设条件, 则进行对外服 务之后, 釆集所述存储格式转换后的数据库的数据吞吐量及查询语 句响应时间。  Further, the data collection unit 13 is further configured to: after the external compression is performed, if the compression ratio satisfies the first preset condition, and the scheduling time meets the second preset condition, The data throughput of the database after the format conversion and the response time of the query statement.
进一步地, 所述系统性能指标至少包括数据量、 查询平均访问 数据量、 处理行数占读取行数比例、 查询平均访问的列比、 以及查 询语句所占比例。  Further, the system performance indicator includes at least a data volume, a query average access data amount, a processing row number as a read row number ratio, a query average access column ratio, and a proportion of the query statement.
本发明提供了一种控制器, 主要包括决策单元、 存储格式转换 单元和反馈单元。 若以第一存储格式存储数据的数据库的系统性能 指标满足用户设定的转换条件, 决策单元则确定数据库中存储数据 所需的第二存储格式, 然后, 存储格式转换单元将数据库中的数据 的存储格式从第一存储格式转换为第二存储格式, 最后, 反馈单元 判断存储格式转换后的数据库的压缩比是否满足第一预设条件, 并 对存储格式转换后的数据库中的数据进行排序, 测试排序时间是否 满足第二预设条件, 若压缩比满足第一预设条件, 且排序时间满足 第二预设条件, 则进行对外服务; 或者, 若压缩比不满足第一预设 条件, 和 /或排序时间不满足第二预设条件, 则发送反馈信息至决策 单元, 以使得决策单元根据反馈信息中的用户待设定的核心指标阈 值重新确定数据库中存储数据所需的第二存储格式。 通过该方案, 控制器通过对系统实际运行数据的监测, 不断确定数据库系统中数 据的最优存储格式, 即控制器能够使数据库系统根据负载情况动态 确定数据库系统中数据的存储格式, 解决了 目前在改变数据库存储 格式时, 需要数据库管理员手动离线修改, 系统存储空间和利用率 低的问题, 通过自决策数据存储格式, 降低了查询语句的吞吐量, 同时提升系统的存储空间和利用率。 实施例二 The invention provides a controller, which mainly comprises a decision unit, a storage format conversion unit and a feedback unit. If the system performance indicator of the database storing the data in the first storage format satisfies the conversion condition set by the user, the decision unit determines a second storage format required for storing the data in the database, and then the storage format conversion unit sets the data in the database. The storage format is converted from the first storage format to the second storage format. Finally, the feedback unit determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion. Whether the test sorting time satisfies the second preset condition, if the compression ratio satisfies the first preset condition, and the sorting time is satisfied If the second predetermined condition is used, the external service is performed; or if the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset condition, the feedback information is sent to the decision unit, so that the The core indicator threshold to be set by the user in the feedback information re-determines the second storage format required to store the data in the database. Through the scheme, the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation, and solve the current When changing the database storage format, the database administrator needs to manually modify the offline, the system storage space and low utilization rate, through the self-decision data storage format, reduce the throughput of the query statement, and at the same time improve the storage space and utilization of the system. Embodiment 2
本发明实施例实施例提供一种控制器, 如图 3所示, 包括: 处理器 2 0 , 用于若以第一存储格式存储数据的数据库的系统性 能指标满足用户设定的转换条件, 则确定所述数据库中存储数据所 需的第二存储格式; 在格式转换器 2 1将所述数据库中的数据的存储 格式从所述第一存储格式转换为所述第二存储格式后, 判断存储格 式转换后的数据库的压缩比是否满足第一预设条件, 并对所述存储 格式转换后的数据库中的数据进行排序, 测试排序时间是否满足第 二预设条件; 若所述压缩比满足第一预设条件, 且所述排序时间满 足第二预设条件, 则进行对外服务, 或者, 若所述压缩比不满足第 一预设条件, 和 /或所述排序时间不满足第二预设条件, 则根据用户 待设定的核心指标阈值重新确定所述数据库中存储数据所需的第二 存储格式;  The embodiment of the present invention provides a controller, as shown in FIG. 3, including: a processor 20, configured to: if a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user, Determining a second storage format required for storing data in the database; determining, after the format converter 21 converts a storage format of the data in the database from the first storage format to the second storage format Whether the compression ratio of the formatted database satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time satisfies the second preset condition; if the compression ratio satisfies the first a preset condition, and if the sorting time satisfies the second preset condition, the external service is performed, or if the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset Condition, the second storage format required for storing data in the database is re-determined according to a core indicator threshold to be set by the user;
格式转换器 2 1 , 用于将所述数据库中的数据的存储格式从所述 第一存储格式转换为所述处理器 2 0确定的所述第二存储格式。  The format converter 2 1 is configured to convert a storage format of data in the database from the first storage format to the second storage format determined by the processor 20.
进一步地, 所述处理器 2 0 , 还用于在确定所述数据库中存储数 据所需的第二存储格式之前, 判断所述系统性能指标是否满足用户 设定的核心指标阈值; 若所述系统性能指标满足所述核心指标阈值, 则判断所述系统性能指标是否满足所述转换条件。 Further, the processor 20 is further configured to: before determining a second storage format required for storing data in the database, determining whether the system performance indicator meets a core indicator threshold set by a user; The performance indicator meets the core indicator threshold. Then, it is determined whether the system performance indicator satisfies the conversion condition.
进一步地, 所述处理器 2 0 , 还用于若所述系统性能指标不满足 所述转换条件, 则保持所述数据库中存储数据的格式为所述第一存 储格式。  Further, the processor 20 is further configured to keep the format of the stored data in the database as the first storage format if the system performance indicator does not satisfy the conversion condition.
进一步地, 如图 4所示, 所述控制器还包括数据釆集器 22 ; 所述数据釆集器 22 , 用于釆集所述系统性能指标。  Further, as shown in FIG. 4, the controller further includes a data collector 22, and the data collector 22 is configured to collect the system performance indicators.
进一步地, 所述处理器 2 0 , 还用于在确定所述数据库中存储数 据所需的第二存储格式之后, 根据用户配置信息, 确定将所述数据 库中的数据的存储格式从所述第一存储格式转换为所述第二存储格 式的转换时刻。  Further, the processor 20 is further configured to: after determining a second storage format required for storing data in the database, determine, according to user configuration information, a storage format of data in the database from the first A storage format is converted to a conversion instant of the second storage format.
进一步地, 所述格式转换器 2 1 , 具体用于根据所述第二存储格 式和所述转换时刻, 在緩冲区中将所述数据库中的数据重组, 若所 述緩冲区中的数据量达到磁盘写阈值, 则将所述緩冲区中的数据写 入磁盘。  Further, the format converter 2 1 is specifically configured to reassemble data in the database in a buffer according to the second storage format and the conversion time, if data in the buffer When the amount reaches the disk write threshold, the data in the buffer is written to the disk.
进一步地, 所述格式转换器 2 1 , 具体用于若所述緩冲区中的数 据的所述第二存储格式为单列存储, 则将所述緩冲区中的数据的存 储格式从所述第一存储格式转换为所述单列存储, 以及压缩并存储 所述緩冲区中的数据; 或者, 若所述緩冲区中的数据的所述第二存 储格式为行列混合存储或行存储, 则将所述緩冲区中的数据的存储 格式从所述第一存储格式转换为所述行列混合存储或所述行存储, 并存储所述緩冲区中的数据。  Further, the format converter 2 1 is specifically configured to: if the second storage format of the data in the buffer is a single column storage, store a format of the data in the buffer from the Converting the first storage format to the single column storage, and compressing and storing the data in the buffer; or, if the second storage format of the data in the buffer is a row or column hybrid storage or a row storage, And storing the storage format of the data in the buffer from the first storage format to the row and column hybrid storage or the row storage, and storing the data in the buffer.
进一步地, 所述处理器 2 0 , 还用于在所述存储格式转换器根据 所述第二存储格式和所述转换时刻, 在緩冲区中将所述数据库中的 数据重组之前, 若所述第一存储格式为行存储, 则按行读取所述数 据库中的数据。  Further, the processor 20 is further configured to: before the storage format converter reorganizes data in the database in a buffer according to the second storage format and the conversion moment, The first storage format is row storage, and the data in the database is read in rows.
进一步地, 所述数据釆集器 2 2 , 用于在若所述压缩比满足第一 预设条件, 且所述排序时间满足第二预设条件, 则进行对外服务之 后, 釆集所述存储格式转换后的数据库的数据吞吐量及查询语句响 应时间。 进一步地, 所述系统性能指标至少包括数据量、 查询平均访问 数据量、 处理行数占读取行数比例、 查询平均访问的列比、 以及查 询语句所占比例。 Further, the data collector 2 2 is configured to collect the storage after the external service is performed if the compression ratio satisfies the first preset condition and the sorting time satisfies the second preset condition. The data throughput of the formatted database and the response time of the query statement. Further, the system performance indicator includes at least a data volume, a query average access data amount, a processed row number as a read row number ratio, a query average access column ratio, and a proportion of the query statement.
本发明实施例提供一种控制器,主要包括处理器和格式转换器。 若以第一存储格式存储数据的数据库的系统性能指标满足用户设定 的转换条件, 控制器则确定数据库中存储数据所需的第二存储格式, 并将数据库中的数据的存储格式从第一存储格式转换为第二存储格 式, 然后, 控制器判断存储格式转换后的数据库的压缩比是否满足 第一预设条件, 并对存储格式转换后的数据库中的数据进行排序, 测试排序时间是否满足第二预设条件, 若压缩比满足第一预设条件, 且排序时间满足第二预设条件, 则进行对外服务; 或者, 若压缩比 不满足第一预设条件, 和 /或排序时间不满足第二预设条件, 则根据 用户待设定的核心指标阈值重新执行确定数据库中存储数据所需的 第二存储格式。 通过该方案, 控制器通过对系统实际运行数据的监 测, 不断确定数据库系统中数据的最优存储格式, 即控制器能够使 数据库系统根据负载情况动态确定数据库系统中数据的存储格式, 解决了 目前在改变数据库存储格式时, 需要数据库管理员手动离线 修改, 系统存储空间和利用率低的问题, 通过自决策数据存储格式, 降低了查询语句的吞吐量, 同时提升系统的存储空间和利用率。 实施例三  Embodiments of the present invention provide a controller, which mainly includes a processor and a format converter. If the system performance indicator of the database storing the data in the first storage format satisfies the conversion condition set by the user, the controller determines a second storage format required for storing the data in the database, and stores the data in the database from the first format. The storage format is converted into the second storage format, and then the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time is satisfied. a second preset condition, if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the external service is performed; or, if the compression ratio does not satisfy the first preset condition, and/or the sorting time is not When the second preset condition is met, the second storage format required to determine the data stored in the database is re-executed according to the core indicator threshold to be set by the user. Through the scheme, the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation, and solve the current When changing the database storage format, the database administrator needs to manually modify the offline, the system storage space and low utilization rate, through the self-decision data storage format, reduce the throughput of the query statement, and at the same time improve the storage space and utilization of the system. Embodiment 3
本发明实施例提供一种数据存储格式的转换方法,如图 5所示, 该方法包括:  An embodiment of the present invention provides a data storage format conversion method. As shown in FIG. 5, the method includes:
S 1 0 1、 若以第一存储格式存储数据的数据库的系统性能指标满 足用户设定的转换条件, 控制器则确定数据库中存储数据所需的第 二存储格式。  S 1 0 1. If the system performance indicator of the database storing the data in the first storage format satisfies the conversion condition set by the user, the controller determines the second storage format required for storing the data in the database.
现有的数据库的行、 列或者行列混合存储格式都是完全依赖于 数据库初始化时的初始设定, 即用户在创建数据库时所指定的数据 库底层存储格式。 当用户需要更改数据库存储格式时, 需要 DB A 手 动离线修改, 缺少了系统自动调节优化功能。 The existing database's row, column, or row and column hybrid storage format is completely dependent on the initial setting of the database initialization, that is, the underlying storage format of the database specified by the user when creating the database. DB A hand is required when the user needs to change the database storage format Dynamic offline modification, lack of automatic adjustment and optimization function.
为了解决系统不能自动调节优化数据库中数据的存储格式的问 题, 本发明提供一种数据存储格式的转换方法, 能够使得数据库系 统根据负载情况, 动态确定数据库底层存储格式, 实现了系统自动 调节优化功能。  In order to solve the problem that the system cannot automatically adjust the storage format of the data in the optimized database, the present invention provides a data storage format conversion method, which can enable the database system to dynamically determine the underlying storage format of the database according to the load situation, and realize the automatic adjustment and optimization function of the system. .
在实际应用中, 数据库的 0L TP应用和 0LAP应用分别在写操作 和读操作上体现优势。 为了综合行存储、 列存储的优缺点, 产生了 各种行歹 'J组合存储方式, 0LTP和 0LAP ί虫合。 在面向 0LTP和 0LAP 融合的应用环境中, 数据库初始化以行方式存在。  In practical applications, the 0L TP application and the 0LAP application of the database have advantages in write operations and read operations, respectively. In order to integrate the advantages and disadvantages of row storage and column storage, various types of 'J combination storage methods, 0LTP and 0LAP bugs are generated. In an application environment for 0LTP and 0LAP convergence, database initialization exists in rows.
完成数据库系统中数据存储格式的转换, 控制器需要首先确定 数据库中存储数据所需的存储格式, 即确定数据库中存储的数据需 要以哪一种格式进行存储。  To complete the conversion of the data storage format in the database system, the controller needs to first determine the storage format required to store the data in the database, that is, to determine which format the data stored in the database needs to be stored.
具体的, 若以第一存储格式存储数据的数据库的系统性能指标 满足用户设定的转换条件, 控制器则确定数据库中存储数据所需的 第二存储格式。  Specifically, if the system performance indicator of the database storing the data in the first storage format satisfies the conversion condition set by the user, the controller determines a second storage format required for storing the data in the database.
其中, 系统性能指标为控制器在一定的时间周期内釆集到的, 至少包括数据量、 查询平均访问数据量、 处理行数占读取行数比例、 查询平均访问的列比、 以及查询语句所占比例。  The system performance indicator is collected by the controller in a certain period of time, and includes at least a data volume, a query average access data amount, a processed row number, a read row number ratio, a query average access column ratio, and a query statement. The proportion.
进一步地, 控制器在确定数据库中存储数据所需的第二存储格 式之前, 需要判断数据库的系统性能指标是否满足用户设定的转换 条件, 若该系统性能指标不满足用户设定的转换条件, 则控制器不 做任何处理, 数据库中的数据依旧以第一存储格式存储。  Further, before determining the second storage format required for storing the data in the database, the controller needs to determine whether the system performance indicator of the database meets the conversion condition set by the user, and if the system performance indicator does not meet the conversion condition set by the user, The controller does not do any processing, and the data in the database is still stored in the first storage format.
示例性的, 若用户设定的转换条件为任意一列访问频度 (访问 此列次数 /访问此表次数) 达到 8 0%即可将该列转换为列存储, 当控 制器釆集到数据库中第 η 列的访问频度达到 8 0% , 控制器则确定该 第 η列数据釆用列存储格式进行存储。  Exemplarily, if the conversion condition set by the user is any column access frequency (the number of accesses to the column/the number of times the table is accessed) reaches 80%, the column can be converted into column storage, and the controller is collected into the database. The access frequency of the nth column reaches 80%, and the controller determines that the nth column data is stored in the column storage format.
相应的, 控制器确定数据库中存储数据所需的第二存储格式的 同时, 控制器需要计算出该数据库中需要进行行列转换的表, 确定 数据库中需要聚合存储的列和数据库中需要单独存储的列。 例如, 单独访问列频率最高的列按列方式单独存储。 Correspondingly, when the controller determines the second storage format required for storing the data in the database, the controller needs to calculate a table in the database that needs to perform row and column conversion, and determine that the columns in the database that need to be aggregated and the database need to be separately stored. Column. E.g, Individually accessed columns with the highest frequency of columns are stored separately in columns.
进一步地, 在控制器确定数据库中存储数据所需的第二存储格 式后, 控制器还需要根据用户配置信息, 确定将数据库中数据的存 储格式从第一存储格式转换为第二存储格式的转换时刻。  Further, after the controller determines the second storage format required for storing the data in the database, the controller further determines, according to the user configuration information, converting the storage format of the data in the database from the first storage format to the second storage format. time.
具体的, 控制器可以根据系统性能指标, 在负载空闲时刻进行 存储格式的转换, 也可以为在确定数据库中存储数据所需的第二存 储格式后, 提示用户可进行转换并将第二存储格式显示给用户, 当 用户输入命令后控制器进行存储格式的转换。  Specifically, the controller may perform a storage format conversion according to the system performance indicator during the idle time of the load, or may prompt the user to perform the conversion and the second storage format after determining the second storage format required for storing the data in the database. Displayed to the user, the controller performs the conversion of the storage format when the user inputs the command.
5 1 02、 控制器将数据库中的数据的存储格式从第一存储格式转 换为第二存储格式。  5 1 02. The controller converts the storage format of the data in the database from the first storage format to the second storage format.
在控制器确定数据库存储数据所需的第二存储格式后, 控制器 将数据库中的数据的存储格式从第一存储格式转换为第二存储格 式。  After the controller determines the second storage format required by the database to store the data, the controller converts the storage format of the data in the database from the first storage format to the second storage format.
具体的, 控制器确定第二存储格式和存储格式的转换时刻后, 控制器在緩冲区中将数据库中的数据按照第二存储格式重组, 当緩 冲区中的数据量达到磁盘写阈值时, 控制器则将该緩冲区中的数据 写入磁盘。  Specifically, after the controller determines the conversion time of the second storage format and the storage format, the controller reorganizes the data in the database according to the second storage format in the buffer, when the amount of data in the buffer reaches the disk write threshold. The controller writes the data in the buffer to disk.
进一步地, 控制器根据緩冲区中的数据的第二存储格式, 对緩 冲区中的数据进行不同处理。 若緩冲区中的数据的所述第二存储格 式为单列存储, 则将緩冲区中的数据的存储格式从第一存储格式转 换为单列存储, 以及压缩并存储緩冲区中的数据; 或者, 若緩冲区 中的数据的第二存储格式为行列混合存储或行存储, 则将緩冲区中 的数据的存储格式从第一存储格式转换为行列混合存储或行存储, 并存储緩冲区中的数据。  Further, the controller performs different processing on the data in the buffer according to the second storage format of the data in the buffer. If the second storage format of the data in the buffer is a single column storage, converting the storage format of the data in the buffer from the first storage format to the single column storage, and compressing and storing the data in the buffer; Alternatively, if the second storage format of the data in the buffer is row-column mixed storage or row storage, the storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or the row storage, and the storage is slow. The data in the flush area.
5 1 0 3、 控制器判断存储格式转换后的数据库的压缩比是否满足 第一预设条件, 并对存储格式转换后的数据库中的数据进行排序, 测试排序时间是否满足第二预设条件。  5 1 0 3. The controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time satisfies the second preset condition.
在控制器将数据库中的数据的存储格式从第一存储格式转换为 第二存储格式后, 控制器需要检测存储格式的转换是否合理, 是否 能够优化数据库。 After the controller converts the storage format of the data in the database from the first storage format to the second storage format, the controller needs to detect whether the conversion of the storage format is reasonable, whether Ability to optimize the database.
具体的, 控制器通过对数据库中已经转换为列的表进行大小变 化的判断, 并通过简单的排序测试排序时间。 由于, 表的大小直接 体现了数据库的压缩比, 压缩比越高空间利用率越高; 排序时间的 长短体现了消耗 C P U 内存资源的多少, 列存储能够提高排序效率。 因此, 控制器检测存储格式的转换是否合理, 需要判断存储格式转 换后的数据库的压缩比是否满足第一预设条件, 并对存储格式转换 后的数据库中的数据进行排序, 测试排序时间是否满足第二预设条 件。  Specifically, the controller judges the size change of the table that has been converted into a column in the database, and tests the sorting time by a simple sorting. Because the size of the table directly reflects the compression ratio of the database, the higher the compression ratio, the higher the space utilization; the length of the sorting time reflects the amount of memory consumed by the C P U, and the column storage can improve the sorting efficiency. Therefore, the controller detects whether the conversion of the storage format is reasonable, and needs to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time is satisfied. The second preset condition.
其中, 第一预设条件为存储格式转换后的数据库的压缩比小于 或等于第一预设阈值, 第二预设条件为排序时间小于或等于第二预 设阈值。  The first preset condition is that the compression ratio of the database after the storage format conversion is less than or equal to the first preset threshold, and the second preset condition is that the sorting time is less than or equal to the second preset threshold.
5 1 04、 若压缩比满足第一预设条件, 且排序时间满足第二预设 条件, 控制器则进行对外服务。  5 1 04. If the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the controller performs external service.
具体的, 若压缩比满足第一预设条件, 且排序时间满足第二预 设条件, 则说明当前存储格式的转换能够使得数据库系统的性能得 到提升, 控制器则进行对外服务。  Specifically, if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, it indicates that the current storage format conversion can improve the performance of the database system, and the controller performs external service.
进一步地, 进行对外服务后, 控制器需要进一步观察釆集相应 指标, 判断数据库系统性能是否优化, 即数据库系统中是否需要进 一步分拆成列, 或者避免无需分割的列被分割。  Further, after performing external services, the controller needs to further observe the corresponding indicators to determine whether the performance of the database system is optimized, that is, whether the database system needs to be further split into columns, or to avoid splitting the columns without division.
其中,控制器釆集的指标包括数据吞吐量和查询语句响应时间。 数据吞吐量体现了是否减少了冗余数据读取, 若数据吞吐量的变化 不明显则说明有可能需要进一步分拆成列; 查询语句响应时间为判 断列存储是否有效的直接指标。  Among them, the indicators collected by the controller include data throughput and query statement response time. The data throughput reflects whether the redundant data reading is reduced. If the data throughput is not obvious, it may need to be further split into columns. The query response time is a direct indicator to determine whether the column storage is valid.
5 1 0 5、 若压缩比不满足第一预设条件, 和 /或排序时间不满足第 二预设条件, 控制器则根据用户待设定的核心指标阈值重新执行下 5 1 0 5, if the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset condition, the controller re-executes according to the core index threshold to be set by the user.
―次数据存储格式的转换。 Conversion of the secondary data storage format.
若压缩比不满足第一预设条件,和 /或排序时间不满足第二预设 条件, 则说明控制器确定的第二存储格式不能使得数据库系统的 ' f生 能得到有效提升, 控制器需要根据用户待设定的核心指标阈值重新 执行下一次数据存储格式的转换重新确定数据的第二存储格式。 If the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset condition, it indicates that the second storage format determined by the controller cannot make the database system's Can be effectively improved, the controller needs to re-execute the next data storage format conversion according to the core index threshold to be set by the user to re-determine the second storage format of the data.
本发明实施例提供一种数据存储格式的转换方法, 若以第一存 储格式存储数据的数据库的系统性能指标满足用户设定的转换条 件, 控制器则确定数据库中存储数据所需的第二存储格式, 并将数 据库中的数据的存储格式从第一存储格式转换为第二存储格式, 然 后, 控制器判断存储格式转换后的数据库的压缩比是否满足第一预 设条件, 并对存储格式转换后的数据库中的数据进行排序, 测试排 序时间是否满足第二预设条件, 若压缩比满足第一预设条件, 且排 序时间满足第二预设条件, 则进行对外服务; 或者, 若压缩比不满 足第一预设条件, 和 /或排序时间不满足第二预设条件, 则根据用户 待设定的核心指标阈值重新执行确定数据库中存储数据所需的第二 存储格式。 通过该方案, 控制器通过对系统实际运行数据的监测, 不断确定数据库系统中数据的最优存储格式, 即控制器能够使数据 库系统根据负载情况动态确定数据库系统中数据的存储格式, 解决 了 目前在改变数据库存储格式时, 需要数据库管理员手动离线修改, 系统存储空间和利用率低的问题, 通过自决策数据存储格式, 降低 了查询语句的吞吐量, 同时提升系统的存储空间和利用率。 实施例四  An embodiment of the present invention provides a data storage format conversion method. If a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user, the controller determines a second storage required to store data in the database. Formatting, converting the storage format of the data in the database from the first storage format to the second storage format, and then, the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and converts the storage format After the data in the database is sorted, whether the test sorting time satisfies the second preset condition, if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the external service is performed; or, if the compression ratio is If the first preset condition is not met, and/or the sorting time does not satisfy the second preset condition, the second storage format required to determine the data stored in the database is re-executed according to the core indicator threshold to be set by the user. Through the scheme, the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation, and solve the current When changing the database storage format, the database administrator needs to manually modify the offline, the system storage space and low utilization rate, through the self-decision data storage format, reduce the throughput of the query statement, and at the same time improve the storage space and utilization of the system. Embodiment 4
S 2 0 K 控制器釆集以第一存储格式存储数据的数据库的系统性 才旨才示。  The S 2 0 K controller collects the systematic nature of the database in which the data is stored in the first storage format.
现有的数据库的行、 列或者行列混合存储格式都是完全依赖于 数据库初始化时的初始设定, 即用户在创建数据库时所指定的数据 库底层存储格式。 当用户需要更改数据库存储格式时, 需要 DBA 手 动离线修改, 缺少了系统自动调节优化功能。  The row, column, or row-column hybrid storage format of an existing database is completely dependent on the initial setting of the database initialization, that is, the underlying storage format of the database specified by the user when creating the database. When the user needs to change the database storage format, the DBA needs to manually modify it offline, and the system automatic adjustment optimization function is missing.
为了解决系统不能自动调节优化数据库中数据的存储格式的问 题, 本发明提供一种数据存储格式的转换方法, 能够使得数据库系 统根据负载情况, 动态确定数据库底层存储格式, 实现了系统自动 调节优化功能。 In order to solve the problem that the system cannot automatically adjust the storage format of the data in the optimized database, the present invention provides a data storage format conversion method, which can enable the database system to dynamically determine the underlying storage format of the database according to the load situation, and realize the system automatic Adjust the optimization function.
在实际应用中, 数据库的 0L TP应用和 0LAP应用分别在写操作 和读操作上体现优势。 为了综合行存储、 列存储的优缺点, 产生了 各种行歹 'J组合存储方式, 0LTP和 0LAP ί虫合。 在面向 0LTP和 0LAP 融合的应用环境中, 数据库初始化以行方式存在。  In practical applications, the 0L TP application and the 0LAP application of the database have advantages in write operations and read operations, respectively. In order to integrate the advantages and disadvantages of row storage and column storage, various types of 'J combination storage methods, 0LTP and 0LAP bugs are generated. In an application environment for 0LTP and 0LAP convergence, database initialization exists in rows.
具体的, 为了实现能够动态确定数据库底层存储格式, 控制器 首先釆集以第一存储格式存储数据的数据库的系统性能指标, 以使 得控制器根据釆集到的性能指标来确定数据库存储数据所需的存储 格式。  Specifically, in order to dynamically determine the underlying storage format of the database, the controller first collects system performance indicators of the database storing the data in the first storage format, so that the controller determines the database storage data according to the collected performance indicators. Storage format.
其中, 系统性能指标至少包括数据量、 查询平均访问数据量、 处理行数占读取行数比例、 查询平均访问的列比、 以及查询语句所 占比例。 具体的,  The system performance indicator includes at least the amount of data, the average amount of data accessed by the query, the number of rows processed, the proportion of rows read, the column ratio of the average access of the query, and the proportion of the query. specific,
数据量为是否使用列存的重要指标, 数据量越大查询越多越适 合使用列存, 数据量大小为整个数据库数据量大小;  The amount of data is an important indicator of whether or not to use the column. The larger the amount of data, the more the query is more suitable for using the column, and the amount of data is the size of the entire database;
查询平均访问数据量为数据库每次查询平均使用的数据行数, 平均访问数据量很大的场景适合列存;  The average amount of access data is the number of data rows used by the database per query. The scenario with a large amount of average access data is suitable for inventory.
处理行数占读取行数比例指平均每次操作实际使用的数据行数 占全部读取行数的比例, 在读取分析数据库中的数据时, 有些数据 虽然被从磁盘读入, 但是实际上系统并未进行相关分析操作, 我们 期望的是读入的所有行数都能被系统所处理, 所以比值越高越适合 列存;  The ratio of the number of processed lines to the number of read lines refers to the ratio of the number of data lines actually used per operation to the total number of read lines. When reading the data in the analysis database, some data is read from the disk, but actually The system does not perform related analysis operations. We expect that all the number of rows read can be processed by the system, so the higher the ratio, the more suitable for inventory;
查询平均访问的列比指平均查询语句访问的列 占总列数的比 例, 该比值越小越适合列存;  Query the average access column ratio refers to the ratio of the columns accessed by the average query statement to the total number of columns, the smaller the ratio, the more suitable for column storage;
查询语句所占比例指查询操作在所有数据库操作中所占比例, 比值越接近 1 00%越适合列存。  The proportion of query statements refers to the proportion of query operations in all database operations. The closer the ratio is to 100%, the more suitable it is.
S 2 02、 控制器判断系统性能指标是否满足用户设定的核心指标 阈值。  S 2 02. The controller determines whether the system performance indicator satisfies the core indicator threshold set by the user.
在控制器釆集到以第一存储格式存储数据的数据库的系统性能 指标后, 控制器对该系统性能指标进行分析。 具体的, 控制器判断系统性能指标是否满足用户设定的核心指 标阈值, 即控制器根据用户设定的核心指标阈值和决策算法对系统 性能指标进行决策分析。 After the controller collects the system performance indicators of the database storing the data in the first storage format, the controller analyzes the system performance indicators. Specifically, the controller determines whether the system performance indicator satisfies the core index threshold set by the user, that is, the controller performs decision analysis on the system performance index according to the core index threshold and the decision algorithm set by the user.
可选的, 若用户设定的核心指标阈值包括: 查询平均访问的列 比阈值 Ta, 查询语句所占比例阈值 Tq 和处理行数占读取行数比例 阈值 Τρ , 则用户设定的决策算法可以为 (查询平均访问的列比 <Ta ) 和 /或 ( 查询语句所占比例〉 Tq ) 和 /或 (处理行数占读取行数比 例 >Tp )。  Optionally, if the core indicator threshold set by the user includes: querying the average access column ratio threshold Ta, the query statement occupying the threshold value Tq, and the processing row number occupying the read row number ratio threshold Τρ, the user-set decision algorithm It can be (query average access column ratio <Ta) and / or (query statement proportion > Tq) and / or (process row number to read row number ratio > Tp).
S203、 若系统性能指标满足核心指标阈值, 控制器则判断系统 性能指标是否满足用户设定的转换条件。  S203. If the system performance indicator meets the core indicator threshold, the controller determines whether the system performance indicator meets the conversion condition set by the user.
若控制器确定系统性能指标满足核心阈值, 即系统性能指标满 足用户设定的决策算法, 控制器则判断该性能指标是否满足用户设 定的转换条件。 只有性能指标满足用户设定的转换条件, 控制器才 能确定数据库中存储数据所需的存储格式。  If the controller determines that the system performance indicator meets the core threshold, that is, the system performance indicator satisfies the decision algorithm set by the user, the controller determines whether the performance indicator satisfies the conversion condition set by the user. Only if the performance indicator meets the conversion conditions set by the user, the controller can determine the storage format required to store the data in the database.
示例性的, 若用户设定的转换条件为任意一列访问频度 (访问 此列次数 /访问此表次数) 达到 80%即可将该列转换为列存储, 当控 制器釆集到数据库中第 n 列的访问频度达到 80%, 控制器才可确定 该第 n列数据釆用列存储格式进行存储。  Exemplarily, if the conversion condition set by the user is any column access frequency (the number of accesses to the column/the number of times the table is accessed) reaches 80%, the column can be converted into a column storage, when the controller is collected into the database. The access frequency of the n columns reaches 80%, and the controller can determine that the nth column of data is stored in the column storage format.
S204、 若系统性能指标满足转换条件, 控制器则确定数据库中 存储数据所需的第二存储格式。  S204. If the system performance indicator satisfies the conversion condition, the controller determines a second storage format required for storing data in the database.
具体的, 若以第一存储格式存储数据的数据库的系统性能指标 满足用户设定的转换条件, 则说明数据库中的数据的存储格式可以 转换, 控制器可确定数据库中存储数据所需的第二存储格式。  Specifically, if the system performance indicator of the database storing the data in the first storage format satisfies the conversion condition set by the user, the storage format of the data in the database may be converted, and the controller may determine the second required for storing the data in the database. Storage format.
相应的, 控制器确定数据库中存储数据所需的第二存储格式的 同时, 控制器需要计算出该数据库中需要进行行列转换的表, 确定 数据库中需要聚合存储的列和数据库中需要单独存储的列。 例如, 单独访问列频率最高的列按列方式单独存储。  Correspondingly, when the controller determines the second storage format required for storing the data in the database, the controller needs to calculate a table in the database that needs to perform row and column conversion, and determine that the columns in the database that need to be aggregated and the database need to be separately stored. Column. For example, columns that have the highest frequency of access to the column are stored separately in columns.
S205、 控制器根据用户配置信息, 确定将数据库中的数据的存 储格式从第一存储格式转换为第二存储格式的转换时刻。 在控制器确定数据库中存储数据所需的第二存储格式后, 控制 器还需要根据用户配置信息, 确定将数据库中数据的存储格式从第 一存储格式转换为第二存储格式的转换时刻。 S205. The controller determines, according to the user configuration information, a conversion time for converting a storage format of the data in the database from the first storage format to the second storage format. After the controller determines the second storage format required for storing the data in the database, the controller further determines, according to the user configuration information, a conversion time for converting the storage format of the data in the database from the first storage format to the second storage format.
具体的, 控制器可以根据系统性能指标, 在负载空闲时刻进行 存储格式的转换, 也可以为在确定数据库中存储数据所需的第二存 储格式后, 提示用户可进行转换并将第二存储格式显示给用户, 当 用户输入命令后控制器进行存储格式的转换。  Specifically, the controller may perform a storage format conversion according to the system performance indicator during the idle time of the load, or may prompt the user to perform the conversion and the second storage format after determining the second storage format required for storing the data in the database. Displayed to the user, the controller performs the conversion of the storage format when the user inputs the command.
S 2 06、 控制器将数据库中的数据的存储格式从第一存储格式转 换为第二存储格式。  S 2 06. The controller converts the storage format of the data in the database from the first storage format to the second storage format.
在控制器确定数据库存储数据所需的第二存储格式后, 控制器 将数据库中的数据的存储格式从第一存储格式转换为第二存储格 式。  After the controller determines the second storage format required by the database to store the data, the controller converts the storage format of the data in the database from the first storage format to the second storage format.
具体的, 控制器确定第二存储格式和存储格式的转换时刻后, 控制器在緩冲区中将数据库中的数据按照第二存储格式重组, 当緩 冲区中的数据量达到磁盘写阈值时, 控制器则将该緩冲区中的数据 写入磁盘, 其中, 若第一存储格式为行存储, 则控制器首先按行读 取数据库中的数据, 然后才在红冲区中将数据库中的数据按照第二 存储格式重组。  Specifically, after the controller determines the conversion time of the second storage format and the storage format, the controller reorganizes the data in the database according to the second storage format in the buffer, when the amount of data in the buffer reaches the disk write threshold. The controller writes the data in the buffer to the disk, wherein if the first storage format is row storage, the controller first reads the data in the database by row, and then the database is in the red burst area. The data is reorganized according to the second storage format.
进一步地, 控制器根据緩冲区中的数据的第二存储格式, 对緩 冲区中的数据进行不同处理。 若緩冲区中的数据的所述第二存储格 式为单列存储, 则将緩冲区中的数据的存储格式从第一存储格式转 换为单列存储, 以及压缩并存储緩冲区中的数据; 或者, 若緩冲区 中的数据的第二存储格式为行列混合存储或行存储, 则将緩冲区中 的数据的存储格式从第一存储格式转换为行列混合存储或行存储, 并存储緩冲区中的数据。  Further, the controller performs different processing on the data in the buffer according to the second storage format of the data in the buffer. If the second storage format of the data in the buffer is a single column storage, converting the storage format of the data in the buffer from the first storage format to the single column storage, and compressing and storing the data in the buffer; Alternatively, if the second storage format of the data in the buffer is row-column mixed storage or row storage, the storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or the row storage, and the storage is slow. The data in the flush area.
S 2 07、 控制器判断存储格式转换后的数据库的压缩比是否满足 第一预设条件, 并对存储格式转换后的数据库中的数据进行排序, 测试排序时间是否满足第二预设条件。  S2 07: The controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time meets the second preset condition.
在控制器将数据库中的数据的存储格式从第一存储格式转换为 第二存储格式后, 控制器需要检测存储格式的转换是否合理, 是否 能够优化数据库。 The controller converts the storage format of the data in the database from the first storage format to After the second storage format, the controller needs to detect whether the conversion of the storage format is reasonable and whether the database can be optimized.
具体的, 控制器通过对数据库中已经转换为列的表进行大小变 化的判断, 并通过简单的排序测试排序时间。 由于, 表的大小直接 体现了数据库的压缩比, 压缩比越高空间利用率越高; 排序时间的 长短体现了消耗 CPU 内存资源的多少, 列存储能够提高排序效率。 因此, 控制器检测存储格式的转换是否合理, 需要判断存储格式转 换后的数据库的压缩比是否满足第一预设条件, 并对存储格式转换 后的数据库中的数据进行排序, 测试排序时间是否满足第二预设条 件。  Specifically, the controller judges the size change of the table that has been converted into a column in the database, and tests the sorting time by a simple sorting. Because the size of the table directly reflects the compression ratio of the database, the higher the compression ratio, the higher the space utilization; the length of the sorting time reflects the amount of CPU memory resources consumed, and the column storage can improve the sorting efficiency. Therefore, the controller detects whether the conversion of the storage format is reasonable, and needs to determine whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and sorts the data in the database after the storage format conversion, and tests whether the sorting time is satisfied. The second preset condition.
其中, 第一预设条件为存储格式转换后的数据库的压缩比小于 或等于第一预设阈值, 第二预设条件为排序时间小于或等于第二预 设阈值。  The first preset condition is that the compression ratio of the database after the storage format conversion is less than or equal to the first preset threshold, and the second preset condition is that the sorting time is less than or equal to the second preset threshold.
S 2 08、 若压缩比满足第一预设条件, 且排序时间满足第二预设 条件, 控制器则进行对外服务。  S 2 08. If the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the controller performs external service.
具体的, 若压缩比满足第一预设条件, 且排序时间满足第二预 设条件, 则说明当前存储格式的转换能够使得数据库系统的性能得 到提升, 控制器则进行对外服务。  Specifically, if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, it indicates that the current storage format conversion can improve the performance of the database system, and the controller performs external service.
进一步地, 进行对外服务后, 控制器需要进一步观察釆集相应 指标, 判断数据库系统性能是否优化, 即数据库系统中是否需要进 一步分拆成列, 或者避免无需分割的列被分割。  Further, after performing external services, the controller needs to further observe the corresponding indicators to determine whether the performance of the database system is optimized, that is, whether the database system needs to be further split into columns, or to avoid splitting the columns without division.
其中,控制器釆集的指标包括数据吞吐量和查询语句响应时间。 数据吞吐量体现了是否减少了冗余数据读取, 若数据吞吐量的变化 不明显则说明有可能需要进一步分拆成列; 查询语句响应时间为判 断列存储是否有效的直接指标。  Among them, the indicators collected by the controller include data throughput and query statement response time. The data throughput reflects whether the redundant data reading is reduced. If the data throughput is not obvious, it may need to be further split into columns. The query response time is a direct indicator to determine whether the column storage is valid.
S 2 09、 若压缩比不满足第一预设条件, 和 /或排序时间不满足第 二预设条件, 控制器则根据用户待设定的核心指标阈值重新执行下 ―次数据存储格式的转换。  S 2 09. If the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset condition, the controller re-executes the conversion of the next-order data storage format according to the core index threshold to be set by the user. .
若压缩比不满足第一预设条件,和 /或排序时间不满足第二预设 条件, 则说明控制器确定的第二存储格式不能使得数据库系统的 ' f生 能得到有效提升, 控制器需要根据用户待设定的核心指标阈值重新 执行下一次数据存储格式的转换重新确定数据的第二存储格式。 If the compression ratio does not satisfy the first preset condition, and/or the sorting time does not satisfy the second preset The condition indicates that the second storage format determined by the controller cannot effectively improve the data of the database system, and the controller needs to re-execute the conversion of the next data storage format according to the core index threshold to be set by the user. The second storage format.
示例性的, 若系统初始化时, 列访问比为 9 0%时, 控制器将该 列转换为单独列存储, 但是, 在转换后发现系统性能无提升, 控制 器则反馈将列访问比由 9 0%提高至 9 1 % , 用户根据该反馈信息重新设 定列访问比的值, 且控制器根据用户新设定的列访问比重新确定第 二存储格式。  Exemplarily, if the column access ratio is 90% when the system is initialized, the controller converts the column into a separate column storage. However, after the conversion, the system performance is not improved, and the controller feedbacks the column access ratio by 9. 0% is increased to 91%, the user resets the value of the column access ratio according to the feedback information, and the controller re-determines the second storage format according to the column access ratio newly set by the user.
S 2 1 0、 若系统性能指标不满足转换条件, 控制器则保持数据库 中存储数据的格式为第一存储格式。  S 2 1 0. If the system performance indicator does not satisfy the conversion condition, the controller keeps the format of the stored data in the database as the first storage format.
本发明实施例提供一种数据存储格式的转换方法, 若以第一存 储格式存储数据的数据库的系统性能指标满足用户设定的转换条 件, 控制器则确定数据库中存储数据所需的第二存储格式, 并将数 据库中的数据的存储格式从第一存储格式转换为第二存储格式, 然 后, 控制器判断存储格式转换后的数据库的压缩比是否满足第一预 设条件, 并对存储格式转换后的数据库中的数据进行排序, 测试排 序时间是否满足第二预设条件, 若压缩比满足第一预设条件, 且排 序时间满足第二预设条件, 则进行对外服务; 或者, 若压缩比不满 足第一预设条件, 和 /或排序时间不满足第二预设条件, 则重新根据 所述反馈信息中的用户待设定的核心指标阈值执行确定数据库中存 储数据所需的第二存储格式。 通过该方案, 控制器通过对系统实际 运行数据的监测, 不断确定数据库系统中数据的最优存储格式, 即 控制器能够使数据库系统根据负载情况动态确定数据库系统中数据 的存储格式, 解决了 目前在改变数据库存储格式时, 需要数据库管 理员手动离线修改, 系统存储空间和利用率低的问题, 通过自决策 数据存储格式, 降低了查询语句的吞吐量, 同时提升系统的存储空 间和利用率。  An embodiment of the present invention provides a data storage format conversion method. If a system performance indicator of a database storing data in a first storage format satisfies a conversion condition set by a user, the controller determines a second storage required to store data in the database. Formatting, converting the storage format of the data in the database from the first storage format to the second storage format, and then, the controller determines whether the compression ratio of the database after the storage format conversion satisfies the first preset condition, and converts the storage format After the data in the database is sorted, whether the test sorting time satisfies the second preset condition, if the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, the external service is performed; or, if the compression ratio is If the first preset condition is not met, and/or the sorting time does not satisfy the second preset condition, the second storage required to determine the data stored in the database is performed according to the core indicator threshold to be set by the user in the feedback information. format. Through the scheme, the controller continuously determines the optimal storage format of the data in the database system by monitoring the actual running data of the system, that is, the controller can enable the database system to dynamically determine the storage format of the data in the database system according to the load situation, and solve the current When changing the database storage format, the database administrator needs to manually modify the offline, the system storage space and low utilization rate, through the self-decision data storage format, reduce the throughput of the query statement, and at the same time improve the storage space and utilization of the system.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁, 仅以上述各功能模块的划分进行举例说明, 实际应用中, 可以根据 需要而将上述功能分配由不同的功能模块完成, 即将装置的内部结 构划分成不同的功能模块, 以完成以上描述的全部或者部分功能。 上述描述的系统, 装置和单元的具体工作过程, 可以参考前述方法 实施例中的对应过程, 在此不再赘述。 It will be clearly understood by those skilled in the art that for the convenience and brevity of the description, only the division of the above functional modules is illustrated. In practical applications, The above functions are allocated by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the system, the device and the unit described above, refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中, 应该理解到, 所揭露的装置 和方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施 例仅仅是示意性的, 例如, 所述模块或单元的划分, 仅仅为一种逻 辑功能划分, 实际实现时可以有另外的划分方式, 例如多个单元或 组件可以结合或者可以集成到另一个系统, 或一些特征可以忽略, 或不执行。  In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be used. Combined or can be integrated into another system, or some features can be ignored, or not executed.
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围 并不局限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技 术范围内, 可轻易想到变化或替换, 都应涵盖在本发明的保护范围 之内。 因此, 本发明的保护范围应以所述权利要求的保护范围为准。  The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

权 利 要 求 书 claims
1、 一种控制器, 其特征在于, 包括: 1. A controller, characterized in that it includes:
决策单元,用于若以第一存储格式存储数据的数据库的系统性能 指标满足用户设定的转换条件, 则确定所述数据库中存储数据所需的 第二存储格式; A decision-making unit, used to determine the second storage format required to store data in the database if the system performance index of the database that stores data in the first storage format meets the conversion conditions set by the user;
存储格式转换单元,用于将所述数据库中的数据的存储格式从所 述第一存储格式转换为所述决策单元确定的所述第二存储格式; A storage format conversion unit configured to convert the storage format of data in the database from the first storage format to the second storage format determined by the decision-making unit;
反馈单元,用于判断存储格式转换后的数据库的压缩比是否满足 第一预设条件, 并对所述存储格式转换后的数据库中的数据进行排 序, 测试排序时间是否满足第二预设条件; 若所述压缩比满足第一预 设条件, 且所述排序时间满足第二预设条件, 则进行对外服务, 或者, 若所述压缩比不满足第一预设条件, 和 /或所述排序时间不满足第二 预设条件, 则发送反馈信息至所述决策单元, 以使得所述决策单元根 据所述反馈信息中的用户待设定的核心指标阈值重新确定所述数据 库中存储数据所需的第二存储格式。 A feedback unit, used to determine whether the compression ratio of the database after the storage format conversion meets the first preset condition, sort the data in the database after the storage format conversion, and test whether the sorting time meets the second preset condition; If the compression ratio satisfies the first preset condition, and the sorting time satisfies the second preset condition, external services are performed, or, if the compression ratio does not satisfy the first preset condition, and/or the sorting If the time does not meet the second preset condition, feedback information is sent to the decision-making unit, so that the decision-making unit re-determines the data required to store data in the database based on the core indicator threshold to be set by the user in the feedback information. Second storage format.
2、 根据权利要求 1所述的控制器, 其特征在于, 2. The controller according to claim 1, characterized in that,
所述决策单元,还用于在确定所述数据库中存储数据所需的第二 存储格式之前, 判断所述系统性能指标是否满足用户设定的核心指标 阈值; 若所述系统性能指标满足所述核心指标阈值, 则判断所述系统 性能指标是否满足所述转换条件。 The decision-making unit is also configured to determine whether the system performance index meets the core index threshold set by the user before determining the second storage format required to store data in the database; if the system performance index meets the If the core index threshold is determined, it is determined whether the system performance index meets the conversion condition.
3、 根据权利要求 2所述的控制器, 其特征在于, 3. The controller according to claim 2, characterized in that,
所述决策单元, 还用于若所述系统性能指标不满足所述转换条 件, 则保持所述数据库中存储数据的格式为所述第一存储格式。 The decision-making unit is also configured to maintain the format of the data stored in the database as the first storage format if the system performance index does not meet the conversion condition.
4、 根据权利要求 1 - 3 中任一项所述的控制器, 其特征在于, 所 述控制器还包括数据釆集单元, 4. The controller according to any one of claims 1 to 3, characterized in that the controller further includes a data collection unit,
所述数据釆集单元, 还用于釆集所述系统性能指标。 The data collection unit is also used to collect the system performance indicators.
5、 根据权利要求 1 - 4 中任一项所述的控制器, 其特征在于, 所述决策单元,还用于在确定所述数据库中存储数据所需的第二 存储格式之后, 根据用户配置信息, 确定将所述数据库中的数据的存 储格式从所述第一存储格式转换为所述第二存储格式的转换时刻。 5. The controller according to any one of claims 1 to 4, characterized in that, the decision-making unit is also configured to determine the second storage format required to store data in the database according to user configuration. information to determine the storage of data in the database The storage format is converted from the first storage format to the second storage format at a conversion time.
6、 根据权利要求 5所述的控制器, 其特征在于, 6. The controller according to claim 5, characterized in that,
所述存储格式转换单元,具体用于根据所述决策单元确定的所述 第二存储格式和所述转换时刻, 在緩冲区中将所述数据库中的数据重 组, 若所述緩冲区中的数据量达到磁盘写阈值, 则将所述緩冲区中的 数据写入磁盘。 The storage format conversion unit is specifically configured to reorganize the data in the database in the buffer according to the second storage format and the conversion time determined by the decision-making unit. If the data in the buffer is When the amount of data reaches the disk write threshold, the data in the buffer is written to the disk.
7、 根据权利要求 6所述的控制器, 其特征在于, 7. The controller according to claim 6, characterized in that,
所述存储格式转换单元,具体用于若所述緩冲区中的数据的所述 第二存储格式为单列存储, 则将所述緩冲区中的数据的存储格式从所 述第一存储格式转换为所述单列存储, 以及压缩并存储所述緩冲区中 的数据; 或者, 若所述緩冲区中的数据的所述第二存储格式为行列混 合存储或行存储, 则将所述緩冲区中的数据的存储格式从所述第一存 储格式转换为所述行列混合存储或所述行存储, 并存储所述緩冲区中 的数据。 The storage format conversion unit is specifically configured to change the storage format of the data in the buffer from the first storage format if the second storage format of the data in the buffer is single-column storage. Convert to the single column storage, and compress and store the data in the buffer; or, if the second storage format of the data in the buffer is mixed row and column storage or row storage, then convert the The storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or the row storage, and the data in the buffer is stored.
8、 根据权利要求 6 所述的控制器, 其特征在于, 所述控制器还 包括读取单元; 8. The controller according to claim 6, wherein the controller further includes a reading unit;
所述读取单元,用于在所述存储格式转换单元根据所述第二存储 格式和所述转换时刻, 在緩冲区中将所述数据库中的数据重组之前, 若所述第一存储格式为行存储, 则按行读取所述数据库中的数据。 The reading unit is configured to reorganize the data in the database in the buffer according to the second storage format and the conversion time by the storage format conversion unit, if the first storage format If it is row storage, the data in the database is read row by row.
9、 根据权利要求 1 - 8 中任一项所述的控制器, 其特征在于, 所述数据釆集单元, 还用于在若所述压缩比满足第一预设条件, 且所述排序时间满足第二预设条件, 则进行对外服务之后, 釆集所述 存储格式转换后的数据库的数据吞吐量及查询语句响应时间。 9. The controller according to any one of claims 1 to 8, characterized in that the data collection unit is also configured to: if the compression ratio meets the first preset condition, and the sorting time If the second preset condition is met, after external services are provided, the data throughput and query statement response time of the database after the storage format conversion are collected.
1 0、 根据权利要求 1 - 9 中任一项所述的控制器, 其特征在于, 所 述系统性能指标至少包括数据量、 查询平均访问数据量、 处理行数占 读取行数比例、 查询平均访问的列比、 以及查询语句所占比例。 10. The controller according to any one of claims 1 to 9, characterized in that the system performance indicators include at least the amount of data, the average amount of query data accessed, the ratio of the number of processed rows to the number of read rows, the query The average column ratio accessed, and the proportion of query statements.
1 1、 一种控制器, 其特征在于, 包括: 1 1. A controller, characterized in that it includes:
处理器,用于若以第一存储格式存储数据的数据库的系统性能指 标满足用户设定的转换条件, 则确定所述数据库中存储数据所需的第 二存储格式; 在格式转换器将所述数据库中的数据的存储格式从所述 第一存储格式转换为所述第二存储格式后, 判断存储格式转换后的数 据库的压缩比是否满足第一预设条件, 并对所述存储格式转换后的数 据库中的数据进行排序, 测试排序时间是否满足第二预设条件; 若所 述压缩比满足第一预设条件, 且所述排序时间满足第二预设条件, 则 进行对外服务, 或者, 若所述压缩比不满足第一预设条件, 和 /或所 述排序时间不满足第二预设条件, 则根据用户待设定的核' ^指标阈值 重新确定所述数据库中存储数据所需的第二存储格式; A processor configured to determine the third data required for storing data in the database if the system performance index of the database that stores data in the first storage format meets the conversion conditions set by the user. Two storage formats; After the format converter converts the storage format of the data in the database from the first storage format to the second storage format, it is determined whether the compression ratio of the database after the storage format conversion meets the first predetermined value. Set conditions, sort the data in the database after the storage format conversion, and test whether the sorting time meets the second preset condition; if the compression ratio meets the first preset condition, and the sorting time meets the second preset condition If the preset conditions are met, external services will be provided, or, if the compression ratio does not meet the first preset conditions, and/or the sorting time does not meet the second preset conditions, then the core index to be set by the user will be The threshold value re-determines a second storage format required for storing data in the database;
格式转换器,用于将所述数据库中的数据的存储格式从所述第一 存储格式转换为所述处理器确定的所述第二存储格式。 A format converter, configured to convert the storage format of data in the database from the first storage format to the second storage format determined by the processor.
1 2、 根据权利要求 1 1所述的控制器, 其特征在于, 12. The controller according to claim 11, characterized in that,
所述处理器,还用于在确定所述数据库中存储数据所需的第二存 储格式之前, 判断所述系统性能指标是否满足用户设定的核心指标阈 值; 若所述系统性能指标满足所述核心指标阈值, 则判断所述系统性 能指标是否满足所述转换条件。 The processor is also configured to determine whether the system performance index meets the core index threshold set by the user before determining the second storage format required to store data in the database; if the system performance index meets the If the core index threshold is determined, it is determined whether the system performance index meets the conversion condition.
1 3、 根据权利要求 1 2所述的控制器, 其特征在于, 13. The controller according to claim 12, characterized in that,
所述处理器, 还用于若所述系统性能指标不满足所述转换条件, 则保持所述数据库中存储数据的格式为所述第一存储格式。 The processor is further configured to maintain the format of data stored in the database as the first storage format if the system performance index does not meet the conversion condition.
14、 根据权利要求 1 1 - 1 3 中任一项所述的控制器, 其特征在于, 所述控制器还包括数据釆集器; 14. The controller according to any one of claims 1 1 to 1 3, characterized in that, the controller further includes a data collector;
所述数据釆集器, 用于釆集所述系统性能指标。 The data collector is used to collect the system performance indicators.
1 5、 根据权利要求 1 1 - 1 4 中任一项所述的控制器, 其特征在于, 所述处理器,还用于在确定所述数据库中存储数据所需的第二存 储格式之后, 根据用户配置信息, 确定将所述数据库中的数据的存储 格式从所述第一存储格式转换为所述第二存储格式的转换时刻。 15. The controller according to any one of claims 11 to 14, characterized in that, the processor is further configured to, after determining the second storage format required to store data in the database, According to the user configuration information, the conversion time for converting the storage format of the data in the database from the first storage format to the second storage format is determined.
1 6、 根据权利要求 1 5所述的控制器, 其特征在于, 16. The controller according to claim 15, characterized in that,
所述格式转换器,具体用于根据所述第二存储格式和所述转换时 刻, 在緩冲区中将所述数据库中的数据重组, 若所述緩冲区中的数据 量达到磁盘写阈值, 则将所述緩冲区中的数据写入磁盘。 The format converter is specifically used to reorganize the data in the database in the buffer according to the second storage format and the conversion time, if the amount of data in the buffer reaches the disk write threshold , then the data in the buffer is written to disk.
1 7、 根据权利要求 1 6所述的控制器, 其特征在于, 17. The controller according to claim 16, characterized in that,
所述格式转换器,具体用于若所述緩冲区中的数据的所述第二存 储格式为单列存储, 则将所述緩冲区中的数据的存储格式从所述第一 存储格式转换为所述单列存储, 以及压缩并存储所述緩冲区中的数 据; 或者, 若所述緩冲区中的数据的所述第二存储格式为行列混合存 储或行存储, 则将所述緩冲区中的数据的存储格式从所述第一存储格 式转换为所述行列混合存储或所述行存储, 并存储所述緩冲区中的数 据。 The format converter is specifically configured to convert the storage format of the data in the buffer from the first storage format if the second storage format of the data in the buffer is single column storage. For the single column storage, and compress and store the data in the buffer; or, if the second storage format of the data in the buffer is mixed row and column storage or row storage, then the buffer is The storage format of the data in the buffer is converted from the first storage format to the row-column mixed storage or the row storage, and the data in the buffer is stored.
1 8、 根据权利要求 1 6所述的控制器, 其特征在于, 18. The controller according to claim 16, characterized in that,
所述处理器,还用于在所述存储格式转换器根据所述第二存储格 式和所述转换时刻, 在緩冲区中将所述数据库中的数据重组之前, 若 所述第一存储格式为行存储, 则按行读取所述数据库中的数据。 The processor is also configured to, before the storage format converter reorganizes the data in the database in the buffer according to the second storage format and the conversion time, if the first storage format If it is row storage, the data in the database is read row by row.
1 9、 根据权利要求 1 1 - 1 8 中任一项所述的控制器, 其特征在于, 所述数据釆集器, 还用于在若所述压缩比满足第一预设条件, 且 所述排序时间满足第二预设条件, 则进行对外服务之后, 釆集所述存 储格式转换后的数据库的数据吞吐量及查询语句响应时间。 19. The controller according to any one of claims 11 to 18, characterized in that the data collector is also used to: if the compression ratio meets the first preset condition, and the If the sorting time satisfies the second preset condition, after external services are provided, the data throughput and query statement response time of the database after the storage format conversion are collected.
2 0、 根据权利要求 1 1 - 1 9 中任一项所述的控制器, 其特征在于, 所述系统性能指标至少包括数据量、 查询平均访问数据量、 处理行数 占读取行数比例、 查询平均访问的列比、 以及查询语句所占比例。 20. The controller according to any one of claims 11 to 19, characterized in that the system performance indicators at least include data volume, query average access data volume, and the ratio of the number of processed rows to the number of read rows. , the average column ratio accessed by the query, and the proportion of query statements.
2 1、 一种数据存储格式的转换方法, 其特征在于, 包括: 步骤 a: 若以第一存储格式存储数据的数据库的系统性能指标满 足用户设定的转换条件, 则确定所述数据库中存储数据所需的第二存 储格式; 2 1. A data storage format conversion method, characterized by including: Step a: If the system performance index of the database that stores data in the first storage format meets the conversion conditions set by the user, then determine whether the data stored in the database is stored in the first storage format. The secondary storage format required for the data;
步骤 b : 将所述数据库中的数据的存储格式从所述第一存储格式 转换为所述第二存储格式; Step b: Convert the storage format of the data in the database from the first storage format to the second storage format;
步骤 c : 判断存储格式转换后的数据库的压缩比是否满足第一预 设条件, 并对所述存储格式转换后的数据库中的数据进行排序, 测试 排序时间是否满足第二预设条件; Step c: Determine whether the compression ratio of the database after the storage format conversion meets the first preset condition, sort the data in the database after the storage format conversion, and test whether the sorting time meets the second preset condition;
步骤 d : 若所述压缩比满足所述第一预设条件, 且所述排序时间 满足所述第二预设条件, 则进行对外服务; 或者, 若所述压缩比不满 足所述第一预设条件, 和 /或所述排序时间不满足所述第二预设条件, 则根据用户待设定的核心指标阈值重新执行上述步骤。 Step d: If the compression ratio meets the first preset condition, and the sorting time If the second preset condition is met, external services will be provided; or, if the compression ratio does not meet the first preset condition, and/or the sorting time does not meet the second preset condition, then according to Re-execute the above steps for the core indicator threshold to be set by the user.
22、 根据权利要求 21 所述的数据存储格式的转换方法, 其特征 在于, 所述确定所述数据库中存储数据所需的第二存储格式之前, 所 述方法还包括: 22. The data storage format conversion method according to claim 21, characterized in that, before determining the second storage format required to store data in the database, the method further includes:
判断所述系统性能指标是否满足用户设定的核心指标阈值; 若所述系统性能指标满足所述核心指标阈值,则判断所述系统性 能指标是否满足所述转换条件。 Determine whether the system performance indicator meets the core indicator threshold set by the user; if the system performance indicator meets the core indicator threshold, determine whether the system performance indicator meets the conversion condition.
2 3、 根据权利要求 22 所述的数据存储格式的转换方法, 其特征 在于, 23. The data storage format conversion method according to claim 22, characterized in that:
若所述系统性能指标不满足所述转换条件,则保持所述数据库中 存储数据的格式为所述第一存储格式。 If the system performance index does not meet the conversion condition, the format of the data stored in the database is kept as the first storage format.
24、根据权利要求 2 0- 2 3 中任一项所述的数据存储格式的转换方 法, 其特征在于, 所述方法还包括: 24. The data storage format conversion method according to any one of claims 20-23, characterized in that the method further includes:
釆集所述系统性能指标。 Collect the system performance indicators.
25、根据权利要求 2 0- 24 中任一项所述的数据存储格式的转换方 法, 其特征在于, 在确定所述数据库中存储数据所需的第二存储格式 之后, 所述方法还包括: 25. The data storage format conversion method according to any one of claims 20 to 24, characterized in that, after determining the second storage format required to store data in the database, the method further includes:
根据用户配置信息,确定将所述数据库中的数据的存储格式从所 述第一存储格式转换为所述第二存储格式的转换时刻。 According to the user configuration information, the conversion time to convert the storage format of the data in the database from the first storage format to the second storage format is determined.
26、 根据权利要求 25 所述的数据存储格式的转换方法, 其特征 在于, 所述将所述数据库中的数据的存储格式从所述第一存储格式转 换为所述第二存储格式, 具体包括: 26. The method of converting data storage format according to claim 25, characterized in that: converting the storage format of data in the database from the first storage format to the second storage format specifically includes: :
根据所述第二存储格式和所述转换时刻,在緩冲区中将所述数据 库中的数据重组; Reorganize the data in the database in the buffer according to the second storage format and the conversion moment;
若所述緩冲区中的数据量达到磁盘写阈值,则将所述緩冲区中的 数据写入磁盘。 If the amount of data in the buffer reaches the disk write threshold, the data in the buffer is written to the disk.
27、 根据权利要求 26 所述的数据存储格式的转换方法, 其特征 在于, 所述将所述緩冲区中的数据写入磁盘, 具体包括: 若所述緩冲区中的数据的所述第二存储格式为单列存储,则将所 述緩冲区中的数据的存储格式从所述第一存储格式转换为所述单列 存储, 以及压缩并存储所述緩冲区中的数据; 或者, 27. The data storage format conversion method according to claim 26, characterized by Specifically, writing the data in the buffer to the disk specifically includes: if the second storage format of the data in the buffer is single-column storage, then writing the data in the buffer to the disk. Convert the storage format from the first storage format to the single column storage, and compress and store the data in the buffer; or,
若所述緩冲区中的数据的所述第二存储格式为行列混合存储或 行存储, 则将所述緩冲区中的数据的存储格式从所述第一存储格式转 换为所述行列混合存储或所述行存储, 并存储所述緩冲区中的数据。 If the second storage format of the data in the buffer is mixed row and column storage or row storage, convert the storage format of the data in the buffer from the first storage format to the mixed row and row storage format. Store or the row store, and store the data in the buffer.
28、 根据权利要求 26 所述的数据存储格式的转换方法, 其特征 在于, 所述根据所述第二存储格式和所述转换时刻, 在緩冲区中将所 述数据库中的数据重组之前, 所述方法还包括: 28. The data storage format conversion method according to claim 26, characterized in that, before reorganizing the data in the database in the buffer according to the second storage format and the conversion time, The method also includes:
若所述第一存储格式为行存储, 则按行读取所述数据库中的数 据。 If the first storage format is row storage, read the data in the database row by row.
29、根据权利要求 2 0- 28 中任一项所述的数据存储格式的转换方 法, 其特征在于, 所述若所述压缩比满足第一预设条件, 且所述排序 时间满足第二预设条件, 则进行对外服务之后, 所述方法还包括: 釆集所述存储格式转换后的数据库的数据吞吐量及查询语句响 应时间。 29. The data storage format conversion method according to any one of claims 20 to 28, characterized in that if the compression ratio satisfies a first preset condition and the sorting time satisfies a second preset condition, Assuming the condition, after providing external services, the method further includes: collecting the data throughput and query statement response time of the database after the storage format conversion.
30、根据权利要求 2 0- 29 中任一项所述的数据存储格式的转换方 法, 其特征在于, 所述系统性能指标至少包括数据量、 查询平均访问 数据量、 处理行数占读取行数比例、 查询平均访问的列比、 以及查询 语句所占比例。 30. The data storage format conversion method according to any one of claims 20 to 29, characterized in that the system performance indicators at least include data volume, query average access data volume, number of processed rows to read rows The number ratio, the average column ratio accessed by the query, and the proportion of query statements.
PCT/CN2014/073576 2014-03-18 2014-03-18 Method and apparatus for conversion of data storage formats WO2015139193A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201480000190.5A CN105378716B (en) 2014-03-18 2014-03-18 A kind of conversion method and device of data memory format
PCT/CN2014/073576 WO2015139193A1 (en) 2014-03-18 2014-03-18 Method and apparatus for conversion of data storage formats

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/073576 WO2015139193A1 (en) 2014-03-18 2014-03-18 Method and apparatus for conversion of data storage formats

Publications (1)

Publication Number Publication Date
WO2015139193A1 true WO2015139193A1 (en) 2015-09-24

Family

ID=54143626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/073576 WO2015139193A1 (en) 2014-03-18 2014-03-18 Method and apparatus for conversion of data storage formats

Country Status (2)

Country Link
CN (1) CN105378716B (en)
WO (1) WO2015139193A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092624A (en) * 2016-12-28 2017-08-25 北京小度信息科技有限公司 Date storage method, apparatus and system
WO2020034757A1 (en) * 2018-08-16 2020-02-20 腾讯科技(深圳)有限公司 Data processing method and device, storage medium, and electronic device
US20220300508A1 (en) * 2018-04-19 2022-09-22 Risk Management Solutions, Inc. Data storage system for providing low latency search query responses

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111064976B (en) * 2018-10-17 2022-01-04 武汉斗鱼网络科技有限公司 Method for sending live broadcast information and server
CN111198859B (en) * 2018-11-16 2023-11-03 北京微播视界科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN110275677B (en) * 2019-05-22 2022-04-12 华为技术有限公司 Hard disk format conversion method and device and storage equipment
CN110162563B (en) * 2019-05-28 2023-11-17 深圳市网心科技有限公司 Data warehousing method and system, electronic equipment and storage medium
CN112579597B (en) * 2020-12-15 2023-03-21 西安邮电大学 Compression-sensitive database file storage method and system
CN115470235A (en) * 2021-06-11 2022-12-13 华为技术有限公司 Data processing method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102495905A (en) * 2011-12-23 2012-06-13 天津神舟通用数据技术有限公司 Packing method based on line storage database engine
US20120254252A1 (en) * 2011-03-31 2012-10-04 International Business Machines Corporation Input/output efficiency for online analysis processing in a relational database
CN103345518A (en) * 2013-07-11 2013-10-09 清华大学 Self-adaptive data storage management method and system based on data block

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120254252A1 (en) * 2011-03-31 2012-10-04 International Business Machines Corporation Input/output efficiency for online analysis processing in a relational database
CN102495905A (en) * 2011-12-23 2012-06-13 天津神舟通用数据技术有限公司 Packing method based on line storage database engine
CN103345518A (en) * 2013-07-11 2013-10-09 清华大学 Self-adaptive data storage management method and system based on data block

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092624A (en) * 2016-12-28 2017-08-25 北京小度信息科技有限公司 Date storage method, apparatus and system
US20220300508A1 (en) * 2018-04-19 2022-09-22 Risk Management Solutions, Inc. Data storage system for providing low latency search query responses
WO2020034757A1 (en) * 2018-08-16 2020-02-20 腾讯科技(深圳)有限公司 Data processing method and device, storage medium, and electronic device
US11636083B2 (en) 2018-08-16 2023-04-25 Tencent Technology (Shenzhen) Company Limited Data processing method and apparatus, storage medium and electronic device

Also Published As

Publication number Publication date
CN105378716B (en) 2019-03-26
CN105378716A (en) 2016-03-02

Similar Documents

Publication Publication Date Title
WO2015139193A1 (en) Method and apparatus for conversion of data storage formats
US11449481B2 (en) Data storage and query method and device
Lu et al. Frequency based chunking for data de-duplication
US9305041B2 (en) Compression of serialized B-tree data
US8108442B2 (en) System for compression and storage of data
US9305040B2 (en) Efficient B-tree data serialization
US20130124796A1 (en) Storage method and apparatus which are based on data content identification
WO2012041110A1 (en) Method and device for data comparison
US20130275364A1 (en) Concurrent OLAP-Oriented Database Query Processing Method
WO2013155751A1 (en) Concurrent-olap-oriented database query processing method
US8510280B2 (en) System, method, and computer-readable medium for dynamic detection and management of data skew in parallel join operations
US20170068675A1 (en) Method and system for adapting a database kernel using machine learning
CN111339103B (en) Data exchange method and system based on full-quantity fragmentation and incremental log analysis
CN105630810B (en) A method of mass small documents are uploaded in distributed memory system
CN106033324B (en) Data storage method and device
CN112988916B (en) Full and incremental synchronization method, apparatus and storage medium for Clickhouse
CN108415671B (en) Method and system for deleting repeated data facing green cloud computing
KR101656750B1 (en) Method and apparatus for archiving and searching database with index information
CN110908608A (en) Storage space saving method and system
Wang et al. pLSM: A highly efficient LSM-tree index supporting real-time big data analysis
US9460749B2 (en) I/O balance processing method and device
CN111522870B (en) Database access method, middleware and readable storage medium
CN108733808A (en) Big data software systems switching method, system, terminal device and storage medium
EP3550451A1 (en) Data storage and maintenance method and device, and computer storage medium
CN105677853A (en) Data storage method and device based on big data technology framework

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14886658

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14886658

Country of ref document: EP

Kind code of ref document: A1