WO2020034757A1 - 数据处理方法和装置、存储介质及电子装置 - Google Patents

数据处理方法和装置、存储介质及电子装置 Download PDF

Info

Publication number
WO2020034757A1
WO2020034757A1 PCT/CN2019/092459 CN2019092459W WO2020034757A1 WO 2020034757 A1 WO2020034757 A1 WO 2020034757A1 CN 2019092459 W CN2019092459 W CN 2019092459W WO 2020034757 A1 WO2020034757 A1 WO 2020034757A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
page
column
data
query
Prior art date
Application number
PCT/CN2019/092459
Other languages
English (en)
French (fr)
Inventor
李海翔
叶盛
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19850450.8A priority Critical patent/EP3757815A4/en
Publication of WO2020034757A1 publication Critical patent/WO2020034757A1/zh
Priority to US17/014,967 priority patent/US11636083B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • G06F16/2315Optimistic concurrency control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Definitions

  • This application relates to the field of computers, and in particular, to data processing technologies.
  • Historical data in the database usually needs to be cleared by related operations.
  • the above-mentioned data processing method will cause the historical data in the database to be missing, thereby causing a problem that it is difficult to trace the historical data.
  • the embodiments of the present application provide a data processing method and device, a storage medium, and an electronic device, so as to solve the technical problem that it is difficult to trace historical data in related data processing technologies.
  • a data processing method is provided, which is applied to an electronic device and includes: obtaining at least one target row to be cleared at a target time in a data table of a row-type storage database; The target attribute value recorded on the target row is stored to a target page in a columnar storage database; after the target time is reached, the at least one target row is cleared.
  • a data processing device including: a first obtaining unit, configured to obtain at least one target row to be cleared at a target time in a data table of a row-type storage database; A storage unit is configured to store a target attribute value recorded on the at least one target row to a target page in a columnar storage database; and a clearing unit is configured to clear the at least one target row after the target time is reached.
  • a storage medium stores a computer program, and the computer program is configured to execute the above-mentioned data processing method when running.
  • an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the foregoing by using the computer program.
  • Data processing methods including a processor, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the foregoing by using the computer program.
  • the historical data is retained. Specifically, obtain at least one target row in the data table of the row-type storage database to be cleared at the target time; store the target attribute value recorded on the at least one target row to the target page in the columnar storage database; and reach the target After time, clear multiple target lines.
  • the data to be cleared in the row storage database is dumped to the column storage database, and the purpose of saving historical data in the database is achieved, thereby achieving the technical effect of ensuring the complete data trajectory. , To solve the technical problems in related data processing technology difficult to trace historical data.
  • FIG. 1 is a schematic diagram of an application environment of a data processing method according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of an optional data processing method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an optional dump transition page according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another optional dump transition page according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another optional data processing method according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another optional data processing method according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another optional data processing method according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another optional data processing method according to an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another optional data processing method according to an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of another optional data processing method according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an optional data processing apparatus according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an optional electronic device according to an embodiment of the present application.
  • a data processing method is provided.
  • the data processing method may be applied to, but not limited to, an application environment as shown in FIG. 1.
  • the user equipment 104 used by the user 102 includes: a RAM 106 and a processor 108.
  • the user 102 may use the user equipment 104 to send a query request 110 to the query system 114 through the network 112.
  • the search engine 116 in the query system 114 includes an indexing engine 118 and a ranking engine 120.
  • the query system may query the row storage database 122 and the column storage database 124 according to the query request 110 to obtain a query result 126, and return the query result 126 to the user equipment 104 through the network 112.
  • each node device in the system can obtain at least one target row in the data table of the row storage database 122 to be cleared at the target time; the target attributes recorded on the at least one target row Values are stored to the target page in the columnar database 124; after the target time is reached, at least one target row is cleared.
  • the user equipment 104 may include, but is not limited to, a mobile phone, a tablet computer, a desktop computer, etc .
  • the query system 114 may include, but is not limited to, at least one of the following: a distributed database system (where each node device adopts this application Data processing methods), relational database systems based on Multi-Version Concurrency Control (MVCC for short), non-relational database systems based on MVCC, etc.
  • the above network may include, but is not limited to, a wireless network and a wired network.
  • the wireless network includes: Bluetooth, WIFI, and other networks that implement wireless communication.
  • Wired networks can include, but are not limited to, local area networks, metropolitan area networks, and wide area networks.
  • the above query system may include, but is not limited to, at least one of the following: a PC and other devices for computing services. The above is only an example, and this embodiment is not limited in any way.
  • the foregoing data processing method is applied to an electronic device.
  • the electronic device may be a terminal device or a server.
  • the data processing Methods can include:
  • S202 Obtain at least one target row in a data table of the row-type storage database to be cleared at a target time.
  • the above data processing method may be applied to, but not limited to, the following application scenarios: recording user behavior, recording account changes, recording stock transaction records, recording weather monitoring data, and other scenarios that require recording of data change history.
  • the server can obtain at least one to be cleared at the target time from the data table of the line storage database (stores the user's account information).
  • Target row historical account change information
  • the value can be recorded on at least one of the target pages in the columnar database; after the target time is reached, at least one target row is cleared.
  • the data to be dumped in the row storage database may be located in the memory of the node device.
  • the row storage database stores the data in the data table (such as the latest version of the data) in the form of the row storage.
  • the row storage database may include, but is not limited to, PostgreSQL, MySQL, and the like.
  • data is updated by means of periodic triggering or event triggering.
  • the type of each attribute stored in the row may be inconsistent, and different column widths (ie, column sizes) need to be assigned to different attributes. Due to the existence of inconsistent attribute types, row alignment needs to be guaranteed.
  • the life cycle trajectory of the data may be identified by the state attributes of the data.
  • the life cycle of the data can be divided into three stages. Each stage describes the different state attributes of the data to identify the state in the life cycle track of the data.
  • the state attributes corresponding to the three stages are:
  • Historical state The state of the data at the historical stage, called the historical state.
  • the value of historical data is the old value, not the current value.
  • Historical states There can be multiple historical states of a data item, which reflects the process of data state changes. Historical data can only be read and cannot be modified or deleted.
  • Transitional state The state of the data in the transition from the current state to the historical state, called the transitional state.
  • the data in the transition state (called half-decay data) is neither the latest version of the data item nor the historical version, but is in the process of transitioning from the current state to the historical state.
  • full-state data a data item with three states is called full-state data.
  • MVCC Multi-Version Concurrency Control
  • all three states of the data exist, while under the non-MVCC mechanism, the data only exists in the historical state and the current state.
  • account table Account (ID, int, Namechar (8), Balance, int, Note).
  • the account table contains 4 attribute columns: account number, name, balance, and remarks.
  • This table is used to record changes in user account balances.
  • One change in balance will generate a record (corresponding to a row in the account table).
  • One user's existing data is (10, James, 1000, Create account).
  • the user had a balance change, and the account balance decreased by 100.
  • “consume 100” was noted.
  • the database needs to perform an UPDATE (update) operation.
  • the latest version of data stored in the row database is (10, James, 900, consume 100), which is the current state data.
  • (10, James, 1000, Create account) is the transition state data.
  • (10, James, 1000, Create account) is historical data.
  • the data may have bi-temporal attributes: valid time attributes, transaction time attributes.
  • the valid time attribute represents the time attribute of the object represented by the data.
  • the data has its temporal attribute, that is, when the database system performs what kind of operation, an operation is encapsulated as a transaction in the database system, and the transaction has atomicity. Sex.
  • a transaction ID can be used to identify the transactional temporal attributes of a piece of data. From a formal point of view, the effective time attribute and transaction time attribute are represented by ordinary user-defined fields in the data model, but are described by specific keywords for constraint checking and assignment by the database engine.
  • At least one target row in the data table of the row-type storage database to be cleared at the target time may be obtained, that is, one target in the data table of the row-type storage database to be cleared may be obtained. It is also possible to obtain multiple target rows to be cleared from the data table of the row-type storage database. Generally, multiple target rows need to be obtained.
  • the target row to be cleared in the data table of the row storage database can be identified by setting a to-be-cleared identifier; the target row to be cleared can also be placed in a specific storage location to Identify the target row to be cleared in the data table of the row storage database; in addition, you can also identify the target row to be cleared in the data table of the row storage database by other means.
  • the historical data (historical data, that is, the target row to be cleared) in the row storage database can be cleared in various ways.
  • the clearing operation may be performed periodically (periodically), or may be triggered by an event (receiving a clear command).
  • the target time is determined by the way of clearing historical data, which is not limited in this embodiment.
  • PostgreSQL executes the VACUUM operation
  • the expired tuples are cleared for each table according to the VM file.
  • the MySQL Purge thread scans the history list of MVCC and clears outdated data that has no other transaction references and does not need to be rolled back.
  • the VACVAC operation of PostgreSQL and the Purge operation of MySQL are performed periodically by default. For historical data of information changes, for example, data related to accounting issues, it is as important as current data. Therefore, historical data is also expected to be saved, not cleared.
  • the operation of obtaining the target line to be cleared may be performed before the target line is cleared. That is, before the target line to be cleared is cleared, the target line to be cleared is obtained first, and then the operation of clearing the target line to be cleared is performed.
  • Obtaining the target row to be cleared can be based on a timing mechanism (periodic acquisition), and the acquisition operation is started periodically. The timing period can be dynamically adjusted as a parameter, which is not limited in this embodiment.
  • PostgreSQL can perform the dump process before the VACUUM operation.
  • the VACUUM operation does not clear the historical version but dumps the historical version.
  • MySQL can perform the dump process before the Purge operation.
  • the data to be dumped in the row storage database may be located in a memory of a target device (for example, a network element node).
  • a target device for example, a network element node.
  • the location of the row storage database is not specifically limited in this embodiment.
  • the target attribute value recorded on the target column of the target row may be stored to one or more target pages in the columnar storage database, where the target row recorded on the same target column of the target row
  • the target attribute value is recorded on at least one target page among a plurality of target pages in the columnar storage database.
  • the target column may be an attribute column in the data table of the row database.
  • the multiple target columns in this application may be all the attribute columns of the data table or a subset of all the attribute columns of the data table. .
  • Multiple target columns can be specified by a target parameter.
  • the types of attributes recorded on different target columns can be the same or different.
  • the column widths assigned to target columns of different attribute types may be the same or different.
  • the column widths assigned to target columns of the same attribute type may be the same or different.
  • the setting method of the specific attribute column can be set as required, which is not specifically limited in this embodiment.
  • multiple target columns can be all columns in the account table, or part of the columns (such as ID, Balance, Note ).
  • the target attribute value recorded on the target column is read from the obtained target row.
  • the target attribute values corresponding to different target rows on the same target column may be the same or different.
  • the persistent data portion of the columnar storage database may be located in the external storage (for example, a disk) of the target device, and other portions of data may be located in the memory of the target device.
  • the location of the column database is not specifically limited in this embodiment.
  • the column storage database stores a large amount of data, and an excellent data storage method is the cornerstone for achieving efficient use of space and improving query speed.
  • the column storage uses a segment-page management structure, which can effectively use concepts such as table spaces and continuously store column data in external storage during the dump. This makes it easy to perform column-specific calculations on column stores.
  • multiple target columns in the multiple target rows are stored in the target page of the column database, multiple records in the multiple target columns of the multiple target rows are recorded on the same target column.
  • the target attribute value may be recorded on at least one of the target pages in the inventory database.
  • the target attribute value recorded on at least one target row may be stored in a target page in a columnar database in multiple ways.
  • target attribute values recorded on different target columns in multiple target rows may be directly stored in one or more pages in a columnar storage database.
  • the target attribute values recorded on multiple target columns of multiple target rows can be written to a page in the target page in sequence, and after the page is full, then write Go to another page. You can also store target attribute values recorded on different target columns in multiple target rows to different pages on the target page.
  • first target attribute value for a target attribute value (first target attribute value) recorded on each target column in a plurality of target rows, in a case where part of the target attribute value in the first target attribute value fills the first page of the target page , Storing other target attribute values in the first target attribute value other than a part of the target attribute values that have been written in the first page on the second page of the target page.
  • the page may not be full, which may cause storage Waste of space.
  • the non-full can be loaded from the external storage Page into memory, save new data to pages that are not full.
  • the target attribute values recorded on the target columns of multiple target rows may be stored to a dump transition page, where the dump transition page is used to transfer the attribute values recorded on the target columns.
  • a dump transition page can be set in the memory, and the attribute value recorded on the target column can be set by using the dump transition page. The transition from row database to column database.
  • a dump operation may be performed by dumping a transition page in multiple ways.
  • the dump can be performed by dumping the target attribute value recorded on the target column of the same dump transition page (for example, the dump transition page is in the form of page A shown in FIG. 3), or by Dump the transition page by dumping the target attribute values recorded on different target columns (for example, the dump transition page is in the form of page B as shown in Figure 4, page B can be considered a variant of page A ).
  • the dump transition page can be directly dumped to the target page, or multiple dump transition pages can be compressed by the method of estimated compression, and the compressed dump transition page can be dumped to the target page. .
  • the target attribute values recorded on the target columns of multiple target rows can be stored in the dump transition page on a row-by-row basis (the same as the data stored in the data table). Before, during, or after storing the target attribute values recorded on the target columns of multiple target rows by row in the dump transition page, you can determine whether the predetermined conditions are met and execute the target page if the predetermined conditions are met. The operation of dumping attribute values.
  • the attribute value may be determined whether the attribute value meets the first condition, and if it is satisfied, the attribute value that meets the first condition in the dump transition page is stored on a page in the target page of the inventory database, and the first condition may include But it is not limited to: the data amount of the attribute values of the first K rows recorded in the dump transition page is less than or equal to the target threshold, and the data amount of the attribute values of the first (K + 1) rows is greater than the target threshold.
  • the target threshold may be The size of the page is set; after the first L row attribute values recorded in the dump transition page are compressed separately by column, it is estimated that the total amount of compressed data obtained by the compression is less than or equal to the target threshold, and the first (L + 1) row attribute values After performing compression by column, it is estimated that the total amount of compressed data obtained by the compression is greater than the target threshold, where K and L are both positive integers greater than or equal to 1.
  • the dump transition page records 100 rows of attribute values, and each row of data has 5 attribute values. If the total data amount of the first 20 rows of attribute values is less than 2KB (target threshold, the size of the external memory page), and the total data amount of the first 21 rows of attribute values is greater than 2KB, the attribute values of the first 20 rows are stored in a column database Page. For another example, if the attribute values of the first 80 rows are compressed separately by columns (5 columns are estimated separately), the total amount of compressed data estimated is less than 2KB, and the attribute values of the first 81 rows are compressed by columns. If the amount of data is greater than 2KB, the attribute values of the first 80 rows are compressed by column and stored in a page of the columnar database.
  • 2KB target threshold, the size of the external memory page
  • the total data amount of the first 21 rows of attribute values is greater than 2KB
  • the attribute values of the first 20 rows are stored in a column database Page.
  • the attribute values of the first 80 rows are compressed separately by columns (5 columns are estimated separately)
  • the second condition may include, but is not limited to, after compressing the attribute values of the first M rows recorded in the dump transition page by column, it is estimated that the largest amount of compressed data among the compressed data amounts obtained by the compression is less than or equal to the target threshold.
  • the (M + 1) row attribute values are compressed separately in columns, it is estimated that the largest amount of compressed data in each compressed data amount obtained by the compression is greater than the target threshold, where M is a positive integer greater than or equal to 1.
  • the dump transition page records 100 rows of attribute values, and each row of data has 5 attribute values. If the attribute values of the first 90 rows are compressed separately in columns, the largest amount of compressed data is estimated to be less than 2KB among the compressed data amounts obtained by the compression. The largest amount of compressed data is greater than 2KB, then the first 90 rows of attribute values are compressed by column and stored in 5 pages in the column database.
  • target attribute values recorded on each target column of multiple target rows may be respectively stored in a dump transition page corresponding to the target column.
  • Each dump transition page records the attribute values recorded by a target column, and each target column can correspond to one or more dump transition pages.
  • the size of the dump transition page corresponding to different target columns is the same, but the amount of data that the dump transition page can hold is related to the attribute type corresponding to each target column.
  • the attribute types corresponding to each target column are numbers, characters, Strings, etc., in general, the same dump transition page can hold more numbers than strings.
  • the format and size of the dump transition page is the same as the format and size of the target page.
  • the format of the dump transition page may include a page body part for recording the attribute value of the target column, and the page body part may specifically include at least one of the following: a page header and a page footer, where the page header is used to indicate the target identifier Corresponding identification value range, the target identification is the identification corresponding to the attribute value recorded in the dump transition page; the footer is used to verify the dump transition page.
  • the format of the dump transition page is the same as the format of the list page.
  • the format of the dump transition page is the default design format of the column page: there are multiple columns of information in a page, called page A.
  • the format of the dump transition page is an optional design format for the column page: there is only one column of information in a page, called page B.
  • Page B and page A are not structurally different, except that page A contains multiple columns of information, and page B contains only a single column of information.
  • the design of page A is more in line with the habit of saving. This makes it easier to save the data to be dumped and the efficiency of the dump is better.
  • page B needs to first split the dumped data.
  • the efficiency of the dump is relatively low. It is possible to avoid frequent cross-pages and high query efficiency.
  • the user can first adjust the type of the selected dump transition page through parameters. By default, the format of page A can be used.
  • the format of a dump transition page can include three parts: header, body, and footer.
  • Header (Column header): The header is designed as an adaptive header.
  • the system default page header contains: XID_min and XID_max (XID, that is, the transaction ID), which can uniquely identify the correspondence between the attribute values of the same target row, that is, the Identifies the column version corresponding to the attribute value).
  • XID_min and XID_max XID, that is, the transaction ID
  • the former represents the smallest XID of all column versions on this page, and the latter represents the largest XID of all column versions on this page.
  • the XID information on the page header can be replaced with corresponding index information, such as ID_Max and ID_Min.
  • a column memory index can be constructed for the column memory page (target page), which is convenient for quickly positioning column information.
  • Page body a column version (attribute value) containing one or more target columns among a plurality of target columns.
  • each column version is represented by a two-tuple ⁇ XID, value ⁇ , which indicates which transaction manipulated this value. If the user customizes the header information, then the tuple information will be replaced accordingly, for example, modified to ⁇ ID, value ⁇ .
  • the page body of page A contains multiple columns, and each column contains multiple column versions. Columns are stored in order from the beginning of the page to the end of the page. Each column, including the column ID, represents a specific column. Then there are multiple tuple columns, representing unique tuples. Each tuple column, including tupleID (tuple ID), represents a unique tuple column; y indicates that there are several versions of this tuple column.
  • the combination of tupleID, y, and column version can represent the historical change process of the attribute value of a column in a tuple in the data table.
  • the page body of the page B includes multiple column versions of one column, and each column version is stored in sequence from the page head to the page footer.
  • Each tuple column, including tupleID, represents a unique tuple column; y indicates that there are several versions of this tuple column.
  • the combination of tupleID, y, and column version can represent the historical change process of the attribute value of a column in a tuple in the data table.
  • the footer which is located at the bottom of the page, includes: page verification information and column information.
  • the column information is the column information of multiple columns, as shown in FIG. 4, and for the footer of page B, the column information is the column information of one column.
  • the column information contains the ID of the column and the offset of this column on this page.
  • Column information is stored from the page footer to the page head in order, forming a process from the two ends to the middle (the column version of the page body is stored from the page footer to the page footer, and column information is stored from the footer to the page head). Until the free space in the middle can no longer store the next column and column information.
  • a dump transition page can be established for each column in the memory data table, a dump transition page corresponds to a target column in the data table, and different attribute values in a target column can be located in different dumps Transition page.
  • a dump occurs (data is transferred from the row database to the column database)
  • the dump transition page is written first. If it is not full, the dump transition page continues to be written when the dump occurs again.
  • a collection consisting of dump transition pages that belong to the same table can be called a dump transition area.
  • the operation of transferring the attribute value to the target page is performed.
  • the already-filled dump may be filled.
  • the attribute values recorded in the transition page are stored in the third page of the target page, where the attribute values recorded in the dump transition page include the same target columns in multiple target rows that are written to the dump transition page Target attribute value.
  • one page of the column deposit database can be filled directly with the attribute values on the full dump transition page.
  • storing the attribute values recorded in the dump transition page to the third page in the target page includes: determining header information of the dump transition page, where the page header information is used to identify and dump the transition page The identification value range of the target identifier corresponding to the attribute value recorded in the target page; stores the header information and the attribute value recorded in the dump transition page to the third page in the target page.
  • the page header information can quickly determine the range of the target identifier corresponding to the attribute value of the target column stored in the third page, which facilitates rapid positioning during subsequent queries.
  • the target identifier may have multiple forms, including but not limited to: a version identifier and a constraint column, wherein the version identifier is used to uniquely identify a column version of multiple target columns, and the constraint column is a predetermined one of the multiple target columns. Column.
  • the header information may include: the version identifier corresponding to the attribute value recorded in the dump transition page. Maximum and minimum values.
  • the version identifier is used to uniquely identify a column version of a target column of multiple target rows, and the version identifier may include, but is not limited to, a transaction ID and a user-defined index identifier. Dump transition pages in memory and many column memory pages in memory can be managed using HASH because the XID is unique.
  • the target identifier in the header information of the dump transition page is constraint column information .
  • the header information may include: one or more key-value pairs, where the key-value pairs include the attribute value of the constraint column (the first target column in the multiple target rows) and the in-page offset corresponding to the attribute value of the constraint column,
  • the attribute value of the constraint column corresponds to the column version of the attribute value of the target column (the second target column in the multiple target rows) stored in the dump transition page, and the record in the dump transition page corresponding to the attribute value of the constraint column
  • the attribute value of the target column is continuously stored in the dump transition page, and the offset within the page is the offset of the storage location of the attribute value of the target column in the dump transition page, which is the attribute value of the agreed column
  • the offset is also the offset of the storage position of the attribute value of the target column relative to the storage position
  • the above dump transition page can be applied to data with a high degree of distribution aggregation.
  • the constraint column information (such as ID) is distributed in batches. As long as an ID is found, the IDs consistent with it are continuously distributed after it. , And a page may contain only a few IDs.
  • a weather station updates the temperature information every 5 minutes and summarizes it to the meteorological center. Now it is necessary to query the temperature changes monitored by a weather station during the day. This situation is related to the data distribution area.
  • the storage should not be in chronological order, but should be stored in accordance with the data distribution.
  • Item Map The data structure consisting of the data name (constraint element) and its offset address within the page is called Item Map. Item Map will be written in the header of the external storage.
  • the dump data can be written to the dump transition page for "interval write".
  • interval writing refers to: after writing a piece of historical data in the dump transition page, the positions of several rows are vacated for historical data (historical data corresponding to the same constraint value) of the same constraint (such as the primary key). Insert, and records with different constraints will be inserted after several blank lines.
  • the value of the row space is determined by the amount of space occupied by the first value of each interval.
  • the interval is set to a parameter value k, that is, the tolerance of k lines is tolerated.
  • the value of k is set according to the modification frequency of the application. The default value is 10.
  • a city's meteorological bureau has N weather observation stations, including a temperature table Temp (ID, location, char (8), Temperature, int) used to record real-time temperature, the recorded attributes include: observation station identification, location and temperature .
  • ID and Location can be used as constraint columns.
  • Figure 6 shows the dump transition pages with "ID”, “Location”, and "Temperature”).
  • ID 1 The data of the observation station with ID 2 is written after an interval of n lines, and the value of n is set by a parameter. In this way, the data belonging to the same constraint element can be aggregated together, and can be read sequentially in the query, which improves the query efficiency.
  • the attribute values recorded in the dump transition page can be directly copied to the ordinary column storage page
  • the compression ratio of each target column in the dump transition page may be estimated first, and the estimated After summing the compressed data amount of the target column, it is determined whether a dump can be performed according to the total compressed data amount.
  • the compression ratio of the dump transition page corresponding to each target column can be estimated, and the dump is performed according to each compression rate. .
  • the selection of the inventory page (the format of the dump transition page) can be determined before the dump begins.
  • a total amount of compressed data that is expected to be obtained after compressing data of each dump transition page using a target compression method may be determined, where each dump transition page stores an attribute value corresponding to a target column; the total compressed data When the amount meets the target condition, use the target compression method to compress multiple dump transition pages to obtain the total compressed data, where the target condition is: the total compressed data amount is less than or equal to the target threshold, and the total compressed data amount plus one conversion The amount of compressed data stored in the transition page is greater than the target threshold; the total compressed data is stored in the third page of the target page.
  • the state of the dump transition page can be monitored based on the compression estimation technology. For example, if the calculated estimated value cannot fill an external memory page, the dump transition page is extended to an Extend accordingly. .
  • the dump transition page For multiple dump transition pages corresponding to the same target column in memory, it can be expanded to an Extend (extension page, for example, an Extend is 8 dump transition page sizes), that is, there are n consecutive such transitions for the same column.
  • Extend is compressed and persisted (that is, the compressed data is written to pages in external memory) and stored as ordinary column memory pages. Before the Extend is compressed and stored, the header information contained in the Extend is recorded. This can improve compression efficiency and save storage space.
  • the size of the data compression rate is directly related to the data distribution in the dump transition page, so a unified standard cannot be used to determine when to compress and persist the data in the dump transition page to external storage , So you need to make a compression estimation first to ensure that you can fill the external memory pages as much as possible, and reduce cross-page reads when querying.
  • the information entropy theory can be used to make a more accurate estimation according to the data distribution in the dump transition page.
  • the Name column of the Account table if an Extend that stores Name data contains only two types of data, James and Alex, then only one binary bit is required, 1 for James and 0 for Alex. If the dump transition page that stores the Name data contains three types of data: James, Alex, and Bob, then two binary bits are required to represent it.
  • the probability that a character (or string) appears in the dump transition page is p
  • log 2 (1 / p) binary bits are required to replace the character (or character String).
  • the dump transition page is composed of n kinds of data, and the probability of each type of data appearing is p 1 , p 2 ... p n respectively.
  • an information table can be maintained in the memory for the data tables to be dumped, the data distribution of each dump transition page can be monitored in real time, the compression ratio of each dump transition page can be estimated, and expanded into an extended page (Extend ),
  • the extended page can be an actual page (write the header, page body, and footer information of each dumped transition page to the corresponding position of the extended page), or it can be a virtual page (identified according to the information table Dump transition page corresponding to the same extended page), when the theoretically compressed amount of data in an extended page can fill an external memory page, the extended page is compressed for persistence. The memory space occupied by the extended page is then released. However, the extended pages of an external memory page (external memory page, that is, a page in a column database) continue to reside in memory, waiting for the next dump.
  • a Map structure may also be maintained, which establishes a link between the table (column) currently being dumped and the corresponding page. Record the remaining space of the page of the corresponding dump transition page after this dump. For example, a certain information ⁇ t, 2k> in the Map indicates that after this dump, the page corresponding to table t (transfer Storage transition page) 2k of space left unused. Then, the dump thread looks up this Map before performing compression estimation. The lookup can have two results. If there is no table (column) information in the Map, it indicates that the table (column) is the first time to dump, or there is no space left on the page after the last dump. You can directly estimate the default page for this dump.
  • the full page here is not absolutely full, but a threshold (eg, 99%) is set, and the ratio of the current page occupied space to the total page is greater than or equal to the threshold (eg, the occupied space is greater than or equal to 99% of the total page), the page is considered full and the table information is deleted from the map.
  • a threshold eg, 99%
  • the corresponding Map information is the correspondence between the column and the free space of the corresponding page, such as ⁇ column1,2k>
  • the related operations are similar to those described above, and are not repeated here.
  • the target column with the largest total compressed data amount obtained after compression in the target column shall prevail.
  • Compress the dump transition page (Extend) corresponding to each target column and store them in a page in the target page.
  • the XID range of the Extend needs to be determined according to the XID_min / XID_max (version ID, or custom information such as ID_min / ID_max) provided in the page header, and stored in the compressed external storage.
  • XRange version ID, or custom information such as ID_min / ID_max
  • the Extend shown in FIG. 7 is an Extend after being compressed and persisted, that is, an externally stored compressed page.
  • the address information of the header of the external storage and the key value of the Item map are loaded into the memory when the column storage system is started, and the column storage index is established to speed up the query process.
  • each target column in a plurality of target rows operations such as expanding pages, estimating the amount of compressed data, compression, and persistence operations can be separately performed, and a dump transition page corresponding to each target column is provided. Does not affect each other.
  • the pages in the column store database store data in a manner similar to that of FIG. 7 and FIG. 8, except that the range of the column store index in the page header or the page offset in the key-value pair may be different.
  • the target line may be cleared.
  • query information for data querying the data table can also be received.
  • the data in the data table can be stored in the data table of the row storage database and the target page of the column storage database, and can also be stored in the data page (for example, PostgreSQL) or the rollback segment (MySQL).
  • the target page in the column database and the data table (or the data page or rollback segment) of the row database can be queried in order to obtain the correspondence with the query information.
  • the row storage database and the column storage database can both be located in memory, or the row storage database can be located in memory, and the column storage database can be located in external storage.
  • the data in the data table can be stored in the data table of the row database, the target page of the dump transition page, and the column database. It can also be stored in a data page (for example, PostgreSQL) or in a rollback segment (MySQL).
  • a data page for example, PostgreSQL
  • MySQL rollback segment
  • the target page in the column storage database, the data table of the row storage database, and the dump transition page may be sequentially queried.
  • Row storage databases and dump transition pages can be located in memory, and column storage databases can be located in external storage.
  • the received query information includes the query value (specific value, or range value) of the target identifier, obtaining the row storage index, column storage index, and dump transition page of the data table
  • the row storage index is an index of the row storage data stored in the row table by the data table
  • the column storage index is an index value index of the target identifier stored in each page of the target page, and the target identifier corresponds to the attribute value of the target column
  • Use the query value to sequentially query the column index, row index, and dump transition page to determine the target location of the target data corresponding to the query information; from the determined target location, obtain the query result corresponding to the query information; Output the obtained query results.
  • the row storage index, column storage index, and dump transition page of the data table can be obtained by the following steps: the storage address of the data table can be obtained, for example, the data of the data table can be obtained from the metadata of the data table in the data dictionary Storage address; load the data table (data table in the row memory database) into the data cache area and obtain the row memory index of the data table; obtain the dump transition page and column memory index (the dump transition page and column memory index can Resident in memory).
  • the column index can include, but is not limited to, the index and key-value pair of the version identifier.
  • the query value can be used to search in the column memory index and the row memory index. If it is found on the column memory index, the corresponding column memory page is found according to the column memory index, and the data is read out in this page. If it exists on the line storage index, according to the position pointed by the line storage index, it traverses the page of the line storage format and reads the data; it traverses the dump transition page, and if it exists, it reads the data.
  • the column memory index may be searched first, and then the row memory index may be searched.
  • the SQL statement may be given a Hint indication to determine which index to search first.
  • the column storage index such as XRange
  • the row storage index and the dump transition page can be queried in sequence until the corresponding query result is found.
  • you can query the column storage index such as Item Map), the row storage index, and the dump transition page in order to find all the corresponding query results.
  • the SQL query SELECT Name FROM FROM Account WHERE XID ⁇ 20AND XID> 10 is executed.
  • Data query based on constraint columns has good support for regularly generated data, such as meteorological information, and regularly collected and updated information from IoT nodes, but poor support for irregular data updates.
  • Step 1 (indicated by the arrow labeled 1): Based on the strategy selected by the user, the data is periodically written to the dump transition page.
  • Step 2 (indicated by the arrow labeled 2): Use the compression estimation mechanism to implement the dump transition page or Extend to persist the external storage. And establish XRange or Item Map index.
  • Step 3 (indicated by an arrow labeled 3):
  • the query is performed on the row or dump transition page and column memory according to the SQL Hint. By default, the query is performed on the dump transition page and column memory.
  • the use of the XID (or other index) range of each tuple in the column store to manage the dump transition page can effectively improve the addressing speed.
  • XRange and ItemMap based on the compression estimation mechanism Method to ensure that the query column storage process does not decompress unrelated compressed pages, improving query performance.
  • the upper-layer application system can read the latest data in the row storage database, and the analysis system can perform data analysis based on the column storage to obtain valuable information. Application systems and analysis systems do not affect each other, making full use of the value of data.
  • a target row in a data table of a row-type storage database to be cleared at a target time is obtained; a target attribute value recorded on a target column of at least one target row is stored to a target page in the column-type storage database, Among them, the target attribute value recorded on the same column in the target row is recorded on at least one of the target pages in the columnar database; after the target time is reached, the target row is cleared, and the historical data in the database is reached.
  • the purpose of preservation ensures the complete history of data changes.
  • storing a target attribute value recorded on a target column of at least one target row to a target page in a columnar storage database includes:
  • the target attribute values recorded on different columns in the target column are stored on different pages in the target page, and the attribute values recorded on the same target column can be stored on different pages, so that the target can be reasonably planned Column attribute values are stored for easy management of the target page.
  • storing target attribute values recorded on multiple target columns of at least one target row to a target page in a columnar storage database includes:
  • the target attribute value on the target column is recorded by dumping the transition page, and the dump is performed only when the dump transition page is full, which can ensure that the pages in the target page are filled, and waste of storage space is avoided.
  • storing target attribute values recorded on the same target column in at least one target row to a dump transition page includes:
  • the header information includes: the maximum value and the minimum value of the version identifier corresponding to the attribute value recorded in the dump transition page, where the target identifier is a version identifier and the version identifier is used to uniquely identify multiple target rows.
  • an index of attribute values stored in pages in a columnar storage database is formed, which facilitates management of a target page.
  • storing the attribute values recorded in the dump transition page to the third page in the target page includes:
  • the target compression method uses the target compression method to compress each dump transition page in the plurality of dump transition pages to obtain total compressed data, where the target condition is: total compressed data The amount is less than or equal to the target threshold, and the total amount of compressed data plus the amount of compressed data of a dump transition page is greater than the target threshold;
  • the total amount of compressed data after compression is estimated by using attribute values in multiple dump transition pages corresponding to the same column in the multi-target row.
  • the above method further includes:
  • the data tables in the row storage database and the target pages in the column storage database are respectively queried to ensure the comprehensiveness of the query results.
  • the above method further includes:
  • the data tables in the row storage database, the target pages in the column storage database, and the dump transition pages are respectively queried to ensure the comprehensiveness of the query results.
  • the above method further includes:
  • the row memory index is an index of the row memory data stored in the data table in the row storage database, and the column memory index is related to each target page in the target page.
  • the index of the identification value of the target identifier corresponding to the stored attribute value of the multiple target columns;
  • the query information including the query value corresponding to the target identifier, the column storage index, the row storage index, and the dump transition page are searched respectively, thereby ensuring the efficiency of the query and the comprehensiveness of the query results.
  • the processor of the network element node dumps the historical data of the data table in the row storage database to the dump transition page through step S1002.
  • the data in the dump transition page is stored in the column storage page.
  • the query information is received.
  • the query information is used to query the column storage index, the row storage index, and the dump transition page to obtain the query result.
  • the obtained query result is output through step S1010.
  • a data processing apparatus for implementing a data processing method is also provided. As shown in FIG. 11, the apparatus includes:
  • a first obtaining unit 1102 is configured to obtain at least one target row to be cleared at a target time in a data table of a row-type storage database
  • a storage unit 1104 configured to store a target attribute value recorded on at least one target row to a target page in a columnar storage database
  • a clearing unit 1106 is configured to clear at least one target line after reaching the target time.
  • historical data in a database is cleared by using a clear operation.
  • historical data in the database is missing, making it difficult to trace historical data.
  • at least one target row in the data table of the row-type storage database to be cleared at the target time is obtained; and the target attribute value recorded on the at least one target row is stored in the target page in the column-type storage database; After the target time is reached, at least one target row is cleared to save the historical data in the database, to ensure the integrity of the historical data changes, and to solve the technical problem of historical data that is difficult to trace in related data processing technologies.
  • the first obtaining unit 1102 may be used to execute step S202
  • the storage unit 1104 may be used to execute the foregoing step S204
  • the clearing unit 1106 may be used to execute step S206.
  • the optional execution methods are not described in detail here.
  • the storage unit 1104 includes:
  • a first storage module configured to store target attribute values recorded on the same target column in multiple target rows to a dump transition page, wherein the dump transition page is used to dump attribute values recorded on the target column To the target page of the columnar database;
  • the second storage module is configured to: when all or part of the target attribute values recorded on the same target column in multiple target rows are filled with the dump transition page, The attribute value is stored in the third page in the target page.
  • the target attribute value on the target column is recorded by using the dump transition page, and the dump can be performed only when the dump transition page is full, thereby ensuring that the pages in the target page are filled and the storage space is avoided. waste.
  • the second storage module includes:
  • a first determining submodule configured to determine header information of a dump transition page, where the header information is used to identify an identification value range of a target identifier corresponding to an attribute value recorded in the dump transition page;
  • the first storage submodule is configured to store page header information and attribute values recorded in the dump transition page to a third page in the target page.
  • the page header information includes: the maximum value and the minimum value of the version identifier corresponding to the attribute value recorded in the dump transition page, wherein the target identifier is a version identifier and the version identifier is used to uniquely identify multiple target rows.
  • the column version of the target column; or, the header information includes: one or more key-value pairs, where the key-value pair includes the attribute value of the first column in the target column of the multiple target rows and the page corresponding to the attribute value of the first column Internal offset, where the attribute value of the first column corresponds to the column version of the attribute value of the second column stored in the dump transition page, and the attribute value of the first column and the attribute value of the second column are on the dump transition page
  • the offset within the page is the offset in the dump transition page where the attribute value of the second column is stored.
  • an index of attribute values stored in pages in a columnar storage database is formed, which facilitates management of a target page.
  • the second storage module includes:
  • a second determination sub-module is used to determine a total amount of compressed data that is expected to be obtained after compressing data of each dump transition page in a plurality of dump transition pages by using a target compression method, where multiple dump transitions Each dump transition page in the page stores attribute values corresponding to the same target column in multiple target rows, and multiple dump transition pages include dump transition pages;
  • a compression submodule which is used to compress each dump transition page in a plurality of dump transition pages using a target compression method to obtain total compressed data when the total amount of compressed data meets a target condition, where:
  • the target conditions are: the total amount of compressed data is less than or equal to the target threshold, and the total amount of compressed data plus the amount of compressed data of a dump transition page is greater than the target threshold;
  • a second storage submodule configured to store the total compressed data in a third page of the target page.
  • the total amount of compressed data after compression is estimated by attribute values in multiple dump transition pages corresponding to the same column in the target column.
  • the storage unit 1104 includes:
  • a third storage module configured to store target attribute values recorded on different target columns in multiple target rows of multiple target rows to different pages in the target page, where records are recorded on the same target column in multiple target rows
  • the target attribute values recorded on the same target column in multiple target columns of multiple target rows are excluded, except for some target attribute values.
  • the other target attribute values for are stored on the second page of the target page.
  • the target attribute values recorded on different columns in the target column are stored on different pages in the target page, and the attribute values recorded on the same target column can be stored on different pages, so that the target can be reasonably planned Column attribute values are stored for easy management of the target page.
  • the above device further includes:
  • a receiving unit configured to receive query information for performing a data query on a data table after clearing at least one target row
  • a second obtaining unit configured to use the query information to sequentially query the target page in the columnar storage database and the data table of the row storage database to obtain a query result corresponding to the query information
  • the data tables in the row storage database and the target pages in the column storage database are respectively queried to ensure the comprehensiveness of the query results.
  • the above device further includes:
  • a first receiving unit configured to receive query information for querying a data table after clearing at least one target row
  • a first query unit configured to use the query information to sequentially query the target page in the columnar database, the data table in the row database, and the dump transition page to obtain query results corresponding to the query information
  • a first output unit configured to output a query result.
  • the data tables in the row storage database, the target pages in the column storage database, and the dump transition pages are respectively queried to ensure the comprehensiveness of the query results.
  • the foregoing apparatus further includes:
  • a second receiving unit configured to receive query information for performing data query on a data table after at least one target row is cleared, wherein the query information includes a query value corresponding to the target identifier;
  • the third obtaining unit is configured to obtain a row memory index, a column memory index, and a dump transition page, wherein the row memory index is an index of the row memory data stored in the data table in the row storage database, and the column memory index An index of an identification value of a target identifier corresponding to the attribute values of multiple target columns stored on each target page in the target page;
  • a second query unit which is used to sequentially query the column index, the row index, and the dump transition page using the query value to determine the target location where the query result corresponding to the query information is stored;
  • a fourth obtaining unit configured to obtain a query result corresponding to the query information by using the target position
  • the second output unit is configured to output the query result.
  • the query information including the query value corresponding to the target identifier, the column storage index, the row storage index, and the dump transition page are searched respectively, thereby ensuring the efficiency of the query and the comprehensiveness of the query results.
  • the storage media may include: a flash disk, a read-only memory (ROM), a random access device (Random Access Memory, RAM), a magnetic disk, or an optical disk.
  • a storage medium stores a computer program, and the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
  • the foregoing storage medium may be configured to store a computer program for performing the following steps:
  • a person of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be performed by a program instructing related hardware of a terminal device, and the program may be stored in a
  • the computer-readable storage medium may include: a flash disk, a read-only memory, a random access device, a magnetic disk, or an optical disk.
  • an electronic device for implementing the above data processing method is also provided.
  • the electronic device includes a processor 1202, a memory 1204, a transmission device 1206, and the like.
  • a computer program is stored in the memory, and the processor is configured to execute the steps in any one of the foregoing method embodiments through the computer program.
  • the foregoing electronic device may be located in at least one network device among a plurality of network devices in a computer network.
  • the foregoing processor may be configured to execute the following steps by a computer program:
  • FIG. 12 is merely an illustration, and the electronic device may also be a server providing a query service.
  • FIG. 12 does not limit the structure of the electronic device.
  • the electronic device may further include more or fewer components (such as a network interface, etc.) than those shown in FIG. 12, or have a different configuration from that shown in FIG.
  • the memory 1204 may be used to store software programs and modules, such as program instructions / modules corresponding to the data processing method and device in the embodiments of the present application.
  • the processor 1202 executes the software programs and modules stored in the memory 1204 to execute each program.
  • a variety of functional applications and data processing, that is, the above data processing method is implemented.
  • the memory 1204 may include a high-speed random access memory, and may further include a non-volatile memory, such as one or more magnetic storage devices, a flash memory, or other non-volatile solid-state memory.
  • the memory 1204 may further include a memory remotely set relative to the processor 1202, and these remote memories may be connected to the terminal through a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the transmission device 1206 is used to receive or send data via a network.
  • Specific examples of the foregoing network may include a wired network and a wireless network.
  • the transmission device 1206 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices and routers through a network cable so as to communicate with the Internet or a local area network.
  • the transmission device 1206 is a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF radio frequency
  • the integrated unit in the above embodiment When the integrated unit in the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the above-mentioned computer-readable storage medium.
  • the technical solution of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium.
  • Several instructions are included to enable one or more computer devices (which may be personal computers, servers, or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the disclosed client can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be combined. Integration into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理方法和装置、存储介质及电子装置。其中,该方法包括:电子装置获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行(S202);将所述至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面(S204);在达到所述目标时间之后,清除所述至少一个目标行(S206)。该方法解决了相关数据处理技术中存在的难以追溯历史数据的技术问题。

Description

数据处理方法和装置、存储介质及电子装置
本申请要求于2018年08月16日提交中国专利局、申请号为2018109354781、申请名称为“数据处理方法和装置、存储介质及电子装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机领域,具体涉及数据处理技术。
背景技术
目前,通常采用数据库的方式对数据进行存储。对于数据库中的历史数据,通常需要采用相关操作将其清除。上述数据处理方式会使得数据库中的历史数据缺失,从而造成难以追溯历史数据的问题。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种数据处理方法和装置、存储介质及电子装置,以解决相关数据处理技术中难以追溯历史数据的技术问题。
根据本申请实施例的一个方面,提供了一种数据处理方法,应用于电子装置,包括:获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;将该至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;在达到所述目标时间之后,清除该至少一个目标行。
根据本申请实施例的另一方面,还提供了一种数据处理装置,包括:第一获取单元,用于获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;存储单元,用于将所述至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;清除单元,用于在达到所述目标时间之后,清除所述至少一个目标行。
根据本申请实施例的又一方面,还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述数据处理方法。
根据本申请实施例的又一方面,还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,上述处理器通过计算机程序执行上述的数据处理方法。
在本申请实施例中,通过将行式存储数据库中待被清除的目标行转存至列式存储数据库中,实现历史数据的保留。具体的,获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;将至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;在达到目标时间之后,清除多个目标行。如此,基于上述行列转储技术,将行式存储数据库中待被清除的数据转储至列式存储数据库中,达到了保存数据库中历史数据的目的, 从而实现了保证数据变迁轨迹完整的技术效果,解决了相关数据处理技术中存在的难以追溯历史数据的技术问题。
附图说明
此处所说的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的一种数据处理方法的应用环境的示意图;
图2是根据本申请实施例的一种可选的数据处理方法的流程示意图;
图3是根据本申请实施例的一种可选的转储过渡页的示意图;
图4是根据本申请实施例的另一种可选的转储过渡页的示意图;
图5是根据本申请实施例的另一种可选的数据处理方法的示意图;
图6是根据本申请实施例的又一种可选的数据处理方法的示意图;
图7是根据本申请实施例的又一种可选的数据处理方法的示意图;
图8是根据本申请实施例的又一种可选的数据处理方法的示意图;
图9是根据本申请实施例的另一种可选的数据处理方法的流程示意图;
图10是根据本申请实施例的又一种可选的数据处理方法的流程示意图;
图11是根据本申请实施例的一种可选的数据处理装置的结构示意图;
图12是根据本申请实施例的一种可选的电子装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本申请实施例的一个方面,提供了一种数据处理方法。可选地,该数据处理方法可以但不限于应用于如图1所示的应用环境中。如图1所示,用户102使用的用户设备104包括:RAM 106和处理器108。用户102可以使用用户设备104通过网络112向查询系统114发送查询请求110。查询系统 114中的搜索引擎116包括:索引引擎118和排序引擎120。接收到查询请求110后,查询系统可以根据查询请求110对行式存储数据库122和列式存储数据库124进行查询,得到查询结果126,并将查询结果126通过网络112返回给用户设备104。
对于查询系统114上的数据处理,系统中的每个节点设备可以获取行式存储数据库122的数据表中在目标时间上待被清除的至少一个目标行;将至少一个目标行上记录的目标属性值存储至列式存储数据库124中的目标页面;在达到目标时间之后,清除至少一个目标行。
可选地,用户设备104可以包括但不限于:手机、平板电脑、台式电脑等;查询系统114可以包括但不限于以下至少一种:分布式数据库系统(其中的每个节点设备采用本申请中的数据处理方法)、基于多版本并发控制(Multi-Version Concurrency Control,简称为MVCC)的关系型数据库系统、基于MVCC的非关系型数据库系统等。上述网络可以包括但不限于无线网络、有线网络,其中,无线网络包括:蓝牙、WIFI及其他实现无线通信的网络。有线网络可以包括但不限于:局域网、城域网和广域网。上述查询系统可以包括但不限于以下至少之一:PC机及其他用于计算服务的设备。上述只是一种示例,本实施例中对此不做任何限定。
可选地,在本实施例中,作为一种可选的实施方式,如图2所示,上述数据处理方法应用于电子设备,该电子设备可以为终端设备,也可以为服务器,该数据处理方法可以包括:
S202,获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行。
S204,将至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面。
S206,在达到目标时间之后,清除至少一个目标行。
可选地,上述数据处理方法可以但不限于应用在以下应用场景中:记录用户行为、记录账户的账务变动、记录股票交易记录、记录气象监测数据等需要对数据变迁历史进行记录的场景。
以记录账户的账务变动的场景为例,对于用户的网上账户,用户会基于该网上账户进行充值、消费等活动,而如果账户信息只记录有当前的账户余额,那么,在日后账务出现问题时,很可能因用户无法查询该账户的历史账务变动,而难以获知账务问题出现的原因及时间。如果通过本申请中的数据处理方式将所有账户的变动信息均进行转储,那么该账户从开户到销户期间的所有的交易状态均会被记录下来,一旦出现账务问题,可以在第一时间根据记录的历史账务变动信息进行追溯、定位:具体实现时,服务器可以从行式存储数据库的数据表(存储有用户的账务信息)中,获取在目标时间上待被清除的至少一个目标行(历史账务变动信息);将至少一个目标行的目标列 上记录的目标属性值存储至列式存储数据库中的目标页面,其中,至少一个目标行中同一目标列上记录的目标属性值可以被记录在列式存储数据库中的目标页面中的至少一个页面上;在达到目标时间之后,清除至少一个目标行。
需要说明的是,在相关技术中,对于数据库中的历史数据,通常直接将其清除。然而,数据变迁历史在许多场景中都有重要意义,如银行监控储户的历史账单变化信息,气象部门监控天气变化信息,股市显示历史交易信息等。而采用上述相关技术中的数据处理方法,会使得数据库中的历史数据缺失,难以追溯历史数据。而在本申请中,通过获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;将至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;在达到目标时间之后,清除至少一个目标行,如此实现对数据库中的历史数据进行保存,保证历史数据变迁数据的完整,解决了数据处理技术中存在的难以追溯历史数据的技术问题。
可选地,行式存储数据库(也可被称为行存数据库)中待转储的数据可以位于节点设备的内存中。行存数据库以行存的形式保存数据表中的数据(如,最新版本的数据),行存数据库可以包括但不限于:PostgreSQL、MySQL等。在行存数据库中,通过周期触发或事件触发的方式进行数据更新。行存的每个属性的类型可能是不一致的,需要为不同的属性分配不同的列宽(即,列大小)。由于属性类型不一致的存在,需要保证行对齐。
可选地,可以通过数据的状态属性来标识数据的生命周期轨迹。可以将数据的生命周期分为三个阶段,每个阶段刻画数据的不同状态属性,以标识数据的生命周期轨迹中所处的状态,三个阶段对应的状态属性分别为:
(1)当前态(Current State):处于当前阶段的数据的状态,称为当前态,处于当前态的数据为数据项的最新版本。
(2)历史态(Historical state):处于历史阶段的数据的状态,称为历史态。历史态数据的值是旧值,不是当前值。一个数据项的历史态,可以有多个,反映了数据的状态变迁的过程。处于历史态的数据,只能被读取不能再被修改或删除。
(3)过渡态(Transitional State):处于由当前态向历史态转变的数据的状态,称为过渡态。处于过渡态的数据(称为半衰数据)即不是数据项的最新的版本也不是历史态版本,而是处于从当前态向历史态转变的过程中。
这三个状态涵盖了一个数据项的整个生命周期,合称为数据全态(full-state),具有三个状态的数据项称为全态数据。在多版本并发控制(Multi-Version Concurrency Control,MVCC)机制下,数据的三种状态均存在,而在非MVCC机制下,数据只存在历史态和当前态。
例如,有一张账户表Account(ID int,Name char(8),Balance int,Note text)。该账号表包含4属性列,分别为:账号、姓名、余额、备注。该表用于记录 用户账号余额的变动。一次的余额变动就会产生一条记录(对应于账户表中的一行)。有一用户现有数据为(10,James,1000,Create account)。在某一时刻该用户发生了一次余额变动,账户余额减少100,并且在Note中注明'consume 100'。那么,数据库需要执行一次UPDATE(更新)操作,行存数据库存储的最新版本的数据是(10,James,900,consume 100),为当前态数据。在最新版本的数据进行更新的过程,(10,James,1000,Create account)为过渡态数据。而在更新完成之后,(10,James,1000,Create account)为历史态数据。
可选地,数据可以具有双时态属性:有效时间属性、事务时间属性。有效时间属性表示数据表示的对象在时间属性上的情况。如,Kate上中学起止时间是2000-09-01到2003-07-30,即为有效时间。事务时间属性表示数据的某个状态的发生时刻,数据具有其时态属性,即,在何时数据库系统进行了什么样的操作,某项操作在数据库系统内被封装为事务,而事务具有原子性。可以采用事务标识来标识一个数据的事务时态属性。从形式上看,有效时间属性和事务时间属性,在数据模型中用普通的用户自定义字段进行表示,只是用特定的关键字加以描述,供数据库引擎进行约束检查和赋值。
可选地,在本实施例中,可以获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行,即可以获取行式存储数据库的数据表中待被清除的一个目标行,也可以获取行式存储数据库的数据表中待被清除的多个目标行,通常情况下,需要获取多个目标行。
具体的,在行存数据库中可以通过设置待清除标识的方式,来标识该行存数据库的数据表中待清除的目标行;还可以将通过将待清除的目标行放入特定存储位置,来标识该行存数据库的数据表中待清除的目标行;此外,也可以通过其他方式标识行存数据库的数据表中待清除的目标行。
例如,支持MVCC的数据库在进行数据更新操作时,会产生多个版本的旧数据,也就是全时态数据模型中的历史态数据,目前数据库管理系统的做法是定期删除。行存数据库在执行更新/删除(UPDATE/DELETE)操作时并不是直接将现有数据清除,而是做一个待清除标记。PostgreSQL会为每张表设置一个VM文件,用于标识过期元组。MySQL使用MVCC的history list(历史数据列表)来标识过期元组。PostgreSQL的多版本数据存放在数据页面中,MySQL的多版本数据存放在UNDO回滚段中。
可选地,可以采用多种方式清除行存数据库中的历史数据(历史态数据,即待被清除的目标行)。清除操作可以定期(周期)执行,也可以事件触发(接收到清空命令)执行等,目标时间由清除历史数据的方式决定,本实施例对此不作限定。
例如,PostgreSQL在执行VACUUM操作时,会根据VM文件为每张表清除过期元组,MySQL的Purge线程会扫描MVCC的history list,对没有其他事务引用且不需要回滚的过期数据进行清除。PostgreSQL的VACCUM操 作和MySQL的Purge操作默认定期执行。对于信息变迁的历史数据,例如,涉及账务问题的数据,其与当前数据同样重要,因此,历史数据也希望被保存下来,而不是被清除。
可选地,获取待被清除的目标行的操作可以在目标行被清除之前执行。即,可以在清除待被清除的目标行之前,先获取待被清除的目标行,再执行清除待被清除的目标行的操作。获取待被清除的目标行可以基于定时机制(周期获取),定时启动获取操作。定时的周期,可以作为参数进行动态调整,本实施例中对此不作限定。
例如,PostgreSQL可在VACUUM操作前执行转储过程,VACUUM操作执行的不是清除历史态版本而是转储历史态版本。而MySQL可在Purge操作前进行转储过程。
可选地,上述行存数据库待转储的数据可以位于目标设备(如,网元节点)的内存中。对于行存数据库的位置,本实施例不做具体限定。
可选地,在本实施例中,可以将目标行的目标列上记录的目标属性值,存储至列式存储数据库中的一个或多个目标页面,其中,目标行的相同目标列上记录的目标属性值被记录在列式存储数据库中的多个目标页面中的至少一个目标页面上。
可选地,目标列可以是行存数据库的数据表中的一个属性列,本申请中的多个目标列,可以是数据表的全部属性列,也可以是数据表的全部属性列的子集。多个目标列可以由目标参数指定。不同目标列上记录的属性类型可以相同,也可以不同。不同属性类型的目标列被分配的列宽可以相同,也可以不同,相同属性类型的目标列被分配的列宽可以相同,也可以不同。具体的属性列的设置方式,可以根据需要进行设定,本实施例中对此不作具体限定。
例如,对于账户表Account(ID int,Name char(8),Balance int,Note text),多个目标列可以是该账号表中的全部列,也可以是部分列(如,ID、Balance、Note)。
可选地,在获取到待清除的目标行之后,从获取的目标行中读取目标列上记录的目标属性值。相同目标列上不同目标行所对应的目标属性值可以相同,也可以不同。
可选地,上述列式存储数据库(列存数据库)持久化的数据部分可以位于目标设备的外存(例如,磁盘)中,其他部分数据可以位于目标设备内存。对于列存数据库的位置,本实施例不做具体限定。一般列存数据库都会存储超大规模的数据量,优良的数据存储方式是实现空间高效利用和提升查询速度的基石。列存使用段页式管理结构,可有效利用诸如表空间等概念,在转储时把列存数据连续地在外存进行存储。这样便于在列存上执行针对列的计算。
可选地,目标行可以有多个,在将多个目标行中的多个目标列存储至列存数据库的目标页面中时,多个目标行的多个目标列中相同目标列上记录的目标属性值可以被记录在列存数据库中的目标页面中的至少一个页面上。
可选地,可以采用多种方式将至少一个目标行上记录的目标属性值存储至列存数据库中的目标页面。
作为一种可选的实施方式,可以将多个目标行中不同目标列上记录的目标属性值直接存储至列式存储数据库中的一个或多个页面中。
可选地,可以按照数据表中列的顺序,将多个目标行的多个目标列上记录的目标属性值依次写入到目标页面中的一个页面,在该页面写满之后,再写入到另一页面。也可以将多个目标行中不同目标列上记录的目标属性值,分别存储到目标页面的不同页面中。
可选地,对于多个目标行中各目标列上记录的目标属性值(第一目标属性值),在第一目标属性值中的部分目标属性值写满目标页面的第一页面的情况下,将第一目标属性值中除了已被写入第一页面的部分目标属性值以外的其他目标属性值,存储至目标页面的第二页面上。
可选地,在将多个目标行的多个目标列上记录的目标属性值直接存储至列存数据库中的目标页面的一个页面中时,该页面可能会未被写满,这样会造成存储空间浪费。
可选地,在列存数据库位于外存的情况下,为充分利用存储空间(如,磁盘空间),保证列存数据库中目标页面的各页面被写满,可以从外存中加载未写满的页面进入内存,保存新数据至未写满的页面中。
作为另一种可选的实施例方式,可以将多个目标行的目标列上记录的目标属性值存储至转储过渡页,其中,转储过渡页用于将目标列上记录的属性值转储至列存数据库中的目标页面;将转储过渡页中记录的属性值存储至目标页面中。
可选地,在列存数据库位于外存时,为避免频繁地对外存进行读写操作,可以在内存中设置转储过渡页,利用该转储过渡页来进行目标列上记录的属性值由行存数据库至列存数据库的过渡。
可选地,可以采用多种方式通过转储过渡页进行转储操作。具体的,可以通过同一转储过渡页转储目标列上记录的目标属性值的方式进行转储(例如,转储过渡页为如图3所示的页面A的形式),也可以通过不同的转储过渡页转储不同目标列上记录的目标属性值的方式进行转储(例如,转储过渡页为如图4所示的页面B的形式,页面B可认为是页面A的一种变形)。进而,可以将转储过渡页直接转储至目标页面中,或者,也可以通过预估压缩的方式对多个转储过渡页进行压缩,并将压缩后的转储过渡页转储至目标页面。
作为一种可选的实施方式,可以将多个目标行的目标列上记录的目标属 性值,按行存储在转储过渡页中(与数据表中存储数据的方式相同)。在将多个目标行的目标列上记录的目标属性值按行存储在转储过渡页中之前、过程中或者之后,可以判断预定条件是否满足,并在预定条件满足的情况下执行向目标页面中转储属性值的操作。
可选地,可以判断属性值是否满足第一条件,如果满足,则将转储过渡页中满足第一条件的属性值存储至列存数据库的目标页面中的一个页面上,第一条件可以包括但不限于:转储过渡页中记录的前K行属性值的数据量小于或等于目标阈值,前(K+1)行属性值的数据量大于目标阈值,该目标阈值可以根据目标页面中一个页面的大小进行设定;对转储过渡页中记录的前L行属性值按列分别进行压缩后,估计压缩得到的总压缩数据量小于或等于目标阈值,前(L+1)行属性值按列分别进行压缩后,估计压缩得到的总压缩数据量大于目标阈值,其中,K和L均为大于或等于1的正整数。
例如,转储过渡页中记录了100行属性值,每行数据具有5个属性值。如果前20行属性值的总数据量小于2KB(目标阈值,外存页面的大小),前21行属性值的总数据量大于2KB,则将前20行的属性值存储至列存数据库的一个页面中。又例如,如果前80行属性值按列分别进行压缩(5列分别进行估计)后,估计得到的总压缩数据量小于2KB,前81行属性值按列分别进行压缩后,估计得到的总压缩数据量大于2KB,则将前80行的属性值按列进行压缩后存储至列存数据库的一个页面中。
可选地,还可以判断第二条件是否满足,如果满足,则将转储过渡页中满足第一条件的目标列的属性值分别存储至列存数据库的目标页面中的一个页面上。第二条件可以包括但不限于:对转储过渡页中记录的前M行属性值按列分别进行压缩后,估计压缩得到的各压缩数据量中最大的压缩数据量小于或等于目标阈值,前(M+1)行属性值按列分别进行压缩后,估计压缩得到的各压缩数据量中最大的压缩数据量大于目标阈值,其中,M为大于或等于1的正整数。
例如,转储过渡页中记录了100行属性值,每行数据具有5个属性值。如果前90行属性值按列分别进行压缩后,估计压缩得到的各压缩数据量中最大的压缩数据量小于2KB,前91行属性值按列分别进行压缩后,估计压缩得到的各压缩数据量中最大的压缩数据量大于2KB,则将前90行属性值按列分别进行压缩,存储至列存数据库中的5个页面中。
作为另一种可选的实施方式,可以将多个目标行的各目标列上记录的目标属性值分别存储到与该目标列对应的转储过渡页中。每个转储过渡页中记录了一个目标列所记录的属性值,每一目标列可以对应于一个或多个转储过渡页。
可选地,不同目标列对应的转储过渡页的大小相同,但转储过渡页能容纳数据量与各目标列对应的属性类型有关,例如,各目标列对应的属性类型 有数字、字符、字符串等,那么一般情况下,同一转储过渡页能容纳数字的数量多于容纳字符串的数量。转储过渡页的格式和大小与目标页面的格式和大小相同。
可选地,转储过渡页的格式可以包括用于记录目标列的属性值的页体部分,该页体部分具体可以包括以下至少之一:页头和页尾,页头用于表示目标标识对应的标识值范围,该目标标识为转储过渡页中记录的属性值对应的标识;页尾用于对转储过渡页进行校验。
下面结合以下示例对转储过渡页进行说明。转储过渡页的格式与列存页面的格式相同。如图3所示,该转储过渡页的格式为列存页面默认的设计格式:一个页面中有多个列的信息,称为页面A。如图4所示,该转储过渡页的格式为列存页面可选的设计格式:一个页面中只有一个列信息,称为页面B。页面B和页面A在结构上没有差别,只是页面A包括多列的信息,页面B只包含单列的信息。页面A的设计更符合行存的习惯,这样在保存待转储数据时更加简单,转储的效率也更好。但是在针对列的查询中可能需要频繁跨页,会影响查询效率;页面B的设计需要先对待转储的数据进行拆分,转储的效率比较低,但是在针对列的查询中就可以尽可能地避免频繁跨页,查询效率高。在进行转储之前,用户可以先通过参数调整选定转储过渡页的类型,默认可以采用页面A的格式。
一个转储过渡页的格式可以包括三个部分:页头、页体、页尾。
(1)页头(列存页头):页头被设计成自适应的页头。
在用户没有在数据表上自定义索引的情况下,系统默认页头包含:XID_min和XID_max(XID,即事务ID,可以唯一标识同一目标行的各属性值之间的对应关系,即,唯一的标识属性值所对应的列版本)。前者表示本页中所有列版本的最小的XID,后者表示本页中所有列版本的最大的XID。
而在用户在数据表上自定义了索引的情况下,则页头上的XID信息可以被替换成相应的索引信息,如ID_Max和ID_Min。
通过上述方式,可以对列存页面(目标页面)构建列存索引,便于快速定位列信息。
(2)页体:包含多个目标列中的一个或多个目标列的列版本(属性值)。默认情况下,每个列版本用一个二元组{XID,value}表示,表示哪个事务操作了此值。如果用户自定义了页头信息,那么二元组信息就会被相应地替换,如,修改为{ID,value}。
如图3所示,对于页面A的页体,页体中包含多个列,每个列包含多个列版本。列从页头向页尾依次存储。每个列,包括列ID,表示具体的某一列。然后有多个元组列,表示唯一的元组。每个元组列,包括tupleID(元组ID),表示唯一的元组列;y表示此元组列有几个版本。tupleID、y和列版本的组合,可以表征数据表中某一元组中的某一列的属性值的历史变迁过程。
如图4所示,对于页面B,页面B的页体中包含一个列的多个列版本,各个列版本从页头向页尾依次存储。对于该列,可以有多个元组列,表示唯一的元组。每个元组列,包括tupleID,表示唯一的元组列;y表示此元组列有几个版本。tupleID、y和列版本的组合,可以表征数据表中某一元组中的某一列的属性值的历史变迁过程。
3)页尾,位于页面的最底部,包括:页面校验信息和列信息。
对于页面A的页尾,列信息为多个列的列信息,如图4所示,对于页面B的页尾,列信息为一个列的列信息。列信息中包含有列的ID和本页中此列的偏移。列信息从页尾向页头方向依次存放,形成一个从两头靠向中间的过程(页体的列版本由页头向页尾方向依次存放,列信息从页尾向页头方向依次存放),直至中间的空余空间不再能存放下一条列和一个列信息为止。
可选地,可以针对内存数据表中的每一个列建立一个转储过渡页,一个转储过渡页对应于数据表中的一个目标列,一个目标列中的不同属性值可以位于不同的转储过渡页。当转储(数据由行存数据库转存到列存数据库中)发生时,先写此转储过渡页。如果未写满,则再次发生转储时继续写该转储过渡页。可以将由属于同一张表的转储过渡页组成的集合称为转储过渡区。
下面结合具体示例对转储过渡区进行说明,如图5所示,对于历史数据(10,James,1000,Create account),将“James”、“1000”和“Create account”与ID“10”分别写入到不同的转储过渡页,与同一数据表对应的多个转储过渡页组成的集合为一个转储过渡区。
可选地,在将多个目标行的各目标列上记录的目标属性值分别存储到与该目标列对应的转储过渡页中之前、过程中或者之后,可以判断目标条件是否满足,并在目标条件满足的情况下执行向目标页面中转属性值的操作。
可选地,可以在多个目标行中相同目标列上记录的目标属性值的全部或者部分目标属性值写满转储过渡页(目标条件)的情况下,将该已被写满的转储过渡页中记录的属性值,存储至目标页面中的第三页面中,其中,转储过渡页中记录的属性值包括多个目标行中相同目标列上被写入到转储过渡页中的目标属性值。
可选地,由于转储过渡页和列存数据库的页面可以具有相同的格式,因此,可以直接利用已写满的转储过渡页上的属性值写满列存数据库的一个页面。
可选地,将转储过渡页中记录的属性值存储至目标页面中的第三页面中,包括:确定转储过渡页的页头信息,其中,页头信息用于标识与转储过渡页中记录的属性值对应的目标标识的标识值范围;将页头信息和转储过渡页中记录的属性值存储至目标页面中的第三页面。通过页头信息,可以快速确定第三页面中存储的目标列的属性值所对应的目标标识的范围,便于后续查询时的快速定位。
可选地,目标标识可以有多种形式,可以包括但不限于:版本标识、约束列,其中,版本标识用于唯一标识多个目标列的列版本,约束列为多个目标列中的预定列。
作为一种可选的实施方式,在目标标识为版本标识的情况下,如图3或图4所示,页头信息可以包含:与转储过渡页中记录的属性值所对应的版本标识的最大值和最小值。版本标识用于唯一标识多个目标行的目标列的列版本,该版本标识可以包括但不限于:事务ID,用户自定义的索引标识。在内存中的转储过渡页和诸多的位于内存中的列存页,因XID唯一,可以使用HASH进行管理。
可选地,针对时态相关的数据不需要对转储过渡页做出特别的限定,只要保证历史态数据依次写入转储过渡页即可。以银行业务为例,查询某一年某个网点所有交易记录,这种情况是和时态相关的,为了查询方便,要求相近记录按时间连续存放。
作为另一种可选的实施方式,在目标标识为约束列的属性(例如,“位置”、“温度”等)的情况下,转储过渡页的页头信息中的目标标识为约束列信息。页头信息可以包含:一个或多个键值对,键值对包括约束列(多个目标行中的第一目标列)的属性值以及与约束列的属性值对应的页内偏移量,其中,约束列的属性值与转储过渡页中存储的目标列(多个目标行中的第二目标列)的属性值的列版本对应,约束列的属性值对应的转储过渡页中记录目标列的属性值在转储过渡页中连续存储,页内偏移量为目标列的属性值的存储位置在转储过渡页中的偏移量,该转储过渡页为约定列的属性值对应的转储过渡页,该偏移量也是目标列的属性值的存储位置相对于约定列的属性值的存储位置的偏移量。
上述转储过渡页可适用于分布聚集程度高的数据,简单来说就是约束列信息(如,ID)等都是批量分布的,只要找到了一个ID,与其一致的ID均连续分布在其后,而一页中可能只包含少数的几个ID。
以气象监测数据为例,某气象站每5分钟更新一次气温信息,并汇总到气象中心,现需要查询一天中某气象站监测的气温变化。这种情况是和数据分布区域相关的,存储就不应该按照时间顺序,而应该按照数据分布存储。
可选地,可以将一页中不同的约束元素(约束列的属性值)称为一个Item,那么,页头中只存<Item,页内偏移量>这样的键值对。将数据名称(约束元素)及其页内偏移地址组成的数据结构称为Item Map。Item Map将会写在外存的页头中。
可选地,转储数据写入转储过渡页可以进行“间隔写”。所谓间隔写是指:在转储过渡页中写入一条历史态数据后,空出若干行的位置,供同一约束(如主键)的历史态数据(与同一约束值对应的历史态数据)后续插入,而不同约束的记录将会在若干空行之后插入。行空间的值由每个间隔段的第一个值 所占空间的大小决定。间隔设为一个参数值k,即,容忍k行间隔,k值根据应用的修改频度进行设置,默认值为10。
例如,某市气象局有N个气象观测站,其中有一张气温表Temp(ID int,Location char(8),Temperature int)用于记录实时气温,记录的属性包括:观测站标识、位置和温度。此表中,ID和Location都可以作为约束列。那么,其转储过渡页的写入顺序如图6所示(示出了与“ID”、“Location”和“Temperature”的转储过渡页),在ID为1的观测站写入数据后,ID为2的观测站数据在间隔n行之后再写,n的值由参数设定。这样,属于同一约束元素的数据聚合在一起,在查询时可以顺序读,提高查询效率。
可选地,从转储过渡页到普通的列存页(列存数据库中的目标页面)的页面进行拷贝操作时,可以将转储过渡页中记录的属性值直接拷贝到普通的列存页的页面,也可以以页面为单位,对与同一列对应的一个或多个转储过渡页中记录的属性值进行压缩后,再拷贝到普通的列存页的页面,以节约存储空间。
可选地,在同一转储过渡页中存储有多个目标列的信息的情况下(如,页面A),可以先预估转储过渡页中每目标列的压缩率,对预估的各目标列的压缩数据量进行求和之后,根据总的压缩数据量确定可以是否进行转储。在一个转储过渡页中存储有一个目标列的信息的情况下(如,页面B),可以分别预估各目标列对应的转储过渡页的压缩率,并分别根据各压缩率进行转储。列存页面(转储过渡页的格式)的选择可以在转储开始前确定。
可选地,可以确定使用目标压缩方式对各转储过渡页的数据进行压缩之后预计得到的总压缩数据量,其中,各转储过渡页存储有与目标列对应的属性值;在总压缩数据量满足目标条件的情况下,使用目标压缩方式对多个转储过渡页进行压缩得到总压缩数据,其中,目标条件为:总压缩数据量小于等于目标阈值,并且总压缩数据量加上一个转储过渡页的压缩数据量大于目标阈值;将总压缩数据存储至目标页面中的第三页面中。
可选地,可以基于压缩预估技术,监控转储过渡页的状态,例如,若计算得到压缩后的预估值不能写满一个外存页面,则相应地将转储过渡页扩展为一个Extend。内存中对于与同一目标列对应的多个转储过渡页,可以扩展为一个Extend(扩展页,如一个Extend为8个转储过渡页面大小),即对同一个列连续有n个这样的转储过渡页面,则写满一个Extend。然后再对Extend进行压缩持久化(即,将压缩的数据写入外存中的页面),存储为普通的列存页面。在Extend被压缩存储之前,记录该Extend中包含的页头信息。如此可以提高压缩效率,节约存储空间。
可选地,数据压缩率的大小跟转储过渡页中的数据分布直接相关,因此不能以一个统一的标准,来确定何时将对转储过渡页中的数据进行压缩并持久化到外存,因此需要先进行压缩预估,以尽量确保可以写满外存页面,减 少查询时的跨页面读。
可选地,可利用信息熵理论,根据转储过渡页中数据分布情况进行较准确的预估。例如,Account表的Name列,如果存储Name数据的某Extend中只含有James、Alex两种数据,那么只需一个二进制位,1表示James,0表示Alex。如果存储Name数据的转储过渡页中含有James、Alex和Bob三种数据,那么就需要两个二进制位来表示。以此类推,在均匀分布的情况下,假定一个字符(或字符串)在转储过渡页中出现的概率是p,则需要log 2(1/p)个二进制位表示替代该字符(或字符串)的替代符号。
对于一般情况,假设转储过渡页由n种数据组成,每种数据出现的概率分别为p 1,p 2...p n,替代符号占据的二进制最少为:
Figure PCTCN2019092459-appb-000001
其中,p是根据频率统计得到的,因此,转储过渡页中每种数据占据二进制的数学期望如公式(1)所示(根据公式(1),可以预估数据的压缩率):
Figure PCTCN2019092459-appb-000002
以Account表的Name列为例进行说明。假设某转储过渡页中有James,Alex,Bob三种数据,数据项总数是1024项,大小为4KB,三种数据所占比例为50%,30%,20%,那么每种数据占据的二进制位数为0.5*log 2(1/0.5)+0.3*log 2(1/0.3)+0.2*log 2(1/0.2)=1.49。理论上每种数据要占据1.49个二进制位,那么压缩1024项数据理论上需要1526个二进制位,即,0.19KB。压缩比例大概为20:1。那么,写满1个外存页大约需要对内存的20个转储过渡页进行压缩。
可选地,可以在内存中为待转储的数据表维护一张信息表,实时监控各转储过渡页的数据分布情况,预估各转储过渡页的压缩率,扩展成扩展页(Extend),扩展页可以是实际的页面(将各转储过渡页的页头信息、页体信息和页尾信息分别写入扩展页的对应的位置),也可以是虚拟的页面(根据信息表标识同一扩展页对应的转储过渡页),当某扩展页中的理论压缩后数据量可以写满一个外存页时,压缩该扩展页,进行持久化。之后该扩展页所占的内存空间也即被释放。而不能写满一个外存页(外存页面,即,列存数据库中的一个页面)的扩展页继续驻留内存,等待下次转储。
可选地,对于预估可能出现不准确的情况,还可以维护一个Map结构,其在当前正在进行转储的表(列)和与之对应的页面之间建立联系。记录进行此次转储之后,对应的转储过渡页的页面还剩余多少空间,如,Map中某信息<t,2k>,则说明在进行此次转储之后,表t对应的页面(转储过渡页)还剩余2k的空间没有被使用。那么,转储线程在进行压缩预估之前先查此Map。查找可能有两种结果。若Map中没有表(列)的信息,则表明该表(列)是 第一次进行转储,或,上次转储之后页面没有空间剩余,可以直接为该次转储预估默认的页面大小,同时将转储之后的剩余空间信息写入此Map。若Map中有表(列)信息,说明该表(列)上次进行转储之后有页面空余,可以根据读出的页面空闲值进行压缩预估。需要说明的是,这里的写满页面不是绝对写满,而是设定一个阈值(如,99%),当前页面已占用空间与总页面的比例大于等于该阈值(如,已占用空间大于等于总页面的99%)时,即认定该页已满,表信息从Map中删除。
可选地,对于一个转储过渡页中存储有一个目标列的信息的场景(如,页面B),那么,对应的Map信息就是列与对应页面空闲空间的对应关系,如<column1,2k>等,相关操作与前述类似,在此不做赘述。
作为一种可选的实施方式,对于多个目标列中的各列,可以以目标列中压缩后得到的总压缩数据量最大的目标列为准,在最大的总压缩数据量满足目标条件时,分别对各目标列对应的转储过渡页(Extend)进行压缩,分别存储在目标页面中的一个页面中。
例如,如图7所示,在Extend被压缩前,需要根据页头提供的XID_min/XID_max(版本标识,或ID_min/ID_max等自定义信息)确定Extend的XID范围,并存储在压缩后外存中的页头信息中,称之为X Range。X Range可以减少查询过程中不必要的解压缩操作。图7中所示的Extend为被压缩持久化之后的Extend,即外存压缩页。
针对时态相关的查询不需要对内存过渡页做出特别的限定,只要保证历史态数据依次写入内存中的转储过渡页即可。在内存中的转储过渡页和诸多的位于内存中的列存页,因XID唯一,可以使用HASH进行管理。
再例如,如图8所示,在一个Extend被压缩之前,需要根据页头提供的键值对<约束列的属性值,页内偏移地址>,确定每个Extend的键值对,并将每个Extend的键值对存储在压缩后外存页的页头信息中,将数据名称及其页内偏移地址组成的数据结构称为Item Map。
可选地,外存页头的地址信息和Item Map的key值会在列存系统启动时加载到内存中,建立起列存索引,加速查询过程。
作为另一种可选的实施方式,对于多个目标行中的各目标列,可以分别进行扩展成扩展页、预估压缩数据量、压缩以及持久化操作,各目标列对应的转储过渡页之间互不影响。与此对应的列存数据库中的页面存储数据的方式与图7和图8类似,只是每个页面中页头的列存索引的范围或者键值对中的页面偏移量可能不同。
可选地,在本实施例中,在达到目标时间之后,可以清除目标行。在清除目标行之后,还可以接收用于对数据表进行数据查询的查询信息。数据表中的数据可以存储在行存数据库的数据表中和列存数据库的目标页面中,还可以存储在数据页面(例如,PostgreSQL)或回滚段(MySQL)中。
可选地,在接收到查询信息之后,可以根据该查询信息,依次查询列存数据库中的目标页面和行存数据库的数据表(或,以及数据页面或回滚段),获取与查询信息对应的查询结果,并将获取的查询结果进行输出。行存数据库和列存数据库可以均位于内存中,也可以行存数据库位于内存中,而列存数据库位于外存中。
在使用转储过渡页进行数据转储的情况下,数据表中的数据可以存储在行存数据库的数据表、转储过渡页和列存数据库的目标页面中。还可以存储在数据页面(例如,PostgreSQL)或回滚段(MySQL)中。
可选地,在接收到查询信息之后,可以根据该查询信息,依次查询列存储数据库中的目标页面、行存数据库的数据表和转储过渡页(或,以及数据页面或回滚段),获取与查询信息对应的查询结果,并将获取的查询结果进行输出。行存数据库和转储过渡页可以位于内存中,列存数据库可以位于外存中。
可选地,在接收到查询信息之后,若接收到的查询信息包括目标标识的查询值(具体值,或范围值),则获取数据表的行存索引、列存索引和转储过渡页,其中,行存索引为数据表在行存数据库中存储的行存数据的索引,列存索引为目标页面的各页面中存储的目标标识的标示值索引,目标标识对应于目标列的属性值;使用查询值依次对列存索引、行存索引和转储过渡页进行查询,确定与查询信息对应的目标数据所在的目标位置;从所确定的目标位置处,获取与查询信息对应的查询结果;将获取的查询结果进行输出。
可选地,可以通过以下步骤获取数据表的行存索引、列存索引和转储过渡页:可以获取数据表的存储地址,例如,可以从数据字典中该数据表的元数据中获取数据的存储地址;加载该数据表(行存数据库中的数据表)进入数据缓存区,并获取该数据表的行存索引;获取转储过渡页和列存索引(转储过渡页和列存索引可以在内存中常驻)。列存索引可以包括但不限于:版本标识的索引、键值对。
可选地,可以使用查询值在列存索引和行存索引中进行查找,如果在列存索引上找到,则根据列存索引找到对应的列存页面,在该页面中读出数据;如果在行存索引上存在,则根据行存索引指向的位置,遍历行存格式的页面,读出数据;遍历转储过渡页,如果存在,则读出数据。
可选地,可以根据X Range的指示查询相应的压缩页,对相应的压缩页执行解压缩操作,读出数据,也可以根据Item Map的指示查询相应的压缩页,对相应的压缩页执行解压缩操作,读出数据。
可选地,可以优先在列存索引上查找,再在行存索引中进行查找,SQL语句可以给定Hint指示,以确定优先在哪个索引上查找。对于版本标识相关的查询,可以通过依次查询列存索引(如,X Range)、行存索引和转储过渡页,直到查找到对应的查询结果为止。对于约束列的属性值有关的查询,可 以依次查询列存索引(如,Item Map)、行存索引和转储过渡页,查找出所有对应的查询结果。
例如,如图7所示,执行SQL查询语句SELECT Name FROM Account WHERE XID<20AND XID>10,在查询列存的过程中,根据一般的查询过程,首先要解压Name的每一个Extend,找出符合条件的XID之后再求出Name值。解压需要耗费大量的资源,且影响查询速度。而基于X Range来执行上述SQL,可以预先知道只有Extend 1中存在符合条件的Name值,因此只需要解压Extend1即可。同时,可以不必再查询行存索引以及转储过渡页。大大节约了解压和查询的时间。
基于约束列的数据查询对于规律生成的数据,如气象信息,物联网节点定时采集更新的信息有很好的支持度,但是对于无规律的数据更新支持度较差。
例如,执行SQL语句SELECT Temperature FROM Temp WHERE ID=1;按照基于X Range查询,ID=1的Temperature值可能横跨若干个压缩页,查询时进行大量解压。而基于Item Map的查询只需要查询页头信息中的Item Map的key值,得到ID=1的页内偏移值,计算ID=1的数据范围,再据此找到Temperature列的对应范围的数据即可。
为方便管理,外存中属于同一列的所有页面组成一个Segment。需要说明的是,Segment只是逻辑上的划分,至于其物理上的实现不在本申请的讨论范围。采用何种转储策略在转储开始前由用户根据具体的查询分析场景通过设置存储参数决定。默认是基于版本标识的转储策略。
下面结合以下示例对数据处理方法进行说明。一个完整的数据处理方法如图9所示,主要分成3个大步骤:
步骤1(标示为1的箭头所示):基于用户选择的策略,定时将数据写入转储过渡页。
步骤2(标示为2的箭头所示):利用压缩预估机制实现转储过渡页或Extend向外存进行持久化。并建立XRange或Item Map索引。
步骤3(标示为3的箭头所示):当查询请求到来时,根据SQL Hint在行存或转储过渡页及列存进行查询,默认在转储过渡页及列存进行查询。
通过本示例,利用列存中各个元组的XID(或其他索引)范围管理转储过渡页可以有效提高寻址速度,同时针对不同的数据场景,采用基于压缩预估机制的X Range和Item Map的方式,确保查询列存过程不解压无关压缩页,提高了查询性能。同时,上层应用系统可以读取行存数据库中的最新数据,分析系统可以基于列存进行数据分析,得出有价值的信息。应用系统与分析系统互不影响,充分利用了数据的价值。
通过本实施例,获取行式存储数据库的数据表中在目标时间上待被清除的目标行;将至少一个目标行的目标列上记录的目标属性值存储至列式存储 数据库中的目标页面,其中,目标行中相同一列上记录的目标属性值被记录在列式存储数据库中的目标页面中的至少一个页面上;在达到目标时间之后,清除目标行,达到了对数据库中的历史数据进行保存的目的,保证了数据变迁历史完整。
作为一种可选的实施方案,将至少一个目标行的目标列上记录的目标属性值存储至列式存储数据库中的目标页面,包括:
S1,将多个目标行中不同目标列上记录的目标属性值存储至目标页面中的不同页面上,其中,在多个目标行的相同目标列上记录的目标属性值中的部分目标属性值写满目标页面中的第一页面的情况下,将多个目标行中相同目标列上记录的目标属性值中除部分目标属性值以外的其他目标属性值存储至目标页面中的第二页面上。
通过本实施例,通过将目标列中不同列上记录的目标属性值存储至目标页面中的不同页面上,并且同一目标列上记录的属性值可以存储在不同的页面上,从而可以合理规划目标列属性值的存储方式,方便对目标页面进行管理。
作为一种可选的实施方案,将至少一个目标行的多个目标列上记录的目标属性值存储至列式存储数据库中的目标页面,包括:
S1,将多个目标行中相同目标列上记录的目标属性值存储至转储过渡页,其中,转储过渡页用于将目标列上记录的属性值转储至列式存储数据库的目标页面;
S2,在多个目标行中相同目标列上记录的目标属性值的全部或者部分目标属性值写满转储过渡页的情况下,将转储过渡页中记录的属性值存储至目标页面中的第三页面,其中,转储过渡页中记录的属性值包括相同目标列上记录的目标属性值。
通过本实施例,通过转储过渡页记录目标列上的目标属性值,在转储过渡页存满时才进行转储,可以保证目标页面中的页面被写满,避免了存储空间的浪费。
作为一种可选的实施方案,将至少一个目标行中相同目标列上记录的目标属性值存储至转储过渡页,包括:
S1,确定转储过渡页的页头信息,其中,页头信息用于标识与转储过渡页中记录的属性值对应的目标标识的标识值范围;
S2,将页头信息和转储过渡页中记录的属性值存储至目标页面中的第三页面。
可选地,页头信息包括:与转储过渡页中记录的属性值所对应的版本标识的最大值和最小值,其中,目标标识为版本标识,版本标识用于唯一标识多个目标行中多个目标列的列版本;或者,页头信息包括:一个或多个键值对,键值对包括多个目标行中多个目标列中第一列的属性值以及与第一列的 属性值对应的页内偏移量,其中,第一列的属性值与转储过渡页中存储的第二列的属性值的列版本对应,第一列的属性值与第二列的属性值在转储过渡页中连续存储,页内偏移量为第二列的属性值的存储位置在转储过渡页中的偏移量。
通过本实施例,通过设置页头信息,从而形成对列式存储数据库中的页面中存储的属性值的索引,方便对目标页面的管理。
作为一种可选的实施方案,将转储过渡页中记录的属性值存储至目标页面中的第三页面中包括:
S1,确定使用目标压缩方式对多个转储过渡页中的各转储过渡页的数据进行压缩之后预计得到的总压缩数据量,其中,多个转储过渡页中的各转储过渡页存储有与多个目标行中相同目标列对应的属性值,多个转储过渡页包含转储过渡页;
S2,在总压缩数据量满足目标条件的情况下,使用目标压缩方式对多个转储过渡页中的各转储过渡页分别进行压缩,得到总压缩数据,其中,目标条件为:总压缩数据量小于或等于目标阈值,且总压缩数据量加上一个转储过渡页的压缩数据量大于目标阈值;
S3,将总压缩数据存储至目标页面中的第三页面中。
通过本实施例,通过预估与多目标行中的同一列对应的多个转储过渡页中的属性值进行压缩后的总的压缩数据量,在压缩后的总数据量大于目标阈值时,对多个转储过渡页中的属性值进行压缩后存储至目标页面中的一个页面,节省了存储空间。
作为一种可选的实施方案,在清除至少一个目标行之后,上述方法还包括:
S1,接收用于对数据表进行数据查询的查询信息;
S2,使用查询信息,依次查询列式存储数据库中的目标页面和行式存储数据库的数据表,获取与查询信息对应的查询结果;
S3,将查询结果进行输出。
通过本实施例,根据查询信息,分别对行式存储数据库中的数据表和列式存储数据库中的目标页面进行查询,保证了查询结果的全面性。
作为一种可选的实施方案,在清除至少一个目标行之后,上述方法还包括:
S1,接收用于对数据表进行数据查询的查询信息;
S2,使用查询信息,依次查询列式存储数据库中的目标页面、行式存储数据库的数据表以及转储过渡页,获取与查询信息对应的查询结果;
S3,将查询结果进行输出。
通过本实施例,根据查询信息,分别对行式存储数据库中的数据表、列式存储数据库中的目标页面以及转储过渡页进行查询,保证了查询结果的全 面性。
作为一种可选的实施方案,在清除多个目标行之后,上述方法还包括:
S1,接收用于对数据表进行数据查询的查询信息,其中,查询信息包括与目标标识对应的查询值;
S2,获取行存索引、列存索引和转储过渡页,其中,行存索引为行式存储数据库中的数据表中存储的行存数据的索引,列存索引为与目标页面中各目标页面存储的多个目标列的属性值所对应的目标标识的标识值的索引;
S3,使用查询值依次对列存索引、行存索引和转储过渡页进行查询,确定与查询信息对应的查询结果所存储在的目标位置;
S4,使用目标位置,获取与查询信息对应的查询结果;
S5,将查询结果进行输出。
通过本实施例,根据包括有目标标识对应的查询值的查询信息,分别对列存索引、行存索引以及转储过渡页进行查询,保证了查询的效率以及查询结果的全面性。
以下结合图10,对上述数据处理方法进行说明。如图10所示,网元节点的处理器通过步骤S1002,将行存数据库中的数据表的历史数据转储至转储过渡页。通过步骤S1004,将转储过渡页中的数据存储至列存页面。通过步骤S1006,接收查询信息。通过步骤S1008,使用查询信息,对列存索引、行存索引以及转储过渡页进行查询,得到查询结果。通过步骤S1010将得到的查询结果进行输出。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
通过以上实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或网络设备等)执行本申请各个实施例的方法。
根据本申请实施例的另一个方面,还提供了一种用于实施数据处理方法的数据处理装置,如图11所示,该装置包括:
(1)第一获取单元1102,用于获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;
(2)存储单元1104,用于将至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;
(3)清除单元1106,用于在达到目标时间之后,清除至少一个目标行。
需要说明的是,在相关技术中,对于数据库中的历史数据,通过采用清除操作将其清除。采用前述方法,会使得数据库中的历史数据缺失,从而难以追溯历史数据。而在本申请中,获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;将至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;在达到目标时间之后,清除至少一个目标行,以对数据库中的历史数据进行保存,保证数据变迁历史数据的完整,进而解决了相关数据处理技术中存在的难以追溯历史数据的技术问题。
可选地,第一获取单元1102可用于执行步骤S202,存储单元1104可用于执行前述步骤S204,清除单元1106可用于执行步骤S206。可选的执行方式在此不做赘述。
作为一种可选的实施方案,存储单元1104包括:
(1)第一存储模块,用于将多个目标行中相同目标列上记录的目标属性值存储至转储过渡页,其中,转储过渡页用于将目标列上记录的属性值转储至列式存储数据库的目标页面;
(2)第二存储模块,用于在多个目标行中相同目标列上记录的目标属性值的全部或者部分目标属性值写满转储过渡页的情况下,将转储过渡页中记录的属性值存储至目标页面中的第三页面中。
通过本实施例,利用转储过渡页记录目标列上的目标属性值,可以在转储过渡页存满时才进行转存储,从而可以保证目标页面中的页面被写满,避免了存储空间的浪费。
作为一种可选的实施方案,第二存储模块包括:
(1)第一确定子模块,用于确定转储过渡页的页头信息,其中,页头信息用于标识与转储过渡页中记录的属性值对应的目标标识的标识值范围;
(2)第一存储子模块,用于将页头信息和转储过渡页中记录的属性值存储至目标页面中的第三页面。
可选地,页头信息包括:与转储过渡页中记录的属性值所对应的版本标识的最大值和最小值,其中,目标标识为版本标识,版本标识用于唯一标识多个目标行的目标列的列版本;或者,页头信息包括:一个或多个键值对,键值对包括多个目标行的目标列中第一列的属性值以及与第一列的属性值对应的页内偏移量,其中,第一列的属性值与转储过渡页中存储的第二列的属性值的列版本对应,第一列的属性值与第二列的属性值在转储过渡页中连续存储,页内偏移量为第二列的属性值的存储位置在转储过渡页中的偏移量。
通过本实施例,通过设置页头信息,从而形成对列式存储数据库中的页面中存储的属性值的索引,方便对目标页面的管理。
作为一种可选的实施方案,第二存储模块包括:
(1)第二确定子模块,用于确定使用目标压缩方式对多个转储过渡页中的各转储过渡页的数据进行压缩之后预计得到的总压缩数据量,其中,多个转储过渡页中的各转储过渡页存储有与多个目标行中相同目标列对应的属性值,多个转储过渡页包含转储过渡页;
(2)压缩子模块,用于在总压缩数据量满足目标条件的情况下,使用目标压缩方式对多个转储过渡页中的各转储过渡页分别进行压缩,得到总压缩数据,其中,目标条件为:总压缩数据量小于或等于目标阈值,且总压缩数据量加上一个转储过渡页的压缩数据量大于目标阈值;
(3)第二存储子模块,用于将总压缩数据存储至目标页面中的第三页面中。
通过本实施例,通过预估多与目标列中的同一列对应的多个转储过渡页中的属性值进行压缩后的总的压缩数据量,在压缩后的总数据量大于目标阈值时,对多个转储过渡页中的属性值进行压缩后存储至目标页面中的一个页面,节省了存储空间。
作为一种可选的实施方案,存储单元1104包括:
第三存储模块,用于将多个目标行的多个目标行中不同目标列上记录的目标属性值存储至目标页面中的不同页面上,其中,在多个目标行中相同目标列上记录的目标属性值中的部分目标属性值写满目标页面中的第一页面的情况下,将多个目标行的多个目标列中相同目标列上记录的目标属性值中除部分目标属性值以外的其他目标属性值存储至目标页面中的第二页面上。
通过本实施例,通过将目标列中不同列上记录的目标属性值存储至目标页面中的不同页面上,并且同一目标列上记录的属性值可以存储在不同的页面上,从而可以合理规划目标列属性值的存储方式,方便对目标页面进行管理。
作为一种可选的实施方案,上述装置还包括:
(1)接收单元,用于在清除至少一个目标行之后,接收用于对数据表进行数据查询的查询信息;
(2)第二获取单元,用于使用查询信息,依次查询列式存储数据库中的目标页面和行式存储数据库的数据表,获取与查询信息对应的查询结果;
(3)输出单元,用于将查询结果进行输出。
通过本实施例,根据查询信息,分别对行式存储数据库中的数据表和列式存储数据库中的目标页面进行查询,保证了查询结果的全面性。
作为一种可选的实施方案,上述装置还包括:
(1)第一接收单元,用于在清除至少一个目标行之后,接收用于对数据表进行数据查询的查询信息;
(2)第一查询单元,用于使用查询信息,依次查询列式存储数据库中的目 标页面、行式存储数据库的数据表以及转储过渡页,获取与查询信息对应的查询结果;
(3)第一输出单元,用于将查询结果进行输出。
通过本实施例,根据查询信息,分别对行式存储数据库中的数据表、列式存储数据库中的目标页面以及转储过渡页进行查询,保证了查询结果的全面性。
作为一种可选的实施方案,在清除多个目标行之后,上述装置还包括:
(1)第二接收单元,用于在清除至少一个目标行之后,接收用于对数据表进行数据查询的查询信息,其中,查询信息包括与目标标识对应的查询值;
(2)第三获取单元,用于获取行存索引、列存索引和转储过渡页,其中,行存索引为行式存储数据库中的数据表中存储的行存数据的索引,列存索引为与目标页面中各目标页面存储的多个目标列的属性值所对应的目标标识的标识值的索引;
(3)第二查询单元,用于使用查询值依次对列存索引、行存索引和转储过渡页进行查询,确定与查询信息对应的查询结果所存储在的目标位置;
(4)第四获取单元,用于使用目标位置,获取与查询信息对应的查询结果;
(5)第二输出单元,用于将查询结果进行输出。
通过本实施例,根据包括有目标标识对应的查询值的查询信息,分别对列存索引、行存索引以及转储过渡页进行查询,保证了查询的效率以及查询结果的全面性。
可选地,在本实施例中,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
根据本申请的实施例的又一方面,还提供了一种存储介质,该存储介质中存储有计算机程序,其中,计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;
S2,将至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;
S3,在达到目标时间之后,清除至少一个目标行。
可选地,在本实施例中,本领域普通技术人员可以理解上述实施例的各 种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器、随机存取器、磁盘或光盘等。
根据本申请实施例的又一个方面,还提供了一种用于实施上述数据处理方法的电子装置,如图12所示,该电子装置包括:处理器1202、存储器1204和传输装置1206等。该存储器中存储有计算机程序,该处理器被设置为通过计算机程序执行上述任一项方法实施例中的步骤。
可选地,在本实施例中,上述电子装置可以位于计算机网络的多个网络设备中的至少一个网络设备。
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;
S2,将至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;
S3,在达到目标时间之后,清除至少一个目标行。
可选地,本领域普通技术人员可以理解,图12所示的结构仅为示意,电子装置也可以是提供查询服务的服务器。图12其并不对上述电子装置的结构造成限定。例如,电子装置还可包括比图12中所示更多或者更少的组件(如网络接口等),或者具有与图12所示不同的配置。
其中,存储器1204可用于存储软件程序以及模块,如本申请实施例中的数据处理方法和装置对应的程序指令/模块,处理器1202通过运行存储在存储器1204内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述数据处理方法。存储器1204可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或其他非易失性固态存储器。在一些实例中,存储器1204可进一步包括相对于处理器1202远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置1206用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置1206包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置1206为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立 的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (15)

  1. 一种数据处理方法,应用于电子装置,包括:
    获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;
    将所述至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;
    在达到所述目标时间之后,清除所述至少一个目标行。
  2. 根据权利要求1所述的方法,将所述至少一个目标行上记录的目标属性值存储至所述列式存储数据库中的目标页面包括:
    将多个目标行中不同目标列上记录的目标属性值存储至所述目标页面中的不同页面上,其中,在所述多个目标行的相同目标列上记录的目标属性值中的部分目标属性值写满所述目标页面中的第一页面的情况下,将所述多个目标行中相同目标列上记录的所述目标属性值中除所述部分目标属性值以外的其他目标属性值存储至所述目标页面中的第二页面上。
  3. 根据权利要求1所述的方法,将所述至少一个目标行上记录的目标属性值存储至所述列式存储数据库中的目标页面包括:
    将多个目标行的相同目标列上记录的目标属性值存储至转储过渡页,其中,所述转储过渡页用于将所述目标列上记录的属性值转储至所述列式存储数据库的目标页面;
    在所述多个目标行的相同目标列上记录的目标属性值的全部或者部分目标属性值写满所述转储过渡页的情况下,将所述转储过渡页中记录的属性值存储至所述目标页面中的第三页面。
  4. 根据权利要求3所述的方法,将所述转储过渡页中记录的属性值存储至所述目标页面中的第三页面包括:
    确定所述转储过渡页的页头信息,其中,所述页头信息用于标识与所述转储过渡页中记录的属性值对应的目标标识的标识值范围;
    将所述页头信息和所述转储过渡页中记录的属性值存储至所述目标页面中的所述第三页面。
  5. 根据权利要求4所述的方法,所述页头信息包括:与所述转储过渡页中记录的属性值所对应的版本标识的最大值和最小值,其中,所述目标标识为所述版本标识,所述版本标识用于唯一标识所述多个目标行的目标列的列版本;或者,
    所述页头信息包括:一个或多个键值对,所述键值对包括所述多个目标行的目标列中第一列的属性值以及与所述第一列的属性值对应的页内偏移量,其中,所述第一列的属性值与所述转储过渡页中存储的第二列的属性值的列版本对应,所述第一列的属性值与所述第二列的属性值在所述转储过渡页中连续存储,所述页内偏移量为所述第二列的属性值的存储位置在所述转 储过渡页中的偏移量。
  6. 根据权利要求3所述的方法,将所述转储过渡页中记录的属性值存储至所述目标页面中的第三页面包括:
    确定使用目标压缩方式对多个转储过渡页中的各转储过渡页的数据进行压缩之后预计得到的总压缩数据量,其中,所述多个转储过渡页中的各转储过渡页存储有与所述多个目标行中相同目标列对应的属性值,所述多个转储过渡页包含所述转储过渡页;
    在所述总压缩数据量满足目标条件的情况下,使用所述目标压缩方式对所述多个转储过渡页中的各转储过渡页分别进行压缩,得到总压缩数据,其中,所述目标条件为:所述总压缩数据量小于或等于目标阈值,且所述总压缩数据量加上一个转储过渡页的压缩数据量大于所述目标阈值;
    将所述总压缩数据存储至所述目标页面中的所述第三页面中。
  7. 根据权利要求1所述的方法,在清除所述至少一个目标行之后,所述方法还包括:
    接收用于对所述数据表进行数据查询的查询信息;
    使用所述查询信息,依次查询所述列式存储数据库中的所述目标页面和所述行式存储数据库的数据表,获取与所述查询信息对应的查询结果;
    将所述查询结果进行输出。
  8. 根据权利要求3所述的方法,在清除所述至少一个目标行之后,所述方法还包括:
    接收用于对所述数据表进行数据查询的查询信息;
    使用所述查询信息,依次查询所述列式存储数据库中的所述目标页面、所述行式存储数据库的数据表以及所述转储过渡页,获取与所述查询信息对应的查询结果;
    将所述查询结果进行输出。
  9. 根据权利要求4所述的方法,其特征在于,在清除所述至少一个目标行之后,所述方法还包括:
    接收用于对所述数据表进行数据查询的查询信息,其中,所述查询信息包括与所述目标标识对应的查询值;
    获取行存索引、列存索引和所述转储过渡页,其中,所述行存索引为所述行式存储数据库中的数据表中存储的行存数据的索引,所述列存索引为与所述目标页面中各目标页面存储的所述多个目标列的属性值所对应的目标标识的标识值的索引;
    使用所述查询值依次对所述列存索引、所述行存索引和所述转储过渡页进行查询,确定与所述查询信息对应的查询结果所存储在的目标位置;
    使用所述目标位置,获取与所述查询信息对应的所述查询结果;
    将所述查询结果进行输出。
  10. 根据权利要求1至9中任一项所述的方法,所述行式存储数据库的待转储的数据位于目标设备的内存中,所述列式存储数据库的持久化数据位于所述目标设备的外存中,所述待转储的数据包括:所述待被清除的至少一个目标行。
  11. 一种数据处理装置,其特征在于,包括:
    第一获取单元,用于获取行式存储数据库的数据表中在目标时间上待被清除的至少一个目标行;
    存储单元,用于将所述至少一个目标行上记录的目标属性值存储至列式存储数据库中的目标页面;
    清除单元,用于在达到所述目标时间之后,清除所述至少一个目标行。
  12. 根据权利要求11所述的装置,所述存储单元包括:
    第一存储模块,用于将多个目标行的相同目标列上记录的目标属性值存储至转储过渡页,其中,所述转储过渡页用于将所述目标列上记录的属性值转储至所述列式存储数据库的所述目标页面;
    第二存储模块,用于在所述多个目标行的相同目标列上记录的目标属性值的全部或者部分目标属性值写满所述转储过渡页的情况下,将所述转储过渡页中记录的属性值存储至所述目标页面中的第三页面中。
  13. 根据权利要求11或12所述的装置,所述装置还包括:
    接收单元,用于接收用于对所述数据表进行数据查询的查询信息;
    第二获取单元,用于使用所述查询信息,依次查询所述列式存储数据库中的所述目标页面和所述行式存储数据库的数据表,获取与所述查询信息对应的查询结果;
    输出单元,用于将所述查询结果进行输出。
  14. 一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至10任一项中所述的方法。
  15. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至10任一项中所述的方法。
PCT/CN2019/092459 2018-08-16 2019-06-24 数据处理方法和装置、存储介质及电子装置 WO2020034757A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19850450.8A EP3757815A4 (en) 2018-08-16 2019-06-24 DATA PROCESSING METHOD AND DEVICE, STORAGE MEDIUM AND ELECTRONIC DEVICE
US17/014,967 US11636083B2 (en) 2018-08-16 2020-09-08 Data processing method and apparatus, storage medium and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810935478.1A CN110196847A (zh) 2018-08-16 2018-08-16 数据处理方法和装置、存储介质及电子装置
CN201810935478.1 2018-08-16

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/014,967 Continuation US11636083B2 (en) 2018-08-16 2020-09-08 Data processing method and apparatus, storage medium and electronic device

Publications (1)

Publication Number Publication Date
WO2020034757A1 true WO2020034757A1 (zh) 2020-02-20

Family

ID=67751422

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/092459 WO2020034757A1 (zh) 2018-08-16 2019-06-24 数据处理方法和装置、存储介质及电子装置

Country Status (4)

Country Link
US (1) US11636083B2 (zh)
EP (1) EP3757815A4 (zh)
CN (1) CN110196847A (zh)
WO (1) WO2020034757A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111059B (zh) * 2020-01-13 2023-04-14 杭州海康威视数字技术股份有限公司 数据存储管理的方法和装置
US11386089B2 (en) * 2020-01-13 2022-07-12 The Toronto-Dominion Bank Scan optimization of column oriented storage
CN111309985B (zh) * 2020-03-10 2023-08-25 支付宝(杭州)信息技术有限公司 基于PostgreSQL数据库的高维向量存储方法和装置
CN113296683B (zh) * 2020-04-07 2022-04-29 阿里巴巴集团控股有限公司 数据存储方法、装置、服务器和存储介质
CN113806307A (zh) * 2021-08-09 2021-12-17 阿里巴巴(中国)有限公司 数据处理方法及装置
CN113722623B (zh) * 2021-09-03 2024-07-05 锐掣(杭州)科技有限公司 数据处理方法、装置、电子设备及存储介质
US20230315710A1 (en) * 2022-03-30 2023-10-05 International Business Machines Corporation Database query management using a new column type
CN116594808B (zh) * 2023-04-26 2024-05-28 深圳计算科学研究院 一种数据库回滚资源处理方法、装置、计算机设备及介质
CN116644103B (zh) * 2023-05-17 2023-11-24 本原数据(北京)信息技术有限公司 基于数据库的数据排序方法和装置、设备、存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345518A (zh) * 2013-07-11 2013-10-09 清华大学 基于数据块的自适应数据存储管理方法及系统
CN104424287A (zh) * 2013-08-30 2015-03-18 深圳市腾讯计算机系统有限公司 数据查询方法和装置
WO2015139193A1 (zh) * 2014-03-18 2015-09-24 华为技术有限公司 一种数据存储格式的转换方法及装置
CN107092624A (zh) * 2016-12-28 2017-08-25 北京小度信息科技有限公司 数据存储方法、装置及系统

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5918225A (en) * 1993-04-16 1999-06-29 Sybase, Inc. SQL-based database system with improved indexing methodology
US6240428B1 (en) * 1997-10-31 2001-05-29 Oracle Corporation Import/export and repartitioning of partitioned objects
US8583692B2 (en) * 2009-04-30 2013-11-12 Oracle International Corporation DDL and DML support for hybrid columnar compressed tables
US9262330B2 (en) * 2009-11-04 2016-02-16 Microsoft Technology Licensing, Llc Column oriented in-memory page caching
US8762387B1 (en) * 2013-07-31 2014-06-24 Linkedin Corporation Inverted indexes for accelerating analytics queries
US10838926B2 (en) * 2013-10-01 2020-11-17 Sap Se Transparent access to multi-temperature data
CN103631937B (zh) 2013-12-06 2017-03-15 北京趣拿信息技术有限公司 构建列存储索引的方法、装置及系统
US9697242B2 (en) * 2014-01-30 2017-07-04 International Business Machines Corporation Buffering inserts into a column store database
US10108622B2 (en) * 2014-03-26 2018-10-23 International Business Machines Corporation Autonomic regulation of a volatile database table attribute
US9891831B2 (en) * 2014-11-25 2018-02-13 Sap Se Dual data storage using an in-memory array and an on-disk page structure
EP3271840B1 (en) * 2015-05-07 2019-02-27 Cloudera, Inc. Mutations in a column store
US10664462B2 (en) * 2017-03-01 2020-05-26 Sap Se In-memory row storage architecture
CN107256233B (zh) * 2017-05-16 2021-01-12 北京奇虎科技有限公司 一种数据存储方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345518A (zh) * 2013-07-11 2013-10-09 清华大学 基于数据块的自适应数据存储管理方法及系统
CN104424287A (zh) * 2013-08-30 2015-03-18 深圳市腾讯计算机系统有限公司 数据查询方法和装置
WO2015139193A1 (zh) * 2014-03-18 2015-09-24 华为技术有限公司 一种数据存储格式的转换方法及装置
CN107092624A (zh) * 2016-12-28 2017-08-25 北京小度信息科技有限公司 数据存储方法、装置及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3757815A4 *

Also Published As

Publication number Publication date
US20200409925A1 (en) 2020-12-31
CN110196847A (zh) 2019-09-03
EP3757815A1 (en) 2020-12-30
US11636083B2 (en) 2023-04-25
EP3757815A4 (en) 2021-06-16

Similar Documents

Publication Publication Date Title
WO2020034757A1 (zh) 数据处理方法和装置、存储介质及电子装置
CN111046034B (zh) 管理内存数据及在内存中维护数据的方法和系统
EP3812915B1 (en) Big data statistics at data-block level
US8700674B2 (en) Database storage architecture
US8868512B2 (en) Logging scheme for column-oriented in-memory databases
CN102521269B (zh) 一种基于索引的计算机连续数据保护方法
CN107491523B (zh) 存储数据对象的方法及装置
CN106599199A (zh) 一种数据缓存与同步方法
US8819074B2 (en) Replacement policy for resource container
US8620880B2 (en) Database system, method of managing database, and computer-readable storage medium
WO2019184618A1 (zh) 数据存储的方法、装置、服务器和存储介质
EP3495964B1 (en) Apparatus and program for data processing
CN111309720A (zh) 时序数据的存储、读取方法、装置、电子设备及存储介质
WO2020007288A1 (zh) 管理内存数据及在内存中维护数据的方法和系统
US10936500B1 (en) Conditional cache persistence in database systems
US20230040530A1 (en) Systems and method for processing timeseries data
CN110389967A (zh) 数据存储方法、装置、服务器及存储介质
US9104711B2 (en) Database system, method of managing database, and computer-readable storage medium
WO2016175880A1 (en) Merging incoming data in a database
CN111427920B (zh) 数据采集方法、装置、系统、计算机设备及存储介质
CN112463073A (zh) 一种对象存储分布式配额方法、系统、设备和存储介质
KR101419428B1 (ko) 모바일 환경에 구축된 데이터베이스에 대한 트랜잭션 로깅 및 회복 장치 및 그 방법
CN112527804B (zh) 文件存储方法、文件读取方法和数据存储系统
CN112463837B (zh) 一种关系型数据库数据存储查询方法
CN111913959B (zh) 一种数据查询方法、装置、终端和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19850450

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019850450

Country of ref document: EP

Effective date: 20200922

NENP Non-entry into the national phase

Ref country code: DE