WO2019105420A1 - 数据查询 - Google Patents

数据查询 Download PDF

Info

Publication number
WO2019105420A1
WO2019105420A1 PCT/CN2018/118249 CN2018118249W WO2019105420A1 WO 2019105420 A1 WO2019105420 A1 WO 2019105420A1 CN 2018118249 W CN2018118249 W CN 2018118249W WO 2019105420 A1 WO2019105420 A1 WO 2019105420A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
phoenix
instruction
hbase
memory
Prior art date
Application number
PCT/CN2018/118249
Other languages
English (en)
French (fr)
Inventor
丁远普
李日光
Original Assignee
新华三大数据技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 新华三大数据技术有限公司 filed Critical 新华三大数据技术有限公司
Priority to JP2020544099A priority Critical patent/JP7018516B2/ja
Priority to EP18884738.8A priority patent/EP3683697A4/en
Priority to US16/766,231 priority patent/US11269881B2/en
Publication of WO2019105420A1 publication Critical patent/WO2019105420A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation

Definitions

  • the HBase database is a highly reliable, high performance, column-oriented, scalable distributed storage system that provides random, real-time read and write access to large data sets.
  • the HBase database stores data in the form of a data table (herein referred to as an HBase table).
  • the HBase table may be composed of rows and column families, as shown in Table 1, which is an example of an HBase table.
  • the row key (RowKey) is an index
  • the column family (ColumnFamily) may be composed of one or more columns.
  • the name, address, age, mobile phone number, mailbox, etc. in Table 1 are metadata, and each metadata corresponds to multiple attribute values. For example, the attribute values corresponding to the name are Zhang San and Li Si.
  • FIG. 1 is a structural diagram of a data query system in an embodiment of the present disclosure
  • FIG. 2 is a structural diagram of a data query system in an embodiment of the present disclosure
  • FIG. 3 is a flowchart of a method for creating an index table in an embodiment of the present disclosure
  • FIG. 5 is a flowchart of a data storage method in an embodiment of the present disclosure.
  • FIG. 6 is a flowchart of a data query method in another embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a hardware structure of a connector in an embodiment of the present disclosure.
  • the data can be quickly retrieved based on the row key. For example, when the query request is received, if the row key included in the query request is 001, the content queried from the HBase table is the data of the first row, and therefore, the data of the first row is returned. If the query request does not include the row key, but includes the attribute value of a certain metadata, such as Beijing, the row data cannot be used to quickly query the corresponding data, but a full table scan of the entire HBase table is required to query the inclusion. The data of one line of "Beijing" has very low query performance.
  • embodiments of the present disclosure propose a data query method that can be applied to a data query system including a memory, a querier, and a connector.
  • Queryers, storage, and connectors can be deployed on the same server or on different servers. If the querier, memory, and connector are deployed on the same server, the querier, memory, and connector are the three functional modules of the server.
  • the querier can be SQL (Structured Query Language) that implements data query functions.
  • Query language) Engine the memory can be a database that implements the data storage function
  • the connector can be a middleware that implements the connection function. If the querier, storage, and connectors are deployed on different servers, the querier, storage, and connectors are three separate servers.
  • FIG. 1 it is a schematic structural diagram of the above data query system, wherein:
  • the memory can use the Phoenix component to store data.
  • the Phoenix component is used to provide SQL support for the HBase database, so that when the SQL request is received, the data in the HBase database can be requested according to the SQL request. Take action.
  • the querier can use the SparkSQL engine (a Spark-based distributed SQL engine) to query data.
  • the SparkSQL engine can open multiple interfaces to external data sources (DataSource), for example, JDBC (Java DataBase Connectivity, java database). Interfaces such as ODBC (Open Database Connectivity) and API (Application Programming Interface).
  • DataSource data sources
  • JDBC Java DataBase Connectivity
  • ODBC Open Database Connectivity
  • API Application Programming Interface
  • the SparkSQL engine can also support more formats of data sources, such as JSON (Java Script Object Notation), Parquet (column storage format), avro (data serialization system), CSV (Comma Separated Values, Comma separated values) and other formats.
  • the memory uses the Phoenix component to store data, and the querier uses the SparkSQL engine to query the data. That is, the querier can process the SparkSQL instruction, and the memory can process the Phoenix instruction. Therefore, if the querier sends the SparkSQL instruction, if the memory receives The SparkSQL instruction will not be able to process the SparkSQL instruction; similarly, after the memory sends the Phoenix instruction, if the querier receives the Phoenix instruction, it will not be able to process the Phoenix instruction. Based on this, a connector can also be deployed between the memory and the querier, and the connector is used to implement the conversion of the SparkSQL instruction and the Phoenix instruction.
  • an index table may be created in the memory, where the index table is used to record the correspondence between the attribute values of the metadata in the HBase table and the row keys (for example, in Table 1, the records of "Beijing" and "001" are recorded. relationship).
  • the index table is used to record the correspondence between the attribute values of the metadata in the HBase table and the row keys (for example, in Table 1, the records of "Beijing" and "001" are recorded. relationship).
  • an index table creation process may be involved: an index table creation process, a data query process, a data storage process, an index table deletion process, and an index table acquisition process.
  • the querier can provide SparkSQL creation interface, create an interface based on the SparkSQL, generate and send SparkSQL creation instructions, and SparkSQL create instructions to create an index table.
  • the connector can provide the SparkSQL creation interface and the Phoenix creation interface. The interface is created based on SparkSQL.
  • the connector can parse the content of the SparkSQL creation instruction.
  • the connector can generate and send the Phoenix creation according to the content parsed from the SparkSQL creation instruction.
  • the instruction, the Phoenix creation instruction is used to create an index table.
  • the memory can provide a Phoenix creation interface, based on which the Phoenix creates an interface, and the memory can parse the contents of the Phoenix creation instruction and create an index table based on the content.
  • the querier can provide a SparkSQL query interface. Based on the SparkSQL query interface, the querier can generate and send SparkSQL query commands.
  • the SparkSQL query command is used to query data and can process SparkSQL response commands for SparkSQL query commands.
  • the connector can provide a SparkSQL query interface and a Phoenix query interface. Based on the SparkSQL query interface, the connector can parse the contents of the SparkSQL query instruction. Based on the Phoenix query interface, the connector can generate and send a Phoenix query instruction based on the content parsed from the SparkSQL query instruction. In addition, based on the Phoenix query interface, the connector can also parse the contents of the Phoenix response command for the Phoenix query instruction. Based on the SparkSQL query interface, the connector can generate and send a SparkSQL response command based on the content parsed from the Phoenix response instruction.
  • the memory can provide a Phoenix query interface. Based on the Phoenix query interface, the memory can parse the contents of the Phoenix query instruction, query it, and generate and send a Phoenix response command according to the query result.
  • the querier can provide a SparkSQL storage interface.
  • the SparkSQL storage instruction can be generated and sent, and the SparkSQL storage instruction is used to store data.
  • the connector can provide SparkSQL storage interface and Phoenix storage interface. Based on the SparkSQL storage interface, the connector can parse the contents of the SparkSQL storage instruction. Based on the Phoenix storage interface, the connector can generate and send Phoenix storage instructions based on the content parsed from the SparkSQL storage instruction. The Phoenix store instruction is used to store data.
  • the memory can provide a Phoenix memory interface, based on which the memory can parse the contents of the Phoenix store instructions and store the relevant data based on the content.
  • the querier can provide a SparkSQL delete interface. Based on the SparkSQL delete interface, the SparkSQL delete command can be generated and sent.
  • the SparkSQL delete command is used to delete the index table.
  • the connector can provide the SparkSQL delete interface and the Phoenix delete interface. Based on the SparkSQL delete interface, the connector can parse the contents of the SparkSQL delete command. Based on the Phoenix delete interface, the connector can generate and send a Phoenix delete command according to the content parsed from the SparkSQL delete command.
  • the Phoenix delete instruction is used to delete the index table.
  • the memory can provide a Phoenix delete interface. Based on the Phoenix delete interface, the memory can parse the contents of the Phoenix delete instruction and delete the index table according to the content.
  • the querier can provide the SparkSQL interface. Based on the SparkSQL interface, the querier can generate and send SparkSQL get instructions.
  • the SparkSQL get command is used to obtain the index table and can process SparkSQL response commands for SparkSQL. .
  • the connector can provide a SparkSQL acquisition interface and a Phoenix acquisition interface. Based on the SparkSQL acquisition interface, the connector can parse the content of the SparkSQL acquisition instruction. Based on the Phoenix acquisition interface, the connector can generate and send a Phoenix acquisition instruction according to the content obtained by the SparkSQL acquisition instruction. In addition, based on the Phoenix acquisition interface, the connector can also parse the contents of the Phoenix response instruction for the Phoenix acquisition instruction. Based on the SparkSQL acquisition interface, the connector can generate and send a SparkSQL response instruction based on the content parsed from the Phoenix response instruction.
  • the memory can provide a Phoenix acquisition interface. Based on the Phoenix acquisition interface, the memory can parse the contents of the Phoenix acquisition instruction, obtain an index table, and generate and send a Phoenix response instruction.
  • step 31 the querier sends a SparkSQL creation instruction (for creating an index table) to the connector.
  • the SparkSQL creation instruction includes the table identifier of the HBase table and the metadata of the HBase table. For example, if an index table is created for the HBase table shown in Table 1, and the metadata "mobile phone number" is used as an index of the index table, the SparkSQL creation instruction may include the table identifier of the HBase table and the metadata "mobile phone number”.
  • Step 32 After receiving the SparkSQL creation instruction, the connector generates a Phoenix creation instruction according to the SparkSQL creation instruction, and sends the Phoenix creation instruction to the memory.
  • the connector may parse the table identifier of the HBase table and the metadata of the HBase table from the SparkSQL creation instruction, and generate a Phoenix creation instruction according to the table identifier and the metadata, that is, the Phoenix creation instruction includes a table of the HBase table. Identification, metadata of the HBase table (such as "mobile phone number").
  • Step 33 After receiving the Phoenix creation instruction, the memory obtains the attribute value corresponding to the metadata and the row key corresponding to the attribute value from the HBase table corresponding to the table identifier, and creates a corresponding row identifier and the metadata corresponding to the table identifier. Indexing the table, and recording the correspondence between the attribute value and the row key in the index table.
  • the memory may obtain the attribute value 18611111111 corresponding to the metadata “mobile phone number” from the table 1, the row key 001 corresponding to the attribute value, and record the correspondence between the attribute value 18611111111 and the row key 001 in the index table.
  • an index table as shown in Table 2 is obtained.
  • the table identifier of the HBase table shown in Table 1 is A
  • the table identifier A of the HBase table and the metadata “mobile phone number” may be sent to the connector.
  • the connector records the correspondence between the table identifier and the metadata in a local mapping table, and the correspondence indicates that an index table corresponding to the table identifier and the metadata exists in the memory.
  • the creation of the index table may be completed. Further, the data query may be completed based on the index table.
  • step 41 the querier sends a SparkSQL query instruction (for querying data) to the connector.
  • the SparkSQL query instruction includes a table identifier of the HBase table, metadata, and attribute values corresponding to the metadata. For example, if the data corresponding to "18611111111" is queried from the HBase table shown in Table 1, the SparkSQL query command may include the table identifier A of the HBase table, the metadata "mobile phone number”, and the attribute value corresponding to the "mobile phone number”. 18611111111".
  • Step 42 After receiving the SparkSQL query instruction, if the index table corresponding to the table identifier and the metadata exists in the memory, the first query command is generated and sent to the memory. If there is no index table corresponding to the table identifier and the metadata in the memory, a second Phoenix query instruction is generated and sent to the memory.
  • the connector After receiving the SparkSQL query instruction, the connector parses the table identifier, the metadata, and the attribute value corresponding to the metadata of the HBase table from the SparkSQL query instruction. Then, the connector can determine whether there is an index table corresponding to the table identifier and the metadata in the memory.
  • the first Phoenix query instruction includes an HBase table identifier, metadata, and attribute values corresponding to the metadata.
  • the second Phoenix query instruction includes an HBase table identifier and an attribute value corresponding to the metadata.
  • the second Phoenix query instruction includes an HBase table identifier, metadata, and attribute values corresponding to the metadata.
  • the process for determining, by the connector, whether the memory has an index table corresponding to the table identifier and the metadata may include:
  • the connector may query the mapping table and the metadata in the SparkSQL query, and then query whether the mapping between the table identifier and the metadata exists in the mapping table. If so, it can be determined that the memory has an index table corresponding to the table identifier and the metadata; if not, it can be determined that the index table does not exist in the memory.
  • Method 2 After parsing the table identifier and the metadata from the SparkSQL query instruction, the connector sends a Phoenix management instruction to the memory, where the Phoenix management instruction includes the table identifier and the metadata. After receiving the Phoenix management instruction, the memory queries whether there is an index table corresponding to the table identifier and the metadata, and sends a Phoenix response instruction including the query result to the connector. The connector may determine, according to the query result, whether the memory has an index table corresponding to the table identifier and the metadata.
  • Step 43 After receiving the first Phoenix query instruction, the memory obtains a row key corresponding to the attribute value from the index table corresponding to the table identifier and the metadata, and obtains the HBase table corresponding to the table identifier. A row of data corresponding to the row key (ie, row data).
  • the memory may parse the table identifier A, the metadata "mobile phone number”, and the attribute value "18611111111” from the first Phoenix query instruction.
  • the index table corresponding to the table identifier A and the metadata "mobile phone number” is as shown in Table 2.
  • the memory can obtain the row key 001 corresponding to the attribute value "18611111111” from the table 2; the HBase table corresponding to the table identifier A is as shown in Table 1.
  • the memory can obtain the row data corresponding to the row key 001 from Table 1, including "001, Zhang San, Beijing, 28, 18611111111".
  • the memory may parse the table identifier A and the attribute value “Zhang San” from the second Phoenix query instruction, and the HBase table corresponding to the table identifier A is as shown in Table 1. It is shown that the full table scan is performed on Table 1, and the row data corresponding to "Zhang San” is obtained, including "001, Zhang San, Beijing, 28, 18611111111".
  • the Phoenix query instruction when the connector generates the Phoenix query instruction, the Phoenix query instruction may further include a specific flag for indicating that the Phoenix query instruction is the first Phoenix query instruction or the second Phoenix query instruction.
  • the memory When receiving the first Phoenix query instruction, the memory may first query the index table and then query the HBase table; when receiving the second Phoenix query instruction, the memory may directly query the HBase table.
  • Step 44 After obtaining the row data, the memory returns the row data to the connector.
  • Step 45 After receiving the row data, the connector returns the row data to the querier.
  • the memory may generate a Phoenix response instruction for the first Phoenix query instruction or the second Phoenix query instruction, and send the Phoenix response instruction to the connector, the Phoenix response instruction including the queried row data.
  • the connector parses the row data of the query from the Phoenix response instruction, generates a SparkSQL response instruction for the SparkSQL query instruction, and sends the SparkSQL response instruction to the querier, and the querier from the SparkSQL response instruction Parse out the row data.
  • the data query can be completed.
  • data storage can also be completed based on the above index table, and the data storage process will be described below in conjunction with the flowchart shown in FIG. 5.
  • step 51 the querier sends a SparkSQL store instruction (for storing data) to the connector.
  • the SparkSQL storage instruction includes a table identifier of the HBase table, at least one metadata of the HBase table, an attribute value corresponding to each metadata, and a row key.
  • a table identifier of the HBase table For example, if you need to store the following data in the HBase table shown in Table 1: row key "005", name “Han Qi”, address “Shenzhen”, age “50”, mobile phone number "18655555555”, the SparkSQL storage command can The following contents are included: the table identifier A of the HBase table, the correspondence between the metadata "name” and the attribute value "Han-Seven", the correspondence between the metadata "address” and the attribute value "Shenzhen", the metadata "age” and the attribute value Correspondence relationship of "50”, correspondence between metadata "mobile phone number” and attribute value "18655555555", and row key "005".
  • Step 52 After receiving the SparkSQL storage instruction, the connector generates a first Phoenix storage instruction and/or a second Phoenix storage instruction according to the SparkSQL storage instruction, and sends the first Phoenix storage instruction and/or the second Phoenix storage instruction. Give memory.
  • the connector After receiving the SparkSQL storage instruction, the connector parses the table identifier of the HBase table, at least one metadata of the HBase table, attribute values corresponding to each metadata, and row keys from the SparkSQL storage instruction.
  • the first Phoenix store instruction includes the table identifier, the metadata, an attribute value corresponding to the metadata, and a row key.
  • the connector determines whether the memory has an index table corresponding to the table identifier and the metadata. For the specific judgment manner, refer to step 42, and details are not described herein.
  • the connector may generate a first Phoenix storage instruction, where the first Phoenix storage instruction includes the table identifier A, the metadata “mobile phone number”, and the attribute value. "18655555555", line key 005.
  • the connector may further generate a second Phoenix storage instruction, where the second Phoenix storage instruction includes the table identifier, the at least one metadata, and an attribute value corresponding to each metadata.
  • the line key After receiving the SparkSQL storage instruction, the connector may further generate a second Phoenix storage instruction, where the second Phoenix storage instruction includes the table identifier, the at least one metadata, and an attribute value corresponding to each metadata.
  • the line key After receiving the SparkSQL storage instruction, the connector may further generate a second Phoenix storage instruction, where the second Phoenix storage instruction includes the table
  • the first Phoenix store instruction may instruct the memory to store data in the index table
  • the second Phoenix store instruction may instruct the memory to store data in the HBase table.
  • the Phoenix store instruction may further include a specific flag for indicating that the Phoenix store instruction is the first Phoenix store instruction or the second Phoenix store instruction.
  • step 54 is performed; when the memory receives the second Phoenix store command, step 53 is performed.
  • Step 53 After receiving the second Phoenix storage instruction, the memory records the row key and the attribute value corresponding to each metadata in the HBase table corresponding to the table identifier.
  • the memory parses out the correspondence between the table identifier A of the HBase table, the correspondence between the metadata “name” and the attribute value “Han Qi”, the correspondence between the metadata “address” and the attribute value “Shenzhen” from the second Phoenix storage instruction,
  • the correspondence between the metadata "age” and the attribute value "50”, the correspondence between the metadata "mobile phone number” and the attribute value "18655555555”, and the row key "005" therefore, in the HBase table shown in Table 1, increase One line of data, such as the one shown in Table 3, is a row of data of 005.
  • the corresponding attribute value included in the second Phoenix storage instruction may be added to the corresponding position of the row key in the corresponding metadata of the HBase table; If the row key included in the second Phoenix storage instruction does not exist in the HBase table, a new data row may be added in the HBase table, and the row key and each metadata included in the second Phoenix storage instruction are recorded in the new data row. Property value.
  • Step 54 After receiving the first Phoenix storage instruction, the memory records the correspondence between the attribute value of the metadata and the row key in an index table corresponding to the table identifier and the metadata.
  • the memory parses the table identifier A, the metadata "mobile phone number”, the attribute value "18655555555”, and the row key 005 from the first Phoenix storage instruction. Therefore, the attribute value "18655555555” can be recorded in the index table shown in Table 2.
  • the index table shown in Table 4 is obtained by the correspondence with the row key "005".
  • the index table can also be deleted based on the index table.
  • Case 1 The querier sends a SparkSQL delete command to the connector, and the SparkSQL delete command includes a table identifier of the HBase table and metadata of the HBase table.
  • the connector After receiving the SparkSQL delete instruction, the connector generates a Phoenix delete instruction according to the SparkSQL delete instruction, and may send the Phoenix delete instruction to the memory, where the Phoenix delete instruction may include the table identifier and the metadata.
  • the memory parses the table identifier of the HBase table, the metadata of the HBase table from the Phoenix delete command, and deletes the index table corresponding to the table identifier and the metadata.
  • the memory can delete the table 4 according to the Phoenix delete command.
  • the querier sends a SparkSQL delete command to the connector.
  • the SparkSQL delete command may include a table identifier of the HBase table, metadata of the HBase table, and an attribute value corresponding to the metadata.
  • the connector may generate a Phoenix delete instruction according to the SparkSQL delete instruction, and may send the Phoenix delete instruction to the memory, where the Phoenix delete instruction may include the table identifier, the metadata, and the The attribute value corresponding to the metadata.
  • the memory may parse the table identifier of the HBase table, the metadata of the HBase table, and the attribute value corresponding to the metadata from the Phoenix delete command, and may identify from the table, corresponding to the metadata.
  • the index table delete the row data corresponding to the attribute value.
  • the Phoenix delete instruction includes a table identifier (such as Table A), metadata (such as a mobile phone number), and an attribute value (such as 18655555555), when the index table shown in Table 4 exists in the memory, the memory may be deleted according to the Phoenix delete instruction.
  • the corresponding data in Table 4 is deleted, and the deleted index table is as shown in Table 2.
  • the index table can also be obtained based on the index table.
  • Case 1 The querier sends a SparkSQL acquisition instruction to the connector, and the SparkSQL acquisition instruction includes a table identifier of the HBase table and metadata of the HBase table.
  • the connector After receiving the SparkSQL obtaining instruction, the connector generates a Phoenix obtaining instruction according to the SparkSQL obtaining instruction, and may send the Phoenix obtaining instruction to the memory, and the Phoenix obtaining instruction may include the table identifier and the metadata.
  • the memory After receiving the Phoenix acquisition instruction, the memory parses the table identifier of the HBase table, the metadata of the HBase table from the Phoenix acquisition instruction, and obtains the table identifier, the index table corresponding to the metadata, and then, through the connector The table identifier and the index table corresponding to the metadata are returned to the querier.
  • the memory may send a Phoenix response instruction to the connector for the Phoenix acquisition instruction, the Phoenix response instruction including the index table, and then The connector may send a SparkSQL response instruction to the querier for the SparkSQL fetch instruction, the SparkSQL response instruction including the index table.
  • the memory may return the table 4 to the querier according to the Phoenix acquisition instruction. .
  • Case 2 The querier sends a SparkSQL obtaining instruction to the connector, where the SparkSQL obtaining instruction may include a table identifier of the HBase table, metadata of the HBase table, and an attribute value corresponding to the metadata.
  • the connector may generate a Phoenix obtaining instruction according to the SparkSQL obtaining instruction, and may send the Phoenix obtaining instruction to the memory, where the Phoenix obtaining instruction may include the table identifier, the metadata, and the The attribute value corresponding to the metadata.
  • the memory may parse the table identifier of the HBase table, the metadata of the HBase table, the attribute value corresponding to the metadata, and may identify the metadata from the table. In the corresponding index table, the row data corresponding to the attribute value is obtained.
  • the memory then returns the row data to the querier through the connector.
  • the memory may send a Phoenix response instruction to the connector for the Phoenix acquisition instruction, the Phoenix response instruction may include the row data, and then the connector may send a SparkSQL response instruction to the querier for the SparkSQL acquisition instruction, the SparkSQL response instruction. This row of data can be included.
  • the Phoenix acquisition instruction includes a table identifier (such as Table A), metadata (such as a mobile phone number), and an attribute value (such as 18655555555), when the index table shown in Table 4 exists in the memory, the memory can obtain an instruction according to the Phoenix.
  • the corresponding row data in Table 4 (including "mobile phone number” "18655555555", “row key” "005") is returned to the querier.
  • the index table can be queried by the attribute value to obtain the row key of the HBase table, and then the HBase table is used.
  • the row key queries the HBase table to obtain the corresponding row data.
  • the memory can store multiple HBase tables. Taking two HBase tables as an example, an HBase table is called an HBase table.
  • the table identifier is the first table identifier.
  • Table 1 is an HBase table
  • the first table identifier is Table A.
  • Another HBase table is called the associated HBase table corresponding to the HBase table, and the table identifier of the associated HBase table is the second table identifier.
  • Table 5 the HBase table is an example of the associated HBase table
  • the second table identifier is Table B.
  • the HBase table and the associated HBase table need to have the same metadata, such as Table 1 and Table 5, when the HBase table is associated with the HBase table. , with the same metadata "phone number”.
  • the index in the index table needs to be the metadata, that is, when the index table is created by using the process shown in FIG. 3, the metadata included in the SparkSQL creation instruction may be It is a "mobile phone number”. In this way, the index table records the correspondence between the attribute value corresponding to the metadata "mobile phone number" and the row key.
  • Step 61 The querier sends a SparkSQL association query instruction to the connector, where the SparkSQL association query instruction includes the first table identifier of the HBase table and the second table identifier of the associated HBase table corresponding to the HBase table.
  • the SparkSQL association query instruction may include only the first table identifier (such as the table A) and the second table identifier (such as the table B), but does not include the metadata, the attribute value corresponding to the metadata, and the like.
  • Step 62 After receiving the SparkSQL association query instruction, the connector generates a first Phoenix associated query instruction according to the SparkSQL association query instruction, and sends the first Phoenix associated query instruction to the memory.
  • the connector may parse the first table identifier and the second table identifier from the SparkSQL association query instruction, and then the connector may generate a first Phoenix association query instruction including the second table identifier.
  • the connector can parse two table identifiers from the SparkSQL association query instruction. To distinguish the two table identifiers, the data amount of the HBase table corresponding to the two table identifiers can be obtained, and the data amount is large. The table identifier is determined as the first table identifier, and the table identifier with a small amount of data is determined as the second table identifier. After determining the second table identifier, the first Phoenix associated query instruction including the second table identifier can be generated.
  • the table identifier of the HBase table is the first table identifier
  • the table identifier of the associated HBase table is the second table.
  • the table identifier of the associated HBase table is the first table identifier
  • the processing flow is the same, and the generation is still included.
  • the first Phoenix associated query instruction identified by the second table will not be described again.
  • the table identifier of the party with the smaller amount of data in the HBase table and the associated HBase table is identified as the second table identifier.
  • the connector may further determine whether the data volume of the table corresponding to the second table identifier is less than a threshold, and if yes, generate a first Phoenix association query command including the identifier of the second table. Then perform the next steps. If no, the first Hello association query command is not generated, but the HBase table corresponding to the first table identifier and the associated HBase table corresponding to the second table identifier are directly obtained from the memory, and then the HBase table and the associated HBase table are sent. For the querier, this process will not be described again.
  • the connector can obtain the data amount of the HBase table corresponding to the table identifier from the memory, which is not limited.
  • Step 63 After receiving the first Phoenix association query instruction, the memory obtains all the first type of row data from the associated HBase table corresponding to the second table identifier, and returns all the first type of row data to the connector; Each first type of row data includes an attribute value corresponding to at least one metadata.
  • the memory may obtain all the first type of row data from the associated HBase table corresponding to the second table identifier “Table B”, that is, obtain all the contents of the table 5, and then the memory may all the first type of row data.
  • the returned content may include the first type of line data 1 (row key 011, mobile phone number 18611111111, ID number 100000000000000000), the first type of line data 2 (row key 012, mobile phone number 18622222222, ID number 200000000000000000), This type of push.
  • the memory can send a Phoenix response instruction to the first Phoenix associated query instruction to the connector, and the Phoenix response instruction can include all of the first type of row data described above.
  • Step 64 Perform, for each metadata in each first type of row data, if the first table identifier (the first table identifier included in the SparkSQL association query instruction) and the index table corresponding to the metadata exist, Then, the connector generates a second Phoenix associated query instruction, and sends a second Phoenix associated query instruction to the memory, where the second Phoenix associated query instruction includes the first table identifier, the metadata, and an attribute value corresponding to the metadata; After the second Phoenix association query instruction, the row key corresponding to the attribute value is obtained from the first table identifier and the index table corresponding to the metadata, and the row key is obtained from the HBase table corresponding to the first table identifier. Corresponding second type of row data, and returning the second type of row data to the connector; the connector associates the first type of row data with the second type of row data and returns the result to the querier.
  • the connector may no longer generate the second Phoenix associated query instruction, but continue to analyze other metadata.
  • the second Phoenix association is not generated.
  • Query instructions For the metadata "mobile phone number” of the first type of row data 1, since the memory has the index table corresponding to the first table identifier (table A) and the metadata "mobile phone number”, the connector generates a second Phoenix associated query instruction.
  • the second Phoenix associated query instruction is sent to the memory, and the second Phoenix associated query instruction includes a first table identifier (Table A), a metadata "mobile phone number”, and an attribute value "18611111111” corresponding to the metadata.
  • the index table corresponding to the first table identifier (Table A) and the metadata “mobile phone number” is as shown in Table 2, that is, the row corresponding to the attribute value “18611111111” is obtained from Table 2.
  • the key "001”, the HBase table corresponding to the first table identifier (Table A) is as shown in Table 1, that is, the second type of row data corresponding to the row key "001” is obtained from Table 1.
  • "001, Zhang San, Beijing, 28, 18611111111” then return the second type of row data 1 to the connector.
  • the connector associates the first type of row data 1 with the second type of row data 1, and returns the associated first type of row data 1 and the second type of row data 1 to the querier.
  • the connector completes the association between the first type of row data 1 and the second type of row data 1. Then, the connector can also be used for the first type of row data 2, the first type of row data 3, and the first type of row data 4 The association process is similar to the association process of the first type of row data 1 and will not be repeated here.
  • the connector needs to determine whether the first table identifier and the index table corresponding to the metadata are present in the memory. For the specific determination manner, refer to step 42 and no further details are provided herein.
  • the memory sends a Phoenix response instruction to the connector for the second Phoenix associated query instruction
  • the Phoenix response instruction may include the second Class line data, which returns the second type of row data to the connector.
  • the connector may send a SparkSQL response instruction to the querier for the SparkSQL associated query instruction.
  • the SparkSQL response instruction may include the first type of row data and the second type of row data, thereby returning the first type of row data and the second type of row data to the querier.
  • the connector may send multiple SparkSQL response instructions to the querier.
  • the SparkSQL response instruction 1 includes a first type of row data 1 and a second type of row data 1.
  • the SparkSQL response instruction 2 includes a first type of row data 2 and a second type of row. Data 2, and so on.
  • the connector may send a SparkSQL response instruction to the querier, the SparkSQL response instruction including an association relationship between the first type of row data 1 and the second type of row data 1, the first type of row data 2, and the second type of row data 2 Associations, and so on.
  • the querier can obtain the association relationship between the first type of row data and the second type of row data in the HBase table and the associated HBase table, and does not need the querier association, thereby saving the workload of the querier.
  • the first type of row data and the second type of row data may be associated by the connector, and the associated first type of row data and the second type of row data may be associated. Sent to the querier.
  • the HBase table and the associated HBase table can be avoided from being returned to the querier, and the querier associates the first type of row data in the HBase table with the second type of row data in the associated HBase table, thereby greatly improving the HBase table and association.
  • the connector may include a machine readable storage medium 601, a processor 602, and the processor readable storage medium 601, the processor 602 may communicate via a system bus 603.
  • the machine readable storage medium 601 can store machine executable instructions corresponding to operations performed by the connector in the above process.
  • the processor 602 can load and execute machine executable instructions to implement the operations of the connectors described above.
  • the operation of the connector may include: receiving a SparkSQL query instruction sent from the querier, where the SparkSQL query instruction includes a table identifier of the HBase table, metadata of the HBase table, and an attribute value corresponding to the metadata; Generating, by the memory, an index table corresponding to the table identifier and the metadata, generating a first Phoenix query instruction, and transmitting the first Phoenix query instruction to the memory, wherein the index table is used for recording Correspondence between the attribute value of the metadata of the HBase table and the row key of the HBase table, the first Phoenix query instruction including the table identifier, the metadata, and the attribute value, and causing the memory to be Obtaining, in the index table corresponding to the metadata, a row key corresponding to the attribute value, obtaining, from the
  • the machine executable instructions further cause the processor 602 to: receive a SparkSQL creation instruction sent from the querier, wherein the SparkSQL creation instruction includes a table identification of the HBase table and a meta of the HBase table Data; and generating a Phoenix creation instruction, and transmitting the Phoenix creation instruction to the memory, wherein the Phoenix creation instruction includes the table identification and the metadata, and causing the memory to be identified from the table Obtaining, in the corresponding HBase table, an attribute value corresponding to the metadata and a row key corresponding to the attribute value, creating an index table corresponding to the table identifier and the metadata, and in the index table Recording a correspondence between the attribute value and the row key.
  • the machine executable instructions further cause the processor 602 to: receive a SparkSQL store instruction sent from the querier, wherein the SparkSQL store instruction includes a table identifier of the HBase table, at least one element of the HBase table Data, an attribute value corresponding to each metadata, and a row key; and performing an operation for each metadata: if the memory has an index table corresponding to the table identifier and the metadata, generating a first Phoenix Storing instructions and transmitting the first Phoenix store instruction to the memory, wherein the first Phoenix store instruction includes the table identifier, the metadata, an attribute value corresponding to the metadata, and Determining a row key corresponding to the attribute value, and causing the memory to record a correspondence between the attribute value and the row key in an index table corresponding to the table identifier and the metadata.
  • the machine executable instructions further cause processor 602 to: generate a second Phoenix store instruction and send the second Phoenix store instruction to the memory, wherein the second Phoenix store instruction includes The table identifier, the at least one metadata, an attribute value corresponding to each of the metadata, and the row key, and causing the memory to record the row in an HBase table corresponding to the table identifier A correspondence between a key and an attribute value corresponding to each of the metadata.
  • the machine executable instructions further cause processor 602 to: receive a SparkSQL delete instruction sent from the querier; and generate a Phoenix delete instruction and send the Phoenix delete instruction to the memory, wherein
  • the Phoenix delete instruction is one of the following instructions: including a table identifier of the HBase table and metadata of the HBase table, and causing the memory to delete an instruction of the index table corresponding to the table identifier and the metadata; and including a table identifier of the HBase table, metadata of the HBase table, and attribute values corresponding to the metadata, and causing the memory to be deleted from the index table corresponding to the table identifier and the metadata to correspond to the attribute value
  • the instruction of the line data is one of the following instructions: including a table identifier of the HBase table and metadata of the HBase table, and causing the memory to delete an instruction of the index table corresponding to the table identifier and the metadata; and including a table identifier of the HBase table, metadata of the HBase table, and attribute
  • the machine executable instructions further cause the processor 602 to: receive a SparkSQL fetch instruction sent from the querier; generate a Phoenix fetch instruction, and send the Phoenix fetch instruction to the memory, wherein
  • the Phoenix acquisition instruction is one of the following instructions: including a table identifier of the HBase table and metadata of the HBase table, and causing the memory to return an index table corresponding to the table identifier and the metadata to the connector And a table identifier including an HBase table, metadata of the HBase table, and attribute values corresponding to the metadata, and causing the memory to be obtained from an index table corresponding to the table identifier and the metadata a row data corresponding to the attribute value, and returning the row data to an instruction of the connector; and after receiving the index table or the row data, the index table or the row data Return to the querier.
  • the machine executable instructions further cause the processor 602 to: receive a SparkSQL association query instruction sent from the querier, wherein the SparkSQL association query instruction includes a first table identifier of the HBase table and Generating a second table identifier of the associated HBase table corresponding to the HBase table; generating a first Phoenix associated query instruction, and sending the first Phoenix associated query instruction to the memory, where the first Phoenix associated query instruction includes Determining the second table identifier, and causing the memory to obtain all the first type of row data from the associated HBase table corresponding to the second table identifier, and return all the first type of row data to the connector, each Each of the first type of row data includes an attribute value corresponding to the at least one metadata; and performs an operation for each of the first type of line data if the memory exists and the first table identifier And generating, by the index table corresponding to the metadata, a second Phoenix associated query instruction, and sending the second Phoenix associated query instruction to the memory, The second Phoenix association query
  • machine-readable storage medium 601 can be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, and the like.
  • the machine-readable storage medium may be: RAM (Radom Access Memory), volatile memory, non-volatile memory, flash memory, storage drive (such as a hard disk drive), solid state drive, any type of storage disk. (such as a disc, dvd, etc.), or a similar storage medium, or a combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据查询方法、系统及连接器。该方法包括:查询器向连接器发送SparkSQL查询指令(41),其中,所述SparkSQL查询指令包括HBase表的表标识、所述HBase表的元数据以及与所述元数据对应的属性值;所述连接器在接收到所述SparkSQL查询指令后,若存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix查询指令,并将所述第一Phoenix查询指令发送给所述存储器(42),其中,所述索引表用于记录HBase表的元数据的属性值与HBase表的行键的对应关系,所述第一Phoenix查询指令包括所述表标识、所述元数据以及所述属性值;所述存储器在接收到所述第一Phoenix查询指令后,从与所述表标识和所述元数据对应的所述索引表中,获取与所述属性值对应的行键,从与所述表标识对应的HBase表中,获取与所述行键对应的行数据(43),并将所述行数据返回给所述连接器(44);以及所述连接器在接收到所述行数据后,将所述行数据返回给所述查询器(45)。

Description

数据查询
相关申请的交叉引用
本申请基于并要求2017年11月30日递交的中国专利申请201711235855.2的优先权,其所有内容通过引用包含于此。
背景技术
HBase数据库是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,用于提供对大型数据集的随机地、实时地读写访问服务。其中,该HBase数据库以数据表(本文中称为HBase表)的形式存储数据,HBase表可以由行和列族组成,如表1所示,为HBase表的示例。行键(RowKey)是索引,列族(ColumnFamily)可以由一个或者多个列组成,表1中姓名、地址、年龄、手机号、邮箱等为元数据,每个元数据对应多个属性值,例如姓名对应的属性值有张三、李四。
表1
Figure PCTCN2018118249-appb-000001
附图说明
图1是本公开一种实施方式中的数据查询系统的结构图;
图2是本公开一种实施方式中的数据查询系统的结构图;
图3是本公开一种实施方式中的索引表的创建方法的流程图;
图4是本公开一种实施方式中的数据查询方法的流程图;
图5是本公开一种实施方式中的数据存储方法的流程图;
图6是本公开另一种实施方式中的数据查询方法的流程图;
图7是本公开一种实施方式中的连接器的硬件结构的示意图。
具体实施方式
基于表1所示的HBase表结构,根据行键可以快速的检索到数据。例如,当接收到查询请求时,若查询请求包括的行键是001,则从HBase表中查询到的内容是第一行的数据,因此,返回第一行的数据。若查询请求未包括行键,而是包括某个元数据的属性值,如北京,则无法通过行键快速查询到对应的数据,而是需要对整个HBase表进行全表扫描, 以查询到包含“北京”的一行数据,查询性能很低。
为此,本公开实施例提出一种数据查询方法,可以应用于包括存储器、查询器和连接器的数据查询系统。查询器、存储器和连接器可以部署在同一个服务器,也可以部署在不同服务器。若查询器、存储器和连接器部署在同一个服务器,则查询器、存储器和连接器是该服务器的三个功能模块,例如,查询器可以是实现数据查询功能的SQL(Structured Query Language,结构化查询语言)引擎,存储器可以是实现数据存储功能的数据库,连接器可以是实现连接功能的中间件。若查询器、存储器和连接器部署在不同服务器,则查询器、存储器和连接器是三个独立服务器。
参见图1所示,为上述数据查询系统的结构示意图,其中:
在使用HBase数据库存储数据时,存储器可以采用Phoenix组件实现数据的存储,Phoenix组件用于提供对HBase数据库的SQL支持,这样,存储器在接收到SQL请求时,可以根据SQL请求对HBase数据库中的数据进行操作。
查询器可以采用SparkSQL引擎(一种基于Spark的分布式SQL引擎)实现数据的查询,该SparkSQL引擎可以开放多个接入外部数据源(DataSource)的接口,例如,JDBC(Java DataBase Connectivity,java数据库连接)、ODBC(Open Database Connectivity,开放数据库连接)、API(Application Programming Interface,应用程序编程接口)等接口。SparkSQL引擎还可以支持更多格式的数据源,例如,JSON(Java Script Object Notation,Java脚本对象标记)、Parquet(列式存储格式)、avro(数据序列化的系统)、CSV(Comma Separated Values,逗号分隔值文)等格式。
存储器采用Phoenix组件实现数据的存储,查询器采用SparkSQL引擎实现数据的查询,也就是说,查询器能够处理SparkSQL指令,而存储器能够处理Phoenix指令,因此,查询器发送SparkSQL指令后,若存储器接收到SparkSQL指令,将无法处理该SparkSQL指令;同理,存储器发送Phoenix指令后,若查询器接收到Phoenix指令,将无法处理该Phoenix指令。基于此,还可以在存储器与查询器之间部署连接器,连接器用于实现SparkSQL指令与Phoenix指令的转换。本公开实施例中,可以在存储器中创建索引表,该索引表用于记录HBase表中元数据的属性值与行键的对应关系(例如表1中,记录“北京”与“001”的对应关系)。这样,若Phoenix指令没有包括行键“001”,而是包括属性值“北京”,则存储器通过属性值“北京”查询索引表,得到行键“001”,然后,通过行键“001”查询表1所示的HBase表,得到第一行的数据。上述方式不需要对HBase表进行全表扫描,从而提高查询性能,并提升查询效率。
本公开实施例中,可以涉及如下过程之一或者任意组合:索引表创建过程,数据查询过程,数据存储过程,索引表删除过程和索引表获取过程。
参见图2所示,查询器可以提供SparkSQL创建接口,基于该SparkSQL创建接口,能够生成并发送SparkSQL创建指令,SparkSQL创建指令用于创建索引表。连接器可以提供 SparkSQL创建接口和Phoenix创建接口,基于SparkSQL创建接口,连接器能够解析SparkSQL创建指令的内容,基于该Phoenix创建接口,连接器能够根据从SparkSQL创建指令解析得到的内容生成并发送Phoenix创建指令,该Phoenix创建指令用于创建索引表。存储器可以提供Phoenix创建接口,基于该Phoenix创建接口,存储器能够解析Phoenix创建指令的内容,并根据该内容创建索引表。
参见图2所示,查询器可以提供SparkSQL查询接口,基于该SparkSQL查询接口,查询器能够生成并发送SparkSQL查询指令,SparkSQL查询指令用于查询数据,并能够处理针对SparkSQL查询指令的SparkSQL响应指令。
连接器可以提供SparkSQL查询接口和Phoenix查询接口。基于SparkSQL查询接口,连接器能够解析SparkSQL查询指令的内容,基于Phoenix查询接口,连接器能够根据从SparkSQL查询指令解析得到的内容生成并发送Phoenix查询指令。此外,基于Phoenix查询接口,连接器还能够解析针对Phoenix查询指令的Phoenix响应指令的内容,基于SparkSQL查询接口,连接器能够根据从Phoenix响应指令解析得到的内容生成并发送SparkSQL响应指令。
存储器可以提供Phoenix查询接口,基于Phoenix查询接口,存储器能够解析Phoenix查询指令的内容,并进行查询,根据查询结果生成并发送Phoenix响应指令。
参见图2所示,查询器可以提供SparkSQL存储接口,基于该SparkSQL存储接口,能够生成并发送SparkSQL存储指令,SparkSQL存储指令用于存储数据。连接器可以提供SparkSQL存储接口和Phoenix存储接口,基于SparkSQL存储接口,连接器能够解析SparkSQL存储指令的内容,基于Phoenix存储接口,连接器能够根据从SparkSQL存储指令解析得到的内容生成并发送Phoenix存储指令,该Phoenix存储指令用于存储数据。存储器可以提供Phoenix存储接口,基于该Phoenix存储接口,存储器能够解析Phoenix存储指令的内容,并根据该内容存储相关数据。
参见图2所示,查询器可以提供SparkSQL删除接口,基于该SparkSQL删除接口,能够生成并发送SparkSQL删除指令,SparkSQL删除指令用于删除索引表。连接器可以提供SparkSQL删除接口和Phoenix删除接口,基于SparkSQL删除接口,连接器能够解析SparkSQL删除指令的内容,基于Phoenix删除接口,连接器能够根据从SparkSQL删除指令解析得到的内容生成并发送Phoenix删除指令,该Phoenix删除指令用于删除索引表。存储器可以提供Phoenix删除接口,基于该Phoenix删除接口,存储器能够解析Phoenix删除指令的内容,并根据该内容删除索引表。
参见图2所示,查询器可以提供SparkSQL获取接口,基于该SparkSQL获取接口,查询器能够生成并发送SparkSQL获取指令,SparkSQL获取指令用于获取索引表,并能够处理针对SparkSQL获取指令的SparkSQL响应指令。
连接器可以提供SparkSQL获取接口和Phoenix获取接口。基于SparkSQL获取接口,连接器能够解析SparkSQL获取指令的内容,基于Phoenix获取接口,连接器能够根据从 SparkSQL获取指令解析得到的内容生成并发送Phoenix获取指令。此外,基于Phoenix获取接口,连接器还能够解析针对Phoenix获取指令的Phoenix响应指令的内容,基于SparkSQL获取接口,连接器能够根据从Phoenix响应指令解析得到的内容生成并发送SparkSQL响应指令。
存储器可以提供Phoenix获取接口,基于Phoenix获取接口,存储器能够解析Phoenix获取指令的内容,获取索引表,生成并发送Phoenix响应指令。
在上述应用场景下,结合图3所示的流程,对索引表创建过程进行说明。
步骤31,查询器向连接器发送SparkSQL创建指令(用于创建索引表)。
其中,SparkSQL创建指令包括HBase表的表标识、HBase表的元数据。例如,若为表1所示的HBase表创建索引表,且将元数据“手机号”作为索引表的索引,则该SparkSQL创建指令可以包括该HBase表的表标识、元数据“手机号”。
步骤32,连接器在接收到SparkSQL创建指令后,根据该SparkSQL创建指令生成Phoenix创建指令,并将该Phoenix创建指令发送给存储器。
具体的,连接器可以从SparkSQL创建指令中解析出HBase表的表标识、该HBase表的元数据,并根据该表标识、该元数据生成Phoenix创建指令,即该Phoenix创建指令包括HBase表的表标识、HBase表的元数据(如“手机号”)。
步骤33,存储器在接收到Phoenix创建指令后,从该表标识对应的HBase表中,获取该元数据对应的属性值、该属性值对应的行键,创建与该表标识、该元数据对应的索引表,并在该索引表中记录该属性值与该行键的对应关系。
例如,存储器可以从表1中,获取元数据“手机号”对应的属性值18611111111,该属性值对应的行键001,并在索引表中记录属性值18611111111与行键001的对应关系,以此类推,得到如表2所示的索引表。
表2
手机号 行键
18611111111 001
18622222222 002
18633333333 003
18644444444 004
在一个例子中,假设表1所示的HBase表的表标识为A,则存储器创建表2的索引表后,还可以将HBase表的表标识A、元数据“手机号”发送给连接器,连接器在本地的映射表中记录该表标识和该元数据的对应关系,该对应关系表示存储器中存在与该表标识和该元数据对应的索引表。
基于上述处理流程,可以完成索引表的创建,进一步的,还可以基于该索引表完成数据查询。
以下结合图4所示的流程,对数据查询过程进行说明。
步骤41,查询器向连接器发送SparkSQL查询指令(用于查询数据)。
其中,SparkSQL查询指令包括HBase表的表标识、元数据以及与该元数据对应的属性值。例如,若从表1所示的HBase表中查询“18611111111”对应的数据,则SparkSQL查询指令可以包括HBase表的表标识A、元数据“手机号”、与“手机号”对应的属性值“18611111111”。
步骤42,连接器在接收到该SparkSQL查询指令后,若存储器中存在与该表标识和该元数据对应的索引表,则生成第一Phoenix查询指令、并发送给存储器。若该存储器中不存在与该表标识和该元数据对应的索引表,则生成第二Phoenix查询指令、并发送给存储器。
连接器在接收到该SparkSQL查询指令后,从该SparkSQL查询指令中解析出HBase表的表标识、元数据以及与该元数据对应的属性值。然后,连接器可以判断存储器中是否存在与该表标识和该元数据对应的索引表。
第一Phoenix查询指令包括HBase表标识、元数据、以及与该元数据对应的属性值。
第二Phoenix查询指令包括HBase表标识以及与元数据对应的属性值。或者,第二Phoenix查询指令包括HBase表标识、元数据以及与元数据对应的属性值。
其中,针对“连接器判断存储器是否存在与该表标识和该元数据对应的索引表”的过程,可以包括:
方式一、若连接器本地维护有上述映射表,则:连接器在从SparkSQL查询指令中解析出表标识和元数据后,可以查询映射表中是否存在该表标识和该元数据的对应关系,如果是,则可以确定存储器存在与该表标识和该元数据对应的索引表;如果否,则可以确定存储器不存在该索引表。
方式二、连接器在从SparkSQL查询指令中解析出表标识和元数据后,向存储器发送Phoenix管理指令,该Phoenix管理指令包括该表标识和该元数据。存储器在接收到该Phoenix管理指令后,查询本地是否存在与该表标识和该元数据对应的索引表,并向连接器发送包括查询结果的Phoenix响应指令。连接器可以根据该查询结果,确定存储器是否存在与该表标识、该元数据对应的索引表。
步骤43,存储器在接收到上述第一Phoenix查询指令后,从与表标识和元数据对应的索引表中,获取与属性值对应的行键,并从该表标识对应的HBase表中,获取该行键对应的一行数据(即行数据)。
例如,存储器可以从第一Phoenix查询指令中解析出表标识A、元数据“手机号”、属性值“18611111111”。与表标识A和元数据“手机号”对应的索引表如表2所示,存储器可以从表2中获取属性值“18611111111”对应的行键001;表标识A对应的HBase表如表1所示,存储器可以从表1中获取行键001对应的行数据,包括“001、张三、北京、28、18611111111”。
在另一个例子中,存储器在接收到第二Phoenix查询指令后,可以从第二Phoenix查询 指令中解析出表标识A、属性值“张三”,该表标识A对应的HBase表如表1所示,对表1进行全表扫描,得到“张三”对应的行数据,包括“001、张三、北京、28、18611111111”。
其中,连接器在生成Phoenix查询指令时,该Phoenix查询指令还可以包括特定标记,用于表示Phoenix查询指令为第一Phoenix查询指令或第二Phoenix查询指令。存储器在接收到第一Phoenix查询指令时,可以先查询索引表,然后查询HBase表;存储器在接收到第二Phoenix查询指令时,可以直接查询HBase表。
步骤44,存储器在获取到行数据后,将该行数据返回给连接器。
步骤45,连接器在接收到该行数据后,将该行数据返回给查询器。
存储器可以生成针对第一Phoenix查询指令或第二Phoenix查询指令的Phoenix响应指令,将Phoenix响应指令发送给连接器,该Phoenix响应指令包括查询到的行数据。连接器接收到Phoenix响应指令后,从Phoenix响应指令中解析出该查询到的行数据,生成针对SparkSQL查询指令的SparkSQL响应指令,并将SparkSQL响应指令发送给查询器,查询器从SparkSQL响应指令中解析出该行数据。
基于上述处理流程,可以完成数据的查询。
此外,还可以基于上述索引表完成数据存储,以下结合图5所示的流程图,对数据存储过程进行说明。
步骤51,查询器向连接器发送SparkSQL存储指令(用于存储数据)。
其中,SparkSQL存储指令包括HBase表的表标识、HBase表的至少一个元数据、每个元数据对应的属性值、以及行键。例如,若需要在表1所示的HBase表中存储如下数据:行键“005”、姓名“韩七”、地址“深圳”、年龄“50”、手机号“18655555555”,则SparkSQL存储指令可以包括如下内容:HBase表的表标识A、元数据“姓名”和属性值“韩七”的对应关系、元数据“地址”和属性值“深圳”的对应关系、元数据“年龄”和属性值“50”的对应关系、元数据“手机号”和属性值“18655555555”的对应关系、以及行键“005”。
步骤52,连接器在接收到该SparkSQL存储指令后,根据该SparkSQL存储指令生成第一Phoenix存储指令和/或第二Phoenix存储指令,并将第一Phoenix存储指令和/或第二Phoenix存储指令发送给存储器。
连接器在接收到该SparkSQL存储指令后,从该SparkSQL存储指令中解析出HBase表的表标识、HBase表的至少一个元数据、每个元数据对应的属性值、以及行键。
针对每个元数据执行如下操作:若存储器存在与该表标识和该元数据对应的索引表,则根据该SparkSQL存储指令生成第一Phoenix存储指令,并发送给存储器;否则不需要生成第一Phoenix存储指令。第一Phoenix存储指令包括所述表标识、所述元数据、所述元数据对应的属性值、以及行键。
此外,针对SparkSQL存储指令中的每个元数据,连接器判断存储器是否存在与该表标识和该元数据对应的索引表,具体判断方式参见步骤42,不再赘述。
例如,针对元数据“姓名”、“地址”、“年龄”,判断结果为存储器不存在对应的索引表, 结束流程。针对元数据“手机号”,判断结果为存储器存在对应的索引表,则连接器可以生成第一Phoenix存储指令,该第一Phoenix存储指令包括该表标识A、元数据“手机号”、属性值“18655555555”、行键005。
连接器在接收到该SparkSQL存储指令后,还可以生成第二Phoenix存储指令,第二Phoenix存储指令包括所述表标识、所述至少一个元数据、所述每个元数据对应的属性值、所述行键。
其中,第一Phoenix存储指令可以指示存储器在索引表中存储数据,第二Phoenix存储指令可以指示存储器在HBase表中存储数据。
连接器在生成Phoenix存储指令时,该Phoenix存储指令还可以包括特定标记,用于表示Phoenix存储指令为第一Phoenix存储指令或第二Phoenix存储指令。存储器收到第一Phoenix存储指令时,执行步骤54;存储器收到第二Phoenix存储指令时,执行步骤53。
步骤53,存储器在接收到第二Phoenix存储指令后,在该表标识对应的HBase表中,记录该行键以及每个元数据对应的属性值。
例如,存储器从第二Phoenix存储指令中解析出HBase表的表标识A、元数据“姓名”和属性值“韩七”的对应关系、元数据“地址”和属性值“深圳”的对应关系、元数据“年龄”和属性值“50”的对应关系、元数据“手机号”和属性值“18655555555”的对应关系、行键“005”,因此,在表1所示的HBase表中,增加一行数据,如表3中所示的行键为005的一行数据。
表3
Figure PCTCN2018118249-appb-000002
若该HBase表中已经存在第二Phoenix存储指令包括的行键,则可以在该HBase表的该对应的各个元数据中行键对应的位置,添加第二Phoenix存储指令包括的相应的属性值;若该HBase表中不存在第二Phoenix存储指令包括的行键,则可以在HBase表添加新的数据行,并在该新的数据行记录该第二Phoenix存储指令包括的行键和各个元数据的属性值。
步骤54,存储器在接收到第一Phoenix存储指令后,在与该表标识和该元数据对应的索引表中,记录该元数据的属性值与该行键的对应关系。
例如,存储器从第一Phoenix存储指令解析出表标识A、元数据“手机号”、属性值“18655555555”、行键005,因此,可以在表2所示的索引表中记录属性值“18655555555”与行键“005”的对应关系,得到表4所示的索引表。
表4
手机号 行键
18611111111 001
18622222222 002
18633333333 003
18644444444 004
18655555555 005
基于索引表还可以实现索引表的删除,以下对索引表删除过程进行说明。
情况一、查询器向连接器发送SparkSQL删除指令,该SparkSQL删除指令包括HBase表的表标识、该HBase表的元数据。连接器在接收到SparkSQL删除指令后,根据该SparkSQL删除指令生成Phoenix删除指令,并可以将该Phoenix删除指令发送给存储器,该Phoenix删除指令可以包括该表标识和该元数据。存储器在接收到Phoenix删除指令后,从Phoenix删除指令中解析出HBase表的表标识、该HBase表的元数据,并删除与该表标识和该元数据对应的索引表。
例如,Phoenix删除指令中包括表标识A、元数据“手机号”,则存储器中存在表4所示的索引表时,存储器可以根据该Phoenix删除指令将表4删除。
情况二、查询器向连接器发送SparkSQL删除指令,该SparkSQL删除指令可以包括HBase表的表标识、该HBase表的元数据、与该元数据对应的属性值。
连接器在接收到该SparkSQL删除指令后,可以根据该SparkSQL删除指令生成Phoenix删除指令,并可以将该Phoenix删除指令发送给存储器,其中,该Phoenix删除指令可以包括该表标识、该元数据和该元数据对应的属性值。
存储器在接收到该Phoenix删除指令后,可以从Phoenix删除指令中解析出HBase表的表标识、该HBase表的元数据、该元数据对应的属性值,并可以从该表标识、该元数据对应的索引表中,删除该属性值对应的行数据。
例如,Phoenix删除指令中包括表标识(如表A)、元数据(如手机号)、属性值(如18655555555),则存储器中存在表4所示的索引表时,存储器可以根据该Phoenix删除指令将表4中相应的数据删除,删除后的索引表如表2所示。
基于索引表还可以实现索引表的获取,以下对索引表获取过程进行说明。
情况一、查询器向连接器发送SparkSQL获取指令,该SparkSQL获取指令包括HBase表的表标识、该HBase表的元数据。连接器在接收到SparkSQL获取指令后,根据该SparkSQL获取指令生成Phoenix获取指令,并可以将该Phoenix获取指令发送给存储器,该Phoenix获取指令可以包括该表标识和该元数据。
存储器在接收到Phoenix获取指令后,从Phoenix获取指令中解析出HBase表的表标识、该HBase表的元数据,并获取该表标识、该元数据对应的索引表,然后,可以通过连接器将该表标识、该元数据对应的索引表返回给查询器。
针对“通过连接器将该表标识、该元数据对应的索引表返回给查询器”的过程,存储器可以向连接器发送针对Phoenix获取指令的Phoenix响应指令,该Phoenix响应指令包括该索引表,然后,连接器可以向查询器发送针对SparkSQL获取指令的SparkSQL响应指令,该SparkSQL响应指令包括该索引表。
例如,Phoenix获取指令中包括表标识(如表A)、元数据(如手机号),则存储器中存在表4所示的索引表时,存储器可以根据该Phoenix获取指令将表4返回给查询器。
情况二、查询器向连接器发送SparkSQL获取指令,该SparkSQL获取指令可以包括HBase表的表标识、该HBase表的元数据、与该元数据对应的属性值。
连接器在接收到该SparkSQL获取指令后,可以根据该SparkSQL获取指令生成Phoenix获取指令,并可以将该Phoenix获取指令发送给存储器,其中,该Phoenix获取指令可以包括该表标识、该元数据和该元数据对应的属性值。
存储器在接收到该Phoenix获取指令后,可以从该Phoenix获取指令中解析出HBase表的表标识、该HBase表的元数据、该元数据对应的属性值,并可以从该表标识、该元数据对应的索引表中,获取该属性值对应的行数据。
然后,存储器通过连接器将该行数据返回给查询器。具体的,存储器可以向连接器发送针对Phoenix获取指令的Phoenix响应指令,该Phoenix响应指令可以包括该行数据,然后,连接器可以向查询器发送针对SparkSQL获取指令的SparkSQL响应指令,该SparkSQL响应指令可以包括该行数据。
例如,Phoenix获取指令中包括表标识(如表A)、元数据(如手机号)、属性值(如18655555555),则存储器中存在表4所示的索引表时,存储器可以根据该Phoenix获取指令将表4中相应的行数据(包括“手机号”“18655555555”、“行键”“005”)返回给查询器。
基于上述技术方案,本公开实施例中,通过创建索引表,并在索引表中记录HBase表的元数据的属性值、HBase表的行键的对应关系,这样,在接收到Phoenix查询指令后,即使在Phoenix查询指令中未包括HBase表的行键,而是包括HBase表的元数据对应的属性值,则可以通过该属性值查询索引表,得到该HBase表的行键,然后使用HBase表的行键查询该HBase表,从而得到对应的行数据。上述方式不用对整个HBase表进行全表扫描,可以提升查询性能。
在实际应用中,为了避免一个HBase表的内容太多,导致查询复杂,则存储器可以存储多个HBase表,以两个HBase表为例,为区分方便,将一个HBase表称为HBase表,其表标识为第一表标识,如表1为HBase表,第一表标识为表A。另一个HBase表称为该HBase表对应的关联HBase表,关联HBase表的表标识为第二表标识,如表5所示,为关联HBase表的示例,第二表标识为表B。
表5
Figure PCTCN2018118249-appb-000003
Figure PCTCN2018118249-appb-000004
本应用场景下,在建立HBase表、关联HBase表时,为了将HBase表和关联HBase表中的数据进行关联,则HBase表和关联HBase表需要具有相同的元数据,如表1和表5中,具有相同的元数据“手机号”。而且,在为表1所示的HBase表创建索引表时,该索引表中的索引需要是该元数据,即在采用图3所示的流程创建索引表时,SparkSQL创建指令包括的元数据可以为“手机号”,这样,索引表记录的是元数据“手机号”对应的属性值与行键的对应关系。
在上述应用场景下,结合图6所示的流程图,对数据查询过程进行说明。
步骤61,查询器向连接器发送SparkSQL关联查询指令,该SparkSQL关联查询指令包括HBase表的第一表标识、该HBase表对应的关联HBase表的第二表标识。
其中,在该SparkSQL关联查询指令中,可以只包括第一表标识(如表A)和第二表标识(如表B),而未包括元数据、该元数据对应的属性值等内容。
步骤62,连接器在接收到SparkSQL关联查询指令后,根据SparkSQL关联查询指令生成第一Phoenix关联查询指令,并将第一Phoenix关联查询指令发送给存储器。
其中,连接器可以从SparkSQL关联查询指令中解析出第一表标识和第二表标识,然后,连接器可以生成包括该第二表标识的第一Phoenix关联查询指令。
在一个例子中,连接器可以从SparkSQL关联查询指令中解析出两个表标识,为了区分这两个表标识,则可以获取这两个表标识对应的HBase表的数据量,将数据量大的表标识确定为第一表标识,将数据量小的表标识确定为第二表标识。在确定出第二表标识后,就可以生成包括第二表标识的第一Phoenix关联查询指令。
通常情况下,在建立HBase表、关联HBase表时,HBase表的数据量大于关联HBase表的数据量,因此,HBase表的表标识为第一表标识,关联HBase表的表标识为第二表标识。反之,若HBase表的数据量小于关联HBase表的数据量,则HBase表的表标识为第二表标识,关联HBase表的表标识为第一表标识,其处理流程相同,仍然是生成包括第二表标识的第一Phoenix关联查询指令,不再赘述。
总之,是将HBase表和关联HBase表中的数据量较小的一方的表标识作为第二表标识。
在一个例子中,连接器确定出第二表标识后,还可以判断第二表标识对应的表的数据量是否小于阈值,如果是,才生成包括第二表标识的第一Phoenix关联查询指令,然后执行后续步骤。如果否,则不再生成第一Phoenix关联查询指令,而是直接从存储器获取该第一表标识对应的HBase表、该第二表标识对应的关联HBase表,然后将HBase表和关联HBase表发送给查询器,对此过程不再赘述。
在一个例子中,针对“连接器获取表标识对应的HBase表的数据量”的过程,连接器可 以从存储器获取表标识对应的HBase表的数据量,对此不做限制。
步骤63,存储器在接收到第一Phoenix关联查询指令后,从该第二表标识对应的关联HBase表中,获取所有第一类行数据,并将所有第一类行数据返回给连接器;其中,每个第一类行数据均包括至少一个元数据对应的属性值。
例如,存储器可以从第二表标识“表B”对应的关联HBase表中,获取到所有第一类行数据,即获取表5的所有内容,然后,存储器可以将所有的第一类行数据均返回给连接器。例如,返回的内容可以包括第一类行数据1(行键011、手机号18611111111、身份证号100000000000000000)、第一类行数据2(行键012、手机号18622222222、身份证号200000000000000000),以此类推。
在一个例子中,存储器可以向连接器发送针对所述第一Phoenix关联查询指令的Phoenix响应指令,且该Phoenix响应指令可以包括上述所有第一类行数据。
步骤64,针对每个第一类行数据中的每个元数据执行如下操作:若存储器存在该第一表标识(SparkSQL关联查询指令包括的第一表标识)、该元数据对应的索引表,则连接器生成第二Phoenix关联查询指令,将第二Phoenix关联查询指令发送给存储器,第二Phoenix关联查询指令包括该第一表标识、该元数据、该元数据对应的属性值;存储器接收到第二Phoenix关联查询指令后,从该第一表标识、该元数据对应的索引表中,获取该属性值对应的行键,并从该第一表标识对应的HBase表中,获取该行键对应的第二类行数据,并将该第二类行数据返回给连接器;连接器将该第一类行数据和该第二类行数据进行关联后返回给查询器。
此外,若存储器不存在该第一表标识、该元数据对应的索引表,则连接器可以不再生成第二Phoenix关联查询指令,而是继续进行其它元数据的分析。
例如,针对第一类行数据1的元数据“身份证号”,由于存储器不存在第一表标识(表A)、元数据“身份证号”对应的索引表,因此不生成第二Phoenix关联查询指令。针对第一类行数据1的元数据“手机号”,由于存储器存在第一表标识(表A)、元数据“手机号”对应的索引表,因此,连接器生成第二Phoenix关联查询指令,将第二Phoenix关联查询指令发送给存储器,第二Phoenix关联查询指令包括第一表标识(表A)、元数据“手机号”、该元数据对应的属性值“18611111111”。
存储器接收到第二Phoenix关联查询指令后,第一表标识(表A)、元数据“手机号”对应的索引表如表2所示,即从表2中获取属性值“18611111111”对应的行键“001”,第一表标识(表A)对应的HBase表如表1所示,即从表1中获取行键“001”对应的第二类行数据1“001、张三、北京、28、18611111111”,然后将第二类行数据1返回给连接器。连接器将第一类行数据1和第二类行数据1进行关联,并将关联后的第一类行数据1和第二类行数据1返回给查询器。
经过上述处理,连接器完成第一类行数据1与第二类行数据1的关联,然后,连接器还可以对第一类行数据2、第一类行数据3、第一类行数据4进行关联,其关联过程与第一 类行数据1的关联过程类似,在此不再重复赘述。
在上述实施例中,连接器需要判断存储器是否存在该第一表标识、该元数据对应的索引表,具体判断方式可以参见步骤42所示,在此不再赘述。
在上述实施例中,针对“存储器将第二类行数据返回给连接器”的过程,存储器向连接器发送针对第二Phoenix关联查询指令的Phoenix响应指令,该Phoenix响应指令可以包括所述第二类行数据,从而将第二类行数据返回给连接器。
在上述实施例中,针对“连接器将第一类行数据和第二类行数据进行关联后,返回给查询器”的过程,连接器可以向查询器发送针对SparkSQL关联查询指令的SparkSQL响应指令,该SparkSQL响应指令可以包括该第一类行数据和该第二类行数据,从而将该第一类行数据和该第二类行数据返回给查询器。
例如,连接器可以向查询器发送多个SparkSQL响应指令,SparkSQL响应指令1包括第一类行数据1和第二类行数据1,SparkSQL响应指令2包括第一类行数据2和第二类行数据2,以此类推。或者,连接器可以向查询器发送一个SparkSQL响应指令,该SparkSQL响应指令包括第一类行数据1和第二类行数据1的关联关系、第一类行数据2和第二类行数据2的关联关系,以此类推。
经过上述处理,查询器就可以得到HBase表和关联HBase表中的第一类行数据和第二类行数据的关联关系,不需要查询器关联,节省查询器的工作量。
基于上述技术方案,当关联HBase表的数据量较小时,可以由连接器对第一类行数据和第二类行数据进行关联,并将关联后的第一类行数据和第二类行数据发送给查询器。这样,可以避免将HBase表和关联HBase表返回给查询器,由查询器对HBase表中的第一类行数据与关联HBase表中的第二类行数据进行关联,大幅度提升HBase表和关联HBase表的查询性能,减少IO操作。
基于与上述方法同样的申请构思,本公开实施例中还提出一种连接器,其硬件结构可以参见图7所示。其中,该连接器可以包括机器可读存储介质601、处理器602,该机器可读存储介质601、处理器602可经由系统总线603通信。
其中,机器可读存储介质601可以存储连接器在上述流程中所执行的操作对应的机器可执行指令。处理器602可以加载并执行机器可执行指令,以实现上述连接器的操作。该连接器的操作可以包括:接收从查询器发送的SparkSQL查询指令,其中,所述SparkSQL查询指令包括HBase表的表标识、所述HBase表的元数据以及与所述元数据对应的属性值;若存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix查询指令,并将所述第一Phoenix查询指令发送给所述存储器,其中,所述索引表用于记录HBase表的元数据的属性值与HBase表的行键的对应关系,所述第一Phoenix查询指令包括所述表标识、所述元数据以及所述属性值,并使得所述存储器从与所述表标识和所述元数据对应的索引表中,获取与所述属性值对应的行键,从与所述表标识对应的HBase表中,获取与所述行键对应的行数据,并将所述行数据返回给所述连接器;以及将接收到的所述行 数据返回给所述查询器。
在一个例子中,机器可执行指令还促使处理器602执行如下操作:接收从所述查询器发送的SparkSQL创建指令,其中,所述SparkSQL创建指令包括HBase表的表标识以及所述HBase表的元数据;以及生成Phoenix创建指令,并将所述Phoenix创建指令发送给所述存储器,其中,所述Phoenix创建指令包括所述表标识以及所述元数据,并使得所述存储器从与所述表标识对应的HBase表中,获取与所述元数据对应的属性值以及与所述属性值对应的行键,创建与所述表标识和所述元数据对应的索引表,并在所述索引表中记录所述属性值与所述行键的对应关系。
在一个例子中,机器可执行指令还促使处理器602执行如下操作:接收从所述查询器发送的SparkSQL存储指令,其中,所述SparkSQL存储指令包括HBase表的表标识、HBase表的至少一个元数据、与每个元数据对应的属性值以及行键;以及针对每个元数据执行如下操作:若所述存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix存储指令,并将所述第一Phoenix存储指令发送给所述存储器,其中,所述第一Phoenix存储指令包括所述表标识、所述元数据、与所述元数据对应的属性值以及与所述属性值对应的行键,并使得所述存储器在与所述表标识和所述元数据对应的索引表中,记录所述属性值与所述行键的对应关系。
在一个例子中,机器可执行指令还促使处理器602执行如下操作:生成第二Phoenix存储指令,并将所述第二Phoenix存储指令发送给所述存储器,其中,所述第二Phoenix存储指令包括所述表标识、所述至少一个元数据、与所述每个元数据对应的属性值以及所述行键,并使得所述存储器在与所述表标识对应的HBase表中,记录所述行键与所述每个元数据对应的属性值的对应关系。
在一个例子中,机器可执行指令还促使处理器602执行如下操作:接收从所述查询器发送的SparkSQL删除指令;以及生成Phoenix删除指令,并将所述Phoenix删除指令发送给所述存储器,其中,所述Phoenix删除指令为如下指令之一:包括HBase表的表标识以及HBase表的元数据,并使得所述存储器删除与所述表标识和所述元数据对应的索引表的指令;以及包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值,并使得所述存储器从与所述表标识和所述元数据对应的索引表中删除与所述属性值对应的行数据的指令。
在一个例子中,机器可执行指令还促使处理器602执行如下操作:接收从所述查询器发送的SparkSQL获取指令;生成Phoenix获取指令,并将所述Phoenix获取指令发送给所述存储器,其中,所述Phoenix获取指令为如下指令之一:包括HBase表的表标识以及HBase表的元数据,并使得所述存储器将与所述表标识和所述元数据对应的索引表返回给所述连接器的指令;以及包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值,并使得所述存储器从与所述表标识和所述元数据对应的索引表中,获取与所述属性值对应的行数据,并将所述行数据返回给所述连接器的指令;以及在接收到所述索 引表或者所述行数据后,将所述索引表或者所述行数据返回给所述查询器。
在一个例子中,机器可执行指令还促使处理器602执行如下操作:接收从所述查询器发送的SparkSQL关联查询指令,其中,所述SparkSQL关联查询指令包括HBase表的第一表标识以及与所述HBase表对应的关联HBase表的第二表标识;生成第一Phoenix关联查询指令,并将所述第一Phoenix关联查询指令发送给所述存储器,其中,所述第一Phoenix关联查询指令包括所述第二表标识,并使得所述存储器从与所述第二表标识对应的关联HBase表中,获取所有第一类行数据,并将所有第一类行数据返回给所述连接器,每个第一类行数据均包括与至少一个元数据对应的属性值;以及针对每个第一类行数据中的每个元数据执行如下操作:若所述存储器存在与所述第一表标识和所述元数据对应的索引表,则生成第二Phoenix关联查询指令,并将所述第二Phoenix关联查询指令发送给所述存储器,其中,所述第二Phoenix关联查询指令包括所述第一表标识、所述元数据以及与所述元数据对应的属性值,并使得所述存储器从与所述第一表标识和所述元数据对应的索引表中,获取与所述属性值对应的行键,并从与所述第一表标识对应的HBase表中,获取与所述行键对应的第二类行数据,并将所述第二类行数据返回给所述连接器;以及将所述第一类行数据和所述第二类行数据进行关联后,返回给所述查询器。
作为一个实施例,机器可读存储介质601可以是任何电子、磁性、光学或其它物理存储装置,可以包含或存储信息,如可执行指令、数据,等等。例如,机器可读存储介质可以是:RAM(Radom Access Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、dvd等),或者类似的存储介质,或者它们的组合。
以上所述仅为本公开的实施例而已,并不用于限制本公开。对于本领域技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本公开的权利要求范围之内。

Claims (21)

  1. 一种数据查询方法,包括:
    查询器向连接器发送SparkSQL查询指令,其中,所述SparkSQL查询指令包括HBase表的表标识、所述HBase表的元数据以及与所述元数据对应的属性值;
    所述连接器在接收到所述SparkSQL查询指令后,若存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix查询指令,并将所述第一Phoenix查询指令发送给所述存储器,其中,所述索引表用于记录HBase表的元数据的属性值与HBase表的行键的对应关系,所述第一Phoenix查询指令包括所述表标识、所述元数据以及所述属性值;
    所述存储器在接收到所述第一Phoenix查询指令后,从与所述表标识和所述元数据对应的所述索引表中,获取与所述属性值对应的行键,从与所述表标识对应的HBase表中,获取与所述行键对应的行数据,并将所述行数据返回给所述连接器;以及
    所述连接器在接收到所述行数据后,将所述行数据返回给所述查询器。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述查询器向所述连接器发送SparkSQL创建指令,其中,所述SparkSQL创建指令包括HBase表的表标识以及所述HBase表的元数据;
    所述连接器在接收到所述SparkSQL创建指令后,生成Phoenix创建指令,并将所述Phoenix创建指令发送给所述存储器,其中,所述Phoenix创建指令包括所述表标识以及所述元数据;以及
    所述存储器在接收到所述Phoenix创建指令后,从与所述表标识对应的HBase表中,获取与所述元数据对应的属性值以及与所述属性值对应的行键,创建与所述表标识和所述元数据对应的索引表,并在索引表中记录所述属性值与所述行键的对应关系。
  3. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述查询器向所述连接器发送SparkSQL存储指令,其中,所述SparkSQL存储指令包括HBase表的表标识、HBase表的至少一个元数据、与每个元数据对应的属性值以及行键;
    所述连接器在接收到所述SparkSQL存储指令后,针对每个元数据执行如下操作:若所述存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix存储指令,并将所述第一Phoenix存储指令发送给所述存储器,其中,所述第一Phoenix存储指令包括所述表标识、所述元数据、与所述元数据对应的属性值以及与所述属性值对应的行键;以及
    所述存储器在接收到所述第一Phoenix存储指令后,在与所述表标识和所述元数据对应的索引表中,记录所述属性值与所述行键的对应关系。
  4. 根据权利要求3所述的方法,其中,所述方法还包括:
    所述连接器在接收到所述SparkSQL存储指令后,生成第二Phoenix存储指令,并将所述第二Phoenix存储指令发送给所述存储器,其中,所述第二Phoenix存储指令包括所述表标识、所述至少一个元数据、与所述每个元数据对应的属性值以及所述行键;以及
    所述存储器在接收到所述第二Phoenix存储指令后,在与所述表标识对应的HBase表 中,记录所述行键与每个元数据对应的属性值的对应关系。
  5. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述查询器向所述连接器发送SparkSQL删除指令;
    所述连接器在接收到所述SparkSQL删除指令后,生成Phoenix删除指令,并将所述Phoenix删除指令发送给所述存储器,其中,所述Phoenix删除指令包括HBase表的表标识以及HBase表的元数据,或者,所述Phoenix删除指令包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值;以及
    所述存储器在接收到所述Phoenix删除指令后,若所述Phoenix删除指令包括HBase表的表标识以及HBase表的元数据,则删除与所述表标识和所述元数据对应的索引表;若所述Phoenix删除指令包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值,则从与所述表标识和所述元数据对应的索引表中删除与所述属性值对应的行数据。
  6. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述查询器向所述连接器发送SparkSQL获取指令;
    所述连接器在接收到所述SparkSQL获取指令后,生成Phoenix获取指令,并将所述Phoenix获取指令发送给所述存储器,其中,所述Phoenix获取指令包括HBase表的表标识以及HBase表的元数据,或者,所述Phoenix获取指令包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值;
    所述存储器在接收到所述Phoenix获取指令后,若所述Phoenix获取指令包括HBase表的表标识以及HBase表的元数据,则将与所述表标识和所述元数据对应的索引表返回给所述连接器;若所述Phoenix获取指令包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值,则从与所述表标识和所述元数据对应的索引表中,获取与所述属性值对应的行数据,并将所述行数据返回给所述连接器;以及
    所述连接器在接收到所述索引表或者所述行数据后,将所述索引表或者所述行数据返回给所述查询器。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    所述查询器向所述连接器发送SparkSQL关联查询指令,其中,所述SparkSQL关联查询指令包括HBase表的第一表标识以及与所述HBase表对应的关联HBase表的第二表标识;
    所述连接器在接收到所述SparkSQL关联查询指令后,生成第一Phoenix关联查询指令,并将所述第一Phoenix关联查询指令发送给所述存储器,其中,所述第一Phoenix关联查询指令包括所述第二表标识;
    所述存储器在接收到所述第一Phoenix关联查询指令后,从与所述第二表标识对应的关联HBase表中,获取所有第一类行数据,并将所有第一类行数据返回给所述连接器,其中,每个第一类行数据均包括与至少一个元数据对应的属性值;以及
    针对每个第一类行数据中的每个元数据执行如下操作:
    若所述存储器存在与第一表标识和所述元数据对应的索引表,则所述连接器生 成第二Phoenix关联查询指令,并将所述第二Phoenix关联查询指令发送给所述存储器,其中,所述第二Phoenix关联查询指令包括所述第一表标识、所述元数据以及与所述元数据对应的属性值;
    所述存储器在接收到所述第二Phoenix关联查询指令后,从与所述第一表标识和所述元数据对应的索引表中,获取与所述属性值对应的行键,并从与所述第一表标识对应的HBase表中,获取与所述行键对应的第二类行数据,并将所述第二类行数据返回给所述连接器;以及
    所述连接器将所述第一类行数据和所述第二类行数据进行关联后,返回给所述查询器。
  8. 一种数据查询系统,包括:
    查询器,用于向连接器发送SparkSQL查询指令,其中,所述SparkSQL查询指令包括HBase表的表标识、所述HBase表的元数据以及与所述元数据对应的属性值;
    所述连接器,用于在接收到所述SparkSQL查询指令后,若存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix查询指令,并将所述第一Phoenix查询指令发送给所述存储器,其中,所述索引表用于记录HBase表的元数据的属性值与HBase表的行键的对应关系,所述第一Phoenix查询指令包括所述表标识、所述元数据以及所述属性值;
    所述存储器,用于在接收到所述第一Phoenix查询指令后,从与所述表标识和所述元数据对应的所述索引表中,获取与所述属性值对应的行键,从与所述表标识对应的HBase表中,获取与所述行键对应的行数据,并将所述行数据返回给所述连接器,
    其中,所述连接器还用于在收到所述行数据后,将所述行数据返回给所述查询器。
  9. 根据权利要求8所述的数据查询系统,其中,
    所述查询器还用于向所述连接器发送SparkSQL创建指令,其中,所述SparkSQL创建指令包括HBase表的表标识以及所述HBase表的元数据;
    所述连接器还用于在接收到所述SparkSQL创建指令后,生成Phoenix创建指令,并将所述Phoenix创建指令发送给所述存储器,其中,所述Phoenix创建指令包括所述表标识以及所述元数据;以及
    所述存储器还用于在接收到所述Phoenix创建指令后,从与所述表标识对应的HBase表中,获取与所述元数据对应的属性值以及与所述属性值对应的行键,创建与所述表标识和所述元数据对应的索引表,并在索引表中记录所述属性值与所述行键的对应关系。
  10. 根据权利要求8所述的数据查询系统,其中,
    所述查询器还用于向所述连接器发送SparkSQL存储指令,其中,所述SparkSQL存储指令包括HBase表的表标识、HBase表的至少一个元数据、与每个元数据对应的属性值以及行键;
    所述连接器还用于在接收到所述SparkSQL存储指令后,针对每个元数据执行如下操 作:若所述存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix存储指令,并将所述第一Phoenix存储指令发送给所述存储器,其中,所述第一Phoenix存储指令包括所述表标识、所述元数据、与所述元数据对应的属性值以及与所述属性值对应的行键;以及
    所述存储器还用于在接收到所述第一Phoenix存储指令后,在与所述表标识和所述元数据对应的索引表中,记录所述属性值与所述行键的对应关系。
  11. 根据权利要求10所述的数据查询系统,其中,
    所述连接器还用于在接收到所述SparkSQL存储指令后,生成第二Phoenix存储指令,并将所述第二Phoenix存储指令发送给所述存储器,其中,所述第二Phoenix存储指令包括所述表标识、所述至少一个元数据、与所述每个元数据对应的属性值以及所述行键;以及
    所述存储器还用于在接收到所述第二Phoenix存储指令后,在与所述表标识对应的HBase表中,记录所述行键与每个元数据对应的属性值的对应关系。
  12. 根据权利要求8所述的数据查询系统,其中,
    所述查询器还用于向所述连接器发送SparkSQL删除指令;
    所述连接器还用于在接收到所述SparkSQL删除指令后,生成Phoenix删除指令,并将所述Phoenix删除指令发送给所述存储器,其中,所述Phoenix删除指令包括HBase表的表标识以及HBase表的元数据,或者,所述Phoenix删除指令包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值;以及
    所述存储器还用于在接收到所述Phoenix删除指令后,若所述Phoenix删除指令包括HBase表的表标识以及HBase表的元数据,则删除与所述表标识和所述元数据对应的索引表;若所述Phoenix删除指令包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值,则从与所述表标识和所述元数据对应的索引表中删除与所述属性值对应的行数据。
  13. 根据权利要求8所述的数据查询系统,其中,
    所述查询器还用于向所述连接器发送SparkSQL获取指令;
    所述连接器还用于在接收到所述SparkSQL获取指令后,生成Phoenix获取指令,并将所述Phoenix获取指令发送给所述存储器,其中,所述Phoenix获取指令包括HBase表的表标识以及HBase表的元数据,或者,所述Phoenix获取指令包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值;
    所述存储器还用于在接收到所述Phoenix获取指令后,若所述Phoenix获取指令包括HBase表的表标识以及HBase表的元数据,则将与所述表标识和所述元数据对应的索引表返回给所述连接器;若所述Phoenix获取指令包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值,则从与所述表标识和所述元数据对应的索引表中,获取与所述属性值对应的行数据,并将所述行数据返回给所述连接器;以及
    所述连接器还用于在接收到所述索引表或者所述行数据后,将所述索引表或者所述行数据返回给所述查询器。
  14. 根据权利要求8所述的数据查询系统,其中,
    所述查询器还用于向所述连接器发送SparkSQL关联查询指令,其中,所述SparkSQL关联查询指令包括HBase表的第一表标识以及与HBase表对应的关联HBase表的第二表标识;
    所述连接器还用于在接收到所述SparkSQL关联查询指令后,生成第一Phoenix关联查询指令,并将所述第一Phoenix关联查询指令发送给所述存储器,其中,所述第一Phoenix关联查询指令包括所述第二表标识;
    所述存储器还用于在接收到所述第一Phoenix关联查询指令后,从与所述第二表标识对应的关联HBase表中,获取所有第一类行数据,并将所有第一类行数据返回给所述连接器,其中,每个第一类行数据均包括与至少一个元数据对应的属性值;以及
    针对每个第一类行数据中的每个元数据,
    若所述存储器存在与所述第一表标识和所述元数据对应的索引表,则所述连接器还用于生成第二Phoenix关联查询指令,并将所述第二Phoenix关联查询指令发送给所述存储器,其中,所述第二Phoenix关联查询指令包括所述第一表标识、所述元数据以及与所述元数据对应的属性值;
    所述存储器还用于在接收到所述第二Phoenix关联查询指令后,从与所述第一表标识和所述元数据对应的索引表中,获取与所述属性值对应的行键,并从与所述第一表标识对应的HBase表中,获取与所述行键对应的第二类行数据,并将所述第二类行数据返回给所述连接器;以及
    所述连接器还用于将所述第一类行数据和所述第二类行数据进行关联后,返回给所述查询器。
  15. 一种连接器,包括:
    处理器;以及
    存储有机器可执行指令的机器可读存储介质,
    其中,通过读取并执行所述机器可执行指令,所述处理器被使得:
    接收从查询器发送的SparkSQL查询指令,其中,所述SparkSQL查询指令包括HBase表的表标识、所述HBase表的元数据以及与所述元数据对应的属性值;
    若存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix查询指令,并将所述第一Phoenix查询指令发送给所述存储器,其中,所述索引表用于记录HBase表的元数据的属性值与HBase表的行键的对应关系,所述第一Phoenix查询指令包括所述表标识、所述元数据以及所述属性值,并使得所述存储器从与所述表标识和所述元数据对应的索引表中,获取与所述属性值对应的行键,从与所述表标识对应的HBase表中,获取与所述行键对应的行数据,并将所述行数据返回给所述连接器;以及
    将接收到的所述行数据返回给所述查询器。
  16. 根据权利要求15所述的连接器,其中,所述机器可执行指令还促使所述处理器:
    接收从所述查询器发送的SparkSQL创建指令,其中,所述SparkSQL创建指令包括HBase表的表标识以及所述HBase表的元数据;以及
    生成Phoenix创建指令,并将所述Phoenix创建指令发送给所述存储器,其中,所述Phoenix创建指令包括所述表标识以及所述元数据,并使得所述存储器从与所述表标识对应的HBase表中,获取与所述元数据对应的属性值以及与所述属性值对应的行键,创建与所述表标识和所述元数据对应的索引表,并在所述索引表中记录所述属性值与所述行键的对应关系。
  17. 根据权利要求15所述的连接器,其中,所述机器可执行指令还促使所述处理器:
    接收从所述查询器发送的SparkSQL存储指令,其中,所述SparkSQL存储指令包括HBase表的表标识、HBase表的至少一个元数据、与每个元数据对应的属性值以及行键;以及
    针对每个元数据执行如下操作:若所述存储器存在与所述表标识和所述元数据对应的索引表,则生成第一Phoenix存储指令,并将所述第一Phoenix存储指令发送给所述存储器,其中,所述第一Phoenix存储指令包括所述表标识、所述元数据、与所述元数据对应的属性值以及与所述属性值对应的行键,并使得所述存储器在与所述表标识和所述元数据对应的索引表中,记录所述属性值与所述行键的对应关系。
  18. 根据权利要求17所述的连接器,其中,所述机器可执行指令还促使所述处理器:
    生成第二Phoenix存储指令,并将所述第二Phoenix存储指令发送给所述存储器,其中,所述第二Phoenix存储指令包括所述表标识、所述至少一个元数据、与所述每个元数据对应的属性值以及所述行键,并使得所述存储器在与所述表标识对应的HBase表中,记录所述行键与所述每个元数据对应的属性值的对应关系。
  19. 根据权利要求15所述的连接器,其中,所述机器可执行指令还促使所述处理器:
    接收从所述查询器发送的SparkSQL删除指令;以及
    生成Phoenix删除指令,并将所述Phoenix删除指令发送给所述存储器,其中,所述Phoenix删除指令为如下指令之一:
    包括HBase表的表标识以及HBase表的元数据,并使得所述存储器删除与所述表标识和所述元数据对应的索引表的指令;以及
    包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值,并使得所述存储器从与所述表标识和所述元数据对应的索引表中删除与所述属性值对应的行数据的指令。
  20. 根据权利要求15所述的连接器,其中,所述机器可执行指令还促使所述处理器:
    接收从所述查询器发送的SparkSQL获取指令;
    生成Phoenix获取指令,并将所述Phoenix获取指令发送给所述存储器,其中,所述 Phoenix获取指令为如下指令之一:
    包括HBase表的表标识以及HBase表的元数据,并使得所述存储器将与所述表标识和所述元数据对应的索引表返回给所述连接器的指令;以及
    包括HBase表的表标识、HBase表的元数据以及与所述元数据对应的属性值,并使得所述存储器从与所述表标识和所述元数据对应的索引表中,获取与所述属性值对应的行数据,并将所述行数据返回给所述连接器的指令;以及
    在接收到所述索引表或者所述行数据后,将所述索引表或者所述行数据返回给所述查询器。
  21. 根据权利要求15所述的连接器,其中,所述机器可执行指令还促使所述处理器:
    接收从所述查询器发送的SparkSQL关联查询指令,其中,所述SparkSQL关联查询指令包括HBase表的第一表标识以及与所述HBase表对应的关联HBase表的第二表标识;
    生成第一Phoenix关联查询指令,并将所述第一Phoenix关联查询指令发送给所述存储器,其中,所述第一Phoenix关联查询指令包括所述第二表标识,并使得所述存储器从与所述第二表标识对应的关联HBase表中,获取所有第一类行数据,并将所有第一类行数据返回给所述连接器,每个第一类行数据均包括与至少一个元数据对应的属性值;以及
    针对每个第一类行数据中的每个元数据执行如下操作:
    若所述存储器存在与所述第一表标识和所述元数据对应的索引表,则生成第二Phoenix关联查询指令,并将所述第二Phoenix关联查询指令发送给所述存储器,其中,所述第二Phoenix关联查询指令包括所述第一表标识、所述元数据以及与所述元数据对应的属性值,并使得所述存储器从与所述第一表标识和所述元数据对应的索引表中,获取与所述属性值对应的行键,并从与所述第一表标识对应的HBase表中,获取与所述行键对应的第二类行数据,并将所述第二类行数据返回给所述连接器;以及
    将所述第一类行数据和所述第二类行数据进行关联后,返回给所述查询器。
PCT/CN2018/118249 2017-11-30 2018-11-29 数据查询 WO2019105420A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2020544099A JP7018516B2 (ja) 2017-11-30 2018-11-29 データクエリ
EP18884738.8A EP3683697A4 (en) 2017-11-30 2018-11-29 DATA INQUIRY
US16/766,231 US11269881B2 (en) 2017-11-30 2018-11-29 Data query

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711235855.2A CN109101516B (zh) 2017-11-30 2017-11-30 一种数据查询方法和服务器
CN201711235855.2 2017-11-30

Publications (1)

Publication Number Publication Date
WO2019105420A1 true WO2019105420A1 (zh) 2019-06-06

Family

ID=64796513

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/118249 WO2019105420A1 (zh) 2017-11-30 2018-11-29 数据查询

Country Status (5)

Country Link
US (1) US11269881B2 (zh)
EP (1) EP3683697A4 (zh)
JP (1) JP7018516B2 (zh)
CN (1) CN109101516B (zh)
WO (1) WO2019105420A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457346A (zh) * 2019-07-05 2019-11-15 中国平安财产保险股份有限公司 数据查询方法、装置及计算机可读存储介质
CN111104426A (zh) * 2019-11-22 2020-05-05 深圳智链物联科技有限公司 一种数据查询方法及系统
CN112800073A (zh) * 2021-01-27 2021-05-14 浪潮云信息技术股份公司 一种基于NiFi更新Delta Lake的方法
CN113434580A (zh) * 2020-03-23 2021-09-24 北京国双科技有限公司 Phoenix数据库访问方法、装置、设备及介质
US11386089B2 (en) 2020-01-13 2022-07-12 The Toronto-Dominion Bank Scan optimization of column oriented storage

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083597A (zh) * 2019-03-16 2019-08-02 平安普惠企业管理有限公司 命令查询方法、装置、计算机设备和存储介质
CN111125090B (zh) * 2019-11-12 2023-05-30 中盈优创资讯科技有限公司 数据存取方法及装置
CN110888929B (zh) * 2019-12-06 2022-03-29 秒针信息技术有限公司 数据处理方法、装置、数据节点及存储介质
CN111125216B (zh) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 数据导入Phoenix的方法及装置
CN113656397A (zh) * 2021-07-02 2021-11-16 阿里巴巴新加坡控股有限公司 一种针对时序数据的索引构建及查询的方法、装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354255A (zh) * 2015-10-21 2016-02-24 华为技术有限公司 数据查询方法和装置
CN107122437A (zh) * 2017-04-19 2017-09-01 高新兴科技集团股份有限公司 一种支持多条件检索和实时分析的大数据处理方法
CN107133342A (zh) * 2017-05-16 2017-09-05 广州舜飞信息科技有限公司 一种IndexR实时数据分析库
US9779150B1 (en) * 2014-08-15 2017-10-03 Tableau Software, Inc. Systems and methods for filtering data used in data visualizations that use relationships

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8489773B1 (en) * 2008-06-06 2013-07-16 Amdocs Software Systems Limited System, method, and computer program for sending a response to a client based on a replication message received from a master server
CN101350028A (zh) * 2008-07-10 2009-01-21 西安中电商务信息技术有限公司 一种基于SQL结构化查询语言的XML数据XPath查询方法
JP2010224824A (ja) 2009-03-23 2010-10-07 Toshiba Corp 情報処理装置
CN101788992A (zh) * 2009-05-06 2010-07-28 厦门东南融通系统工程有限公司 一种数据库查询语句的转换方法和转换系统
US9535961B2 (en) * 2011-11-18 2017-01-03 Hewlett Packard Enterprise Development Lp Query summary generation using row-column data storage
CN102737132A (zh) * 2012-06-25 2012-10-17 天津神舟通用数据技术有限公司 基于数据库行列混合存储的多规则复合压缩方法
US9477731B2 (en) 2013-10-01 2016-10-25 Cloudera, Inc. Background format optimization for enhanced SQL-like queries in Hadoop
CN104750727B (zh) * 2013-12-30 2019-03-26 沈阳亿阳计算机技术有限责任公司 一种列式内存存储查询装置及列式内存存储查询方法
US20160104090A1 (en) * 2014-10-09 2016-04-14 Splunk Inc. State determination using per-entity thresholds
CN106682042B (zh) * 2015-11-11 2019-11-22 杭州海康威视数字技术股份有限公司 一种关系数据缓存及查询方法及装置
US10169601B2 (en) 2015-11-18 2019-01-01 American Express Travel Related Services Company, Inc. System and method for reading and writing to big data storage formats
CN105589969A (zh) * 2015-12-23 2016-05-18 浙江大华技术股份有限公司 一种数据处理方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9779150B1 (en) * 2014-08-15 2017-10-03 Tableau Software, Inc. Systems and methods for filtering data used in data visualizations that use relationships
CN105354255A (zh) * 2015-10-21 2016-02-24 华为技术有限公司 数据查询方法和装置
CN107122437A (zh) * 2017-04-19 2017-09-01 高新兴科技集团股份有限公司 一种支持多条件检索和实时分析的大数据处理方法
CN107133342A (zh) * 2017-05-16 2017-09-05 广州舜飞信息科技有限公司 一种IndexR实时数据分析库

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3683697A4 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457346A (zh) * 2019-07-05 2019-11-15 中国平安财产保险股份有限公司 数据查询方法、装置及计算机可读存储介质
CN110457346B (zh) * 2019-07-05 2024-04-30 中国平安财产保险股份有限公司 数据查询方法、装置及计算机可读存储介质
CN111104426A (zh) * 2019-11-22 2020-05-05 深圳智链物联科技有限公司 一种数据查询方法及系统
CN111104426B (zh) * 2019-11-22 2024-04-05 深圳智链物联科技有限公司 一种数据查询方法及系统
CN111104426B8 (zh) * 2019-11-22 2024-04-23 北京傲速科技有限公司 一种数据查询方法及系统
US11386089B2 (en) 2020-01-13 2022-07-12 The Toronto-Dominion Bank Scan optimization of column oriented storage
CN113434580A (zh) * 2020-03-23 2021-09-24 北京国双科技有限公司 Phoenix数据库访问方法、装置、设备及介质
CN112800073A (zh) * 2021-01-27 2021-05-14 浪潮云信息技术股份公司 一种基于NiFi更新Delta Lake的方法
CN112800073B (zh) * 2021-01-27 2023-03-28 浪潮云信息技术股份公司 一种基于NiFi更新Delta Lake的方法

Also Published As

Publication number Publication date
EP3683697A4 (en) 2020-07-29
EP3683697A1 (en) 2020-07-22
JP2021502655A (ja) 2021-01-28
CN109101516A (zh) 2018-12-28
US11269881B2 (en) 2022-03-08
CN109101516B (zh) 2019-09-17
JP7018516B2 (ja) 2022-02-10
US20200372028A1 (en) 2020-11-26

Similar Documents

Publication Publication Date Title
WO2019105420A1 (zh) 数据查询
CN107402995B (zh) 一种分布式newSQL数据库系统及方法
CN106227800B (zh) 一种高度关联大数据的存储方法及管理系统
US8015146B2 (en) Methods and systems for assisting information processing by using storage system
WO2017167171A1 (zh) 一种数据操作方法,服务器及存储系统
US10997124B2 (en) Query integration across databases and file systems
WO2017161540A1 (zh) 数据查询的方法、数据对象的存储方法和数据系统
US8700567B2 (en) Information apparatus
CN103595797B (zh) 一种分布式存储系统中的缓存方法
CN111221791A (zh) 一种多源异构数据导入数据湖的方法
JPWO2011108695A1 (ja) 並列データ処理システム、並列データ処理方法及びプログラム
WO2014089828A1 (zh) 访问存储设备的方法和存储设备
CN103744913A (zh) 一种基于搜索引擎技术的数据库检索方法
WO2018205471A1 (zh) 基于特征分析的数据存取方法、存储设备及存储系统
US10762068B2 (en) Virtual columns to expose row specific details for query execution in column store databases
WO2020125630A1 (zh) 文件读取
CN102622361B (zh) 一种数据库查询方法
CN108280123B (zh) 一种HBase的列聚合方法
WO2019128936A1 (zh) 数据处理
US20220342888A1 (en) Object tagging
WO2016101528A1 (zh) 内存数据库的数据处理方法及装置
WO2024119797A1 (zh) 一种数据处理方法、系统、设备以及存储介质
WO2015015559A1 (ja) 検索システムおよび検索方法
US11487780B2 (en) Processing data between data stores
CN109063061B (zh) 跨分布式系统数据处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18884738

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018884738

Country of ref document: EP

Effective date: 20200417

ENP Entry into the national phase

Ref document number: 2020544099

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE