CN114510480A - Method and device for querying data - Google Patents

Method and device for querying data Download PDF

Info

Publication number
CN114510480A
CN114510480A CN202210106077.1A CN202210106077A CN114510480A CN 114510480 A CN114510480 A CN 114510480A CN 202210106077 A CN202210106077 A CN 202210106077A CN 114510480 A CN114510480 A CN 114510480A
Authority
CN
China
Prior art keywords
data
database
index table
query
querying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210106077.1A
Other languages
Chinese (zh)
Inventor
刘雪晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210106077.1A priority Critical patent/CN114510480A/en
Publication of CN114510480A publication Critical patent/CN114510480A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

A method and a device for querying data are provided, which relate to the field of big data. The method comprises the following steps: acquiring a query condition, wherein the query condition is used for querying first data in a first database; inquiring the storage position of the first data in the first database from a pre-constructed target index table based on the inquiry condition, wherein the target index table comprises the corresponding relation between at least one inquiry condition and the storage position of at least one piece of data in the first database, and each inquiry condition in the at least one inquiry condition corresponds to one or more pieces of data in the first database; the first data is read based on the storage location of the first data. By means of the pre-constructed index table comprising the corresponding relation between the query conditions and the storage positions of the data, after the query data is triggered, the storage positions of the data corresponding to the query conditions can be found in the index table according to the obtained query conditions, so that the data queried by a user can be quickly read according to the storage positions, and the time delay of querying the data is reduced.

Description

Method and device for querying data
Technical Field
The present application relates to the field of big data, and in particular, to a method and an apparatus for querying data.
Background
With the rapid development of information technology, job data grows exponentially, and at present, a large data platform is generally used for processing the job data. Because both job submission and job scheduling require a large amount of overhead, the time delay for directly querying data from a database or a data warehouse based on a large data platform is large, and the query is generally a minute-level query.
How to reduce the time delay of data query becomes an urgent problem to be solved.
Disclosure of Invention
The application provides a method and a device for querying data, which aim to reduce the time delay of querying data.
In a first aspect, the present application provides a method for querying data, the method comprising: acquiring a query condition, wherein the query condition is used for querying first data in a first database; inquiring the storage position of the first data in the first database from a pre-constructed target index table based on the inquiry condition, wherein the target index table comprises a corresponding relation between at least one inquiry condition and the storage position of at least one piece of data in the first database, and each inquiry condition in the at least one inquiry condition corresponds to one or more pieces of data in the first database; reading the first data based on the storage location of the first data.
Based on the scheme, the index table is constructed in advance to indicate the query conditions and the storage positions of the data in the database through the index table, and after the query data is triggered, the storage positions of the data corresponding to the query conditions in the database can be found in the index table according to the obtained query conditions, so that the data queried by a user can be quickly positioned according to the storage positions, the data are read, and the time delay of querying the data is reduced.
Optionally, the querying, based on the query condition, a storage location of the first data in the first database from a pre-constructed target index table includes: determining the target index table from at least one index table constructed in advance, wherein the target index table is an index table comprising the query conditions, and each index table in the at least one index table comprises the corresponding relation between at least one query condition and the storage position of at least one piece of data in the first database; and inquiring the storage position of the first data from the target index table based on the inquiry condition.
Optionally, the method further comprises: and constructing the at least one index table.
Optionally, the constructing the at least one index table includes: acquiring original information of the first database; determining a table name of each index table in the at least one index table and at least one query condition included in each index table based on the original information; and constructing at least one index table based on the table name of each index table, at least one query condition included in each index table and the storage position of the data corresponding to each query condition in the first database.
Optionally, the at least one index table is built in a second database, the second database being a database with online analysis processing capability.
Optionally, the method further comprises: and updating one or more index tables associated with the changed data under the condition that the data in the first database is determined to be changed.
Optionally, the first data is batch data, the batch data is data processed in batches on a large data platform, the first database is hive, the original information includes a table name of each data table in at least one data table included in the first database and at least one field included in each data table, the table name of each data table is used to determine a table name of an index table, and the at least one field in each data table is used to determine a query condition.
Optionally, in a case that it is determined that the data in the first database has changed, updating one or more index tables associated with the changed data includes: and updating one or more index tables associated with the changed data when the change of the data in the first database is monitored.
Optionally, the first data is streaming data, the streaming data is data streamed on a big data platform, the first database is Kafka, the original information includes a name of each topic in at least one topic (topic) included in the first database and at least one field included in each topic, the name of each topic is used for determining a table name of an index table, and at least one field included in each topic is used for determining a query condition.
Optionally, in a case that it is determined that the data in the first database has changed, updating one or more index tables associated with the changed data includes: reading newly added information in each topic in real time; in the event that it is determined that at least one newly added message in a topic indicates an addition, deletion or modification to data in the first database, updating one or more index tables associated with the changed data.
In a second aspect, the present application provides an apparatus for querying data, the apparatus comprising: the device comprises an acquisition module, a query module and a reading module; the acquisition module is used for acquiring a query condition, and the query condition is used for querying first data in a first database; the query module is used for querying a storage position of the first data in the first database from a pre-constructed target index table based on the query condition, the target index table comprises a corresponding relation between at least one query condition and the storage position of at least one piece of data in the first database, and each query condition in the at least one query condition corresponds to one or more pieces of data in the first database; the reading module is used for reading the first data based on the storage position of the first data.
In a third aspect, the present application provides an apparatus for querying data, the apparatus comprising a processor. The processor is coupled to the memory and is operable to execute the computer program in the memory to implement the method of the first aspect and any possible implementation of the first aspect.
Optionally, the apparatus for querying data may further include a memory for storing computer-readable instructions, and the processor reads the computer-readable instructions to enable the apparatus for querying data to implement the method described in any one of the above-mentioned first aspect and possible implementation manners of the first aspect.
Optionally, the apparatus for querying data may further include a communication interface for the apparatus to communicate with other devices, and the communication interface may be, for example, a transceiver, a circuit, a bus, a module, or another type of communication interface.
In a fourth aspect, the present application provides a chip system, which comprises at least one processor and is configured to support the implementation of the functionality referred to in the first aspect and any one of the possible implementations of the first aspect, for example, to obtain or process requests and/or information referred to in the above methods.
In one possible design, the system-on-chip further includes a memory to hold program instructions and data, the memory being located within the processor or external to the processor.
The chip system may be formed by a chip, and may also include a chip and other discrete devices.
In a fifth aspect, the present application provides a computer-readable storage medium having computer-readable instructions stored thereon, which, when executed by a computer, cause the computer to implement the first aspect and the method in any one of the possible implementations of the first aspect.
In a sixth aspect, the present application provides a computer program product comprising: computer program (also called code, or instructions), which when executed, causes the method of any of the possible implementations of the first aspect and the first aspect described above to be performed.
It should be understood that the second aspect to the sixth aspect of the present application correspond to the technical solutions of the first aspect of the present application, and the beneficial effects achieved by the aspects and the corresponding possible implementations are similar, and are not described again.
Drawings
Fig. 1 is a schematic application scenario diagram of a method for querying data according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart diagram illustrating a method for querying data according to an embodiment of the present disclosure;
FIG. 3 is a schematic block diagram of an apparatus for querying data according to an embodiment of the present disclosure;
fig. 4 is a schematic block diagram of another apparatus for querying data according to an embodiment of the present disclosure.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
To facilitate understanding of the embodiments of the present application, some terms or words referred to in the present application will be briefly described below.
1. Metadata (metadata): the data about description data (data about) is also called intermediate data or relay data, and is mainly information about description data property (property) for supporting functions such as indicating storage location, history data, resource search, and file record.
2. Batch data: also known as batch or historical big data. Batch data may be understood as data that is processed in batches on a large data platform. Complex batch data processing (batch data processing) typically spans tens of minutes to hours. Interactive queries (interactive queries) based on batch data typically span between tens of seconds to minutes.
3. Streaming data: it may also be referred to as streaming big data or real-time big data. Streaming can be understood as data that is streamed on a big data platform. Streaming data processing (streaming data processing) is typically time-spanning between hundreds of milliseconds and seconds.
4. hive: the system is a data warehouse tool based on Hadoop (a software framework capable of performing distributed processing on a large amount of data), can be used for data extraction, transformation and loading, and is a mechanism capable of storing, querying and analyzing large-scale data stored in Hadoop. It can be simply understood that hive is a database based on a big data platform.
The bottom file of Hive is stored in a Hadoop Distributed File System (HDFS) file, the table in Hive is a pure logic table and only stores metadata information such as the definition of the table, that is, Hive itself does not store data and completely depends on the HDFS, and the specific numerical value corresponding to the field contained in the table in Hive can be stored in the HDFS, so that the structured data file can be mapped into a database table. Therefore, in the present invention, data in Hive can be accessed by directly accessing the HDFS file.
5. Kafka: is a high-throughput distributed publish-subscribe messaging system that can query, process and store continuous streaming data. Kafka has a database function, so Kafka can also be considered as a database.
6. Online analytical processing (OLAP): online analytical processing is understood to be a software technique that enables an analyst to quickly, consistently, and interactively view information from various aspects for the purpose of understanding the data in depth. Having an online analysis process may be understood as having the ability to react to a user's data query or data analysis requirements within seconds.
7. Access hotspot issues: it is understood that all accesses are concentrated on one or several nodes, resulting in overloading the machines and a reduced performance problem.
With the rapid development of information technology, the operation data including structured data or semi-structured data such as characters, images, videos and the like grows exponentially, the traditional database is difficult to store and analyze the operation data, and the operation data is generally processed by using a large data platform such as Hadoop at present. Because both job submission and job scheduling require a large amount of overhead, the time delay for directly querying data from a database or a data warehouse based on a large data platform is large, and the query is generally a minute-level query. That is, some databases or data warehouses based on big data platforms do not have the capability of on-line analysis and processing, and cannot feed back the data queried by the user with low time delay.
In order to solve the above problems, the present application provides a method and an apparatus for querying data, where through a pre-constructed index table including a correspondence between query conditions and storage locations of data in a database, a storage location of data corresponding to a query condition can be found in the index table according to the obtained query condition, and the data are read out and fed back to a user, so as to reduce a time delay of querying the data.
It should be noted that the method and apparatus for querying data provided in the embodiment of the present application may be applied to the field of big data, and may also be applied to any field other than the field of big data, which is not limited in this application.
Fig. 1 is a schematic view of an application scenario applicable to a method for querying data provided in an embodiment of the present application.
The application scenario applicable to the method for querying data provided by the embodiment of the present application may include a device for querying data, and a big data platform and/or a database may be deployed on the device for querying data. As shown in FIG. 1, which illustrates a user and a computer 110, computer 110 may be an example of a device that queries data. The computer 110 may query data in response to a user-triggered operation of querying the data or in response to a preset event of querying the data at a fixed time, obtain a query condition, find a storage location of the data corresponding to the query condition in a pre-established index table including a correspondence between the query condition and the storage location of the data, quickly locate the data queried by the user according to the storage location, read the data out and feed the data back to the user, thereby reducing a time delay of querying the data, avoiding a long time for the user to wait, and improving a user experience.
It should be understood that, in an actual application scenario, the device for querying data includes but is not limited to a computer, for example, a server may also be an example of the device for querying data, a server cluster may also be an example of the device for querying data, and as long as an apparatus or a device that can implement the method for querying data provided in the embodiment of the present application by running a program may be an example of the device for querying data, the present application is not limited thereto.
Fig. 2 is a schematic flow chart of a method for querying data according to an embodiment of the present application.
As shown in fig. 2, the method 200 of querying data includes steps 210 to 230. The method 200 may be executed by a device for querying data, and the following describes steps 210 to 230 in detail by taking a computer as an example of the device for querying data.
In step 210, the computer obtains a query condition for querying first data in a first database.
For example, a user may input a query condition on the user interface or select a certain query condition from some predefined candidate query conditions, and the computer may obtain a query condition input or selected by the user in response to the operation of querying the data by the user.
The first data may be understood as data that the user wants to query.
By way of example and not limitation, the first data may be batch data or the first data may be streaming data, which is not limited in this application. The first data is batch data, which is understood to mean that the first data is one or more pieces of data in the batch data, and the first data is streaming data, which is understood to mean that the first data can be one or more pieces of data in the streaming data.
In step 220, the computer queries the storage location of the first data in the first database from the pre-constructed target index table based on the query condition.
The target index table comprises a corresponding relation between at least one query condition and a storage position of at least one piece of data in the first database, and each query condition in the at least one query condition corresponds to one or more pieces of data in the first database.
For example, the computer may find, according to the obtained query condition, a storage location of one or more pieces of first data corresponding to the query condition from the target index table.
In a possible implementation manner, the computer may determine a target index table from at least one index table constructed in advance, where the target index table is an index table including query conditions of the first data, and each index table in the at least one index table includes a corresponding relationship between the at least one query condition and a storage location of the at least one piece of data in the first database; and inquiring the storage position of the first data from the target index table based on the inquiry condition.
For example, the computer may determine a target index table from at least one index table constructed in advance, and then find the storage location of the first data from the target index table based on the query condition. That is to say, the computer may determine, according to the query condition, the index table that includes the query condition in at least one index table that is constructed in advance as the target index table, and then find the storage location of the data corresponding to the query condition from the target index table, where the storage location of the data corresponding to the query condition is the storage location of the first data.
In step 230, the computer reads the first data based on the storage location of the first data.
After the storage location of the first data is found, the computer reads the first data from the storage location of the first data.
In one possible implementation, the method 200 may further include: the computing mechanism builds at least one index table.
The computer may construct at least one index table, so that, in the case of triggering data query, the computer may obtain the query condition, find the storage location of the first data from a target index table in the at least one index table constructed in advance according to the query condition, and then read the first data according to the storage location of the first data.
Optionally, the computer constructs at least one index table, which may include: the computer acquires original information of a first database; the computer determines a table name of each index table in the at least one index table and at least one query condition included in each index table based on the original information; the computer constructs at least one index table based on the table name of each index table, at least one query condition included in each index table and the storage position of the data corresponding to each query condition in the first database.
For example, the computer may first obtain raw information of the first database, where the raw information may include, but is not limited to, a table name of each data table in at least one data table included in the first database and at least one field included in each data table. The original information of the first database may be specifically determined according to which database the first database is specifically, which is not limited in this application.
After obtaining the original information of the first database, the computer may determine, based on the original information, a table name of each index table in the at least one index table and at least one query condition included in each index table; on the basis, the computer can construct at least one index table based on the table name of each index table, at least one query condition included in each index table and the storage position of the data corresponding to each query condition in the first database.
For example, where the first data may be batch data and the first database may be hive, the raw information may include, but is not limited to, a library name of hive, a table name of each of at least one data table included in the first database, and at least one field included in each data table. The library name of hive can be used to determine the name of the namespace (namespace) of the index tables, the table name of each data table can be used to determine the table name of an index table, and at least one field in each data table is used to determine a query condition.
For another example, where the first data may be streaming data and the first database may be Kafka, the raw information may include, but is not limited to, cluster information in Kafka, a name of each of at least one topic included in Kafka, and at least one field included in each of at least one message included in each topic. The cluster information in Kafka may be used to determine namespaces of the index tables, the name of each topic may be used to determine a table name of an index table, and each message may include at least one field that may be used to determine a query condition.
It should be noted that a query condition may include one or more fields, which is not limited in this application. For example, if a query condition is used to query data of a student whose identification number is 130127 ×, the query condition includes 1 field corresponding to the "identification number". For another example, if a query is used to query data of a student with a height of 160 cm and a gender of women, the query includes 2 fields corresponding to "height" and "gender". For another example, if a query is used to query data of a student with a height of 160 cm and a gender of women and a mathematical score of more than 80, the query includes 3 fields corresponding to "height", "gender" and "mathematical score".
Optionally, the at least one index table is built in a second database, and the second database is a database with online analysis processing capability.
At least one index table can be established in the database with the online analysis processing capability, and due to the nature of the database with the online analysis processing capability, the database can react to the data query of the user within seconds, so that the time delay of querying the data can be reduced.
By way of example, and not limitation, the second database may be an HBase. The HBase is used as an important component of Hadoop, has the main characteristics of supporting real-time storage and query of mass data, has the characteristics of high reliability, high performance, column-oriented property, scalability and the like, and can be quickly positioned to a data position to be read if the HBase is used for querying data according to a certain specified query condition (the query condition can be used as a row key) or a certain row key range is specified for scanning the data by using the HBase storage index table.
In some possible implementations, the second database may be a remote dictionary service (Redis), which is not limited in this application.
In a possible implementation manner, the computer may generate a query request for querying the first data in the first database in response to a user-triggered operation for querying the data or in response to an event for querying the data at a preset timing, for example, the query request may be a Database Definition Language (DDL), and the query request may include a query condition. In this case, if the index table is on the second database, the query request may be converted into a query statement that the second database can identify or execute, for example, when the second database is an HBase, the query statement may be a get query statement or a scan query statement, the query statement also includes the query condition, and then the storage location of the first data is determined in the index table in the second database according to the query condition, which is not limited in this application.
Taking HBase as an example of the second database, the index table corresponding to the data in hive established on HBase may be as shown in table 1.
In table 1, column name in the column corresponding to the query condition is a field (i.e., column name), and value is a value corresponding to the field. columnName1 and columnName2 represent different fields, value1 is a specific value of columnName1, and value2 is a specific value of columnName 2. In addition, in order to avoid the access hot spot problem, a Hash (Hash) value may be calculated according to different fields and values of the different fields to disperse data to different partitions so as to avoid the access hot spot problem, for example, Hash1 in table 1 may be calculated based on columnName1 and value1, and Hash2 may be calculated based on columnName2 and value 2.
In table 1, the column corresponding to the storage location of the data, fileName can be understood as a file name, fileName1, fileName2, fileName3, fileName4, and fileName5 can be understood as different file names, and these files may be different files stored on the HDFS with specific values of the data in the hive; offset is an offset, which can also be understood as the starting point of the storage location of data in a file, and offset1, offset2, offset3, offset4, and offset5 represent different offsets; length is the length of the data, and length1, length2, length3, length4, and length5 represent different lengths.
TABLE 1
Figure BDA0003493528500000101
Taking "fileName 1+ offset1+ length 1" as an example, and "fileName 1+ offset1+ length 1" as an example, a piece of data satisfying the query condition "Hash 1+ columnName1+ value 1" is stored in a file with a file name of fileName1, the starting position of the piece of data in the file is offset1, and the length of the piece of data is length 1. By analogy, "fileName 2+ offset2+ length h 2", "fileName 3+ offset3+ length 3", "fileName 4+ offset4+ length h 4", "fileName 5+ offset5+ length h 5" can also be understood in the same way, and for the sake of brevity, will not be described in detail here.
It should be noted that the storage location of the data is not limited to the above form, and in some possible implementations, the start location and the end location of the file where the data is located may also be indicated, and are not necessarily in the form of the above indicated offset and length.
In this column corresponding to the separator, the separator is only for separating the storage locations of a plurality of data corresponding to the same query condition. In some implementations, this column may not be present in the index table.
Taking HBase as an example of the second database, the index table corresponding to the data in Kafka established on HBase can be shown in table 2.
TABLE 2
Figure BDA0003493528500000111
The same contents in table 2 as those in table 1 can be referred to above, and the contents in table 2 different from those in table 1 will be described below.
Unlike table 1, in the column of storage locations of data, partition is a partition in the topic in Kafka, and partition1, partition2, partition3, partition4, and partition5 represent different partitions. offset is an offset, which can also be understood as the starting point of the storage location of data in the partition, and offset1, offset2, offset3, offset4, and offset5 can be different offsets. Since the data in Kafka are consumed one by one, the entire piece of data can be read out only by knowing the partition where the data is stored and the starting position of the data in the partition, so that the length of each piece of data does not need to be known.
Taking "partition 1+ offset 1" as an example, the "partition 1+ offset 1" can be understood that the starting position of a piece of data satisfying the query condition "Hash 1+ columnName1+ value 1" stored in the partition with the partition name partition1 is offset 1. By analogy, "partition 2+ offset 2", "partition 3+ offset 3", "partition 3+ offset 3", "partition 3+ offset 3" can also be understood in the same way and will not be described in detail here for the sake of brevity.
In addition, in order to avoid a problem that a storage location of data in the first database is changed after the data in the first database is reprocessed, so that the storage location of the data in the first database is inconsistent with a storage location of data stored in a pre-established index table, in some possible implementations, the method 200 may further include: in the event that a determination is made that data in the first database has changed, one or more index tables associated with the changed data are updated.
For example, a change in data in the first database may include data in the first database being deleted or modified, or one or more new pieces of data being added to the first database. For example, if data corresponding to the position "partition 1+ offset1+ length 1" is deleted, it is necessary to delete the piece of "partition 1+ offset1+ length 1" in the column of the storage position of the data in table 1.
Optionally, the first data is batch data, the first database is hive, and in a case that it is determined that data in the first database changes, updating one or more index tables associated with the changed data may include: and updating one or more index tables associated with the changed data under the condition of monitoring that the data in the first database is changed.
Illustratively, a metadata monitoring process, such as MetaStoreEventListener, may be loaded in the live metadata management service (metaStore), and the metadata monitoring process may initiate a function of monitoring information of metadata in the live data table, such as a function onAlterTable of monitoring a structure of a modified data table, a function onInsert of monitoring data in a data table, a function onAddPartition of monitoring an added partition, and the like, so that the information of the metadata in the live data table may be monitored in real time, and when the data in the first database is monitored to be deleted or modified, or one or more pieces of data are newly added in the first database, it may be determined that the data in the first database has changed, and thus, one or more index tables associated with the changed data may be updated to ensure consistency of the data.
Optionally, the first data is streaming data, the first database is Kafka, and in a case that it is determined that data in the first database changes, updating one or more index tables associated with the changed data may include: reading newly added information in each topic in real time; in an instance in which it is determined that at least one newly added message in a topic indicates an addition, deletion or modification to data in the first database, one or more index tables associated with the changed data are updated.
For example, Kafka may consume the new message in each topic in real time, and consume, that is, read, when the new message in each topic is read, if the new message is a message that deletes or modifies data in the first database, or if the new message is a message that adds one or more pieces of data in the first database, it may be determined that data in the first database has changed, so that one or more index tables associated with the changed data may be updated to ensure data consistency.
Based on the scheme, after data query is triggered through the pre-constructed index table comprising the corresponding relation between the query conditions and the storage positions of the data, the storage positions of the data corresponding to the query conditions can be found in the index table according to the obtained query conditions, so that the data queried by a user can be quickly positioned according to the storage positions, the data are read out and fed back to the user, and the time delay of querying the data is reduced. In addition, when the data in the first database is determined to be changed, one or more index tables associated with the changed data are updated, so that the problem that the storage position of the data in the first database is inconsistent with the storage position of the data stored in the pre-established index table can be avoided.
Fig. 3 is a schematic block diagram of an apparatus for querying data according to an embodiment of the present application.
As shown in fig. 3, the apparatus 300 for querying data may include: an acquisition module 310, a reading module 320, and a reading module 330. The apparatus 300 for querying data may be configured to implement the functions of the computer in the method 200, where the obtaining module 310 may be configured to obtain a query condition, where the query condition is used to query the first data in the first database; the query module 320 may be configured to query, based on a query condition, a storage location of the first data in the first database from a pre-constructed target index table, where the target index table includes a correspondence between at least one query condition and a storage location of at least one piece of data in the first database, and each query condition in the at least one query condition corresponds to one or more pieces of data in the first database; the reading module 330 may be configured to read the first data based on the storage location of the first data.
Optionally, the query module 320 may be specifically configured to determine a target index table from at least one index table that is constructed in advance, where the target index table is an index table that includes the query condition, and each index table in the at least one index table includes a correspondence between the at least one query condition and a storage location of at least one piece of data in the first database; and inquiring the storage position of the first data from the target index table based on the inquiry condition.
Optionally, the apparatus 300 for querying data may further include a building module 340, and the building module 340 may be configured to build at least one index table.
Optionally, the building module 340 may be specifically configured to obtain original information of the first database; determining a table name of each index table in at least one index table and at least one query condition included in each index table based on the original information; and constructing at least one index table based on the table name of each index table, at least one query condition included in each index table and the storage position of the data corresponding to each query condition in the first database.
Optionally, the at least one index table is built in a second database, the second database being a database with online analysis processing capability.
Optionally, the building module 340 may be further configured to update one or more index tables associated with changed data in the case that it is determined that the data in the first database has changed.
Optionally, the first data is batch data, the batch data is data processed in batches on a large data platform, the first database is hive, the original information includes a table name of each data table in at least one data table included in the first database and at least one field included in each data table, the table name of each data table is used for determining a table name of an index table, and the at least one field in each data table is used for determining a query condition.
Optionally, the building module 340 may be further specifically configured to, when it is monitored that the data in the first database changes, update one or more index tables associated with the changed data.
Optionally, the first data is streaming data, the streaming data is data streamed on a big data platform, the first database is Kafka, the original information includes a name of each topic in at least one topic included in the first database and at least one field included in each topic, the name of each topic is used for determining a table name of an index table, and at least one field included in each topic is used for determining a query condition.
Optionally, the building module 340 may be further specifically configured to read a new message in each topic in real time; in an instance in which it is determined that at least one newly added message in a topic indicates an addition, deletion or modification to data in the first database, one or more index tables associated with the changed data are updated.
It should be understood that the division of the modules of the apparatus for querying data in fig. 3 is only an example, in practical applications, different functional modules may be divided according to different functional requirements, the present application does not limit the division form and number of the functional modules in practical applications, and fig. 3 does not limit the present application in any way.
Fig. 4 is a schematic block diagram of another apparatus for querying data provided in an embodiment of the present application.
The apparatus 400 for querying data can be used to implement the functions of the computer in the method 200. The apparatus 400 for querying data may be a system on a chip. In the embodiment of the present application, the chip system may be composed of a chip, and may also include a chip and other discrete devices.
As shown in fig. 4, the apparatus 400 for querying data may include at least one processor 410 for implementing the functions of a computer in the method 200 provided by the embodiment of the present application.
Illustratively, when the apparatus 400 for querying data is used to implement the method 200 provided by the embodiment of the present application, the processor 410 may be configured to obtain a query condition, where the query condition is used to query the first data in the first database; inquiring the storage position of the first data in the first database from a pre-constructed target index table based on the inquiry condition, wherein the target index table comprises the corresponding relation between at least one inquiry condition and the storage position of at least one piece of data in the first database, and each inquiry condition in the at least one inquiry condition corresponds to one or more pieces of data in the first database; the first data is read based on the storage location of the first data. For details, reference is made to the detailed description in the method example, which is not repeated herein.
The apparatus 400 for querying data may also include at least one memory 420 that may be used to store program instructions and data, and the like. The memory 420 is coupled to the processor 410. The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, units or modules, and may be an electrical, mechanical or other form for information interaction between the devices, units or modules. The processor 410 may operate in conjunction with the memory 420. Processor 410 may execute program instructions stored in memory 420. At least one of the at least one memory may be included in the processor.
The apparatus for querying data 400 may also include a communication interface 430 for communicating with other devices over a transmission medium such that the apparatus for querying data 400 may communicate with other devices. The communication interface 430 may be, for example, a transceiver, an interface, a bus, a circuit, or a device capable of performing a transceiving function. The processor 410 may utilize the communication interface 430 to send and receive data and/or information and to implement the computer-implemented method 200 described in the corresponding embodiment of fig. 2.
The specific connection medium between the processor 410, the memory 420 and the communication interface 430 is not limited in the embodiments of the present application. In fig. 4, the processor 410, the memory 420, and the communication interface 430 are connected by a bus 440. The bus 440 is shown in fig. 4 by a thick line, and the connection manner between other components is merely illustrative and not limited thereto. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The present application further provides a chip system, which includes at least one processor, and is configured to implement the functions involved in the computer-implemented method in the embodiment shown in fig. 2.
In one possible design, the system-on-chip further includes a memory to hold program instructions and data, the memory being located within the processor or external to the processor.
The chip system may be formed by a chip, and may also include a chip and other discrete devices.
The present application further provides a computer program product, the computer program product comprising: a computer program (also referred to as code, or instructions), which when executed, causes a computer to perform the method of the embodiment shown in fig. 2.
The present application also provides a computer-readable storage medium having stored thereon a computer program (also referred to as code, or instructions). When executed, the computer program causes a computer to perform the method of the embodiment shown in fig. 2.
It should be understood that the processor in the embodiments of the present application may be an integrated circuit chip having signal processing capability. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
As used in this specification, the terms "unit," "module," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution.
Those of ordinary skill in the art will appreciate that the various illustrative logical blocks and steps (step) described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more units are integrated into one module.
In the above embodiments, the functions of the functional modules may be wholly or partially implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). The procedures or functions described in accordance with the embodiments of the present application are generated in whole or in part when the computer program instructions (programs) are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. A method of querying data, the method comprising:
acquiring a query condition, wherein the query condition is used for querying first data in a first database;
inquiring the storage position of the first data in the first database from a pre-constructed target index table based on the inquiry condition, wherein the target index table comprises a corresponding relation between at least one inquiry condition and the storage position of at least one piece of data in the first database, and each inquiry condition in the at least one inquiry condition corresponds to one or more pieces of data in the first database;
reading the first data based on the storage location of the first data.
2. The method of claim 1, wherein the querying the storage location of the first data in the first database from a pre-constructed target index table based on the query condition comprises:
determining the target index table from at least one index table constructed in advance, wherein the target index table is an index table comprising the query conditions, and each index table in the at least one index table comprises the corresponding relation between at least one query condition and the storage position of at least one piece of data in the first database;
and inquiring the storage position of the first data from the target index table based on the inquiry condition.
3. The method of claim 2, wherein the method further comprises:
and constructing the at least one index table.
4. The method of claim 3, wherein said building the at least one index table comprises:
acquiring original information of the first database;
determining a table name of each index table in the at least one index table and at least one query condition included in each index table based on the original information;
and constructing at least one index table based on the table name of each index table, at least one query condition included in each index table and the storage position of the data corresponding to each query condition in the first database.
5. The method of claim 4, wherein the at least one index table is built in a second database, the second database being a database with online analytics processing capabilities.
6. The method of claim 4 or 5, wherein the method further comprises:
and updating one or more index tables associated with the changed data under the condition that the data in the first database is determined to be changed.
7. The method of claim 6, wherein the first data is batch data, the batch data is batch data processed on a large data platform, the first database is hive, the raw information comprises a table name of each data table of at least one data table included in the first database and at least one field included in each data table, the table name of each data table is used for determining a table name of an index table, and the at least one field in each data table is used for determining a query condition.
8. The method of claim 7, wherein in the event that a determination is made that data in the first database has changed, updating one or more index tables associated with the changed data comprises:
and updating one or more index tables associated with the changed data when the change of the data in the first database is monitored.
9. The method of claim 6, wherein the first data is streaming data, the streaming data is data streamed on a big data platform, the first database is Kafka, the raw information comprises a name of each topic in at least one topic comprised by the first database and at least one field comprised by each message in at least one message comprised by each topic, the name of each topic is used to determine a table name of an index table, and at least one field comprised by each message is used to determine a query condition.
10. The method of claim 9, wherein in the event that a determination is made that data in the first database has changed, updating one or more index tables associated with the changed data comprises:
reading newly added information in each topic in real time;
in the event that it is determined that at least one newly added message in a topic indicates an addition, deletion or modification to data in the first database, updating one or more index tables associated with the changed data.
11. An apparatus for querying data, the apparatus comprising:
the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for acquiring query conditions, and the query conditions are used for querying first data in a first database;
the query module is used for querying the storage position of the first data in the first database from a pre-constructed target index table based on the query condition, the target index table comprises a corresponding relation between at least one query condition and the storage position of at least one piece of data in the first database, and each query condition in the at least one query condition corresponds to one or more pieces of data in the first database;
and the reading module is used for reading the first data based on the storage position of the first data.
12. An apparatus for querying data, comprising a processor configured to perform the method of any of claims 1 to 10.
13. A computer-readable storage medium, comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 10.
14. A computer program product, comprising a computer program which, when executed, causes a computer to perform the method of any one of claims 1 to 10.
CN202210106077.1A 2022-01-28 2022-01-28 Method and device for querying data Pending CN114510480A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210106077.1A CN114510480A (en) 2022-01-28 2022-01-28 Method and device for querying data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210106077.1A CN114510480A (en) 2022-01-28 2022-01-28 Method and device for querying data

Publications (1)

Publication Number Publication Date
CN114510480A true CN114510480A (en) 2022-05-17

Family

ID=81551534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210106077.1A Pending CN114510480A (en) 2022-01-28 2022-01-28 Method and device for querying data

Country Status (1)

Country Link
CN (1) CN114510480A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203276A (en) * 2022-09-15 2022-10-18 江苏银承网络科技股份有限公司 Acceptance draft data processing method, system and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203276A (en) * 2022-09-15 2022-10-18 江苏银承网络科技股份有限公司 Acceptance draft data processing method, system and device

Similar Documents

Publication Publication Date Title
US11816083B2 (en) Method and system for indexing of time-series data
US8868595B2 (en) Enhanced control to users to populate a cache in a database system
US20140046928A1 (en) Query plans with parameter markers in place of object identifiers
CN111046034A (en) Method and system for managing memory data and maintaining data in memory
US8812489B2 (en) Swapping expected and candidate affinities in a query plan cache
US10678784B2 (en) Dynamic column synopsis for analytical databases
CN106471501B (en) Data query method, data object storage method and data system
CN106294695A (en) A kind of implementation method towards the biggest data search engine
CN109815240B (en) Method, apparatus, device and storage medium for managing index
KR20180077839A (en) Method for providing REST API service to process massive unstructured data
US9483523B2 (en) Information processing apparatus, distributed processing system, and distributed processing method
CN113051268A (en) Data query method, data query device, electronic equipment and storage medium
US11514697B2 (en) Probabilistic text index for semi-structured data in columnar analytics storage formats
CN114510480A (en) Method and device for querying data
CN114297204A (en) Data storage and retrieval method and device for heterogeneous data source
US10019483B2 (en) Search system and search method
CN115516432A (en) Method and system for identifying, managing and monitoring data dependencies
CN111198917A (en) Data processing method, device, equipment and storage medium
US11841855B2 (en) System and method for efficient processing and managing of reports data and metrics
CN113849524B (en) Data processing method and device
CN115795187A (en) Resource access method, device and equipment
US10769214B2 (en) Encoding and decoding files for a document store
Singh NoSQL: A new horizon in big data
CN116860700A (en) Method, device, equipment and medium for processing metadata in distributed file system
US20220365905A1 (en) Metadata processing method and apparatus, and a computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination