CN111488377A - Data query method and device, electronic equipment and storage medium - Google Patents

Data query method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111488377A
CN111488377A CN202010269966.0A CN202010269966A CN111488377A CN 111488377 A CN111488377 A CN 111488377A CN 202010269966 A CN202010269966 A CN 202010269966A CN 111488377 A CN111488377 A CN 111488377A
Authority
CN
China
Prior art keywords
data
target user
druid
queried
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010269966.0A
Other languages
Chinese (zh)
Inventor
康林
段效晨
秦占明
赵艳杰
罗廷方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010269966.0A priority Critical patent/CN111488377A/en
Publication of CN111488377A publication Critical patent/CN111488377A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Abstract

The embodiment of the application provides a data query method, a data query device, electronic equipment and a storage medium, relates to the technical field of computers, and is characterized in that whether the quantity of uploaded data of a target user is greater than a preset threshold value or not is judged according to a query request instruction, if the quantity of the uploaded data of the target user is greater than the preset threshold value, query contents of the data to be queried are obtained from a ClickHouse engine, if the quantity of the uploaded data of the target user is not greater than the preset threshold value, the query contents of the data to be queried are obtained from a Druid engine, the quantity of the uploaded data of the target user is utilized, two query engines are used in a combined mode, the performance of querying the data is improved, and the time for querying the data is shortened.

Description

Data query method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data query method, an apparatus, an electronic device, and a storage medium.
Background
In the big data era, due to the continuous iterative update of data, a user can generally perform data query on data stored in a system in a table form according to different requirements, and the user can conveniently make a more intelligent business decision by querying mass data. Data queries currently pass through Druid (a highly fault-tolerant, high-performance, open-source distributed system for large data real-time querying and analysis). However, when the data size of the query is large as a distributed data storage system supporting real-time analysis, the query performance is reduced due to the fact that the number of scanned partitions is large, the number of hit records is large, and the memory is occupied extremely.
Disclosure of Invention
An object of the embodiments of the present application is to provide a data query method, an apparatus, an electronic device, and a storage medium, so as to improve performance of querying data and shorten time for querying data.
The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a data query method, including:
acquiring a query request instruction of data to be queried, wherein the data to be queried is uploaded by a target user and comprises a user identifier of the target user;
judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value or not according to the query request instruction;
if the quantity of the data uploaded by the target user is larger than the preset threshold value, acquiring query content of the data to be queried from a ClickHouse engine;
and if the quantity of the data uploaded by the target user is not greater than the preset threshold value, acquiring query content of the data to be queried from the Druid engine.
Optionally, before the step of determining whether the quantity of the data uploaded by the target user is greater than a preset threshold according to the query request instruction, the method further includes:
acquiring the quantity of the data uploaded by a target user, and storing the user identification of the target user into a preset database if the quantity of the data uploaded by the target user is greater than the preset threshold;
the judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value includes:
and inquiring whether the user identification of the target user exists in the preset database or not according to the inquiry request instruction, wherein if the user identification of the target user exists in the preset database, the number of the data uploaded by the target user is larger than the preset threshold, and otherwise, the number of the data uploaded by the target user is not larger than the preset threshold.
Optionally, the drauid engine includes a drauid main cluster and a drauid sub-cluster, where the drauid main cluster and the Druid sub-cluster are both used to store data uploaded by a target user, and the data stored in the drauid main cluster and the data stored in the Druid sub-cluster are the same, and if the number of the data uploaded by the target user is not greater than the preset threshold, obtaining query content of the data to be queried from the drauid engine, including:
if the user identification of the target user does not exist in the preset database, querying the Druid main cluster;
and after querying the Druid main cluster, querying a Druid secondary cluster when a first preset condition is reached.
Optionally, the first preset condition includes:
the time for inquiring the Druid main cluster is larger than a first preset time threshold value and/or
And the failure probability of querying the Druid main cluster is not less than a first preset proportion threshold value.
Optionally, the data to be queried includes an upload time of the data to be queried, in the drive engine, the storage manner of the uploaded data includes monthly aggregation and daily aggregation, and if the number of the data uploaded by the target user is not greater than the preset threshold, acquiring query content of the data to be queried from the drive engine, including:
if the number of the data uploaded by the target user is not larger than the preset threshold, judging whether the uploading time of the data to be inquired is larger than a preset time range threshold;
if the uploading time of the data to be queried is larger than the preset time range threshold, querying the aggregated data in the drive engine according to a first preset period to obtain a first query result; querying the aggregated data according to a second preset period according to the first query result and the uploading time of the data to be queried, so as to obtain query content of the data to be queried;
and if the uploading time of the data to be queried is not greater than the preset time range threshold, querying the aggregated data according to the second preset period, so as to obtain the query content of the data to be queried.
In a second aspect, an embodiment of the present application provides a data query apparatus, including:
the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for acquiring a query request instruction of data to be queried, the data to be queried is uploaded by a target user, and the data to be queried comprises a user identifier of the target user;
the judging module is used for judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value or not according to the query request instruction;
the first processing module is used for acquiring query content of the data to be queried from a ClickHouse engine if the quantity of the data uploaded by the target user is greater than the preset threshold value;
and the second processing module is used for acquiring the query content of the data to be queried from the Druid engine if the quantity of the data uploaded by the target user is not greater than the preset threshold value.
Optionally, the apparatus further comprises:
the storage module is used for acquiring the quantity of the data uploaded by the target user, and storing the user identifier of the target user into a preset database if the quantity of the data uploaded by the target user is greater than the preset threshold;
the judgment module is specifically configured to:
and inquiring whether the user identification of the target user exists in the preset database or not according to the inquiry request instruction, wherein if the user identification of the target user exists in the preset database, the number of the data uploaded by the target user is larger than the preset threshold, and otherwise, the number of the data uploaded by the target user is not larger than the preset threshold.
Optionally, the drauid engine includes a drauid main cluster and a drauid sub-cluster, where the drauid main cluster and the drauid sub-cluster are both used to store data that has been uploaded by a target user, and the data stored in the drauid main cluster and the data stored in the drauid sub-cluster are the same, and the second processing module is specifically configured to:
if the user identification of the target user does not exist in the preset database, querying the Druid main cluster;
and after querying the Druid main cluster, querying a Druid secondary cluster when a first preset condition is reached.
Optionally, the first preset condition includes:
the time for inquiring the Druid main cluster is larger than a first preset time threshold value and/or
And the failure probability of querying the Druid main cluster is not less than a first preset proportion threshold value.
Optionally, the data to be queried includes an upload time of the data to be queried, in the Druid engine, storage manners of the uploaded data include month aggregation and day aggregation, and the second processing module includes:
the judging submodule is used for judging whether the uploading time of the data to be inquired is greater than a preset time range threshold value or not if the quantity of the data uploaded by the target user is not greater than the preset threshold value;
the first processing sub-module is used for querying the aggregated data in the drive engine according to a first preset period to obtain a first query result if the uploading time of the data to be queried is greater than the preset time range threshold; querying the aggregated data according to a second preset period according to the first query result and the uploading time of the data to be queried, so as to obtain query content of the data to be queried;
and the second processing sub-module is used for querying the aggregated data according to the second preset period if the uploading time of the data to be queried is not greater than the preset time range threshold value, so as to obtain the query content of the data to be queried.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein:
the processor, the communication interface and the memory complete mutual communication through a communication bus;
the memory is used for storing a computer program;
the processor is configured to implement the data query method according to any one of the first aspect described above when executing a program stored in the memory.
In a fourth aspect, an embodiment of the present application provides a storage medium, where instructions are stored in the storage medium, and when the instructions are executed on a computer, the instructions cause the computer to execute the data query method according to any one of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the data query method of any one of the above first aspects.
According to the data query method, the data query device, the electronic equipment, the storage medium and the computer program product containing the instructions, whether the quantity of data uploaded by a target user is greater than a preset threshold value or not is judged according to the query request instructions, if the quantity of the data uploaded by the target user is greater than the preset threshold value, query contents of the data to be queried are obtained from a ClickHouse engine, if the quantity of the data uploaded by the target user is not greater than the preset threshold value, the query contents of the data to be queried are obtained from a Druid engine, the quantity of the data uploaded by the target user is utilized, two query engines are combined for use, the performance of querying the data is improved, and the time for querying the data is shortened. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1a is a first schematic diagram of a data query method according to an embodiment of the present application;
FIG. 1b is a second schematic diagram of a data query method according to an embodiment of the present application;
FIG. 1c is an interactive schematic diagram of a data query method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a data query device according to an embodiment of the present application;
fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to improve the performance of data query and shorten the time for querying data, the application discloses a data query method, which comprises the following steps:
acquiring a query request instruction of data to be queried, wherein the data to be queried is uploaded by a target user and comprises a user identifier of the target user;
judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value or not according to the query request instruction;
if the quantity of the data uploaded by the target user is larger than the preset threshold value, acquiring query content of the data to be queried from a ClickHouse engine;
and if the quantity of the data uploaded by the target user is not greater than the preset threshold value, acquiring query content of the data to be queried from the Druid engine.
According to the query request instruction, whether the quantity of the data uploaded by the target user is larger than a preset threshold value or not is judged, if the quantity of the data uploaded by the target user is larger than the preset threshold value, the query content of the data to be queried is obtained from a ClickHouse engine, if the quantity of the data uploaded by the target user is not larger than the preset threshold value, the query content of the data to be queried is obtained from a Druid engine, and the two query engines are combined for use by utilizing the quantity of the data uploaded by the target user, so that the performance of querying the data is improved, and the time for querying the data is shortened.
An embodiment of the present application provides a data query method, and referring to fig. 1a, fig. 1a is a first schematic diagram of the data query method according to the embodiment of the present application, including the following steps:
step 110, obtaining a query request instruction of data to be queried, where the data to be queried is uploaded by a target user, and the data to be queried includes a user identifier of the target user.
The data query method of the embodiment of the application can be implemented by an electronic device, and specifically, the electronic device can be a server or the like.
The user may upload data to a preset database, for example, the user may upload data to the database of the website every day in a video website, where the uploaded data of the user includes user identifiers of the user, the user identifiers of the user have a one-to-one correspondence with the user, and one user identifier corresponds to only one user. For example, in a video website, a user uploads video data to the video website every day, each piece of video data uploaded by the user includes a user name of the user, each user can upload a plurality of pieces of data, then, the data that the user has uploaded in the past can be queried, and when querying the data, a query request instruction of the data to be queried is obtained, wherein the data to be queried is uploaded by a target user, and the data to be queried includes a user identifier of the target user.
And step 120, judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value according to the query request instruction.
Because the data to be queried includes the user identifier of the target user, the number of the data uploaded by the target user can be obtained according to the user identifier of the target user, and then whether the number of the data uploaded by the target user is greater than a preset threshold value is judged. The preset threshold may be set according to actual needs, for example, performance of the ClickHouse engine and the Druid engine is tested, when the user queries data to be queried in data with a data amount less than 8000, performance of the Druid engine is better than that of the ClickHouse engine, and when the user queries data to be queried in data with a data amount greater than 8000, performance of the ClickHouse is better than that of the Druid engine, the preset threshold may be set to 8000, and specifically, the preset threshold may be set according to actual needs, which is not limited herein.
Step 130, if the number of the data uploaded by the target user is greater than a preset threshold, acquiring query content of the data to be queried from a clickwouse engine.
The ClickHouse engine is an open-source column storage database management system facing online analysis processing, and when the quantity of the data uploaded by the target user is larger than a preset threshold value, the query content of the data to be queried is acquired from the ClickHouse engine. When the ClickHouse engine provides data query service, the ClickHouse engine is less affected by data volume, and has good performance in large data volume query service, and query time can be shortened when large data volume is queried.
Step 140, if the number of the data uploaded by the target user is not greater than a preset threshold, obtaining query content of the data to be queried from the drive engine.
The Druid engine is a high fault tolerance and high performance open source distributed system for large data real-time query and analysis, when the number of the uploaded data of the target user is not larger than a preset threshold value, the query content of the data to be queried is obtained from the Druid engine, and when the Druid engine provides data query service, the Druid engine is less affected by a time range, and when the data with small data volume and wide time range is queried, the performance is good, and the query time can be shortened.
According to the query request instruction, whether the quantity of the data uploaded by the target user is larger than a preset threshold value or not is judged, if the quantity of the data uploaded by the target user is larger than the preset threshold value, the query content of the data to be queried is obtained from a ClickHouse engine, if the quantity of the data uploaded by the target user is not larger than the preset threshold value, the query content of the data to be queried is obtained from a Druid engine, and the two query engines are combined for use by utilizing the quantity of the data uploaded by the target user, so that the performance of querying the data is improved, and the time for querying the data is shortened.
Referring to fig. 1b, fig. 1b is a second schematic view of a data query method according to an embodiment of the present application, in a possible implementation manner, before the step of determining, according to the query request instruction, whether the number of data uploaded by the target user is greater than a preset threshold, the method further includes:
step 100, acquiring the quantity of data uploaded by a target user, and storing a user identifier of the target user into a preset database if the quantity of the data uploaded by the target user is greater than the preset threshold;
the determining whether the quantity of the data uploaded by the target user is greater than a preset threshold includes:
step 121, querying whether the user identifier of the target user exists in the preset database according to the query request instruction, wherein if the user identifier of the target user exists in the preset database, it indicates that the quantity of the data uploaded by the target user is greater than the preset threshold, and otherwise, it indicates that the quantity of the data uploaded by the target user is not greater than the preset threshold.
If the number of the data uploaded by the target user is greater than a preset threshold, storing the user identifier of the target user into a preset database, then detecting whether the user identifier of the target user exists in the preset database according to the query request instruction, if the user identifier of the target user exists in the preset database, indicating that the number of the data uploaded by the target user is greater than the preset threshold, otherwise indicating that the number of the data uploaded by the target user is not greater than the preset threshold.
In a possible implementation manner, the obtaining the number of the data uploaded by the target user, and storing the user identifier of the target user in a preset database if the number of the data uploaded by the target user is greater than the preset threshold includes:
and acquiring the quantity of the data uploaded by the target user, and storing the user identification of the target user into a bit array of a preset bloom filter if the quantity of the data uploaded by the target user is greater than a preset threshold value.
The preset database is a bloom filter, the bloom filter includes a bit array and a hash function, for example, if the number of the data uploaded by the user a is greater than a preset threshold 8000, the user identifier of the user a, for example, the user account of the user a, is mapped onto the bit array of the bloom filter through the hash function of the bloom filter, the data to be queried is queried, and whether the user identifier of the user to which the data to be queried belongs exists in the bit array of the bloom filter is determined. Assume that the number of hash functions is 3. Firstly, initializing a bit array of a bloom filter, setting each bit in the bit array to be 0, mapping user identifications of users once through a hash function for all user identifications of the users, so that each mapping generates a hash value, the hash value corresponds to a point on the bit array, and then marking a position corresponding to the bit array as 1. When inquiring whether the user identification of the target user exists in the set, the same method is adopted to map the user identification of the target user to 3 points on the bit array through Hash mapping, if one of the 3 points is not 1, the user identification of the target user can be judged to be absent in the set, otherwise, if the 3 points are all 1, the user identification of the target user exists in the set. Specifically, how to store the user identifier of the user in the bit array of the preset bloom filter may refer to a method in the prior art/related art, which is not described herein again. The bloom filter can be used for realizing efficient storage and query of data, improving the performance of querying the data and further shortening the time for querying the data.
In one possible embodiment, the method for acquiring query content of data to be queried from a Druid engine includes:
if the user identification of the target user does not exist in the preset database, querying the Druid main cluster;
after the Druid main cluster is queried, when a first preset condition is reached, a Druid secondary cluster is queried.
The Druid engine comprises a Druid main cluster and a Druid secondary cluster, if the user identification of the target user exists in the preset database, the Druid main cluster is inquired preferentially, and after the Druid main cluster is inquired, when a first preset condition is reached, for example, when the time for inquiring the Druid main cluster is greater than a preset time threshold or the success probability for inquiring the Druid main cluster is less than a preset success rate threshold, the Druid secondary cluster is inquired, so that data backup is realized, and the reliability and the accuracy of data inquiry are improved.
In a possible implementation, the first preset condition includes:
the time for inquiring the above-mentioned Druid main cluster is greater than first preset time threshold value and/or
And the failure probability of querying the above-mentioned Druid main cluster is not less than a first preset proportion threshold value.
And preferentially inquiring the Druid main cluster, and inquiring the Druid secondary cluster when the time for inquiring the Druid main cluster is greater than a first preset time threshold, or the failure probability for inquiring the Druid main cluster is not less than a first preset proportion threshold, or the time for inquiring the Druid main cluster is greater than the first preset time threshold and the failure probability for inquiring the Druid main cluster is not less than the first preset proportion threshold, so that the backup of data is realized, and the reliability and the accuracy of data inquiry are improved.
In a possible implementation manner, the data to be queried includes an upload time of the data to be queried, the uploaded data is stored in the Druid engine in a manner of monthly aggregation and daily aggregation, and if the number of the uploaded data of the target user is not greater than the preset threshold, the obtaining, from the Druid engine, query content of the data to be queried includes:
if the uploading time of the data to be queried is larger than the preset time range threshold, querying the aggregated data in a first preset period from the drive engine to obtain a first query result; querying the aggregated data according to a second preset period according to the first query result and the uploading time of the data to be queried, so as to obtain the query content of the data to be queried;
and if the uploading time of the data to be inquired is not more than the preset time range threshold, inquiring the aggregated data according to the second preset period so as to acquire the inquiry content of the data to be inquired.
The data to be queried also comprises the uploading time of the data to be queried, so that a user can query the data to be queried according to the uploading time of the data to be queried when querying the data to be queried, and because the influence of the performance of the Druid engine on the time range is small, in the Druid engine, the uploaded data is stored in a mode of aggregating the data according to the uploading time of the data according to a preset time period, so that the user can conveniently query the data. That is, according to the uploading time of the data, the data are aggregated according to a preset time period, for example, monthly aggregation, daily aggregation, etc., so as to facilitate the query of the user. When the data is queried by using the drive engine, it may be first determined whether the uploading time of the data to be queried is greater than a preset time range threshold, for example, it is determined whether the time of the data to be queried is greater than 6 months, that is, whether the data to be queried is uploaded before 6 months, if the time is greater than 6 months, the data aggregated according to a large time period is queried from the drive engine, for example, the data aggregated according to the months is queried to obtain a monthly aggregated query result, and then the data aggregated according to the monthly aggregated query result and the uploading time of the data to be queried is queried. If the time is not more than 6 months, the query is carried out according to the data aggregated by days. Thereby shortening the time for querying data.
In a possible implementation manner, the ClickHouse engine includes a ClickHouse main cluster and a ClickHouse sub cluster, where the ClickHouse main cluster and the ClickHouse sub cluster are both used to store data that has been uploaded by a user, and the data stored in the ClickHouse main cluster and the data stored in the ClickHouse sub cluster are the same, and if the number of data that has been uploaded by the user to which the data to be queried belongs is greater than a preset threshold, the obtaining query content of the data to be queried from the ClickHouse engine includes:
and after the ClickHouse main cluster is queried, querying a ClickHouse secondary cluster when a second preset condition is reached.
The ClickHouse engine comprises a ClickHouse main cluster and a ClickHouse sub-cluster, if the user identification exists in the bit array of the preset bloom filter, the ClickHouse main cluster is inquired preferentially, and when a second preset condition is reached, for example, when the time for inquiring the ClickHouse main cluster is greater than a preset time threshold or the success probability for inquiring the ClickHouse main cluster is less than a preset success rate threshold, the ClickHouse sub-cluster is inquired, so that data backup is realized, and the reliability and the accuracy of data inquiry are improved.
In a possible implementation, the second preset condition includes:
the time for inquiring the ClickHouse master cluster is greater than a second preset time threshold and/or
And the failure probability of querying the ClickHouse main cluster is not less than a second preset proportion threshold.
And preferentially inquiring the ClickHouse main cluster, and when the time for inquiring the ClickHouse main cluster is greater than a second preset time threshold, or the failure probability for inquiring the ClickHouse main cluster is not less than a second preset proportion threshold, or the time for inquiring the ClickHouse main cluster is greater than the second preset time threshold and the failure probability for inquiring the ClickHouse main cluster is not less than the second preset proportion threshold, inquiring the ClickHouse secondary cluster, so as to realize data backup and improve the reliability and accuracy of data inquiry.
The method and the device are suitable for inquiring the data to be inquired according to the free time range, and when the data is inquired according to the fixed time range, the data to be inquired can be inquired through the ElasticSearch distributed full-text retrieval engine.
Referring to fig. 1c, fig. 1c is an interactive schematic diagram of a data query method according to an embodiment of the present application, where a query request instruction of data to be queried is obtained, it is determined whether a query range of the data to be queried is a fixed time range, if the query range is the fixed time range, the data to be queried is queried by an ElasticSearch distributed full-text retrieval engine, if the query range is not the fixed time range, that is, a free time range query, it is determined whether the number of data uploaded by the target user is greater than a preset threshold according to the query request instruction, if the number of data uploaded by the target user is greater than the preset threshold, query content of the data to be queried is obtained from a ClickHouse engine, specifically, the ClickHouse engine includes a ClickHouse main cluster and a ClickHouse sub-cluster, and after the ClickHouse main cluster is preferentially queried, when a second preset condition is reached, the ClickHouse sub-cluster is queried; and if the number of the uploaded data of the target user is not larger than a preset threshold value, acquiring query contents of the data to be queried from the Druid engine, querying the Druid main cluster, and querying the Druid sub-cluster when a first preset condition is reached. Specifically, when querying the Druid main cluster and the Druid sub cluster, querying can be performed according to the uploading time of the data to be queried, and if the uploading time of the data to be queried is greater than the preset time range threshold, querying monthly aggregated data from a Druid engine to obtain a monthly aggregated query result; inquiring the daily aggregated data according to the monthly aggregated inquiry result and the uploading time of the data to be inquired, thereby obtaining the inquiry content of the data to be inquired; and if the uploading time of the data to be inquired is not more than the preset time range threshold, inquiring the data aggregated from the day, thereby obtaining the inquiry content of the data to be inquired. According to the query request instruction, whether the quantity of the data uploaded by the target user is larger than a preset threshold value or not is judged, if the quantity of the data uploaded by the target user is larger than the preset threshold value, the query content of the data to be queried is obtained from a ClickHouse engine, if the quantity of the data uploaded by the target user is not larger than the preset threshold value, the query content of the data to be queried is obtained from a Druid engine, and the two query engines are combined for use by utilizing the quantity of the data uploaded by the target user, so that the performance of querying the data is improved, and the time for querying the data is shortened.
An apparatus is further provided in the embodiment of the present application, referring to fig. 2, where fig. 2 is a schematic diagram of a data query apparatus in the embodiment of the present application, where the apparatus includes:
an acquisition module 210, configured to acquire a query request instruction of data to be queried, where the data to be queried is uploaded by a target user, and the data to be queried includes a user identifier of the target user;
a determining module 220, configured to determine, according to the query request instruction, whether the quantity of the data uploaded by the target user is greater than a preset threshold;
a first processing module 230, configured to obtain query content of the data to be queried from a clickwouse engine if the number of the data that has been uploaded by the target user is greater than the preset threshold;
the second processing module 240 is configured to obtain query content of the data to be queried from the Druid engine if the number of the data uploaded by the target user is not greater than the preset threshold.
In a possible embodiment, the above apparatus further comprises:
the storage module is used for acquiring the quantity of the data uploaded by the target user, and storing the user identifier of the target user into a preset database if the quantity of the data uploaded by the target user is greater than the preset threshold;
the determining module 220 is specifically configured to:
and inquiring whether the user identification of the target user exists in the preset database or not according to the inquiry request instruction, wherein if the user identification of the target user exists in the preset database, the number of the data uploaded by the target user is larger than the preset threshold, and otherwise, the number of the data uploaded by the target user is not larger than the preset threshold.
In a possible implementation manner, the Druid engine includes a Druid main cluster and a Druid sub cluster, the Druid main cluster and the Druid sub cluster are both used to store data uploaded by a target user, and the Druid main cluster and the Druid sub cluster store the same data, and the second processing module is specifically configured to:
if the user identification of the target user does not exist in the preset database, querying the Druid main cluster;
after the Druid main cluster is queried, when a first preset condition is reached, a Druid secondary cluster is queried.
In a possible implementation, the first preset condition includes:
the time for inquiring the above-mentioned Druid main cluster is greater than first preset time threshold value and/or
And the failure probability of querying the above-mentioned Druid main cluster is not less than a first preset proportion threshold value.
In a possible implementation manner, the data to be queried includes an upload time of the data to be queried, in the Druid engine, the storage manner of the uploaded data includes a monthly aggregation and a daily aggregation, and the second processing module 240 includes:
the judging submodule is used for judging whether the uploading time of the data to be inquired is greater than a preset time range threshold value or not if the quantity of the data uploaded by the target user is not greater than the preset threshold value;
the first processing sub-module is used for querying the aggregated data in the drive engine according to a first preset period to obtain a first query result if the uploading time of the data to be queried is greater than the preset time range threshold; querying the aggregated data according to a second preset period according to the first query result and the uploading time of the data to be queried, so as to obtain the query content of the data to be queried;
and the second processing sub-module is used for querying the aggregated data according to the second preset period if the uploading time of the data to be queried is not greater than the preset time range threshold, so as to obtain the query content of the data to be queried.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
An embodiment of the present application further provides an electronic device, referring to fig. 3, where fig. 3 is a schematic diagram of the electronic device according to the embodiment of the present application, and the electronic device includes: the system comprises a processor 310, a communication interface 320, a memory 330 and a communication bus 340, wherein the processor 310, the communication interface 320 and the memory 330 are communicated with each other through the communication bus 340, and the memory 330 is used for storing computer programs;
the processor 310 is configured to implement the following steps when executing the computer program stored in the memory 330:
acquiring a query request instruction of data to be queried, wherein the data to be queried is uploaded by a target user and comprises a user identifier of the target user;
judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value or not according to the query request instruction;
if the quantity of the data uploaded by the target user is larger than the preset threshold value, acquiring query content of the data to be queried from a ClickHouse engine;
and if the quantity of the data uploaded by the target user is not greater than the preset threshold value, acquiring query content of the data to be queried from the Druid engine.
Optionally, the processor 310, when being configured to execute the program stored in the memory 330, may also implement any of the above data query methods.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In an embodiment of the present application, there is also provided a storage medium having instructions stored therein, which when run on a computer, cause the computer to execute any of the above-mentioned data query methods in the above-mentioned embodiments.
In an embodiment of the present application, there is also provided a computer program product containing instructions, which when run on a computer, cause the computer to execute any of the above-mentioned data query methods in the above-mentioned embodiments.
The computer instructions may be stored on or transmitted from a computer-readable storage medium to another computer-readable storage medium, e.g., from a website, computer, server, or data center via wired (e.g., coaxial cable, optical fiber, digital subscriber line (DS L)) or wireless (e.g., infrared, wireless, microwave, etc.) means to another website, computer, server, or data center, the computer-readable storage medium may be any computer-accessible medium or integrated Solid State or multi-media storage medium (e.g., optical Disk.
It should be noted that, in this document, the technical features in the various alternatives can be combined to form the scheme as long as the technical features are not contradictory, and the scheme is within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the same element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A method for data query, the method comprising:
acquiring a query request instruction of data to be queried, wherein the data to be queried is uploaded by a target user and comprises a user identifier of the target user;
judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value or not according to the query request instruction;
if the quantity of the data uploaded by the target user is larger than the preset threshold value, acquiring query content of the data to be queried from a ClickHouse engine;
and if the quantity of the data uploaded by the target user is not greater than the preset threshold value, acquiring query content of the data to be queried from the Druid engine.
2. The method according to claim 1, wherein before the step of determining whether the amount of data uploaded by the target user is greater than a preset threshold according to the query request instruction, the method further comprises:
acquiring the quantity of the data uploaded by a target user, and storing the user identification of the target user into a preset database if the quantity of the data uploaded by the target user is greater than the preset threshold;
the judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value includes:
and inquiring whether the user identification of the target user exists in the preset database or not according to the inquiry request instruction, wherein if the user identification of the target user exists in the preset database, the number of the data uploaded by the target user is larger than the preset threshold, and otherwise, the number of the data uploaded by the target user is not larger than the preset threshold.
3. The method according to claim 2, wherein the Druid engine includes a Druid main cluster and a Druid sub-cluster, the Druid main cluster and the Druid sub-cluster are both used to store the data uploaded by the target user, and the data stored in the Druid main cluster and the Druid sub-cluster are the same, and if the amount of the data uploaded by the target user is not greater than the preset threshold, the obtaining the query content of the data to be queried from the Druid engine includes:
if the user identification of the target user does not exist in the preset database, querying the Druid main cluster;
and after querying the Druid main cluster, querying a Druid secondary cluster when a first preset condition is reached.
4. The method according to claim 3, wherein the first preset condition comprises:
the time for inquiring the Druid main cluster is larger than a first preset time threshold value and/or
And the failure probability of querying the Druid main cluster is not less than a first preset proportion threshold value.
5. The method according to claim 1, wherein the data to be queried includes an upload time of the data to be queried, the storage manner of the uploaded data in the Druid engine includes a monthly aggregation and a daily aggregation, and if the number of the uploaded data of the target user is not greater than the preset threshold, the obtaining the query content of the data to be queried from the Druid engine includes:
if the number of the data uploaded by the target user is not larger than the preset threshold, judging whether the uploading time of the data to be inquired is larger than a preset time range threshold;
if the uploading time of the data to be queried is larger than the preset time range threshold, querying the aggregated data in the drive engine according to a first preset period to obtain a first query result; querying the aggregated data according to a second preset period according to the first query result and the uploading time of the data to be queried, so as to obtain query content of the data to be queried;
and if the uploading time of the data to be queried is not greater than the preset time range threshold, querying the aggregated data according to the second preset period, so as to obtain the query content of the data to be queried.
6. A data query apparatus, characterized in that the apparatus comprises:
the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for acquiring a query request instruction of data to be queried, the data to be queried is uploaded by a target user, and the data to be queried comprises a user identifier of the target user;
the judging module is used for judging whether the quantity of the data uploaded by the target user is greater than a preset threshold value or not according to the query request instruction;
the first processing module is used for acquiring query content of the data to be queried from a ClickHouse engine if the quantity of the data uploaded by the target user is greater than the preset threshold value;
and the second processing module is used for acquiring the query content of the data to be queried from the Druid engine if the quantity of the data uploaded by the target user is not greater than the preset threshold value.
7. The apparatus of claim 6, further comprising:
the storage module is used for acquiring the quantity of the data uploaded by the target user, and storing the user identifier of the target user into a preset database if the quantity of the data uploaded by the target user is greater than the preset threshold;
the judgment module is specifically configured to:
and inquiring whether the user identification of the target user exists in the preset database or not according to the inquiry request instruction, wherein if the user identification of the target user exists in the preset database, the number of the data uploaded by the target user is larger than the preset threshold, and otherwise, the number of the data uploaded by the target user is not larger than the preset threshold.
8. The apparatus of claim 7, wherein the Druid engine comprises a Druid main cluster and a Druid sub cluster, the Druid main cluster and the Druid sub cluster are both configured to store data uploaded by a target user, and the Druid main cluster and the Druid sub cluster store the same data, and the second processing module is specifically configured to:
if the user identification of the target user does not exist in the preset database, querying the Druid main cluster;
and after querying the Druid main cluster, querying a Druid secondary cluster when a first preset condition is reached.
9. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the data query method of any one of claims 1 to 5 when executing a program stored in the memory.
10. A storage medium, characterized in that the storage medium has stored therein a computer program which, when executed by a processor, implements the data query method of any one of claims 1-5.
CN202010269966.0A 2020-04-08 2020-04-08 Data query method and device, electronic equipment and storage medium Pending CN111488377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010269966.0A CN111488377A (en) 2020-04-08 2020-04-08 Data query method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010269966.0A CN111488377A (en) 2020-04-08 2020-04-08 Data query method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111488377A true CN111488377A (en) 2020-08-04

Family

ID=71794674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010269966.0A Pending CN111488377A (en) 2020-04-08 2020-04-08 Data query method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111488377A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084216A (en) * 2020-09-16 2020-12-15 上海宏路数据技术股份有限公司 Data query system based on bloom filter
CN112650759A (en) * 2020-12-30 2021-04-13 中国平安人寿保险股份有限公司 Data query method and device, computer equipment and storage medium
CN112667607A (en) * 2021-01-18 2021-04-16 中国民航信息网络股份有限公司 Historical data management method and related equipment
CN113743975A (en) * 2021-01-29 2021-12-03 北京沃东天骏信息技术有限公司 Advertisement effect processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150199351A1 (en) * 2014-01-15 2015-07-16 Microsoft Corporation Automated Multimedia Content Recognition
CN109033123A (en) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 Querying method, device, computer equipment and storage medium based on big data
CN109977308A (en) * 2019-03-20 2019-07-05 北京字节跳动网络技术有限公司 Construction method, device, storage medium and the electronic equipment of user group's portrait
CN110008228A (en) * 2019-03-26 2019-07-12 北京字节跳动网络技术有限公司 Acquisition methods and device, the storage medium and electronic equipment of user group's data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150199351A1 (en) * 2014-01-15 2015-07-16 Microsoft Corporation Automated Multimedia Content Recognition
CN109033123A (en) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 Querying method, device, computer equipment and storage medium based on big data
CN109977308A (en) * 2019-03-20 2019-07-05 北京字节跳动网络技术有限公司 Construction method, device, storage medium and the electronic equipment of user group's portrait
CN110008228A (en) * 2019-03-26 2019-07-12 北京字节跳动网络技术有限公司 Acquisition methods and device, the storage medium and electronic equipment of user group's data

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084216A (en) * 2020-09-16 2020-12-15 上海宏路数据技术股份有限公司 Data query system based on bloom filter
CN112084216B (en) * 2020-09-16 2021-05-11 上海嗨普智能信息科技股份有限公司 Data query system based on bloom filter
CN112650759A (en) * 2020-12-30 2021-04-13 中国平安人寿保险股份有限公司 Data query method and device, computer equipment and storage medium
CN112650759B (en) * 2020-12-30 2023-10-27 中国平安人寿保险股份有限公司 Data query method, device, computer equipment and storage medium
CN112667607A (en) * 2021-01-18 2021-04-16 中国民航信息网络股份有限公司 Historical data management method and related equipment
CN112667607B (en) * 2021-01-18 2024-02-27 中国民航信息网络股份有限公司 Historical data management method and related equipment
CN113743975A (en) * 2021-01-29 2021-12-03 北京沃东天骏信息技术有限公司 Advertisement effect processing method and device

Similar Documents

Publication Publication Date Title
CN108009236B (en) Big data query method, system, computer and storage medium
CN111488377A (en) Data query method and device, electronic equipment and storage medium
CN108932236B (en) File management method and device
CN109741060B (en) Information inquiry system, method, device, electronic equipment and storage medium
WO2017016423A1 (en) Real-time new data update method and device
CN109284323B (en) Management method and device for detection data
CN111400288A (en) Data quality inspection method and system
CN108154024B (en) Data retrieval method and device and electronic equipment
CN111782692A (en) Frequency control method and device
CN110134738A (en) Distributed memory system resource predictor method, device
CN104462096A (en) Public opinion monitoring and analysis method and device
CN104239353A (en) WEB classification control and log auditing method
CN111694793A (en) Log storage method and device and log query method and device
CN104636368A (en) Data retrieval method and device and server
CN112965957A (en) Data migration method, device, equipment and storage medium
CN110708361B (en) System, method and device for determining grade of digital content publishing user and server
CN110909266B (en) Deep paging method and device and server
CN114528231A (en) Data dynamic storage method and device, electronic equipment and storage medium
CN111475505B (en) Data acquisition method and device
CN109960695B (en) Management method and device for database in cloud computing system
CN113468275A (en) Data importing method and device of graph database, storage medium and electronic equipment
CN111399754B (en) Method and device for releasing storage space and distributed system
CN113760854A (en) Method for identifying data in HDFS memory and related equipment
CN112699149A (en) Target data acquisition method and device, storage medium and electronic device
CN111698324B (en) Data request method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination