CN115145953A - Data query method - Google Patents

Data query method Download PDF

Info

Publication number
CN115145953A
CN115145953A CN202210751435.4A CN202210751435A CN115145953A CN 115145953 A CN115145953 A CN 115145953A CN 202210751435 A CN202210751435 A CN 202210751435A CN 115145953 A CN115145953 A CN 115145953A
Authority
CN
China
Prior art keywords
query
data
module
inquiry
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210751435.4A
Other languages
Chinese (zh)
Inventor
叶杨
陈伟
王维军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhuochen Info Tech Co ltd
Original Assignee
Shanghai Zhuochen Info Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhuochen Info Tech Co ltd filed Critical Shanghai Zhuochen Info Tech Co ltd
Publication of CN115145953A publication Critical patent/CN115145953A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data query method, which is applied to a self-adaptive optimization retrieval performance database for big data storage and mainly comprises the following steps: step 1: inputting a query request, and receiving and analyzing the query request by a query module to obtain a query condition; step (ii) of 2: judging whether the same query conditions exist in the cache module, if so, directly obtaining a query result from the cache module, and if not, entering the step 3; and step 3: adjusting the query resources distributed to each block of data in the storage module by the query module according to the reward and punishment function of the optimization module, and querying to obtain a query result; and 4, step 4: recording the information of each inquired block data in the inquiry process, wherein the information comprises inquiry conditions, inquiry time and inquiry results, and combining the inquiry conditions, the inquiry time and the inquiry results into an inquiry result set; and 5: and caching the query conditions and the query results into a cache module. The invention solves the problem that the prior art can not improve the query efficiency according to the real-time query condition of mass data.

Description

Data query method
The application is a divisional application of Chinese patent application with the application number of 202111291885.1 and the name of "adaptive optimization retrieval performance database and data query method" filed on 3.11 months in 2021.
Technical Field
The invention relates to a data query method, and belongs to the field of big data storage.
Background
Data processing can be broadly divided into two broad categories: online transaction Processing (OLTP) and online Analytical Processing (OLAP). OLTP is the primary application of traditional relational databases, primarily for basic, everyday transactions such as banking transactions. OLAP is a major application of data warehouse systems, supports complex analytical operations, focuses on decision support (and is therefore also referred to as DSS decision support system), and provides intuitive and straightforward query results.
In the OLAP scenario, the most basic and effective optimization of data storage is to store rows instead of columns. Data compression is a common optimization means in the storage field, and the storage space of data on a disk is greatly reduced by controllable CPU overhead, so that the cost can be saved, and the overhead of cross-thread and cross-node network transmission of I/O and data in a memory can be reduced. Compression algorithms are not as high as possible, and algorithms with high compression ratios tend to have slower compression and decompression speeds, requiring tradeoffs between CPU and I/O depending on hardware configuration and usage scenarios. Data encoding may be understood as lightweight compression, including RLE and data dictionary encoding, among others. In the column storage mode, the efficiency of data compression and encoding is much higher than in the row storage mode.
When the OLAP system accesses a large amount of data, the OLAP system is limited by a data storage mode, common query data and non-common data cannot be distinguished, different resources occupied by different query operations are not optimized uniformly, and query efficiency cannot be improved according to the real-time query state of mass data.
In view of the above, it is necessary to provide a new data query method to solve the above problems.
Disclosure of Invention
The invention aims to provide a data query method to solve the problem that the existing data storage system cannot improve the query efficiency according to the real-time query condition of mass data.
In order to achieve the above object, the present invention provides a data query method, which is applied to a self-adaptive optimization retrieval performance database for big data storage, wherein the self-adaptive optimization retrieval performance database comprises a query module, a cache module, an optimization module and a storage module, and the data query method mainly comprises the following steps:
step 1: inputting a query request, and receiving and analyzing the query request by a query module to obtain a query condition;
step 2: judging whether the same query conditions exist in the cache module, if so, directly obtaining a query result from the cache module, and if not, entering the step 3;
and step 3: adjusting the query resources distributed to each block of data in the storage module by the query module according to the reward and punishment function of the optimization module, and querying to obtain a query result;
and 4, step 4: recording the information of each inquired block data in the inquiry process, wherein the information comprises inquiry conditions, inquiry time and inquiry results, and combining the inquiry conditions, the inquiry time and the inquiry results into an inquiry result set;
and 5: caching the query conditions and the query results into a cache module;
wherein, the optimization module evaluates through a reward and punishment function in the step 3The weight alpha of the partition data i in the query process is calculated, and the information entropy of the query instruction is calculated
Figure BDA0003721247630000021
Wherein p is i For inquiry instruction a m Entropy of information in class i, j denotes a m There are j categories in total, and then the conditional information entropy of each query resource is calculated
Figure BDA0003721247630000022
In which resources r are queried n The total of k different attribute values, query resource r n ={r n1 ,r n2 ,…r nk },E(a m |r n ) To query a resource r n Lower a m The conditional information entropy of the query instruction is calculated, and then the information gain G of the n query resources for the m query instructions is calculated m (r n )=I(a m )-E(a m |r n ) Finally, the mth query instruction is obtained by normalization to query the resource r n Weight of (2)
Figure BDA0003721247630000023
Adjusting the query resources distributed to each block data in real time when the query module queries according to the weight alpha of the block data i, wherein the reward and punishment function is specifically as follows:
Figure BDA0003721247630000031
wherein n represents a total of n block data, E (d) represents a time complexity average value at the time of block data query, d i And the time complexity of inquiring the block data i is represented, lambda is a penalty coefficient, and alpha is the weight of the block data i.
As a further improvement of the present invention, the adaptive optimization search performance database further includes an index module that records blocking information of each block data, and step 3 specifically includes:
step 31: concurrently executing filtering of the feature information of the blocks in the query condition to the index module, and summarizing and filtering the obtained feature block data to be queried;
step 32: the feature block data to be inquired are screened in a multithread concurrent execution mode to the storage module, and row indexes of screened screening blocks are obtained;
step 33: and returning a query result.
As a further improvement of the present invention, in step 3, when the weight α of the block data in the reward and punishment function is greater than 1, the forward allocation weight formula of the query resource is as follows:
Figure BDA0003721247630000032
wherein, w mn Querying resource r for mth query instruction n The weight of (c).
As a further improvement of the present invention, in step 3, when the weight α =1 of the block data in the reward and punishment function, the query resource allocated to the block data is not changed.
As a further improvement of the present invention, in step 3, when the weight α of the block data in the reward and punishment function is less than 1, the inverse distribution weight formula of the query resource is:
Figure BDA0003721247630000033
where wmn is the weight of the mth query instruction in the query resource rn.
As a further improvement of the present invention, the adaptive optimization retrieval performance database further includes a data blocking module, and the block data is obtained by the data blocking module through multi-threading or multi-process blocking processing of data to be stored and stored in the storage module.
As a further improvement of the present invention, the data blocking module scans data to be stored and determines a data type of the data to be stored, and then performs blocking processing according to the data type.
As the invention proceedsIn one improvement, the resource allocation optimization of the query resource allocated by the query module to each block of data in the storage module is mainly based on calculating the gain of each query resource to the query instruction, wherein the query resource set is R = { R = 1 ,r 2 ,…,r n Represents that there are n query resources, and the query instruction set is A = { a = 1 ,a 2 ,…,a m Represents that there are m query instructions.
As a further improvement of the present invention, the query resources include but are not limited to thread number, CPU core number, memory and hard disk cache.
As a further improvement of the present invention, the query instruction includes, but is not limited to, the number of scan lines, the execution time, and the number of returned results.
The invention has the beneficial effects that: according to the method, the query module is optimized and updated through the optimization module by using the reward and punishment function, the query resources distributed to each block when the query module queries are adjusted in real time, the query time complexity of each block is changed, the query efficiency is improved, the query process is optimized and retrieved in a self-adaptive manner, and the problem that the query efficiency cannot be improved according to the real-time query condition of mass data in the existing data storage system is solved.
Drawings
FIG. 1 is a block diagram of the architecture of the adaptive optimized search performance database of the present invention.
FIG. 2 is a flow chart of a data query method of the present invention.
FIG. 3 is a detailed flow chart of the query module of the present invention when executing a query.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the present invention discloses a self-adaptive optimized search performance database 100, which is applied to big data storage and specifically includes the following modules:
the data blocking module 1 is used for blocking data to be stored in a multithreading or multiprocessing mode to obtain blocked data;
the storage module 2 is used for storing the block data in the data block module 1;
the index module 3 is used for recording the blocking information of each block of blocking data when the data to be stored is blocked and forming a data index table;
the query module 4 is used for querying the stored block data;
the cache module 5 is used for caching the query conditions and the query results within preset time;
the optimization module 6 evaluates the query process and the query result through the reward and punishment function, optimizes and updates the query module 4, adjusts the query resources distributed to each block data when the query module 4 is queried in real time, and the reward and punishment function is specifically as follows:
Figure BDA0003721247630000051
wherein n represents a total of n block data, E (d) represents a time complexity average value at the time of block data query, d i And the time complexity of inquiring the block data i is shown, lambda is a penalty coefficient, and alpha is the weight of the block data i.
For one copy of data to be stored, the data blocking module 1 is configured to scan data in the data to be stored in a multi-thread or multi-process manner, determine a data type of the data to be stored, select a corresponding blocking method according to the data type, and block the data to be stored.
The data types of the data to be stored specifically include: structured data and unstructured data.
When the data type of the data to be stored is structured data, namely table data, the data to be stored is logically partitioned, firstly, field contents in the data to be stored are identified, and then, the identified field contents are partitioned according to numerical characteristics or coding formats.
Numerical characteristics include, but are not limited to: and the time, the place, the certificate number, the transaction account number, the amount, the contact way, the ip and other preset basic data attributes. Encoding formats include, but are not limited to: numerical type, character string type, time type (date type), acsii code, utf-8, etc.
And when the block processing is carried out according to the numerical characteristics, the data block is divided according to the main data attributes corresponding to the numerical characteristics. The main data attribute refers to the data attribute which accounts for the largest proportion of the data to be stored. If the main data attribute is a time value, the data field can be processed in blocks according to the day; if the primary data attribute is geographic coordinates, the data fields may be chunked by geographic partition. When the data is blocked, the fine granularity of the block selected is changed according to the self characteristics of the data attribute. If the data volume in the block data after being blocked according to the preset fine granularity is still larger, the blocking fine granularity can be further reduced, and the block data with the larger data volume is further divided into a plurality of block data with smaller data volume.
For example, in an enterprise employee data database, all employees are processed as employee data by rows according to specific attributes such as department, gender, year of employment, identity information and the like, and structured data can be stored in blocks after being blocked according to employee identity information (numerical characteristics such as identity card numbers) or department codes (coding formats).
When the data type of the data to be stored is unstructured data, namely text information, dimension blocking is conducted on the data to be stored, data cube cutting is conducted according to different dimensions of the data to be stored, a plurality of block data are obtained, each block data comprises at least one unstructured data with preset dimensions, and the preset dimensions are at least one-dimensional.
And storing the block data blocked by the data blocking module 1 into a storage module 2, wherein the storage module 2 comprises a plurality of distributed storage nodes, and at least one block data is stored in each distributed storage node.
The index module 3 is used for recording the blocking information of each block of blocking data when the data to be stored is blocked, and forming a data index table.
Specifically, when the data to be stored is blocked, blocking information of each block data is recorded, the blocking information includes but is not limited to a block name, a block number, and a block characteristic, the blocking information is recorded in a block index table associated with the block data, and each recorded data is simultaneously added with an index record.
If the data type of the block data is structured data, tree indexes are built when the indexes are built.
If the data type of the block data is unstructured data, establishing an inverted index when establishing the index, wherein the process of establishing the index is as follows: index module 3-cache module 5-storage module 2.
And summarizing the established block index tables to obtain a current total index set, namely a data index table.
The query module 4 is used for querying the stored data.
The cache module 5 is configured to cache the query condition and the query result within a preset time, where the cache module 5 stores the query condition and the query result of at least one query, and the preset cache time is determined by a client, which is not limited herein. Specifically, in this embodiment, the caching preset time is preferably seven days, and the caching module 5 caches the query condition and the query result for querying within seven days. When the query module 4 queries, the actual query conditions obtained by analysis are compared with the query conditions stored in the cache module 5, and when the query conditions are the same, the stored corresponding query results can be directly obtained from the cache module 5 without scanning and querying the storage module 2, so that the query speed and efficiency can be effectively improved.
The cache module 5 plays a role of storing when the size of the data to be stored is 8-256 GB, and directly stores the data to be stored in the cache module 5, and when the size of the data to be stored is greater than 256GB, only caches the query condition and the query result within a preset time. Of course, the range of "8 to 256GB" is merely illustrated as a preferred embodiment, and in other embodiments, the range may be adjusted according to actual situations, and is not limited herein.
Referring to fig. 2, the present invention further provides a data query method, which is applied to the foregoing adaptive optimization search performance database 100, and mainly includes the following steps:
step 1: inputting a query request, and receiving and analyzing the query request by the query module 4 to obtain a query condition;
step 2: judging whether the same query conditions exist in the cache module 5, if so, directly obtaining a query result from the cache module 5, and if not, entering the step 3;
and step 3: adjusting the query resource distributed by the query module 4 to each block of data in the storage module 2 according to the reward and punishment function of the optimization module 6, and querying to obtain a query result;
and 4, step 4: recording the information of each inquired block data in the inquiry process, wherein the information comprises inquiry conditions, inquiry time and inquiry results, and combining the inquiry conditions, the inquiry time and the inquiry results into an inquiry result set;
and 5: and caching the query conditions and the query results into a cache module 5.
Referring to fig. 3, the specific steps of performing the query in step 3 include:
step 31: concurrently executing filtering of the block feature information in the query condition to the index module 3, and summarizing and filtering the obtained feature block data to be queried;
step 32: the data of the characteristic blocks to be inquired are screened in a multithread concurrent execution mode to the storage module 2, and row indexes of screened blocks are obtained;
step 33: and returning a query result.
The optimization module 6 evaluates the query process and the query result through the reward and punishment function, optimizes and updates the query module 4, and adjusts the query resources distributed to each block of data in real time when the query module 4 queries so as to improve the query efficiency.
The optimization module 6 further establishes a query reward and punishment function corresponding to the block data by acquiring a query result set comprising the query conditions, the query time and the query result in the step 4; and optimizing the resources distributed when each block data is subjected to query operation according to the reward and punishment function, wherein the reward and punishment function aims to enable the query time complexity of each block data to be close to each other, so that the optimal solution of the total query efficiency is obtained.
The cost function of query optimization is as follows, and the smaller the function value is, the optimal query efficiency is represented:
Figure BDA0003721247630000081
wherein n represents a total of n block data, E (d) represents a time complexity average value at the time of block data query, d i And the time complexity of inquiring the block data i is represented, lambda is a penalty coefficient, and alpha is the weight of the block data i.
The optimization goal of the reward and penalty function is to minimize the cost function. The reward and punishment function formula is as follows:
Figure BDA0003721247630000082
the weighted value alpha of each block data is obtained through calculation of a reward and punishment function, whether resource distribution optimization is carried out or not is determined, if alpha is larger than 1, forward resource optimization is carried out, the time complexity of block data query is reduced, if alpha =1, resource optimization operation is not carried out, if alpha is smaller than 1, reverse resource optimization is carried out, and the time complexity of block data query is improved.
The resource allocation optimization of the query resources allocated to each block of data in the storage module 2 by the query module 4 is mainly based on calculating the gain of each query resource to the query instruction, and the query resource set R = { R = { R = 1 ,r 2 ,…,r n Denotes that there are n query resources, the query resources include but are not limited to thread number, CPU core number, memory and/or hard disk cache, and the query instruction set a = { a = 1 ,a 2 ,…,a m And m query instructions are represented, and include but are not limited to scanning line number, execution time, returned result number and other instructions.
Firstly, calculating the information entropy of a query instruction:
Figure BDA0003721247630000091
wherein p is i For inquiry instruction a m Entropy of information in class i, j denotes a m There are j categories, in this embodiment, taking the scanning line number in the query instruction as an example, the scanning line number is classified by less than 5000 lines, 5000-1000 lines, and more than 10000 lines, and then j =3 in this embodiment.
Then, calculating the condition information entropy of each query resource:
Figure BDA0003721247630000092
wherein, for a query resource r n There are a total of k different attribute values, so the resource r is queried n ={r n1 ,r n2 ,…r nk },E(a m |r n ) To query a resource r n Lower a m The condition information entropy of (1).
Query resource r n The corresponding information gain can be expressed as:
G m (r n )=I(a m )-E(a m |r n )
by calculating the information gain G of n query resources for m query instructions m (r n ) The influence degree of each query resource on m query instructions can be obtained.
Obtaining the m-th query instruction by normalization and querying the resource r n The weight of (c):
Figure BDA0003721247630000093
when the weight alpha of block data in the reward and punishment function is more than 1, the forward distribution weight formula of the query resource is as follows:
Figure BDA0003721247630000094
the forward distribution weight after the optimization of the query resource distribution can reduce the complexity d of the query time of the blocks i Punishment if awardingIf the block weight alpha in the function is greater than 1, forward resource allocation optimization is carried out on the query resources, namely the query resources allocated to the block data are increased, and the improvement of the allocation quantity of the query resources can lead to lower time consumption in the query process, reduce the complexity of the query time of the blocks and improve the query speed of the block data.
When the weight α =1 of the block data in the reward and punishment function, the query resource allocated to each block data is not changed.
When the weight alpha of the block data in the reward and punishment function is less than 1, the reverse distribution weight formula of the query resource is as follows:
Figure BDA0003721247630000101
the query resource allocation optimized inverse distribution weight can improve the complexity d of the query time of the block i If the weight alpha of the block data in the reward and punishment function is smaller than 1, performing reverse resource allocation optimization on the query resources, namely reducing the query resources allocated to the block data, wherein the reduction of the allocation quantity of the query resources can lead to more time spent in the query process, improve the query time complexity of the block data and reduce the query speed of the block data.
By changing the weight of the distribution of the query resources to the block data, the query time of the block data is increased or reduced, so that the query time among the block data is dynamically balanced, a small time difference is always kept, and the query efficiency is improved.
The adaptive optimized retrieval performance database 100 of the present invention is an OLAP type database, and when retrieving data in the database, since a processing method for data blocking is adopted, a retrieval task can be executed simultaneously by multiple threads or multiple processes, each thread can execute a query instruction, and a result set is recorded respectively. The larger the number of threads, the more query tasks the system can allocate. For example, in order to obtain data of each day, if the threads are enough, the data of each thread corresponding to an hour can be obtained, and finally, the query results are pieced together and returned.
The query speed of a single query instruction is high, the total return time is not the fastest, the query process needs to be optimally planned, different threads execute different query instructions, and different CPU (central processing unit) core numbers, internal memories and/or hard disk caches and other query resources are distributed to the different threads, and the query resources distributed when each block executes the query are dynamically optimized, so that the query of a plurality of block data can be dynamically distributed according to the system load, the time spent by each thread for executing the query instruction required to be executed is changed, the time spent by each thread for finishing the last completion is close to each other, the optimization of the total query efficiency is achieved, the query resources are fully utilized, and the overall time spent is reduced.
In summary, the adaptive optimization retrieval performance database 100 of the present invention optimizes and updates the query module 4 through the optimization module 6 by using a reward and punishment function, adjusts the query resource allocated to each block data when the query module 4 queries in real time, and changes the query time complexity of each block data, so as to improve the query efficiency, and the adaptive optimization retrieval query process solves the problem that the query efficiency cannot be improved according to the real-time query condition of mass data in the existing data storage system; the data blocking module 1 blocks the data and blocks the large data, so that the data blocking can be processed and inquired in a multi-thread or multi-process mode; the index module 3 is used for establishing indexes for each block data and summarizing the indexes to form a data index table, so that the query process can be simplified during query, the query speed is increased, the query can be executed in parallel aiming at the index information of a plurality of block data, and the query efficiency is improved.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the present invention.

Claims (10)

1. A data query method is applied to a self-adaptive optimization retrieval performance database for large data storage, and the self-adaptive optimization retrieval performance database comprises a query module, a cache module, an optimization module and a storage module, and is characterized by mainly comprising the following steps:
step 1: inputting a query request, and receiving and analyzing the query request by a query module to obtain a query condition;
step 2: judging whether the same query conditions exist in the cache module, if so, directly obtaining a query result from the cache module, and if not, entering the step 3;
and step 3: adjusting the query resources distributed to each block of data in the storage module by the query module according to the reward and punishment function of the optimization module, and querying to obtain a query result;
and 4, step 4: recording the information of each inquired block data in the inquiry process, wherein the information comprises inquiry conditions, inquiry time and inquiry results, and combining the inquiry conditions, the inquiry time and the inquiry results into an inquiry result set;
and 5: caching the query conditions and the query results into a cache module;
in the step 3, the optimization module evaluates the weight alpha of the block data i in the query process through a reward and punishment function, and calculates the information entropy of the query instruction
Figure FDA0003721247620000011
Wherein p is i For a query instruction a m Entropy of information in class i, j represents a m There are j categories in total, and then the conditional information entropy of each query resource is calculated
Figure FDA0003721247620000012
In which resources r are queried n A total of k different attribute values, query resource r n ={r n1 ,r n2 ,…r nk },E(a m |r n ) To query a resource r n Lower a m The conditional information entropy of the query instruction is calculated, and then the information gain G of the n query resources for the m query instructions is calculated m (r n )=I(a m )-E(a m |r n ) Finally, the mth query is obtained by normalizationInstructing a query on a resource r n Weight of (2)
Figure FDA0003721247620000013
Adjusting query resources distributed to each block data in real time when the query module queries according to the weight alpha of the block data i, wherein the reward and punishment function is specifically as follows:
Figure FDA0003721247620000014
wherein n represents a total of n block data, E (d) represents a time complexity average value at the time of block data query, d i And the time complexity of inquiring the block data i is shown, lambda is a penalty coefficient, and alpha is the weight of the block data i.
2. The data query method according to claim 1, wherein the adaptive optimization retrieval performance database further includes an index module that records blocking information of each block data, and at this time, step 3 specifically includes:
step 31: concurrently executing filtering of the feature information of the blocks in the query condition to the index module, and summarizing and filtering the obtained feature block data to be queried;
step 32: the feature block data to be inquired are screened in a multithread concurrent execution mode to the storage module, and row indexes of screened screening blocks are obtained;
step 33: and returning a query result.
3. The data query method of claim 1, wherein: in step 3, when the weight α of the block data in the reward and punishment function is greater than 1, the forward distribution weight formula of the query resource is as follows:
Figure FDA0003721247620000021
wherein, w mn Querying resource r for mth query instruction n The weight of (c).
4. The data query method of claim 1, wherein: in step 3, when the weight α =1 of the block data in the reward and punishment function, the query resource allocated to the block data is not changed.
5. The data query method of claim 1, wherein: in step 3, when the weight α of the block data in the reward and punishment function is less than 1, the inverse distribution weight formula of the query resource is as follows:
Figure FDA0003721247620000022
where wmn is the weight of the mth query instruction in the query resource rn.
6. The data query method of claim 1, wherein: the self-adaptive optimization retrieval performance database also comprises a data blocking module, wherein the data blocking module blocks data to be stored through multiple threads or multiple processes, and the data blocking module stores the data in the storage module.
7. The data query method of claim 6, wherein: the data blocking module scans data to be stored and judges the data type of the data to be stored, and then blocks the data according to the data type.
8. The data query method of claim 1, wherein: the resource allocation optimization of the query resource allocated by the query module to each block of data in the storage module is mainly based on calculating the gain of each query resource to a query instruction, wherein the query resource set is R = { R = 1 ,r 2 ,…,r n Indicates that there are n query resourcesThe query instruction set is A = { a = 1 ,a 2 ,…,a m Represents that there are m query instructions.
9. The data query method of claim 8, wherein: the query resources include, but are not limited to, thread count, CPU core count, memory, and hard disk cache.
10. The data query method of claim 8, wherein: the query instructions include, but are not limited to, the number of scan lines, the execution time, and the number of results returned.
CN202210751435.4A 2021-10-22 2021-11-03 Data query method Pending CN115145953A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN2021112350951 2021-10-22
CN202111235095 2021-10-22
CN202111291885.1A CN114020779B (en) 2021-10-22 2021-11-03 Self-adaptive optimization retrieval performance database and data query method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202111291885.1A Division CN114020779B (en) 2021-10-22 2021-11-03 Self-adaptive optimization retrieval performance database and data query method

Publications (1)

Publication Number Publication Date
CN115145953A true CN115145953A (en) 2022-10-04

Family

ID=80060181

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210751435.4A Pending CN115145953A (en) 2021-10-22 2021-11-03 Data query method
CN202111291885.1A Active CN114020779B (en) 2021-10-22 2021-11-03 Self-adaptive optimization retrieval performance database and data query method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202111291885.1A Active CN114020779B (en) 2021-10-22 2021-11-03 Self-adaptive optimization retrieval performance database and data query method

Country Status (1)

Country Link
CN (2) CN115145953A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688106A (en) * 2024-02-04 2024-03-12 广东东华发思特软件有限公司 Efficient distributed data storage and retrieval system, method and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076466B (en) * 2023-10-18 2023-12-29 河北因朵科技有限公司 Rapid data indexing method for large archive database

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194251A1 (en) * 2000-03-03 2002-12-19 Richter Roger K. Systems and methods for resource usage accounting in information management environments
US8423534B2 (en) * 2008-11-18 2013-04-16 Teradata Us, Inc. Actively managing resource bottlenecks in a database system
US8995996B2 (en) * 2009-08-12 2015-03-31 Harry V. Bims Methods and apparatus for performance optimization of heterogeneous wireless system communities
CN102999563A (en) * 2012-11-01 2013-03-27 无锡成电科大科技发展有限公司 Network resource semantic retrieval method and system based on resource description framework
US20170109340A1 (en) * 2015-10-19 2017-04-20 International Business Machines Corporation Personalizing text based upon a target audience
CN106372114B (en) * 2016-08-23 2019-09-10 电子科技大学 A kind of on-line analysing processing system and method based on big data
CN106503084A (en) * 2016-10-10 2017-03-15 中国科学院软件研究所 A kind of storage and management method of the unstructured data of facing cloud database
CN106897375A (en) * 2017-01-19 2017-06-27 浙江大学 A kind of probabilistic query quality optimization method towards uncertain data
CN107918676B (en) * 2017-12-15 2022-01-18 联想(北京)有限公司 Resource optimization method for structured query and database query system
US20210272664A1 (en) * 2018-02-20 2021-09-02 Calvin S. Carter Closed-loop ai-optimized emf treatment and digital delivery of data
CN108804592A (en) * 2018-05-28 2018-11-13 山东浪潮商用系统有限公司 Knowledge library searching implementation method
CN110166282B (en) * 2019-04-16 2020-12-01 苏宁云计算有限公司 Resource allocation method, device, computer equipment and storage medium
CN111552788B (en) * 2020-04-24 2021-08-20 上海卓辰信息科技有限公司 Database retrieval method, system and equipment based on entity attribute relationship
CN112256904A (en) * 2020-09-21 2021-01-22 天津大学 Image retrieval method based on visual description sentences

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688106A (en) * 2024-02-04 2024-03-12 广东东华发思特软件有限公司 Efficient distributed data storage and retrieval system, method and storage medium

Also Published As

Publication number Publication date
CN114020779A (en) 2022-02-08
CN114020779B (en) 2022-07-22

Similar Documents

Publication Publication Date Title
US11163746B2 (en) Reclustering of database tables based on peaks and widths
US9805077B2 (en) Method and system for optimizing data access in a database using multi-class objects
US7680784B2 (en) Query processing system of a database using multi-operation processing utilizing a synthetic relational operation in consideration of improvement in a processing capability of a join operation
US8266147B2 (en) Methods and systems for database organization
CN114020779B (en) Self-adaptive optimization retrieval performance database and data query method
US5987453A (en) Method and apparatus for performing a join query in a database system
EP3014488B1 (en) Incremental maintenance of range-partitioned statistics for query optimization
US8336051B2 (en) Systems and methods for grouped request execution
US8108355B2 (en) Providing a partially sorted index
Polyzotis et al. Meshing streaming updates with persistent data in an active data warehouse
US8055666B2 (en) Method and system for optimizing database performance
US9235590B1 (en) Selective data compression in a database system
US11003649B2 (en) Index establishment method and device
US5845113A (en) Method for external sorting in shared-nothing parallel architectures
CN102521405A (en) Massive structured data storage and query methods and systems supporting high-speed loading
CN102521406A (en) Distributed query method and system for complex task of querying massive structured data
Schaffner et al. A hybrid row-column OLTP database architecture for operational reporting
US20170116242A1 (en) Evaluating sql expressions on dictionary encoded vectors
Lin et al. Dealing with query contention issue in real-time data warehouses by dynamic multi-level caches
Luo et al. Answering linear optimization queries with an approximate stream index
CN117931859A (en) Cache management method and related equipment
CN117331976A (en) SQL sentence execution method and device
Mittra Query Tuning and Optimization Under Oracle 8i
Nunes et al. Self-Tuning Database Management Systems
Simitsis et al. Meshing Streaming Updates with Persistent Data in an Active Data Warehouse

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination