CN115221200A - Data query method and device, electronic equipment and storage medium - Google Patents
Data query method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115221200A CN115221200A CN202110414547.6A CN202110414547A CN115221200A CN 115221200 A CN115221200 A CN 115221200A CN 202110414547 A CN202110414547 A CN 202110414547A CN 115221200 A CN115221200 A CN 115221200A
- Authority
- CN
- China
- Prior art keywords
- data
- cached
- cache database
- preset
- level cache
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 239000013598 vector Substances 0.000 claims description 57
- 238000012417 linear regression Methods 0.000 claims description 17
- 238000012163 sequencing technique Methods 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 description 28
- 230000000737 periodic effect Effects 0.000 description 15
- 230000008030 elimination Effects 0.000 description 9
- 238000003379 elimination reaction Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000003203 everyday effect Effects 0.000 description 5
- 238000012216 screening Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a data query method, a data query device, electronic equipment and a storage medium. The method comprises the following steps: based on a data query request sent by a client, sequentially querying data to be queried in a primary cache database, a secondary cache database and a preset cache database; under the condition that data to be queried is queried in any one of the first-level cache database, the second-level cache database and the preset cache database, feeding the data to be queried back to the client; the first to-be-cached data in the first-level cache database is data with access times larger than or equal to preset access times within a first preset time period; the second data to be cached in the second-level cache database is data with the access times larger than or equal to the preset access times within a second preset time period, and the second preset time period is larger than the first preset time period; and presetting the cache data in the cache database as third data to be cached. By adopting the data query method, the hit rate of the data to be queried can be improved.
Description
Technical Field
The present application relates to data processing technologies, and in particular, to a data query method and apparatus, an electronic device, and a storage medium.
Background
With the explosion of the data era, operators can generate massive data every day, and in a software system of the operators, hot spot data which are repeatedly read and written and generate high concurrency even in a very short time often occur. For hot spot data, the hot spot data is usually stored in a memory, and when the hot spot data needs to be accessed, the hot spot data is read from the memory. Because the price of the memory is expensive, any system in the memory cannot be expanded without limit like a hard disk in view of cost performance, so that the use of the memory is limited, and the screening of hot data and an efficient memory elimination method are important parts for improving the performance of the system.
Currently, elimination of data in memory is mainly achieved by Least Recently Used (LRU) and Least Frequently Used page replacement algorithm (LFU). The LRU mode is efficient when hot data are frequently accessed, but the hit rate is low if cold data are accessed in batches sporadically. The LFU mode is used for eliminating the data in the memory, the frequency of the hot spot data in a recent period of time can be counted, the hot spot data with low frequency is deleted, if the access quantity of the newly added hot spot data is low, the newly added hot spot data is easy to delete, and the hit rate of the newly added hot spot data is low.
Disclosure of Invention
The embodiment of the application aims to provide a data query method, a data query device, electronic equipment and a storage medium, so that the recent high-frequency data is queried through a first-level cache database, the high-frequency data in a past period is queried through a second-level cache database, and the cache query hit rate is effectively improved through the cache mode.
The technical scheme of the application is as follows:
in a first aspect, a data query method is provided, and the method includes:
receiving a data query request sent by a client, wherein the data query request comprises data to be queried;
based on the data query request, sequentially querying the data to be queried in a first-level cache database, a second-level cache database and a preset cache database;
under the condition that any one of the first-level cache database, the second-level cache database and the preset cache database queries the data to be queried, feeding the data to be queried back to the client;
the first data to be cached in the first-level cache database is data with access times larger than or equal to preset access times within a first preset time period;
the second data to be cached in the second-level cache database is data with the access times being greater than or equal to the preset access times within a second preset time period, and the second preset time period is greater than the first preset time period;
the cache data in the preset cache database is third data to be cached: the third data to be cached is cached data except the cached data in the first-level cache database and the cached data in the second-level cache database.
In a second aspect, there is provided a data query apparatus, including:
the data query request receiving module is used for receiving a data query request sent by a client, wherein the data query request comprises data to be queried;
a to-be-queried data query module, configured to query, based on the data query request, the to-be-queried data in the primary cache database, the secondary cache database, and a preset cache database in sequence;
the data to be queried feedback module is used for feeding back the data to be queried to the client under the condition that any one of the first-level cache database, the second-level cache database and the preset cache database queries the data to be queried; the first data to be cached in the first-level cache database is data with access times larger than or equal to preset access times within a first preset time period; the second data to be cached in the second-level cache database is data with the access times being greater than or equal to the preset access times within a second preset time period, and the second preset time period is greater than the first preset time period; the cache data in the preset cache database is third data to be cached: the third data to be cached is cached data except the cached data in the first-level cache database and the cached data in the second-level cache database.
In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor, a memory, and a program or an instruction stored in the memory and executable on the processor, and when the program or the instruction is executed by the processor, the method implements the steps of the data query method according to any one of the embodiments of the present application.
In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or instructions are stored, and when the program or instructions are executed by a processor, the program or instructions implement the steps of the data query method according to any one of the embodiments of the present application.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
according to the data query method provided by the embodiment of the application, the cache database is divided into a first-level cache database and a second-level cache database, wherein first to-be-cached data in the first-level cache database is data with access times larger than or equal to preset access times within a first preset time period; the second data to be cached in the second-level cache database is data with the access times larger than or equal to the preset access times within a second preset time period; the cache data in the cache database is preset as the third to-be-cached data which is the cache data except the cache data in the first-level cache database and the cache data in the second-level cache database, so that when data needs to be queried subsequently, the near-term high-frequency data is queried through the first-level cache database, the high-frequency data in a past period is queried through the second-level cache database, and the cache query hit rate is effectively improved through the cache mode.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application and are not to be construed as limiting the application.
FIG. 1 is a schematic diagram illustrating the memory elimination principle of LRU algorithm in the prior art;
FIG. 2 is a schematic diagram illustrating the LFU algorithm memory elimination principle flow in the prior art;
FIG. 3 is a first flowchart illustrating a data query method provided in an embodiment of the present application;
fig. 4 is a schematic flowchart of a data query method provided in an embodiment of the present application;
fig. 5 is a schematic flow chart of caching a first to-be-cached data in a first-level cache database according to an embodiment of the present application;
fig. 6 is a schematic diagram illustrating a first-level cache database eviction process according to an embodiment of the present disclosure;
FIG. 7 is a schematic structural diagram of a data query device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of, and not restrictive on, the present application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples consistent with certain aspects of the application, as detailed in the appended claims.
In order to better understand the technical scheme of the application, the background technology of the scheme is introduced firstly, as the data era is greatly developed, operators can generate massive data every day, and in a software system of the operators, repeated reading and writing can often occur, and even very high concurrent hot spot data can be generated within a very short time. For hot spot data, the hot spot data is usually stored in a memory, and when the hot spot data needs to be accessed, the hot spot data is read from the memory. Because the price of the memory is expensive, and in view of cost performance, any system in the memory cannot be expanded without limit like a hard disk, so that the use of the memory is limited, and the hot data screening and an efficient memory elimination method are important parts for improving the system performance.
Currently, elimination of data in memory is mainly achieved by Least Recently Used (LRU) and Least Frequently Used page replacement algorithm (LFU). Both of these approaches result in low hit rates for the data.
Both LRU and LFU data eviction methods are described in detail below.
1. LRU algorithm.
The design principle of the LRU algorithm is as follows: if a data is not accessed in the last period of time, it is less likely to be accessed in the future. That is, when the defined memory space is full of data, data that has not been accessed for the longest time should be eliminated. The specific steps of the LRU algorithm are as shown in fig. 1:
(1) When newly adding keys and values, firstly adding Node nodes (storage nodes) at the head of the linked list, and eliminating nodes at the tail of the queue if the threshold set by the LRU is exceeded.
As shown in step1-step7 in fig. 1, step1 is the initial state of the storage space, when data a, B, C, D and E are added step by step in step1, node nodes are added at the head of the linked list first, and data a, B, C, D and E are added to the table in sequence, at this time, the storage space is full, i.e. the state of step6 in fig. 1. If data F is added in step6, the data a at the tail of the queue is eliminated, and the data F enters the head of the linked list, that is, the state of step7 in fig. 1.
For another example, as step8-step9 in fig. 1, when data G needs to be added in the state of step8, data B at the tail of the queue is eliminated, and data G enters the head of the linked list, i.e., the state of step9 in fig. 1.
(2) When the value corresponding to the key is modified, the value in the corresponding Node is modified, and then the Node is moved to the head of the queue.
Like step7-step8 in fig. 1, when the value corresponding to the data C in step7 needs to be modified, the value corresponding to the data C is modified, and then the Node of the data C is moved to the head of the queue, i.e. the state of step8 in fig. 1 is formed.
(3) When the value corresponding to the key is accessed, the accessed Node is moved to the head of the queue.
The LRU method is efficient when hot data is frequently accessed, but the hit rate is low if cold data is accessed in batches sporadically. For example, data a is frequently accessed, then the data of cold data is accessed in batches until a is eliminated, and at this time, data a is accessed again, and then data a has to be added to the memory table Cache again.
2. An LFU algorithm.
The principle of the LFU algorithm is: the data is eliminated according to the historical access frequency of the data, and the core idea is that if the data is accessed for many times in the past, the frequency of accessing the data in the future is higher. The specific steps of the LFU algorithm are as shown in fig. 2:
(1) Newly added data is inserted at the end of the queue (because the reference count for this data is 1).
As step1-step4 in fig. 2, step1 is the initial state of the storage space, the number following each data in step1 represents the reference count of the data, and step1 is the initial reference count state of each data.
When data is added to the memory space, step4 in fig. 2, data F is added to the memory space, and data F is inserted to the tail of the queue with a reference count of 1.
(2) After the data in the queue is accessed, the reference count is increased, and the queue is reordered;
when data in the storage space is referenced, as shown in step2-step3 in fig. 2, when data D and data B are accessed, the reference counts of data D and data B are respectively increased by 1, and the data are reordered based on the reference count of the data, specifically, the data with the small reference count is placed at the tail of the alignment.
(3) And when the data is required to be eliminated, deleting the last data block of the sorted list.
As shown in step4 in fig. 2, when data F is added to the storage space, it is necessary to delete the original data in the storage space, and the deleted data is the data with the smallest reference count, that is, the last data block in the list is deleted, that is, data E is deleted.
The LFU algorithm counts the frequency of the hot spot data in a recent period of time and deletes the data with low frequency, and the method can effectively solve the problem that the LRU algorithm deletes the hot spot data due to the access of batch cold data. However, the method of simply increasing the counter is not perfect, and firstly, the data access mode is frequently changed, for example, some key keys are only periodically accessed at high frequency, the trend cannot be reflected by only increasing the counter, and the data is likely to be deleted, so that the avalanche phenomenon is caused. Secondly, the latest added data is often kicked off, because the initial reference times are few, some data accessed with high frequency in the past cannot be eliminated all the time due to the large access times, and hot spot data is inaccurate, for example, the statistical frequency time measurement is 1 day, data a is accessed 10000 times from 0 point to 1 point, and is not accessed all the time in other times of the day, and newly added hot spot data hardly exceeds data a in a short time, so that data a is always cached in a memory, and the newly added hot spot data is easily deleted. The method is characterized in that the LFU mode is utilized to eliminate the data in the memory, the frequency of the hot spot data in the recent period of time can be counted, and the hot spot data with low frequency is deleted.
In order to solve the problem of low hit rate of data caused by the above-mentioned LRU and LFU data elimination methods, the present application provides a data query method, and the specific method can be referred to the following embodiments.
Referring to fig. 3, an implementation manner of the data query method provided by the present application may specifically include the following steps:
s110, receiving a data query request sent by a client, wherein the data query request comprises data to be queried.
The data query request may be a request sent by a client to query for data.
The data to be queried can be data to be queried by the client.
Before querying data to be queried, a data query request sent by a client needs to be obtained first, where the data query request may include data to be queried.
In an example, before receiving a data query request sent by a client, a data query request needs to be generated, and a specific way of generating the data query request may be to generate the data query request in response to a trigger operation of a data query of the client.
In one example, when a user needs to query data, the data to be queried by the user can be sent at a client, and when a server in the client detects that the user has a trigger operation of sending the queried data at the client, a data query request can be generated in response to the operation of querying the data.
And S120, inquiring the data to be inquired in the first-level cache database, the second-level cache database and the preset cache database in sequence based on the data inquiry request.
The first level cache database may be a database for caching data. The first to-be-cached data stored in the database is data with access times larger than or equal to first preset access times in a first preset time period.
The first data to be cached may be data cached in the first level cache database.
The first preset time period may be a preset time period, for example, a day.
The first preset number of accesses may be a preset threshold value of the number of accesses of the first to-be-cached data in a first preset time period.
The secondary cache database may be a database for caching data. The database and the first level cache database are two different databases. And the second data to be cached stored in the database is the data with the access times larger than or equal to the second preset access times in the second preset time period.
The second data to be cached may be data cached in the second level cache database.
The second preset time period may be a time period set in advance. The second preset time period here is greater than the first preset time period. For example, the first preset period of time may be one day and the second preset period of time may be 3 months.
The second preset access frequency may be a threshold of the access frequency of the preset second data to be cached in a second preset time period.
The predetermined cache database may be a database for caching data. The database is completely different from the primary cache database and the secondary cache database. And the cache data in the database is the third data to be cached. The third data to be cached is cached data other than the first data to be cached and the second data to be cached.
After the data query request is obtained, the data to be queried in the data query request can be sequentially queried in the first-level cache database, the second-level cache database and the preset cache database.
In one example, the data to be queried in the query data request sequentially queried in the first-level cache database, the second-level cache database and the preset cache database may be: the data to be inquired is inquired in the first-level cache database, if the data to be inquired is not found in the first-level cache database, the data to be inquired is continuously inquired in the second-level cache database, and if the data to be inquired is not found in the second-level cache database, the data to be inquired is continuously inquired in the indication cache database.
S130, feeding back the data to be queried to the client under the condition that the data to be queried is queried in any one of the first-level cache database, the second-level cache database and the preset cache database.
When the data to be inquired is inquired in the first-level cache database, the second-level cache database and the preset cache database in sequence, if the data to be inquired is inquired in any one of the first-level cache database, the second-level cache database and the preset cache database, the data to be inquired is fed back to the client.
In one example, after receiving data to be queried, querying the data to be queried in a first-level cache database, returning the data to be queried to a client if the data to be queried is queried in the first-level cache database, continuing querying in a second-level database if the data to be queried is not queried, returning the data to be queried to the client if the data to be queried is queried in the second-level cache database, continuing querying in a preset database if the data to be queried is not queried, returning the data to be queried to the client if the data to be queried is queried in the preset cache database, and returning information of the data to be queried if the data to be queried is not queried.
Therefore, the cache database is divided into a first-level cache database and a second-level cache database, different types of data are cached in the cache databases, and therefore when data needs to be queried subsequently, the near-term high-frequency data are queried through the first-level cache database, the high-frequency data in a past period are queried through the second-level cache database, and through the caching mode, the cache query hit rate is effectively improved.
According to the data query method provided by the embodiment of the application, the cache database is divided into a first-level cache database and a second-level cache database, wherein first to-be-cached data in the first-level cache database is data with access times larger than or equal to preset access times within a first preset time period; the second data to be cached in the second-level cache database is data with the access times larger than or equal to the preset access times within a second preset time period; the cache data in the cache database is preset as the third to-be-cached data which is the cache data except the cache data in the first-level cache database and the cache data in the second-level cache database, so that when data needs to be queried subsequently, the near-term high-frequency data is queried through the first-level cache database, the high-frequency data in a past period is queried through the second-level cache database, and the cache query hit rate is effectively improved through the cache mode.
In the above embodiment, a manner of querying data to be queried is introduced, and before querying the data to be queried, the data to be cached needs to be cached in the first-level cache database, the second-level cache database, and the preset cache database. As another implementation manner of the data query method of the present application, data to be cached is cached in the first-level cache database, the second-level cache database, and the preset cache database, which may be specifically referred to in the following embodiments.
Referring to fig. 4, another implementation manner of the data query method provided by the present application may specifically include the following steps:
s210, obtaining target data to be cached, wherein the target data to be cached comprises: the data processing method comprises the steps of first data to be cached, second data to be cached and third data to be cached.
In one example, the target data to be cached may be data that needs to be cached.
In one example, the target data to be cached may include first data to be cached, second data to be cached, and third data to be cached.
S220, caching the first data to be cached into the first-level cache database based on the access time sequence index of the first data to be cached in unit time.
The unit time may be a preset unit time, and may be, for example, every day.
The access timing index may be an access weighting score of the first data to be cached in a unit time, and specifically may be a score condition that the first data to be cached is accessed on the current day.
In one example, each first data to be cached may be cached in the first-level cache database according to an access timing indicator of each first data to be cached in a unit time.
In one example, the hot spot data is cached in the first-level cache database every day, and the specific caching mode is described in detail later.
In one example, the level one cache database may account for 80% of memory.
And S230, caching the second data to be cached into the secondary cache database based on the predicted accessed frequency of the second data to be cached in unit time.
And caching the second data to be cached into the second-level cache database according to the predicted accessed frequency of the second data to be cached in unit time.
In one example, in order to prevent the avalanche phenomenon caused by some hot spot data with periodic high-frequency access when a periodic batch query comes again, the problem of the avalanche phenomenon is caused, the current LRU algorithm and LFU algorithm do not cache the part of data, the scheme designs a secondary cache database, and the secondary cache database stores the part of data, and the specific caching mode is described in detail later.
In one example, the level one cache database may occupy 20% of the memory.
And S240, storing the third data to be cached into a preset cache database.
After the first data to be cached is cached in the first-level cache database and the second data to be cached is cached in the second-level cache database, the remaining third data to be cached can be cached in the preset cache database.
The novel hierarchical cache elimination strategy provided by the embodiment of the application adopts a two-eight distribution principle for a memory distribution strategy, and divides a memory into a first-level cache database and a second-level cache database. And storing the near-term hot spot data in a first-level cache database, wherein the near-term hot spot data occupies 80% of the memory. Aiming at data with over-high frequency access in the past short time, the data is prevented from periodically having high frequency access in the near future to cause an avalanche phenomenon, and the data is stored in a secondary cache database and occupies 20 percent of memory. The recent high-frequency data can be inquired through the first-level cache database, and the high-frequency data in the past period can be inquired through the second-level cache database, so that the cache inquiry hit rate can be effectively improved.
S250, receiving a data query request sent by the client, wherein the data query request comprises data to be queried.
And S260, sequentially inquiring data to be inquired in the first-level cache database, the second-level cache database and the preset cache database based on the data inquiry request.
S270, under the condition that the data to be inquired is inquired in any one of the first-level cache database, the second-level cache database and the preset cache database, the data to be inquired is fed back to the client
Steps S250-S270 are the same as S110-S130 in the above embodiment, and for brevity, are not described in detail here.
According to the technical scheme of the embodiment of the application, the obtained target data to be cached is stored in the first-level cache database. Aiming at data with over-high frequency access in the past short time, the data is prevented from periodically having high frequency access in the near future to cause an avalanche phenomenon, and the data is stored in a secondary cache database. Therefore, when data query is subsequently carried out, the recent high-frequency data is queried through the first-level cache database, and the high-frequency data in the past period is queried through the second-level cache database.
In order to implement caching of daily hot spot data, the present application provides another implementation manner of data query, which may specifically be a caching method of a first-level cache database, and specifically refer to the following embodiments.
In order to implement the caching of the hot spot data every day, step S220 may specifically include the following steps:
s2201, determining an access time sequence index of each first data to be cached based on the divided time intervals with the preset number of unit time, the preset weight corresponding to each time interval and the number of times that each first data to be cached is accessed in each time interval.
In one example, the idea of the LFU algorithm is utilized by the first-level cache database to cache data, and the idea of the LFU algorithm considers that recently queried data has a high probability of being continuously queried in the future.
The preset number of time intervals may be a preset number of time intervals.
In one example, taking the unit time as one day as an example, the unit time may be divided into a preset number of time intervals, for example, 24 hours a day may be divided into 24 time intervals, that is, each hour is one time interval.
The preset weight may be a weight previously configured to each of the divided time intervals.
In one example, it may be that the time interval is configured with a greater weight as time passes, so as to avoid that data accessed later in the day is not placed at the tail of the list because of a small number of accesses, and when cache data needs to be added later, the data accessed later in the day is not eliminated because of a small number of accesses.
In the embodiment of the present application, the first-level cache database caches the first data to be cached, which is an improvement of the existing LFU algorithm, and the first-level cache database caches the first data to be cached according to the following principle:
based on an improved LFU algorithm, dividing the first data to be cached in the current day into 24 time zones according to access time, wherein the preset weight of the first data to be cached in each time zone is linearly increased along with the time, performing weighted calculation on the first data to be cached in each use, and adding all the use weighted scores of each first data to be cached to obtain the access time sequence index of the first data to be cached. The access time sequence indexes can be sorted from top to bottom in the follow-up process, and when the access time sequence indexes need to be eliminated in the follow-up process, the data with low scores are preferentially eliminated.
In one example, the first level cache database only caches hot spot data for the current day, and the algorithm logic is: first, new data of the current day needs to replace all old data in the memory. Assuming that the initial storage condition of the memory in the next day is as shown in (2) in fig. 5, when the first hot spot data comes from the next day, the last bit in the queue in the memory is directly replaced, the data H replaces E, the storage structure becomes (3), when a new data I comes, the old data in the previous day is eliminated, the position of H is shifted by one bit in the queue as shown in (4), and according to this logic, until all the data in the primary cache database are replaced by the data in the current day as shown in (7).
In one example, in order to prevent the problems that the access frequency of partial data of the current day is too high in a certain time period, and then no access is performed, but newly added data is too low in access frequency, so that the historical high-frequency access data of the current day can not be deleted permanently, and new data is easy to be eliminated, the LFU algorithm in the prior art is improved, and the first data to be cached is cached in the first-level cache database by using the improved algorithm as follows:
according to the time division principle, 24 hours per day are divided into 24 intervals by taking hours as a unit, each interval has different specific weights, the time closest to the current time is higher in preset weight, such as 1-point query once and 2-point query once, the embodiment of the application considers that the 2-point query is closer to the current time and the query score weight is larger, and the preset weights of 24 time zones per day are divided according to the principle, specifically as shown in table 1:
TABLE 1 Preset weight analysis Table corresponding to time intervals
0~1 | 1~2 | 2~3 | 3~4 | ...... | 21~22 | 22~23 | 23~24 |
1/24 | 2/24 | 3/24 | 4/24 | ...... | 22/24 | 23/24 | 1 |
Suppose there is hot spot data (first data to be cached) X i The number of accesses in a 24-hour period per day respectively corresponds to C i ={N 1 ,N 2 ,N 3 ...N 24 And after the access score of each first data to be cached is weighted and calculated, obtaining the access time sequence index result as follows:
according to the formula, calculating the access time sequence index f of each first data to be cached i 。
S2202, sequencing the access time sequence indexes in a descending order, and caching the first data to be cached into the first-level cache database in sequence.
After the access time sequence indexes of the first data to be cached are obtained, the access time sequence indexes are sequenced from big to small, namely f i And sequencing from high to low, and caching the first data to be cached into the first-level cache database in sequence. Thus, when the cache data is eliminated, f i And (4) preferentially eliminating the low first data to be cached, and continuously updating the primary cache database.
The first data to be cached on the same day is divided into a preset number of time intervals according to the access time, the preset weight of each time interval is increased linearly along with the use of the first data to be cached, the score used by the first data to be cached each time is weighted and calculated to obtain the access time sequence index of each first data to be cached, the access time sequence index of each first data to be cached is sorted from top to bottom, and when the data to be cached needs to be eliminated subsequently, the first data to be cached with low score is preferentially eliminated. Because the preset weight of the time interval of the newly added first to-be-cached data is high and the preset weight of the historically accessed first to-be-cached data is low, the problem that the newly added data is frequently eliminated in the existing LFU algorithm and the historical data can not be eliminated permanently can be solved.
According to the technical scheme of the embodiment of the application, the first data to be cached on the current day is divided into a preset number of time intervals according to the access time, the preset weight of each time interval is linearly increased along with the use of the first data to be cached, the score of each use of the first data to be cached is weighted and calculated to obtain the access time sequence index of each first data to be cached, the access time sequence indexes of each first data to be cached are sorted from top to bottom, and when the first data to be cached with low score are eliminated in the follow-up process. Because the preset weight of the time interval of the newly added first to-be-cached data is high and the preset weight of the historically accessed first to-be-cached data is low, the problem that the newly added data is frequently eliminated in the existing LFU algorithm and the historical data can not be eliminated permanently can be solved.
After the first data to be cached is cached in the first-level cache database, new first data to be cached may be added to the first-level cache database in the following, and at this time, old first data to be cached in the first-level cache database needs to be eliminated. As another implementation manner of the data query method of the present application, when storing the newly added first to-be-cached data, a manner of eliminating the old first to-be-cached data in the first-level cache database may specifically refer to the following embodiment.
After the step S2202, the embodiment of the present application may further include the steps of:
s2203, receiving new first to-be-cached data.
When storing the newly added first data to be cached and eliminating the old first data to be cached in the first-level cache database, the new first data to be cached is acquired first.
S2204, removing the first to-be-cached data with the lowest access time sequence index from the first-level cache database, and caching new first to-be-cached data to the first-level cache database.
When new first data to be cached is obtained, the first data to be cached with the lowest access time sequence index in the first-level cache database is eliminated, and the new first data to be cached is cached in the first-level cache database.
In an example, specifically, the access timing indicator of the new first data to be cached may be calculated based on the method in S2201 in the foregoing embodiment, and after the access timing indicator of the new first data to be cached is based on the access timing indicator of the new first data to be cached and the first data to be cached with the lowest access timing indicator is deleted, the access timing indicators of other old first data to be cached are reordered, and the other old first data to be cached is cached in the first-level cache database.
In one example, fig. 6 is a schematic diagram of an update to a level one cache database. Fig. 6 (1) shows an initial state of the first-level cache database, fig. 6 (2) shows that if L in (1) is accessed, the number of times of access of L is increased by 1, and fig. 6 (3) shows that data M (new first data to be cached) is added on the basis of (2), the first data to be cached K with the minimum access timing index in the first-level cache database is deleted, and M is cached in the first-level cache database. In fig. 6, (4) shows that M is accessed on the basis of (3), and the number of accesses by M is increased by 1.
Thus, when eliminating, the first to-be-cached data with low scores is preferentially eliminated. Because the preset weight of the time interval of the newly added first to-be-cached data is high and the preset weight of the historically accessed first to-be-cached data is low, the problem that the newly added data is frequently eliminated in the existing LFU algorithm and the historical data can not be eliminated permanently can be solved. Therefore, when the new first to-be-cached data is subsequently accessed, the first to-be-cached data can be directly accessed in the first-level cache database, the first to-be-cached data is not newly added into the memory and cannot be accessed, and the hit rate of data query is improved.
According to the technical scheme of the embodiment of the application, when the data are eliminated, the first to-be-cached data with low scores are preferentially eliminated. Because the preset weight of the time interval of the newly added first to-be-cached data is high and the preset weight of the historically accessed first to-be-cached data is low, the problem that the newly added data is frequently eliminated in the existing LFU algorithm and the historical data can not be eliminated permanently can be solved. Therefore, when the new first to-be-cached data is subsequently accessed, the first to-be-cached data can be directly accessed in the first-level cache database, the first to-be-cached data is not newly added into the memory and cannot be accessed, and the hit rate of data query is improved.
In order to prevent the problem that some hot spot data with periodic high-frequency access causes an avalanche phenomenon when periodic batch query comes again, the current LRU algorithm and LFU algorithm do not cache the part of data, the scheme designs a second-level cache database, and the second-level cache database stores the part of data.
In order to prevent some hot spot data (i.e. second data to be cached) with periodic high-frequency access from causing an avalanche phenomenon, when a periodic batch query comes again, the application provides another implementation manner of data query, specifically, a caching method of a second-level cache database, and specifically, refer to the following embodiments.
In order to prevent some hot spot data with periodic high-frequency access from causing an avalanche phenomenon, when a periodic batch query is started again, the problem of the avalanche phenomenon is caused, and the step S230 may specifically include the following steps:
s2301, obtaining a feature vector corresponding to each second data to be cached.
The feature vector may be a vector characterizing a feature attribute of the second data to be cached.
In one example, the feature vector may be: the access amount and the cache hit rate of the second data to be cached in the third preset time period, the access amount and the cache hit rate of the second data to be cached in the fourth preset time period, and the access amount and the cache hit rate of the second data to be cached in the fifth preset time period.
The third preset time period, the fourth preset time period and the fifth preset time period may all be one preset time period.
In one example, the fourth preset time period is greater than the fifth preset time period and less than the third preset time period.
In one example, taking the fourth preset time period as one month as an example, the third preset time period may be one week, and the fifth preset time period may be 3 months.
In one example, the access amount may be a number of times the second data to be cached is used within a preset time period.
In one example, the cache hit rate may be the number of times the second data to be cached passes through the memory within a preset time period.
S2302, inputting the feature vectors into the trained linear regression model to obtain the current query times of each second data to be cached.
After the feature vector of each second data to be cached is obtained, the feature vector is input into a trained linear regression model, and the current query times of each second data to be cached are obtained based on the linear regression model, that is, the query times of each second data to be cached on the same day are obtained.
In one example, assuming a linear relationship between the number of queries for the current day and the feature vector, it can be expressed as the following equation:
wherein,for the number of queries for the second data to be stored on the current day, θ = { θ = { (for the second data to be stored on the current day) 1 ,θ 2 ...θ n And theta is a constant, the calculation of particular theta will be described later,X i are the feature vectors of the second data to be stored.
Linear regression calculation is converted intoAnd solving the minimum value. Wherein, y i The number of times of inquiry of the second data to be cached on the same day.
In one example, the linear regression model is trained as follows:
To findIs converted so that J (theta) = (y-X. Theta) T (y-X. Theta.) is minimized, and J (theta) is differentiated such thatAnd (3) calculating: θ = (X) T X) -1 X T y
Theta can be calculated through the steps, and further a linear regression equation can be calculated.
And after a linear regression equation is obtained, inputting each feature vector into the linear regression equation, and obtaining the query times of the second data to be cached in the same day.
And S2303, sequencing the query times in a descending order, and caching the second data to be cached into the second-level cache database in sequence.
After the query times of the second data to be cached on the same day are obtained, the query times can be sequenced from large to small, and then the second data to be cached is cached in the second-level cache database in sequence.
Therefore, the second data to be cached is cached into the second-level cache database according to the acquired query times of the second data to be cached on the same day, the problem that the historical periodic high-frequency access data cannot be cached is solved, and the hit rate of the historical hot data is improved.
In the embodiment of the present application, the identification principle of the periodic high-frequency data (second data to be cached) is as follows: the data access situation of a system tends to be more linearly predictable as the base of data stored by the system is larger. Selecting the access quantity and the cache hit rate of the second data to be cached in the past three months, the past one month and the past one week as characteristic vectors, predicting the data access frequency in the current day through a linear regression algorithm, sequencing the predicted access frequency from high to low, and optimizing the data cached with high predicted values in the memory.
According to the technical scheme, the number of times of inquiry of the second data to be cached in the same day is predicted by using a linear regression model for the acquired feature vectors of the second data to be cached, and the second data to be cached is cached in the second-level cache database, so that the problem that historical periodic high-frequency access data cannot be cached is solved, and the hit rate of the historical hot data is improved.
After the second data to be cached is cached in the second-level cache database, new second data to be cached may be subsequently added to the second-level cache database, and at this time, old second data to be cached in the second-level cache database needs to be eliminated. As another implementation manner of the data query method of the present application, when storing newly added second data to be cached, an old second data to be cached in the secondary cache database is eliminated, which may be specifically referred to as the following embodiment.
After the step S2303, the embodiment of the present application may further include the following steps:
s2304, receiving new second data to be cached.
When the newly added second data to be cached is stored and the old second data to be cached in the second-level cache database is eliminated, the new second data to be cached is firstly acquired.
S2305, removing the second data to be cached with the lowest query frequency from the second-level cache database, and caching new second data to be cached to the second-level cache database.
And when new second data to be cached is obtained, eliminating the second data to be cached with the lowest query frequency in the secondary cache database, and caching the new second data to be cached into the secondary cache database. Specifically, the number of times of querying of the new second data to be cached on the current day may be predicted based on the method in S2202 in the foregoing embodiment, and after the number of times of querying of the new second data to be cached is deleted, and the number of times of querying of other old second data to be cached is deleted, the new second data to be cached and the second data to be cached with the lowest number of times of querying of the current day are reordered, and the other old second data to be cached are cached in the second-level cache database.
Therefore, when the data is eliminated, the second data to be cached with low score can be eliminated, so that the problem that some hot spot data with periodic high-frequency access causes an avalanche phenomenon, and the avalanche phenomenon is generated when periodic batch query is carried out again.
According to the technical scheme, during elimination, the second data to be cached with low scores can be eliminated, so that the problem that some hot data with periodic high-frequency access causes an avalanche phenomenon, and the avalanche phenomenon is generated when periodic batch query is carried out again is solved. Therefore, when new second to-be-cached data is accessed subsequently, the second to-be-cached data can be directly accessed in the second-level cache database, the situation that the accessed second to-be-cached data is accessed frequently before but is accessed rarely recently and cannot be accessed is avoided, and the hit rate of data query is improved.
In an example, after the feature vectors of the second data to be stored are obtained, there may be a case that the feature attributes of the second data to be stored represented by the two feature vectors are the same, and at this time, the feature vectors need to be screened to screen the feature vectors with the same feature attributes, so as to reduce the amount of computation, improve the computation efficiency, and save the hardware computation resources.
In order to filter each feature vector to filter feature vectors with the same feature attributes, so as to reduce the amount of computation, improve the computation efficiency, and save hardware computing resources, the embodiment of the present application further provides another data fragmentation method, which may specifically be a method for filtering each feature vector, and specifically refer to the following embodiments.
After the feature vectors corresponding to the second data to be cached are obtained, the method further comprises the following steps:
s2307, for any current second data to be cached, calculating the correlation between the access amount and the cache hit rate of the current second data to be cached in the third preset time period, the access amount and the cache hit rate in the fourth preset time period, and the access amount and the cache hit rate in the fifth preset time period.
The current second data to be cached may be the second data to be cached that is currently to be processed.
After the feature vectors of the second data to be cached are obtained, calculating the correlation between the access amount and the cache hit rate of the current second data to be cached in a third preset time period, the correlation between the access amount and the cache hit rate in a fourth preset time period, and the correlation between the access amount and the cache hit rate in a fifth preset time period.
In one example, the similarity between feature vectors may be calculated based on:
the correlation coefficient between two eigenvectors is formulated as follows:
wherein ρ XY Representing the similarity of two feature vectors; x and Y represent two feature vectors; d (X) represents the variance of the values of X this eigenvector, D (Y) represents the variance of the values of Y this eigenvector, and Cov (X, Y) represents the covariance matrix of the values of the two eigenvectors X and Y.
In the above formula, ρ XY The larger the representation, the more similar the two eigenvectors X and Y are.
And calculating the correlation between any two eigenvectors in the correlation between the access quantity and the cache hit rate of each second data to be cached in the third preset time period, the access quantity and the cache hit rate in the fourth preset time period and the access quantity and the cache hit rate in the fifth preset time period according to the formula.
S2308, reserving any one of the two characteristic vectors with the correlation larger than or equal to the preset correlation threshold value to obtain a target characteristic vector corresponding to the second data to be cached.
The preset correlation threshold may be a similarity threshold between the feature vectors set in advance.
The target feature vector may be a feature vector obtained by screening the feature vector.
After the correlation between any two eigenvectors is obtained, any one eigenvector of the two eigenvectors with the correlation larger than or equal to the preset correlation threshold value can be selected and reserved, and a target eigenvector corresponding to the second data to be cached is obtained.
In one example, continuing the above example, the correlation number ρ is given XY Only one of the feature vectors, p, is selected for which a threshold value is greater than or equal to (e.g., the threshold value may be 0.9) XY The larger the correlation among the characteristic vectors is, the stronger the correlation among the characteristic vectors is, the characteristic vector selection principle is to find the vectors with little correlation, and a final sample set is formed.
Therefore, the feature vectors with the same feature attributes are screened out by screening the feature vectors, so that the calculation amount is reduced, the calculation efficiency is improved, and the hardware calculation resources are saved.
After the feature vectors are screened, correspondingly, the feature vectors are input into a trained linear regression model, which may specifically be: and inputting the target feature vector into a trained linear regression model.
According to the technical scheme of the embodiment of the application, the feature vectors with the same feature attributes are screened out by screening the feature vectors, so that the calculated amount is reduced, the calculation efficiency is improved, and hardware calculation resources are saved.
Based on the data query method provided by the foregoing embodiment, correspondingly, the present application further provides a specific implementation manner of the data slicing apparatus, please refer to the following embodiments.
Referring first to fig. 7, a data query apparatus provided in an embodiment of the present application includes the following modules:
a data query request receiving module 410, configured to receive a data query request sent by a client, where the data query request includes data to be queried;
a to-be-queried data query module 420, configured to query, based on the data query request, the to-be-queried data in a first-level cache database, a second-level cache database, and a preset cache database in sequence;
a to-be-queried data feedback module 430, configured to feed back the to-be-queried data to the client when the to-be-queried data is queried in any one of the primary cache database, the secondary cache database, and the preset cache database; the first to-be-cached data in the first-level cache database is data with access times larger than or equal to preset access times within a first preset time period; the second data to be cached in the second-level cache database is data with the access times being greater than or equal to the preset access times within a second preset time period, and the second preset time period is greater than the first preset time period; the cache data in the preset cache database is third data to be cached: the third data to be cached is cached data except the cached data in the first-level cache database and the cached data in the second-level cache database.
According to the technical scheme of the embodiment of the application, the cache database is divided into a first-level cache database and a second-level cache database, wherein first data to be cached in the first-level cache database is data with access times larger than or equal to preset access times within a first preset time period; the second data to be cached in the second-level cache database is data with access times larger than or equal to the preset access times within a second preset time period; the cache data in the cache database is preset to be the third to-be-cached data except the cache data in the first-level cache database and the cache data in the second-level cache database, so that when data needs to be queried subsequently, the near-term high-frequency data is queried through the first-level cache database, the high-frequency data in the past period is queried through the second-level cache database, and the cache query hit rate is effectively improved through the cache mode.
As an implementation manner of the present application, before querying data, data to be cached needs to be cached in the first-level cache database, the second-level cache database, and the preset cache database, and for detailed description, the data to be cached needs to be cached in the first-level cache database, the second-level cache database, and the preset cache database, the apparatus may further include:
the target data to be cached acquiring module is used for acquiring target data to be cached; wherein, the target data to be cached comprises: the first data to be cached, the second data to be cached and the third data to be cached;
the first-level cache database determining module is used for caching the first data to be cached into the first-level cache database based on the access time sequence index of the first data to be cached in unit time;
the second-level cache database determining module is used for caching the second data to be cached into the second-level cache database based on the predicted accessed frequency of the second data to be cached in unit time;
and the preset cache database determining module is used for storing the third data to be cached into the preset cache database.
As an implementation manner of the present application, in order to describe in detail a caching manner of first data to be cached in the first-level cache database, the first-level cache database determining module may specifically include:
an access time sequence index determining unit, configured to determine an access time sequence index of each piece of first data to be cached, based on a preset number of time intervals per unit time that are divided, a preset weight corresponding to each time interval, and the number of times that each piece of first data to be cached is accessed in each time interval;
and the primary cache database determining unit is used for sequencing the access time sequence indexes from large to small and sequentially caching the first data to be cached into the primary cache database.
As an implementation manner of the present application, in order to cache the new first data to be cached in the first-level cache database, the first-level cache database determining module may further specifically include:
the new first data to be cached receiving unit is used for receiving new first data to be cached;
and the first-level cache database updating unit is used for removing the first to-be-cached data with the lowest access time sequence index from the first-level cache database and caching the new first to-be-cached data to the first-level cache database.
As an implementation manner of the present application, in order to describe in detail a caching manner of second data to be cached in the second-level cache database, the second-level cache database determining module may specifically include:
a feature vector obtaining unit, configured to obtain a feature vector corresponding to each piece of second data to be cached;
the query frequency prediction unit is used for inputting the feature vector into a trained linear regression model to obtain the current query frequency of each second data to be cached;
and the second-level cache database determining unit is used for sequencing the query times from large to small and sequentially caching the second data to be cached into the second-level cache database.
As an implementation manner of the present application, in order to cache the new second data to be cached in the second level cache database, the second level cache database determining module may further specifically include:
the new second data receiving unit to be cached is used for receiving new second data to be cached;
and the second-level cache database updating unit is used for removing the second data to be cached with the lowest query frequency from the second-level cache database and caching new second data to be cached to the second-level cache database.
In one example, the feature vector includes: the data caching method comprises the steps of obtaining the access quantity and the cache hit rate of second data to be cached in a third preset time period, obtaining the access quantity and the cache hit rate of the second data to be cached in a fourth preset time period, obtaining the access quantity and the cache hit rate of the second data to be cached in a fifth preset time period, and obtaining the access quantity and the cache hit rate of the second data to be cached in the fifth preset time period, wherein the fourth preset time period is larger than the fifth preset time period and smaller than the third preset time period.
As an implementation manner of the present application, in order to filter feature vectors with the same feature attributes, reduce the amount of computation, improve the computation efficiency, and save hardware computation resources, the secondary cache database determination module may further include:
the correlation determination unit is used for calculating the correlation between the access amount and the cache hit rate of any current second data to be cached in a third preset time period, the access amount and the cache hit rate in a fourth preset time period and the access amount and the cache hit rate in a fifth preset time period;
and the target characteristic vector determining unit is used for reserving any one of the two characteristic vectors with the correlation greater than or equal to a preset correlation threshold value to obtain a target characteristic vector corresponding to the second data to be cached.
Correspondingly, the query number prediction unit may be specifically configured to: and inputting the target characteristic vector into a trained linear regression model to obtain the current query times of each second data to be cached.
The data query device provided in the embodiment of the present application may be configured to execute the data query method provided in the foregoing method embodiments, and the implementation principle and technical effects are similar, which are not described herein again for the sake of brevity.
Based on the same inventive concept, the embodiment of the application also provides the electronic equipment.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device may include a processor 501 and a memory 502 storing computer programs or instructions.
Specifically, the processor 501 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured as one or more Integrated circuits implementing embodiments of the present invention.
The processor 501 reads and executes the computer program instructions stored in the memory 502 to implement any one of the data query methods in the above embodiments.
In one example, the electronic device can also include a communication interface 503 and a bus 510. As shown in fig. 8, the processor 501, the memory 502, and the communication interface 503 are connected via a bus 510 to complete communication therebetween.
The communication interface 503 is mainly used for implementing communication between modules, devices, units and/or devices in the embodiments of the present invention.
The electronic device may execute the video rate control method in the embodiment of the present invention, so as to implement the data query method described in any one of fig. 3 to fig. 6.
In addition, in combination with the data query method in the foregoing embodiments, the embodiments of the present invention may provide a readable storage medium to implement. The readable storage medium having stored thereon program instructions; the program instructions, when executed by a processor, implement any of the data query methods in the above embodiments.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an Erasable ROM (EROM), a floppy disk, a CD-ROM, an optical disk, a hard disk, an optical fiber medium, a Radio Frequency (RF) link, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention.
Claims (10)
1. A method of data query, the method comprising:
receiving a data query request sent by a client, wherein the data query request comprises data to be queried;
based on the data query request, sequentially querying the data to be queried in a primary cache database, a secondary cache database and a preset cache database;
under the condition that any one of the first-level cache database, the second-level cache database and the preset cache database queries the data to be queried, feeding the data to be queried back to the client;
the first data to be cached in the first-level cache database is data with access times larger than or equal to first preset access times within a first preset time period;
the second data to be cached in the secondary cache database is data with access times larger than or equal to second preset access times within a second preset time period, and the second preset time period is larger than the first preset time period;
the cache data in the preset cache database is third data to be cached: the third data to be cached is cached data except the cached data in the first-level cache database and the cached data in the second-level cache database.
2. The method of claim 1, wherein before the receiving the data query request sent by the client, the method further comprises:
acquiring target data to be cached; wherein, the target data to be cached comprises: the first data to be cached, the second data to be cached and the third data to be cached;
caching the first data to be cached into the first-level cache database based on the access time sequence index of the first data to be cached in unit time;
caching each second data to be cached into the secondary cache database based on the predicted accessed frequency of each second data to be cached in unit time;
and storing the third data to be cached into the preset cache database.
3. The method of claim 1, wherein the caching the first data to be cached in the first-level cache database based on an access timing indicator of the first data to be cached per unit time comprises:
determining an access time sequence index of each first data to be cached based on a preset number of divided time intervals in unit time, a preset weight corresponding to each time interval and the number of times that each first data to be cached is accessed in each time interval;
sequencing the access time sequence indexes from big to small, and caching the first data to be cached into the first-level cache database in sequence.
4. The method of claim 3, wherein after the caching each of the first data to be cached in the level one cache database, the method further comprises:
receiving new first data to be cached;
and removing the first data to be cached with the lowest access time sequence index from the first-level cache database, and caching new first data to be cached to the first-level cache database.
5. The method of claim 1, wherein the caching the second data to be cached in the second-level cache database based on the predicted accessed frequency per unit time of the second data to be cached comprises:
acquiring a characteristic vector corresponding to each second data to be cached;
inputting the feature vectors into a trained linear regression model to obtain the current query times of each second data to be cached;
and sequencing the query times from large to small, and caching the second data to be cached into the second-level cache database in sequence.
6. The method according to claim 5, wherein after the caching each of the second data to be cached in the second-level cache database, the method further comprises:
receiving new second data to be cached;
and removing the second data to be cached with the lowest query times from the second-level cache database, and caching new second data to be cached to the second-level cache database.
7. The method of claim 5, wherein the feature vector comprises: the access amount and the cache hit rate of the second data to be cached in a third preset time period, the access amount and the cache hit rate of the second data to be cached in a fourth preset time period, and the access amount and the cache hit rate of the second data to be cached in a fifth preset time period, wherein the fourth preset time period is greater than the fifth preset time period and is less than the third preset time period;
after the feature vector corresponding to each of the second data to be cached is obtained, the method further includes:
for any current second data to be cached, calculating the correlation between the access amount and the cache hit rate of the current second data to be cached in a third preset time period, the access amount and the cache hit rate in a fourth preset time period, and the access amount and the cache hit rate in a fifth preset time period;
reserving any one of the two eigenvectors with the correlation greater than or equal to a preset correlation threshold value to obtain a target eigenvector corresponding to the second data to be cached;
correspondingly, the inputting the feature vector into the trained linear regression model includes:
and inputting the target feature vector into a trained linear regression model.
8. A data query apparatus, characterized in that the apparatus comprises:
the data query request receiving module is used for receiving a data query request sent by a client, wherein the data query request comprises data to be queried;
a to-be-queried data query module, configured to query, based on the data query request, the to-be-queried data in the primary cache database, the secondary cache database, and a preset cache database in sequence;
the data to be queried feedback module is used for feeding back the data to be queried to the client under the condition that any one of the first-level cache database, the second-level cache database and the preset cache database queries the data to be queried; the first to-be-cached data in the first-level cache database is data with access times larger than or equal to preset access times within a first preset time period; the second data to be cached in the second-level cache database is data with the access times being greater than or equal to the preset access times within a second preset time period, and the second preset time period is greater than the first preset time period; the cache data in the preset cache database is third data to be cached: the third data to be cached is cached data except the cached data in the first-level cache database and the cached data in the second-level cache database.
9. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the data query method of any one of claims 1-7.
10. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement the steps of the data query method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110414547.6A CN115221200A (en) | 2021-04-16 | 2021-04-16 | Data query method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110414547.6A CN115221200A (en) | 2021-04-16 | 2021-04-16 | Data query method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115221200A true CN115221200A (en) | 2022-10-21 |
Family
ID=83605578
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110414547.6A Pending CN115221200A (en) | 2021-04-16 | 2021-04-16 | Data query method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115221200A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117539915A (en) * | 2024-01-09 | 2024-02-09 | 腾讯科技(深圳)有限公司 | Data processing method and related device |
CN117785949A (en) * | 2024-02-28 | 2024-03-29 | 云南省地矿测绘院有限公司 | Data caching method, electronic equipment, storage medium and device |
-
2021
- 2021-04-16 CN CN202110414547.6A patent/CN115221200A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117539915A (en) * | 2024-01-09 | 2024-02-09 | 腾讯科技(深圳)有限公司 | Data processing method and related device |
CN117539915B (en) * | 2024-01-09 | 2024-04-23 | 腾讯科技(深圳)有限公司 | Data processing method and related device |
CN117785949A (en) * | 2024-02-28 | 2024-03-29 | 云南省地矿测绘院有限公司 | Data caching method, electronic equipment, storage medium and device |
CN117785949B (en) * | 2024-02-28 | 2024-05-10 | 云南省地矿测绘院有限公司 | Data caching method, electronic equipment, storage medium and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115221200A (en) | Data query method and device, electronic equipment and storage medium | |
CN102402605B (en) | Mixed distribution model for search engine indexing | |
CN111666305B (en) | Method and system for realizing correlation between redis cache and database | |
CN106909642B (en) | Database indexing method and system | |
CN102955812B (en) | A kind of method of index building storehouse, device and querying method and device | |
CN102819586A (en) | Uniform Resource Locator (URL) classifying method and equipment based on cache | |
CN112540986A (en) | Dynamic indexing method and system for quick combined query of big electric power data | |
CN115757203B (en) | Access policy management method and device, processor and computing equipment | |
US11636112B2 (en) | Updating cache data | |
CN106874332B (en) | Database access method and device | |
CN114138840A (en) | Data query method, device, equipment and storage medium | |
CN110909266B (en) | Deep paging method and device and server | |
CN114880329A (en) | Data query method and device, storage medium and computer equipment | |
CN112199304A (en) | Data prefetching method and device | |
CN111125158B (en) | Data table processing method, device, medium and electronic equipment | |
CN109361714B (en) | User login authentication method, device, equipment and computer storage medium | |
EP3550446A1 (en) | Updating cache data | |
CN112363986B (en) | Time optimization method for file caching | |
CN112035498B (en) | Data block scheduling method and device, scheduling layer node and storage layer node | |
CN115221155A (en) | Data slicing method and device, electronic equipment and storage medium | |
CN116069777A (en) | Index creation method, apparatus, and computer-readable storage medium | |
CN118796883A (en) | Data processing method, device, equipment and storage medium | |
KR101780041B1 (en) | Method and apparatus for improving throughput of database | |
CN107679093B (en) | Data query method and device | |
CN111291040B (en) | Data processing method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |