CN110765138B - Data query method, device, server and storage medium - Google Patents

Data query method, device, server and storage medium Download PDF

Info

Publication number
CN110765138B
CN110765138B CN201911052778.6A CN201911052778A CN110765138B CN 110765138 B CN110765138 B CN 110765138B CN 201911052778 A CN201911052778 A CN 201911052778A CN 110765138 B CN110765138 B CN 110765138B
Authority
CN
China
Prior art keywords
data
array
value
null
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911052778.6A
Other languages
Chinese (zh)
Other versions
CN110765138A (en
Inventor
章碧云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201911052778.6A priority Critical patent/CN110765138B/en
Publication of CN110765138A publication Critical patent/CN110765138A/en
Application granted granted Critical
Publication of CN110765138B publication Critical patent/CN110765138B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Abstract

The embodiment of the disclosure provides a new data structure, and the data structure is used for storing and querying data in a form of a compressed data set, wherein the compressed data set uses two arrays to respectively store indexes of non-null value data and values of the non-null value data, when the value of target data is queried, if the index of the target data is found from a first array, the target data can be determined to be the non-null value data, and then the value of the target data is found from a second array, and if the first array is found not to include the index of the target data, the target data can be determined to be the null value data. The method and the device can solve the problem of cache penetration and save the space occupied by null data in the cache.

Description

Data query method, device, server and storage medium
Technical Field
The present disclosure relates to the field of storage technologies, and in particular, to a data query method, apparatus, server, and storage medium.
Background
In order to improve the query speed of data, the data in the database can be stored in a cache (English: cache), when a client requests to query a certain data in the database, the cache can be accessed first, if the data is already stored in the cache, the value of the data is read from the cache and returned to the client, and if the data is not stored in the cache, the value of the data is read from the database, so that the value of the queried data can be accelerated because the access speed of the cache is higher than that of the database.
Currently, all data to be cached is stored in the cache, usually in the form of key-value pairs. Specifically, if the data is non-null data, the server stores a key value pair in the cache for the data by using an identification number (ID) of the data as a key and a value of the data as a value; if the data is null data, the ID of the data is used as a key, 0 is used as a value, and a key-value pair is stored in the cache for the data. When the terminal needs to inquire the value of the target data and triggers an inquiry request, the server can determine the key of the target data and inquire the cache according to the key of the target data. And if the cache comprises the key of the data, inquiring the value corresponding to the key in the cache as the value of the target data. If the cache does not include a key for the data, the server determines that the data is not cached and reads the value of the data from the database back to the source database.
Taking the number of praise for caching comments as an example, if the number of praise for 3 comments is to be cached, wherein the number of praise for a comment with ID 001 is 5, the number of praise for a comment with ID 002 is 0, and the number of praise for a comment with ID 003 is 0, the server stores 3 key-value pairs in the cache, the key-value pair of comment 1 is key:001, value. When inquiring the number of praise of the comment of which the ID is 002, the server inquires the cache by taking 002 as a key, and the value is obtained as 0; when inquiring the approval number of the comment with the ID of 006, the server inquires the cache with 006 as key, finds that the comment with 006 as key is not included in the cache, determines that the comment with the ID of 006 is not cached, returns to the source database, and inquires the database with 006 as key, thereby reading the approval number of the comment with ID of 006 from the database.
When the method is adopted, the key and the value of the null data are also stored in the cache, and whether the target data to be queried is the null data can be determined after the cache is queried, so that the null data occupies a large storage space in the cache, and the storage space in the cache is wasted.
Disclosure of Invention
The present disclosure provides a data query method, apparatus, server and storage medium, to at least solve the technical problem of wasting storage space in cache in related art. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a data query method, including:
receiving a query request, wherein the query request is used for querying the value of target data;
querying a compressed data set corresponding to the target data in a cache, wherein the compressed data set comprises a first array and a second array, the first array is used for storing indexes of non-null value data, and the second array is used for storing values of the non-null value data;
querying the value of the target data from the second array if the first array includes an index of the target data;
and if the first array does not comprise the index of the target data, determining that the target data is null data.
Optionally, said querying the value of the target data from the second array comprises:
determining the position of the index of the target data in a first array;
and inquiring the value stored in the position in the second array according to the position of the index in the first array as the value of the target data.
Optionally, the indexes of the non-null data in the first array are sequentially arranged from small to large; or the indexes of the non-null value data in the first array are sequentially arranged from large to small;
the determining the position of the index of the target data in the first array comprises:
and inquiring the index of the target data from the first array based on a binary search method to obtain the position of the index in the first array.
Optionally, before determining the position of the index of the target data in the first array, the method further includes:
and reading a value on a first digit in the identification of the target data as an index of the target data.
Optionally, the querying a compressed data set corresponding to the target data in the cache includes:
reading a value on a second digit in the identification of the target data;
and inquiring the compressed data set from the cache according to the value on the second digit.
Optionally, after receiving the query request, the method further includes:
if the cache does not comprise the compressed data set, normalizing the identifier of the target data into a preset identifier corresponding to the compressed data set;
and reading the target data from a database according to the preset identification.
Optionally, the reading the target data from a database according to the preset identifier includes:
determining a target table in which the target data are located in the database according to the identification of the target data;
and reading the target data and other data except the target data in the compressed data set from the target table.
Optionally, the determining, according to the identifier of the target data, a target table in which the target data is located in the database includes:
taking a module of the tail number of the identifier of the target data to obtain a module value;
and determining a table identified as the modulus value in the database as the target table.
Optionally, after querying the compressed data set corresponding to the target data in the cache, the method further includes:
and if the first array is empty, inquiring the value stored in the position corresponding to the index in the second array as the value of the target data.
Optionally, before receiving the query request, the method further includes:
acquiring at least one datum to be cached;
generating the compressed data set according to non-null data in the at least one data;
and storing the compressed data set into a cache.
Optionally, the generating the compressed data set according to non-null data in the at least one data includes:
if the number of the non-null data is less than or equal to a number threshold, storing the index of the non-null data to the first array, and storing the value of the non-null data to the second array;
and if the number of the non-null data is larger than the number threshold, configuring the first array to be null, and storing the value of each data in the at least one data to the second array.
According to a second aspect of an embodiment of the present disclosure, there is provided a data query apparatus including:
a receiving unit configured to perform receiving a query request for querying a value of target data;
the query unit is configured to execute query on a compressed data set corresponding to the target data in the cache, wherein the compressed data set comprises a first array and a second array, the first array is used for storing indexes of non-null value data, and the second array is used for storing values of the non-null value data;
the querying unit is further configured to perform querying the value of the target data from the second array if the first array includes the index of the target data;
a determining unit configured to perform determining that the target data is null data if the first array does not include the index of the target data.
Optionally, the querying unit is configured to perform: determining the position of the index of the target data in a first array; and inquiring the value stored in the position in the second array according to the position of the index in the first array as the value of the target data.
Optionally, the indexes of the non-null data in the first array are sequentially arranged from small to large; or the indexes of the non-null value data in the first array are sequentially arranged from big to small;
the determination unit configured to perform: and inquiring the index of the target data from the first array based on a binary search method to obtain the position of the index in the first array.
Optionally, the apparatus further comprises:
and the reading unit is configured to execute reading of a value on a first digit in the identification of the target data as an index of the target data.
Optionally, the query unit includes:
a reading subunit configured to perform reading of a value on a second digit in the identification of the target data;
a query subunit configured to perform a query of the compressed data set from the cache according to the value on the second digit.
Optionally, the apparatus further comprises:
the normalization unit is configured to normalize the identifier of the target data to a preset identifier corresponding to the compressed data set if the cache does not comprise the compressed data set;
and the reading unit is configured to read the target data from a database according to the preset identification.
Optionally, the reading unit includes:
the determining subunit is configured to determine a target table in which the target data is located in the database according to the identifier of the target data;
a reading subunit configured to perform reading, from the target table, the target data and data other than the target data in the compressed data set.
Optionally, the tail number of the identifier of each data stored in the compressed data set is the same, and the determining subunit is configured to perform: taking a module of the tail number of the identifier of the target data to obtain a module value; and determining a table identified as the modulus value in the database as the target table.
Optionally, the querying unit is further configured to perform: and if the first array is empty, inquiring the value stored in the position corresponding to the index in the second array as the value of the target data.
Optionally, the apparatus further comprises:
an acquisition unit configured to perform acquisition of at least one data to be cached;
a generating unit configured to generate the compressed data set according to non-null data in the at least one data;
a storage unit configured to perform storing the compressed data set into a cache.
Optionally, the generating unit is configured to perform: if the number of the non-null data is less than or equal to a number threshold, storing the index of the non-null data to the first array, and storing the value of the non-null data to the second array; and if the number of the non-null data is larger than the number threshold, configuring the first array to be null, and storing the value of each data in the at least one data to the second array.
According to a third aspect of the embodiments of the present disclosure, there is provided a server, including:
one or more processors;
one or more memories for storing the one or more processor-executable instructions;
wherein the one or more processors are configured to execute the instructions to implement the data query method described above.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, wherein when instructions of the storage medium are executed by a processor of a server, the server is enabled to execute the above data query method.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising one or more instructions that, when executed by a processor of a server, enable the server to perform the above-mentioned data query method.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
the disclosed embodiments provide a new data structure, by storing and querying data in the form of a compressed data set using two arrays to store indices of non-null data and values of the non-null data, respectively, when a value of target data is to be queried, if an index of the target data is found from a first array, it can be determined that the target data is non-null data, and then the value of the target data is found from a second array, and if the first array is found not to include the index of the target data, it can be determined that the target data is null data. On one hand, whether target data to be queried is null data or not can be determined after the cache is queried, and a value of non-null data is found, so that a process of returning to a source database is omitted, and the problem of cache penetration is solved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a block diagram illustrating the structure of a data query system in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating a method of data querying in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a method of data querying in accordance with an exemplary embodiment;
FIG. 4 is a block diagram illustrating a data query device in accordance with an exemplary embodiment;
fig. 5 is a block diagram illustrating a server in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The data to which the present disclosure relates may be data that is authorized by a user or sufficiently authorized by parties.
Some terms related to the present disclosure are explained below:
source return reading: when a terminal requests to query a certain data, it usually accesses the cache first, queries the value of the data from the cache, if the data exists in the cache, reads the value of the data from the cache, and returns the value of the data to the terminal, and if the data does not exist in the cache, it returns to the source database, sends a query request to the database, and reads the value of the data from the database, which is called as source return reading. Generally speaking, after data is read from a database, the value of the data can be stored in the cache again, so that the next time the terminal requests to query the data, the value of the data can be obtained from the cache, and the process of returning to the source database is omitted.
Cache (cache) penetration: if the data value is not stored in the database, that is, the data is null data, a cache miss occurs each time the terminal requests to query the data value, and then the data returns to the source database, which is called cache penetration. For example, in a comment approval scene, when the terminal requests to view a comment list, the server queries the approval number of each comment in the comment list. In general, when the database stores the number of praise for each comment, if the number of praise for a certain comment is 0, the database does not record the number of praise for the comment, so that the number of praise for the comment is a null value, therefore, when the server queries the number of praise for the comment, a cache miss occurs, and then the database returns, and after the database returns, the server can determine that the number of praise for the comment in the database is a null value, and then return the number of praise to the terminal as 0. Because the comment praise number is usually sparse, that is, praise comments only account for a small part, and most comment praise numbers are usually 0, a large amount of null data exists in the database, so that cache penetration frequently occurs.
Hereinafter, effects of the present disclosure will be described in conjunction with exemplary application scenarios.
In the related art, in order to solve the problem of cache penetration, it is common to store null data in the cache in the form of a key-value pair by using the ID of the null data as a key and 0 as a value. For example, if the database does not store the number of likes of a certain comment, 0 is used as the number of likes of the comment, and the comment ID and the key-value pair consisting of 0 are stored in the cache. When the method is adopted, null data is cached in a key value pair mode, so that the null data occupies the storage space in the cache, and the waste of the storage space is caused. In particular, if the amount of data to be cached is huge and the data has a sparse characteristic, for example, in a scene of comment approval, when the number of comments of a certain piece of information reaches the billion order, the number of approved comments in the billion order of comments may be only the billion order, that is, the number of disapproved comments may reach the billion order or even the billion order, and for such data with an obviously sparse characteristic, if the data is stored in this way, a huge cache space is obviously consumed.
In the embodiment of the present disclosure, a new compressed data structure is adopted to store data, the compressed data structure is a compressed data set, and the index and the value of the non-null value data are stored through two arrays respectively. On one hand, the storage space occupied by null value data is saved, so that the storage space of the cache is saved, for example, if the praise number of the comments in the billion order is cached, the praise number of the comments which are praised in the comment is stored in the compressed data set, and the storage space which is not occupied by the praise comments originally is saved. On the other hand, when the terminal requests to inquire certain data, if the index of the data is not inquired from the compressed data set, the data can be directly determined to be null data, so that the process of returning to the source database is omitted, and the problem of cache penetration is solved. For example, if a comment is not approved, and the terminal currently requests to query the number of endorsements for the comment, after the server finds the compressed data set where the comment is located, when the compressed data set is found not to store the index of the comment, it may be determined that the comment is null data, and the number of endorsements is returned to the terminal as 0, without returning to the source database to continue querying the number of endorsements for the comment. In addition, a new format is designed for the data identification, the tail number of the data identification is reserved, and the tail number and the table where the data is located in the database are correlated, so that all data of the same compressed data set are cached in the same table of the database, and the performance overhead of the source database is reduced. In addition, the index of the data in the same compressed data set is normalized when the source database returns, so that the problem of repeated reading which possibly occurs when a plurality of data returns to the source database simultaneously is solved, and the performance overhead of the source database is further reduced.
Hereinafter, a system architecture of the embodiment of the present disclosure is described.
FIG. 1 is a block diagram illustrating the structure of a data query system in accordance with an exemplary embodiment. The data query system comprises: terminal 101, server 102, cache 103, and database 104. The terminal 101, the server 102, the cache 103 and the database 104 are connected through a wireless network or a wired network, and the terminal 101, the server 102, the cache 103 and the database 104 can communicate with each other through the network.
The terminal 101 may be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an e-book reader, an MP3 (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, and a laptop computer. The terminal 101 is installed and operated with an application program supporting data storage or data query, which may be referred to as a storage client, and the application program may be a live application, a multimedia application, a short video application, a content sharing application, and the like. Illustratively, the terminal 101 is a terminal used by a user, and a user account is registered in an application running in the terminal 101.
The server 102 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The server 102 is used for providing background services for the application program installed in the terminal.
Cache 103 may be a cache, a type of high speed memory.
In the process of storing data, the terminal 101, the server 102, the cache 103, and the database 104 may interact. As an example, the terminal 101 may provide data, send the data to the server 102, and the server 102 may store the data to the cache 103 and the database 104. When the terminal 101 is to query data, a query instruction may be generated and sent to the server 102, and the server may access the cache 103 in response to the query instruction, query the data from the cache 103, return the value of the data to the terminal 101 if the value of the data is read from the cache 103, return to the source database 104 if the value of the data is not read from the cache 103, read the value of the data from the database 104, and return the value of the data to the terminal 101.
Those skilled in the art will appreciate that the number of terminals 101, servers 102, caches 103, and databases 104 may be greater or fewer. For example, the number of the terminal 101, the server 102, the cache 103, and the database 104 may be only one, or the number of the terminal 101, the server 102, the cache 103, and the database 104 may be tens or hundreds, or more, in which case the data query system further includes other terminals, other servers, other caches, or other databases. The number and the device type of the terminal, the server, the cache and the database are not limited in the embodiment of the present disclosure.
The method flow of the embodiments of the present disclosure is described below.
Fig. 2 is a flowchart illustrating a data query method according to an exemplary embodiment, which is used in a server, as shown in fig. 2, and includes the following steps.
In step S21, a query request for querying a value of target data is received.
In step S22, a compressed data set corresponding to the target data in the cache is queried, where the compressed data set includes a first array and a second array, the first array is used to store an index of non-null-value data, and the second array is used to store a value of the non-null-value data.
In step S23, if the first array includes the index of the target data, the value of the target data is queried from the second array.
In step S24, if the first array does not include the index of the target data, the target data is determined to be null data.
Fig. 3 is a flowchart illustrating a data query method according to an exemplary embodiment, which is used in a server, as shown in fig. 3, and includes the following steps.
In step S31, the server acquires at least one data to be cached.
The data to be buffered may be non-null data or null data. The value of the non-null data is a value other than a null value, and the value of the null data is a null value. The null value is a value whose value is unknown, that is, a value in which no data is stored in the database. Optionally, the type of the data may be a long type, that is, a long and integer type, and of course, the type of the data may also be a structured complex type or other types, and the type of the data is not limited in this embodiment. Taking the number of praise for caching the comments as an example, the at least one piece of data to be cached may be the number of praise for 100 comments, for example, the data may be: the number of praise for reviews with ID "12345600001" is 10; the number of comments with ID "12345600002" is 0, and so on. If the number of praise of a certain comment is 0, the database will not normally store the number of praise of the certain comment, so in this example, the null value data is the comment with the number of praise of 0.
Each data may have a corresponding identification. The identification of the data is used to identify the corresponding data, and may be, for example, an ID of the data. Optionally, values on the first digits in the identifiers of different data in the at least one piece of data to be cached may be different from each other, and values on the second digits in the identifiers of different data in the at least one piece of data to be cached are all the same. The first digit and the second digit are different digits, the first digit may include one or more digits, and a value on the first digit may be used as an index of data. The second digit may include one or more digits, and a value on the second digit may indicate a table in which the data is located in the database. For example, a first digit may be kilobits and tens of thousands, and a second digit may be more than kilobits and tens of thousands, then the identification of the data would satisfy the format: XXXX XXX, where X is a predetermined value,? Is an index of the data. Taking 100 pieces of data stored at a time as an example, the identifiers of the 100 pieces of data can be (XXXX 00XXX, XXXX01XXX, xxxx02xxx.. XXXX99 XXX), wherein the values of thousands and ten thousands of bits in the identifiers of 100 pieces of data are 00 to 99 in sequence, and the values of thousands and other bits than ten thousands of bits in the identifiers of 100 pieces of data are equal. Taking the value of other digits as 123456_ _001as an example, the identifiers of 100 pieces of data can be (12345600001, 12345601001,12345602001.. 12345699001).
In step S32, the server generates a compressed data set according to non-null data in the at least one data.
For each data in at least one data, the server may determine whether a value of the data is a null value, determine that the data is null data if the value of the data is a null value, and determine that the data is non-null data if the value of the data is not a null value. The server may generate a compressed data set according to non-null data in the data to be cached, so as to cache the at least one data by compressing the data set.
The compressed data set is a data structure provided in this embodiment for storing data in the cache, and the compressed data set may be regarded as one data bucket. Alternatively, the compressed data set may be used to store a preset amount of data, which may be set based on experimentation, experience, or need. For example, the preset number may be 100, and in the process of buffering data, the data may be organized in a manner of one data bucket per 100 data. The compressed data set may be denoted as CompressCacheValue (meaning compressed cache value) in the program. The compressed data set may include two arrays, referred to herein as a first array and a second array, respectively, for the purpose of distinguishing between descriptions.
The first array is used for storing the index of the non-null data, and particularly, the first array can comprise at least one position, and each position is used for storing one index of the non-null data. Alternatively, the first array may be recorded as a packet _ index in the program, and each position of the first array may be used to store one 32-bit unsigned integer data.
In some embodiments, the first array may store a plurality of indexes of non-null data, and the indexes of the non-null data in the first array may be arranged in order from small to large, that is, the indexes stored in the first array satisfy the rule of ascending order. In other embodiments, the indexes of the non-null data in the first array may be arranged in descending order, that is, the indexes stored in the first array satisfy the rule of descending order.
The index is used to look up the value of the corresponding data in the second array, and the index may be a position index of the value of the data in the second array. In some embodiments, the index of the data may be associated with an identification of the data, and the index of the data may be determined based on the identification of the data. As a possible implementation, the index of the data may be a value on a first digit in the identification of the data. For example, if the first digit is kilobits and tens of thousands, the ID of the data is 12345601001, and values on the kilobits and tens of thousands of the ID are 01, the index of the data is 01.
The second array is for storing values of the non-null data, and in particular, the second array may comprise at least one location, each location for storing a value of the non-null data. For example, the second array may be used to store the number of likes for a batch of reviews, with each location in the second array storing the number of likes for a review. Alternatively, the second array may be recorded as a value (meaning value) in the program, and each position of the second array may be used to store one 64-bit unsigned integer data. Each position of the second array may correspond to each position of the first array one-to-one, and an index stored at any position in the first array may point to a value stored at the corresponding position in the second array. For example, if an index of any data is stored at the ith position in the first array, the value of the data may also be stored at the ith position in the second array, where i is 0 or a positive integer, the starting position of the first array is the 0 th position, and the starting position of the second array is the 0 th position.
Alternatively, in some embodiments, a compact data set may be defined using protobuf (protobuf is a tool developed by google for efficient storage and reading of structured data). Illustratively, the compressed data set may be defined by the following program code:
message CompressCacheValue{
repeated uint32 bucket_index=1;
repeated uint64 value=2;
the meaning of the program code is: a data structure is defined, named CompressCacheValue, which contains two attribute fields, one named bucket _ index, which is a tuple for storing 32-bit unsigned integer data, and the other named value, which is a tuple for storing 64-bit unsigned integer.
In one exemplary scenario, if the number of praise for 100 comments is to be cached through the compressed data set, the IDs of the 100 comments are (12345600001, 12345601001,12345602001.. 12345699001), respectively, and 6 of the 100 comments are non-null data, which are: a comment with ID of 123456_00_001 with a praise number of 10; a review with ID of 123456_01_001 with a number of praise of 20; a comment with ID of 123456_02_001 with a praise number of 15; a comment with ID of 123456\ u 16 \001with a praise number of 89; a comment with ID of 123456_52_001 with a praise number of 198; the comment with ID of 123456_99_001 has a like number of 2500, and the remaining 94 comments are null data each having a like number of 0. In this example, values on the kilobits and the megabits of the ID can be taken as the index of the number of praise of the comment, the index of the comment with ID of 123456\u00 \001is 0, the index of the comment with ID of 123456_01_001 is 1, the index of the comment with ID of 123456_02_001 is 2, the index of the comment with ID of 123456_16_001 is 16, the index of the comment with ID of 123456 52_001 is 52, and the index of the comment with ID of 123456_99_001 is 99. If the indexes in the first array are arranged in the order from small to large, and since the indexes of the 6 pieces of non-null data are 0,1,2,16,52,99 in the order from small to large, corresponding to the order, and the values of the 6 pieces of non-null data are 10,20,15,89,198,2500 respectively, the first array is [0,1,2,16,52,99], and the second array is [10,20,15,89,198,2500]; in addition, if the indexes in the first array are sequentially arranged from large to small, the indexes of the 6 pieces of non-null value data are 99,52,16,2,1,0 sequentially from large to small, and the values of the 6 pieces of non-null value data are 2500, 198,89,15,20,10 respectively corresponding to the sequence, the first array is [99,52,16,2,1,0], and the second array is [2500, 198,89,15,20,10].
In the related art, the praise numbers of 100 comments are stored in the cache by 100 key value pairs, the keys and values in the key value pairs occupy 1 storage location in the cache respectively, and each key value pair occupies 2 storage locations, so that 100 key value pairs occupy 200 storage locations. In the embodiment, it can be seen from the above example that, by using the compressed data set, the praise number of each comment in 100 comments can be expressed by using the 6 elements of the first array and the 6 elements of the second array, so that the praise number of 6 non-null-value comments is cached, and a storage space occupied by the praise number of 94 null-value comments is eliminated. The praise number of 100 comments occupies 12 storage positions in the cache, and saves (200-12) =188 storage positions, so that the storage space of the cache can be greatly saved, and the utilization rate of the cache is improved.
In another exemplary scenario, if 2 comments in 100 comments are non-null data, respectively: a comment with an ID of 123456_00_001 with a number of praise of 10; the comment with ID of 123456\ u 99 \001has a like number of 20, and the remaining 98 comments are null data each having a like number of 0. In this example, values on the kilobits and tens of thousands of IDs may be taken as an index of the number of praise for a comment, then the index for a comment with ID 123456_00_001 is 0 and the index for a comment with ID123456_99_001 is 99. If the indexes in the first array are sequentially arranged from small to large, since the indexes of the 2 pieces of non-null value data are sequentially 0,99 from small to large, and the values of the 2 pieces of non-null value data are 10,20 corresponding to the sequence, the first array may be [0,99], and the second array may be [10,20]. It can be seen from the above example that, by using the compressed data set, the praise number of each comment in 100 comments can be expressed by using 2 elements of the first array and 2 elements of the second array, so that the praise number of 2 non-null comments can be cached, and a storage space occupied by the praise number of 98 null comments is eliminated. The praise number of 100 comments occupies 4 storage positions in the cache, and saves (200-4) =196 storage positions, so that the storage space of the cache can be greatly saved, and the utilization rate of the cache is improved.
In some embodiments, the server may cache data using the compressed data set in different ways in conjunction with the degree of sparsity of the data currently to be cached. As an example, the generation process of the compressed data set may include the following steps one to four:
the method comprises the steps that firstly, a server determines the number of non-null data in at least one piece of data to be cached.
And step two, the server compares the number of the non-null data with a number threshold.
The quantity threshold may be set based on experimentation, experience, or business requirements. Through experimental analysis, when the quantity threshold is about half of the data quantity, the space occupied by the data in the cache is the minimum, so that half of the total quantity of the data to be cached can be used as the quantity threshold, and the storage space is saved to the maximum extent. As an example, if each set of compressed data is used to cache 100 pieces of data, the quantity threshold may be 48.
The server can determine whether to perform the following third step to cache the data or perform the following fourth step to cache the data according to the magnitude relation between the number of the non-null data and the number threshold. Taking 100 pieces of data cached in each compressed data set, the data being the number of praise of the comment, and the threshold of the number being 48 as an example, after the server obtains 100 pieces of comments, the number of praise in the 100 pieces of comments can be determined, if the number of praise is less than or equal to 48, the following third step is performed to cache the number of praise of the 100 pieces of comments, and if the number of praise is greater than 48, the following fourth step is performed to cache the number of praise of the 100 pieces of comments.
And step three, if the number of the non-null value data is less than or equal to the number threshold, the server stores the index of the non-null value data to the first array and stores the value of the non-null value data to the second array.
In some embodiments, the server may sort the indexes of the non-null data in a descending order, or sort the indexes of the non-null data in a descending order, and store the sorted indexes into the first array; in addition, the server may sort the values of the non-null value data according to the sorting result of the index, and store the sorted values to the second array. For example, if an index of a piece of non-null data is ranked at the ith bit among all indexes, the value of the non-null data may be ranked at the ith bit among all values.
In addition, for the null data remaining in the data to be cached, the server may discard the null data, thereby avoiding the cache space occupied by the null data. In addition, if each data to be cached is null data, the server may configure the first array as null and the second array as null.
By caching data using the compressed data set in step three, cache space can be saved from both the key and value dimensions at the same time. Specifically, the first array stores the indexes of the non-null value data, and the indexes of the null value data do not need to be stored, so that the storage space occupied by the keys of the null value data is saved, and the storage space requirement of the keys is saved. Similarly, the second array only needs to store the value of the non-null value data, and does not need to store 0 for the null value data, so that the storage space occupied by the value of the null value data is saved, and the storage space requirement of the value is saved. As an example, if M data in the N data to be cached are non-null data, the length of the first array may be M, and the first array stores M indexes, so that the storage space occupied by the keys of the (N-M) null data may be saved. The length of the second array may be M, and the second array stores M values, so that the storage space occupied by the values of (N-M) null data may be saved. Wherein N and M are positive integers, and M is less than or equal to N.
And step four, if the number of the non-null data is larger than the number threshold, the server configures the first array to be null, and stores the value of each data in the at least one data to the second array.
The difference from the usage of the compressed data set shown in the above step three is that if step four is performed to cache data, the second array stores a value of null data on the basis of a value of non-null data, for example, if the data is null data, 0 may be used as the value of the data, and a position corresponding to the data in the second array stores 0. In this usage, if N data are to be cached, the length of the second array may be N, where the second array stores N values, and N is a positive integer.
In some embodiments, each position of the second array may have a one-to-one correspondence with an index of the data, and the ith position in the second array may be used to store the data with index i. Taking the praise number of cached comments and the index as the values of the thousand bits and the ten thousand bits of the comment ID as an example, if the ID of the comment is 12345600001, the index of the comment is 00, and the praise number of the comment can be stored in the 0 th position of the second array; if the id of the comment is 12345601001, the index of the comment is 01, the number of praise for the comment can be stored at the 1 st position in the second array, and so on, if the id of the comment is 12345699001, the index of the comment is 99, and the number of praise for the comment can be stored at the 99 th position in the second array. If the number of votes for a comment is 0 and the index of the comment is i, 0 may be stored in the ith position in the second array.
For example, if each compressed data set caches the number of praise of 100 comments, the second array may be an array of which the size is fixed to 100, and the 0 th position to the 99 th position of the second array sequentially store the number of praise of the ID in kilobits and the number of praise of the comments in ten thousand in 00 to 99. As an example, if the IDs of 100 comments are (12345600001, 12345601001,12345602001.. 12345699001), respectively, 2 of the comments are non-null data, the comments are 12345600001, respectively, the number of prawns is 10, the number of prawns is 12345699001, the number of prawns is 20, the remaining 98 comments are null data, and the number of prawns is 0, the second array is [10,0, \823020 ], the 0 th position and the 99 th position in the array have values, and the other positions store 0.
By using the compressed data set to cache data in step four, cache space can be saved from the key dimension. Specifically, the data set is compressed to store the data values, and the indexes of the data are not required to be stored, so that the storage space occupied by the keys of the non-null data and the keys of the null data is saved, and the storage space requirement of the keys is saved. As an example, if N data are to be cached, when the data are cached by the compressed data set, N values may be stored by the second array, thereby saving the storage space occupied by N keys.
According to the number of the non-null data in the data to be cached at present, a proper using mode is selected from the two using modes to cache the data, so that the flexibility of data storage is improved, the data caching mode is matched with the sparse condition of the data, the consumed memory is reduced as much as possible, and the utilization rate of the storage space is improved.
In some embodiments, the usage provided in step three and the usage provided in step four may be distinguished by whether the first array is empty. Specifically, if the first array in the compressed data set in which the data is located is not empty, it may be determined that the data is stored in the compressed data set through the usage provided in step three above, and if the first array in the compressed data set in which the data is located is empty, it may be determined that the data is stored in the compressed data set through the usage provided in step four above. Taking the first array record as the packet _ index as an example, if the packet _ index array in the compressed data set is not null, it is determined that the compressed data set adopts the usage mode provided in the step three, and if the packet _ index array in the compressed data set is null, it is determined that the compressed data set adopts the usage mode provided in the step four.
In step S33, the server stores the compressed data set in the cache.
In some embodiments, the server may be connected to the cache via a network, and the server may store the compressed data set in the cache via the network. Optionally, the server may determine a corresponding key for the compressed data set, and store the compressed data set and the key of the compressed data set in association, so that when data is queried subsequently, the compressed data set may be found from the cache through the key of the compressed data set.
The key (key) of the compressed data set may be determined based on the identity of the data in the compressed data set. As an example, the key of the compressed data set may be obtained according to a value on a second digit of the identifier of the data stored in the compressed data set. Taking the second digit as an example of a digit other than the kilobits and the tens of thousands, the kilobits and the tens of thousands of the ID of the data may be replaced with an underline "_", and the replacement result may be used as a key of the compressed data set.
For the same compressed data set, values on the second digits of the identifier of each piece of data in the compressed data set may all be the same, and therefore, according to any piece of data in the compressed data set, the key of the same compressed data set may be determined. Illustratively, taking the example that a compressed data set stores 100 pieces of data, and the second digits are thousands of digits and ten thousand of digits, the IDs of the 100 pieces of data may be (12345600001, 12345601001,12345602001.. 12345699001), respectively, and as can be seen from this example, the values of the second digits in the 100 pieces of data are all the same and are 123456001, so that the key of the compressed data set can be obtained according to any one piece of data in the 100 pieces of data as 123456_001. For another example, if the IDs of the data stored by the compressed data set are (0,1000,2000.. 99000), respectively, then the key of the compressed data set may be 0_0; if the IDs of the data stored by the compressed data set are (1, 1001,2001.. 99001), respectively, then the key of the compressed data set may be 0 \ u 1; if the ID of the data stored by the compressed data set is (999, 1999,2999.. 99999), respectively, the key of the compressed data set may be 0 \ u 999.
In step S34, the server receives a query request for querying a value of the target data.
The target data refers to data to be queried, the query request may include an identifier of the target data, and the query request may be generated by the terminal and sent to the server. In an exemplary scenario, if a user wants to query the number of praise of a certain comment, an operation may be triggered on the terminal, and the terminal may generate a query request according to the identifier of the comment and send the query request to the server.
In step S35, the server queries the compressed data set corresponding to the target data in the cache.
The server may read a value on a second digit in the identifier of the target data, and query the compressed data set from the cache according to the value on the second digit. Specifically, the server may obtain a key of the compressed data set according to the value on the second digit, and query the compressed data set corresponding to the key in the cache. Taking the second digit as another digit except the kilobits and the megabits as an example, if the ID of the target data is 12345600001, the kilobits and the megabits of the ID are 00, and the kilobits and the megabits of the ID are replaced with "_", resulting in 123456_001, the key for storing the compressed data set of the target data can be determined to be 123456_001, and the compressed data set can be found from 123456_001.
In step S36, the server determines whether the first array of the compressed data set is empty, and if the first array of the compressed data set is not empty, performs step S37 or step S38, and if the first array of the compressed data set is empty, performs step S39.
Because the compressed data set comprises two using modes, the server can determine which using mode of the two using modes is used by the compressed data set corresponding to the target data by judging whether the first array is empty or not, so as to execute the query mode corresponding to the using mode. Specifically, if the first array is not empty, it may be determined that the compressed data set corresponding to the target data adopts the usage manner provided in step three in step S32, and therefore, it may be determined whether the first array includes the index of the target data, and if the first array includes the index of the target data, it may be determined that the target data is non-empty value data, step S37 is performed, if the first array does not include the index of the target data, it may be determined that the target data is empty value data, step S38 is performed, and if the first array is empty, it is determined that the compressed data set corresponding to the data adopts the usage manner provided in step four in step S32, step S39 is performed.
In step S37, if the first array includes the index of the target data, the server queries the value of the target data from the second array.
In some embodiments, step S37 may include the following steps one to three.
Step one, a server obtains an index of target data.
The server may read a value on a first digit in the identifier of the target data, and use the value as an index of the target data. Taking the first digit as an example other than kilobits and ten thousand digits, if the ID of the target data is 12345600001 and the kilobits and ten thousand bits of the ID are 00, the server may determine that the index of the target data is 00.
And step two, the server determines the position of the index of the target data in the first array.
In some embodiments, if the indexes of the non-null data in the first array are sequentially arranged from small to large, or sequentially arranged from large to small, the server may query the index of the target data from the first array based on a binary search method to obtain the position of the index in the first array.
For the specific flow of binary search, the server may determine the middle element of the first array, take the middle element as an interval endpoint, take the first element to the middle element of the first array as a left half interval, and take the last element from the middle element to the first array as a right half interval; comparing the index of the target data with the middle element, if the index of the target data is smaller than the middle element, determining that the index of the target data falls in a left half interval, continuously determining the middle element of the left half interval, comparing the index of the target data with the middle element of the left half interval, and so on until the index of the target data is found; similarly, if the index of the target data is larger than the middle element, determining that the index of the target data falls in the right half interval, continuously determining the middle element of the right half interval, comparing the index of the target data with the middle element of the right half interval, and so on until the index of the target data is found.
And step three, the server inquires the value stored in the position in the second array according to the position of the index in the first array, and the value is used as the value of the target data.
For example, if the index of the target data is located at the ith position in the first array, the server may query the value stored at the ith position in the second array as the value of the target data.
Illustratively, if the compressed data set stores the like numbers of 100 comments, the IDs of the 100 comments are (12345600001, 12345601001,12345602001.. 12345699001), respectively, wherein the comment ID of 123456_00_001, the comment ID of 123456_01_001, the comment ID of 123456_02 _02u 001, the comment ID of 123456_16_001, the comment ID of 123456_52_001, the like number of the comment ID of 123456 u 99 _001is not null, and the like numbers are 10,20,15,89,198 and 2500, respectively, the first array is [0,1,2,16,52,99], and the second array is [10,20,15,89,198,2500]. If the number of praise of the comment with the ID of 123456 v 52 v 001 is to be queried, it may be determined that the index of the comment is 52, compare 52 with the middle element 16 of the first array, where 52 is greater than 16, it is known that the index of the comment falls in the right half section of the first array, and the middle element in the right half section of the first array is 52, continue to compare the index 52 of the comment with the middle element 52 in the right half section, and the two are equal, then query the index of the comment, because the index of the comment is located at the 4 th position of the first array (the starting position of the first array is recorded as the 0 th position), query the 4 th position in the second array (the starting position of the second array is recorded as the 0 th position), to obtain 198, determine that the number of praise of the comment is 198, and return the query result to the terminal as 198.
It can be seen from this example that, based on the binary search method, the index of the data can be queried from the first array by performing two comparison steps, thereby saving the computing resources of the server processor and improving the efficiency of querying the index. In other embodiments, the index of the data may also be queried from the first array based on a sequential lookup method, and specifically, the index of the data may be sequentially compared with each element of the first array starting from the first element of the first array, whether the current element of the first array is the index of the data is determined, if the current element of the first array is the index of the data, the querying is stopped, and if the current element of the first array is not the index of the data, the querying is continued for the next element of the first array.
In step S38, if the first array does not include the index of the target data, the server determines that the target data is null data.
For example, if the comment with ID of 123456_30_001 is to be queried for the number of praise, the index of the comment is a value on the thousand and ten thousand of the ID, that is, 30, if the first array is [0,1,2,16,52,99], since the first array does not include 30, it can be determined that the number of praise of the comment is null value data, that is, the comment is a comment that has not been praise, the server takes 0 as the number of praise of the comment, and returns the number of praise to the terminal as 0.
In step S39, if the first array is empty, the server queries the value stored in the position corresponding to the index in the second array as the value of the target data.
Illustratively, if 100 reviews are stored in a compressed data set, with IDs of (12345600001, 12345601001,12345602001.. 12345699001), respectively, where the numbers of praise for reviews with IDs of 123456_00_001, 123456_99_001 are not null, and the numbers of praise are 10 and 20, respectively, then the first array is null, and the second array is [10,0, \823020 ]. If the comment with the ID of 123456 v 00 v 001 is queried for the number of praise, the index of the comment can be determined to be 0, the value of the comment is stored in the 0 th position in the second array, the server can query the 0 th position in the second array to obtain 10, and then the comment with the ID of 123456 v 52 v 001 is determined to have the number of praise of 10.
In step S40, if the cache does not include the compressed data set, the server returns to the source database.
The above steps S36 to S39 describe the process of querying data from the cache when the cache includes the compressed data set, and if the compressed data set is not in the cache, a query request for the target data may be generated and sent to the database, so as to read the target data from the database. The server may determine a key of the compressed data set in which the target data is located, and if the compressed data set corresponding to the key is not queried from the cache, determine that the cache does not include the compressed data set in which the target data is located.
In some embodiments, when the server reads the target data back to the source, the server may normalize the identifier of the target data, specifically refer to the following steps one to two:
step one, if the cache does not comprise the compressed data set, the identification of the target data is normalized to a preset identification corresponding to the compressed data set.
The compressed data set in which the target data is located may be a compressed data set or a compressed data set, which is not limited in this embodiment. The preset identifier may be an identifier of any piece of data in the compressed data set, for example, an identifier of a first piece of data in the compressed data set. For example, if the IDs of the data in the compressed data set are (12345600001, 12345601001,12345602001.. 12345699001), respectively, then the preset identity may be the ID of the first piece of data and the preset identity is 12345600001.
By normalizing the identification of the target data, the following effects can be achieved: considering that the server may receive query requests for multiple target data in the same compressed data set at the same time, if the query requests are read back to the source for each target data, a problem of repeated reading may be caused. For example, if a query request for a comment id =12345632001 and a query request for a comment id =12345699001 are received simultaneously, the comment numbers of the two comments are both located in the compressed data set 123456 uu001, and if the index of the comment is not normalized, the server reads the compressed data set 123456 uu001 from the database back source for the comment id =12345632001 and reads the compressed data set 123456 uu001 from the database back source for the comment id =12345699001, which results in reading the compressed data set 123456 uu001 from the database back source twice, which is high in performance cost and high in consumed processing resources. And the indexes of the two comments are normalized to be the preset identifier 12345600001, and the server reads the compressed data set 123456_001 from the database back to the source once through the preset identifier 12345600001, so that the times of returning to the source database are reduced, and the performance overhead brought by the source database is saved.
And step two, the server reads the target data from the database according to the preset identification.
In some embodiments, step two may include steps (2.1) to (2.2) described below.
And (2.1) the server determines a target table where the target data is located in the database according to the identification of the target data.
In some embodiments, an identified tail number of the data may be reserved to indicate the table in the database in which the data resides. When querying the target data, the server may find out in which table in the database the target data is stored according to the tail number of the identification of the target data. Specifically, the server may perform modulo operation on the mantissa of the identifier of the target data to obtain a modulus value; the server may determine a table in the database identified as the modulus value as the target table. The tail number of the tag may be a value on each bit after the first bit of the tag. For example, if the first digit is kilobits and tens of thousands, the tail number of the identification may be ones, tens, and hundreds. During modular extraction, the number of tables in the database can be determined, and if the database comprises P tables, the tail number and P can be subjected to modular extraction to obtain a modular value. Wherein P is a positive integer.
Illustratively, taking the ID of the target data as 12345600001, and the end numbers as one-bit, one-hundred, and one-thousand-bit as an example, the end number of the ID of the target data is 001. If the database is divided into 1000 tables, and the modulus of 001 and 1000 can be obtained to obtain the modulus value of 1, it can be determined that the target table in which the target data is located is table 1.
And (2.2) the server reads the target data and other data except the target data in the compressed data set from the target table.
In some embodiments, the server may store each data in the same compressed data set in the same table in the database. Specifically, in the process of storing any data in the database, the server may read the identifier of the data, perform modulo operation on the tail number of the identifier of the data to obtain a modular value, and store the data in a table corresponding to the modular value in the database. The tail numbers of the identifiers of each data stored in the same compressed data set can be the same, so that the module values of the tail numbers of the identifiers of each data in the same compressed data set are the same, and the module values determine the table where the data are located, so that each data in the same compressed data set can be stored in the same table when the data are stored.
By storing each data in the compressed data set in the same table, the target table may include not only the target data but also other data in the compressed data set other than the target data, and thus, the server may read each data in the compressed data set from the target table. The server can generate a query request, the query request is used for querying the value of each data in the compressed data set, the server can send the query request to the database, the database can respond to the query request and return the value of each data in the compressed data set to the server, and the server can receive the value of each data in the compressed data set returned by the database, so that the target data and other data except the target data are read.
The Query request may include an identifier of each data in the compressed data set, and the Query request may be an in-syntax Structured Query Language (SQL) request. By the aid of the query request, all data in the compressed data set can be read from the database in batch through one query request, performance cost for reading one piece of data from the database is similar, and accordingly cost for returning to the source database is saved. For example, if the identities of the 100 data in the compressed data set are (12345600001, 12345601001,12345602001.. 12345699001), respectively, then the query request may be an sql request, with the program code being: select from XXX _1where in (12345600001, 12345601001,12345602001.. 12345699001), by this query request, 100 pieces of data can be read from the database in a batch.
If different data in the compressed data set are scattered and stored in different tables in the database, the different tables in the database are read when the source database returns. For example, if 100 pieces of data are stored in each compressed data set, if 100 pieces of data are scattered in 100 tables stored in the database, when reading the data of the compressed data set back to the source database, 100 pieces of data in the compressed data set can be taken only by reading the data from 100 tables, that is, sending 100 sql requests to the database. And 100 pieces of data of the compressed data set are stored in the same table in the database, and each piece of data in the compressed data set can be read from 1 table, so that the expense of accessing the database is saved.
By storing each data in the compressed data set in the same table, all data in the compressed data set can be read from the database in batch through one query request, and the performance cost for reading one piece of data from the database is similar, so that the cost for returning to the source database is saved. By the method, when 1 piece of data in the compressed data set is not in cache in query, the values of the rest 99 pieces of data in the compressed data set can be stored in the cache together through the primary database of the source, and then when a query request for any piece of data in the compressed data set is received again subsequently, the values of the 100 pieces of data can be queried from the cache, so that the cache hit rate is increased, and the times of returning to the source database are reduced.
In some embodiments, after the server reads each data in the compressed data set from the target table, a compressed data set may be generated according to the each data, and the compressed data set may be stored in the cache. Therefore, if a query request for any data in the compressed data set is subsequently received, the data can be queried from the cache without returning to the source database, so that each data in the compressed data set is stored in the cache again in the process of returning to the source database once, the number of times of reading back to the source can be reduced, and the efficiency of returning to the source database is improved. For example, if one compressed data set caches 100 pieces of data, when a terminal requests to query any target data in the 100 pieces of data, if the compressed data set is found not to be cached, it indicates that none of the 100 pieces of data are cached, if only the target data is read when returning to the source database, it may cause that 100 pieces of data can be read back to the cache again until the query requests of 100 pieces of data are received respectively and the data are returned to the source database 100 times respectively, which may cause a large performance overhead of the server. When the source database is returned, each piece of data in the compressed data set is read from the database, 100 pieces of data in the compressed data set can be obtained from the database through one query, and then 100 pieces of data in the compressed data set are all stored in the cache. For example, if the IDs of 100 pieces of data are (12345600001, 12345601001,12345602001.. 12345699001), respectively, when any one of the 100 pieces of data is queried, the identification of the data may be normalized to the ID12345600001 of the 1 st piece of data, the end number of the identification of the 1 st piece of data is 001, and the moduli of 001 and 1000 are obtained to obtain the modulus value of 1, and then the 100 pieces of data are read from table 1.
The method provided by this embodiment provides a new data structure, and stores and queries data in the form of a compressed data set, where the compressed data set uses two arrays to store an index of non-null data and a value of the non-null data, and when a value of target data is to be queried, if the index of the target data is found from a first array, it can be determined that the target data is non-null data, and then the value of the target data is found from a second array, and if the first array is found not to include the index of the target data, it can be determined that the target data is null data. On one hand, whether target data to be queried is null data or not can be determined after the cache is queried, and a value of non-null data is found, so that a process of returning a source database is omitted, and the problem of cache penetration is solved.
FIG. 4 is a block diagram illustrating a data query device in accordance with an exemplary embodiment. Referring to fig. 4, the apparatus includes a receiving unit 401, an inquiring unit 402, and a determining unit 403.
A receiving unit 401 configured to perform receiving a query request for querying a value of target data;
a querying unit 402, configured to perform query on a compressed data set corresponding to the target data in the cache, where the compressed data set includes a first array and a second array, the first array is used to store an index of non-null-value data, and the second array is used to store a value of the non-null-value data;
the querying unit 402 is further configured to perform querying the value of the target data from the second array if the first array includes the index of the target data;
a determining unit 403 configured to determine that the target data is null data if the first array does not include the index of the target data.
Optionally, the querying unit 402 is configured to perform: determining the position of the index of the target data in the first array; and inquiring the value stored in the position in the second array as the value of the target data according to the position of the index in the first array.
Optionally, the indexes of the non-null data in the first array are sequentially arranged from small to large; or the indexes of the non-null value data in the first array are sequentially arranged from big to small;
the determining unit 403 is configured to perform: and inquiring the index of the target data from the first array based on a binary search method to obtain the position of the index in the first array.
Optionally, the apparatus further comprises:
and the reading unit is configured to execute reading of a value on a first digit in the identification of the target data as an index of the target data.
Optionally, the querying unit 402 includes:
a reading subunit configured to perform reading of a value on a second digit in the identification of the target data;
and the query subunit is configured to perform query on the compressed data set from the cache according to the value on the second digit.
Optionally, the apparatus further comprises:
the normalization unit is configured to normalize the identifier of the target data to a preset identifier corresponding to the compressed data set if the cache does not comprise the compressed data set;
and the reading unit is configured to read the target data from the database according to the preset identification.
Optionally, the reading unit includes:
the determining subunit is configured to determine a target table in the database where the target data is located according to the identifier of the target data;
and the reading subunit is configured to read the target data and other data except the target data in the compressed data set from the target table.
Optionally, the tail number of the identifier of each data stored in the compressed data set is the same, and the determining subunit is configured to perform: taking a module of the tail number of the identifier of the target data to obtain a module value; and determining a table identified as the module value in the database as the target table.
Optionally, the querying unit 402 is further configured to perform: and if the first array is empty, inquiring the value stored in the position corresponding to the index in the second array as the value of the target data.
Optionally, the apparatus further comprises:
an acquisition unit configured to perform acquisition of at least one data to be cached;
a generating unit configured to generate the compressed data set according to non-null data in the at least one data;
a storage unit configured to perform storing the compressed data set into a cache.
Optionally, the generating unit is configured to perform: if the number of the non-null data is less than or equal to a number threshold, storing the index of the non-null data to the first array, and storing the value of the non-null data to the second array; if the number of the non-null data is greater than the number threshold, the first array is configured to be null, and the value of each data in the at least one data is stored to the second array.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 is a block diagram illustrating a server 500 according to an exemplary embodiment, where the server 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 501 and one or more memories 502, where the memory 502 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 501 to implement the data query method provided by the above method embodiments. Of course, the server may also have a wired or wireless network interface, an input/output interface, and other components to facilitate input and output, and the server may also include other components for implementing the functions of the device, which are not described herein again.
In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of a server to perform the data query method described above. Alternatively, the storage medium may be a non-transitory computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided, which includes one or more instructions that, when executed by a processor of a server, enable the server to perform the above-described data query method.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (20)

1. A method for querying data, comprising:
receiving a query request, wherein the query request is used for querying the value of target data;
querying a compressed data set corresponding to the target data in a cache, wherein the compressed data set comprises a first array and a second array, the first array is used for storing indexes of non-null value data but not storing indexes of null value data, and the second array is used for storing values of the non-null value data; wherein the value of the null data is a null value;
querying a value of the target data from the second array if the first array includes an index of the target data;
if the first array does not comprise the index of the target data, determining that the target data is null data;
the method further comprises the following steps:
if the cache does not comprise a compressed data set corresponding to the target data, normalizing the identifier of the target data into a preset identifier corresponding to the compressed data set, wherein the preset identifier is the identifier of any data in the compressed data set;
determining a target table where the target data are located in a database according to the preset identification, wherein each data in the same compressed data set is stored in the same table in the database;
and reading the target data and other data except the target data in the compressed data set from the target table.
2. The data query method of claim 1, wherein the querying the value of the target data from the second array comprises:
determining the position of the index of the target data in the first array;
and querying the value stored in the position in the second array according to the position of the index in the first array as the value of the target data.
3. The data query method according to claim 2, wherein the indexes of the non-null data in the first array are sequentially arranged from small to large; or the indexes of the non-null value data in the first array are sequentially arranged from big to small;
the determining the position of the index of the target data in the first array comprises:
and inquiring the index of the target data from the first array based on a binary search method to obtain the position of the index in the first array.
4. The data query method of claim 2, wherein before determining the location of the index of the target data in the first array, the method further comprises:
and reading a value on a first digit in the identification of the target data as an index of the target data.
5. The data query method of claim 1, wherein querying the compressed data set corresponding to the target data in the cache comprises:
reading a value on a second digit in the identification of the target data;
and inquiring the compressed data set from the cache according to the value on the second digit.
6. The data query method according to claim 1, wherein the tail numbers of the identifiers of each piece of data stored in the compressed data set are the same, and the determining the target table in which the target data is located in the database according to the preset identifier includes:
taking a module of the tail number of the preset identification to obtain a module value;
and determining a table identified as the modulus value in the database as the target table.
7. The method of claim 1, wherein after querying the compressed data set corresponding to the target data in the cache, the method further comprises:
if the first array is empty, inquiring the value stored in the position corresponding to the index in the second array as the value of the target data;
the first array is empty, which means that the first array does not store content, and when the first array is empty, each position of the second array corresponds to an index of each data one to one, wherein the ith position in the second array is used for storing a value of data with an index i, and i is a positive integer.
8. The data query method of claim 1, wherein before receiving the query request, the method further comprises:
acquiring at least one datum to be cached;
generating the compressed data set according to non-null data in the at least one data;
storing the compressed data set in the cache.
9. The method according to claim 8, wherein the generating the compressed data set according to the non-null data in the at least one data comprises:
if the number of the non-null data is less than or equal to a number threshold, storing an index of the non-null data to the first array, and storing a value of the non-null data to the second array;
and if the number of the non-null data is larger than the number threshold, configuring the first array to be null, and storing the value of each data in the at least one data to the second array.
10. A data query apparatus, comprising:
a receiving unit configured to perform receiving a query request for querying a value of target data;
the query unit is configured to execute query on a compressed data set corresponding to the target data in a cache, wherein the compressed data set comprises a first array and a second array, the first array is used for storing indexes of non-null-value data but not storing indexes of null-value data, and the second array is used for storing values of the non-null-value data; wherein the value of the null data is a null value;
the querying unit is further configured to perform querying the value of the target data from the second array if the first array includes the index of the target data;
a determining unit configured to perform determining that the target data is null data if the first array does not include an index of the target data;
a normalization unit configured to normalize the identifier of the target data to a preset identifier corresponding to the compressed data set if the cache does not include the compressed data set, where the preset identifier is an identifier of any piece of data in the compressed data set;
the reading unit is configured to read the target data from a database according to the preset identification;
the reading unit includes:
the determining subunit is configured to determine a target table in which the target data is located in the database according to the preset identifier, wherein each data in the same compressed data set is stored in the same table in the database;
a reading subunit configured to perform reading, from the target table, the target data and data other than the target data in the compressed data set.
11. The data query device of claim 10, wherein the query unit is configured to perform: determining a position of an index of the target data in the first array; and inquiring the value stored in the position in the second array according to the position of the index in the first array as the value of the target data.
12. The data query device of claim 11, wherein the indexes of the non-null data in the first array are sequentially arranged from small to large; or the indexes of the non-null value data in the first array are sequentially arranged from big to small;
the determination unit configured to perform: and inquiring the index of the target data from the first array based on a binary search method to obtain the position of the index in the first array.
13. The data query apparatus of claim 11, wherein the apparatus further comprises:
and the reading unit is configured to execute reading of a value on a first digit in the identification of the target data as an index of the target data.
14. The data query apparatus of claim 10, wherein the query unit comprises:
a reading subunit configured to perform reading of a value on a second digit in the identification of the target data;
a query subunit configured to perform a query of the compressed data set from the cache according to the value on the second digit.
15. The apparatus according to claim 14, wherein the tail number of the identifier of each data stored in the compressed data set is the same, and the determining subunit is configured to perform: taking a module of the tail number of the preset identification to obtain a module value; and determining a table identified as the modulus value in the database as the target table.
16. The data query apparatus of claim 10, wherein the query unit is further configured to perform: if the first array is empty, inquiring a value stored in a position corresponding to the index in the second array as the value of the target data;
the first array is empty, which means that the first array does not store content, and when the first array is empty, each position of the second array corresponds to an index of each data one to one, wherein the ith position in the second array is used for storing a value of data with an index i, and i is a positive integer.
17. The data query apparatus of claim 10, wherein the apparatus further comprises:
an acquisition unit configured to perform acquisition of at least one data to be cached;
a generating unit configured to perform generating the compressed data set according to non-null data in the at least one data;
a storage unit configured to perform storing the compressed data set into the cache.
18. The data query device of claim 17, wherein the generating unit is configured to perform: if the number of the non-null data is less than or equal to a number threshold, storing an index of the non-null data to the first array, and storing a value of the non-null data to the second array; if the number of the non-null data is greater than the number threshold, the first array is configured to be null, and the value of each data in the at least one data is stored in the second array.
19. A server, comprising:
one or more processors;
one or more memories for storing the one or more processor-executable instructions;
wherein the one or more processors are configured to execute the instructions to implement the data query method of any one of claims 1 to 9.
20. A storage medium, wherein instructions in the storage medium, when executed by a processor of a server, enable the server to perform the data query method of any one of claims 1 to 9.
CN201911052778.6A 2019-10-31 2019-10-31 Data query method, device, server and storage medium Active CN110765138B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911052778.6A CN110765138B (en) 2019-10-31 2019-10-31 Data query method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911052778.6A CN110765138B (en) 2019-10-31 2019-10-31 Data query method, device, server and storage medium

Publications (2)

Publication Number Publication Date
CN110765138A CN110765138A (en) 2020-02-07
CN110765138B true CN110765138B (en) 2023-01-20

Family

ID=69335132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911052778.6A Active CN110765138B (en) 2019-10-31 2019-10-31 Data query method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN110765138B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114546297B (en) * 2020-11-25 2024-02-27 顺丰科技有限公司 Order data processing method, device, system, computer equipment and storage medium
CN112883307B (en) * 2021-02-03 2023-10-20 深圳市大成天下信息技术有限公司 Cache updating method and device and social network system
CN112988795A (en) * 2021-02-26 2021-06-18 开放智能机器(上海)有限公司 Number retrieval method, system, equipment and storage medium
CN113792050A (en) * 2021-09-15 2021-12-14 福建天晴数码有限公司 Method and system for preventing cache from penetrating through user-defined hash
CN117056363B (en) * 2023-07-19 2024-03-19 广州三七极耀网络科技有限公司 Data caching method, system, equipment and storage medium
CN117573703B (en) * 2024-01-16 2024-04-09 科来网络技术股份有限公司 Universal retrieval method, system, equipment and storage medium for time sequence data

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0877983A1 (en) * 1996-02-02 1998-11-18 Sony Electronics Inc. Application programming interface for data transfer and bus management over a bus structure
US6073129A (en) * 1997-12-29 2000-06-06 Bull Hn Information Systems Inc. Method and apparatus for improving the performance of a database management system through a central cache mechanism
CN1980392A (en) * 2005-11-29 2007-06-13 同济大学 Decoding method using video code flow to judge picture poundary and reading in advance picture data, and apparatus therefor
US7254580B1 (en) * 2003-07-31 2007-08-07 Google Inc. System and method for selectively searching partitions of a database
CA2560159A1 (en) * 2006-06-06 2007-12-06 University Of Regina Method and apparatus for concept-based visual presentation of search results
US7386674B1 (en) * 2005-04-25 2008-06-10 Netapp, Inc. Method and apparatus to provide a unified readahead scheme for multiple sources
CN102713909A (en) * 2010-01-24 2012-10-03 微软公司 Dynamic community-based cache for mobile search
EP2572289A1 (en) * 2010-05-18 2013-03-27 Google, Inc. Data storage and processing service
CN105930528A (en) * 2016-06-03 2016-09-07 腾讯科技(深圳)有限公司 Webpage cache method and server
CN109241175A (en) * 2018-06-28 2019-01-18 东软集团股份有限公司 Method of data synchronization, device, storage medium and electronic equipment

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7016914B2 (en) * 2002-06-05 2006-03-21 Microsoft Corporation Performant and scalable merge strategy for text indexing
US7698285B2 (en) * 2006-11-09 2010-04-13 International Business Machines Corporation Compression of multidimensional datasets
US10558705B2 (en) * 2010-10-20 2020-02-11 Microsoft Technology Licensing, Llc Low RAM space, high-throughput persistent key-value store using secondary memory
WO2012060995A2 (en) * 2010-11-01 2012-05-10 Michael Luna Distributed caching in a wireless network of content delivered for a mobile application over a long-held request
US9372877B2 (en) * 2014-05-05 2016-06-21 Microsoft Technology Licensing, Llc Sparse datatable data structure
US10013574B2 (en) * 2014-06-11 2018-07-03 Bijit Hore Method and apparatus for secure storage and retrieval of encrypted files in public cloud-computing platforms
CN105718455B (en) * 2014-12-01 2019-06-14 阿里巴巴集团控股有限公司 A kind of data query method and device
CN104778212B (en) * 2014-12-19 2018-08-07 北京搜狗科技发展有限公司 Map datum generation method and device, map datum read method and device
US9986060B2 (en) * 2015-03-30 2018-05-29 General Electric Company Persistent caching of map imagery and data
CN106445944A (en) * 2015-08-06 2017-02-22 阿里巴巴集团控股有限公司 Data query request processing method and apparatus, and electronic device
US11010300B2 (en) * 2017-05-04 2021-05-18 Hewlett Packard Enterprise Development Lp Optimized record lookups
US20180349499A1 (en) * 2017-06-01 2018-12-06 Facebook Inc. Real-time Counters for Search Results on Online Social Networks
US20190004998A1 (en) * 2017-06-30 2019-01-03 Seagate Technology Llc Sparse matrix representation
CN107491487B (en) * 2017-07-17 2020-12-04 中国科学院信息工程研究所 Full-text database architecture and bitmap index creation and data query method, server and medium
US10579633B2 (en) * 2017-08-31 2020-03-03 Micron Technology, Inc. Reducing probabilistic filter query latency
CN109710639A (en) * 2018-11-26 2019-05-03 厦门市美亚柏科信息股份有限公司 A kind of search method based on pair buffers, device and storage medium
CN109710644A (en) * 2018-12-26 2019-05-03 苏州思必驰信息科技有限公司 The method and apparatus for preventing caching from penetrating

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0877983A1 (en) * 1996-02-02 1998-11-18 Sony Electronics Inc. Application programming interface for data transfer and bus management over a bus structure
US6073129A (en) * 1997-12-29 2000-06-06 Bull Hn Information Systems Inc. Method and apparatus for improving the performance of a database management system through a central cache mechanism
US7254580B1 (en) * 2003-07-31 2007-08-07 Google Inc. System and method for selectively searching partitions of a database
US7386674B1 (en) * 2005-04-25 2008-06-10 Netapp, Inc. Method and apparatus to provide a unified readahead scheme for multiple sources
CN1980392A (en) * 2005-11-29 2007-06-13 同济大学 Decoding method using video code flow to judge picture poundary and reading in advance picture data, and apparatus therefor
CA2560159A1 (en) * 2006-06-06 2007-12-06 University Of Regina Method and apparatus for concept-based visual presentation of search results
CN102713909A (en) * 2010-01-24 2012-10-03 微软公司 Dynamic community-based cache for mobile search
EP2572289A1 (en) * 2010-05-18 2013-03-27 Google, Inc. Data storage and processing service
CN105930528A (en) * 2016-06-03 2016-09-07 腾讯科技(深圳)有限公司 Webpage cache method and server
CN109241175A (en) * 2018-06-28 2019-01-18 东软集团股份有限公司 Method of data synchronization, device, storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种列存储数据仓库中的数据复用策略;王梅等;《计算机学报》;20130815(第08期);第80-89页 *
基于多级信息网格的海量遥感数据存储管理研究;李爽等;《测绘学报》;20161215;第114-122页 *

Also Published As

Publication number Publication date
CN110765138A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN110765138B (en) Data query method, device, server and storage medium
CN110109953B (en) Data query method, device and equipment
US11347787B2 (en) Image retrieval method and apparatus, system, server, and storage medium
CN108491450B (en) Data caching method, device, server and storage medium
CN107092439B (en) Data storage method and equipment
CN111611225A (en) Data storage management method, query method, device, electronic equipment and medium
CN108647266A (en) A kind of isomeric data is quickly distributed storage, exchange method
CN115023697A (en) Data query method and device and server
CN111817722A (en) Data compression method and device and computer equipment
CN111949681A (en) Data aggregation processing device and method and storage medium
CN105912696A (en) DNS (Domain Name System) index creating method and query method based on logarithm merging
CN111400301B (en) Data query method, device and equipment
CN110222046B (en) List data processing method, device, server and storage medium
CN111414527A (en) Similar item query method and device and storage medium
CN113157777B (en) Distributed real-time data query method, cluster, system and storage medium
CN115934583A (en) Hierarchical caching method, device and system
CN108804502A (en) Big data inquiry system, method, computer equipment and storage medium
US8572231B2 (en) Variable-length nonce generation
KR102375511B1 (en) Document storage management server for performing storage processing of document files received from a client terminal in conjunction with a plurality of document storage and operating method thereof
CN114116827A (en) Query system and method for user portrait data
CN112231398A (en) Data storage method, device, equipment and storage medium
CN117539915B (en) Data processing method and related device
CN113297211B (en) Crowd portrait storage and orientation system and method under high concurrency of big data
CN116166671B (en) Memory database table pre-association processing method, system and medium
CN115658728B (en) Query method, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant