CN113239303A - Data storage method and device - Google Patents

Data storage method and device Download PDF

Info

Publication number
CN113239303A
CN113239303A CN202110453160.1A CN202110453160A CN113239303A CN 113239303 A CN113239303 A CN 113239303A CN 202110453160 A CN202110453160 A CN 202110453160A CN 113239303 A CN113239303 A CN 113239303A
Authority
CN
China
Prior art keywords
data
stored
frequency
user
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110453160.1A
Other languages
Chinese (zh)
Inventor
牛旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Tuoxian Technology Co Ltd
Original Assignee
Beijing Jingdong Tuoxian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Tuoxian Technology Co Ltd filed Critical Beijing Jingdong Tuoxian Technology Co Ltd
Priority to CN202110453160.1A priority Critical patent/CN113239303A/en
Publication of CN113239303A publication Critical patent/CN113239303A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Abstract

The invention discloses a data storage method and device, and relates to the technical field of computers. One embodiment of the method comprises: receiving data to be stored; determining whether effective cached data corresponding to the user identification is stored in a high-frequency cache or not according to the user identification indicated by the data to be stored; if so, storing the data to be stored in the high-frequency cache by using Bitmaps; and if not, storing the data to be stored in the database by utilizing the int field. The embodiment reduces the data storage amount and improves the data access efficiency.

Description

Data storage method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for storing data.
Background
With the development of computer technology, more and more users shop through the e-commerce platform, and therefore, the data volume (activity check-in data, shopping data, click data, and the like of the users) of the e-commerce platform is increasing day by day.
For the storage of data volume of e-commerce platforms, the prior art generally adopts a way of record-by-record storage. For example, for the activity check-in data of the users, one piece of data is generated every time each user checks in, and the database stores the check-in data generated by a large number of users one by one.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
each user has one record every day, and each user has 10 records every 10 days, so that a large number of users can generate check-in data with huge data volume in the check-in process, and a large amount of storage space is needed for storing the check-in data one by one.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for storing data, which are capable of storing data to be cached in a high-frequency cache by using Bitmaps, so as to reduce the amount of data storage in the high-frequency cache, and the cached data stored in the high-frequency cache can be easily accessed, thereby improving the access efficiency of the data. In addition, the data to be stored is stored in the database by utilizing the int field, so that the data storage capacity in the database can be reduced, and the storage space occupied by the stored data is reduced.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data storage method, including:
receiving data to be stored;
determining whether effective cached data corresponding to the user identification is stored in a high-frequency cache or not according to the user identification indicated by the data to be stored;
if so, storing the data to be stored in the high-frequency cache by using Bitmaps;
and if not, storing the data to be stored in the database by utilizing the int field.
Optionally, before the determining whether valid cached data corresponding to the user identifier is stored in the high-frequency cache, the method further includes:
and judging whether the user identification corresponds to a high-frequency user, if so, determining whether the cached data exists in the high-frequency cache according to the user identification, and determining that the cached data is valid.
Optionally, when it is determined that the user identifier corresponds to a high-frequency user and the cached data corresponding to the user identifier does not exist in the high-frequency cache,
and determining stored data corresponding to the user identification in a database, and storing the stored data in the high-frequency cache by using Bitmaps.
Optionally, the determining whether the user identifier corresponds to a high-frequency user includes:
determining a frequency corresponding to the user identification according to any one or more of the following factors; the elements include:
the stored data corresponding to the user identification in a preset time length, the total time length corresponding to the stored data, a first time length from the storage time of the first stored data in the stored data to the current time, and a second time length from the preset time to the current time, wherein the preset time is before the first stored storage time;
and when the frequency is greater than a preset frequency threshold value, determining that the user identification corresponds to a high-frequency user.
Optionally, when it is determined that the user identifier does not correspond to the high-frequency user, the method further includes:
and determining whether the cached data exists in the high-frequency cache according to the user identification, and if so, determining whether the cached data is valid according to a preset validity period.
Optionally, after the storing the data to be stored in the high-frequency cache by using Bitmaps, the method further includes:
and updating the cache duration of the stored data corresponding to the user identification in the high-frequency cache.
Optionally, when the cached data is invalid, the method further includes:
and updating the data base by using the int field according to the stored data stored by using Bitmaps in the high-frequency cache, and deleting the stored data in the high-frequency cache after updating.
Optionally, storing the data to be stored in the database by using an int field includes:
and storing each data to be stored by utilizing each bit of the int field, and setting a storage identifier of the int field according to the user identifier.
Optionally, the int field is stored in a first data table; when the number of int fields stored in the data table is greater than a preset number threshold, further comprising:
grouping the plurality of int fields in the first data table, and storing the grouped int fields by using a plurality of second data tables respectively.
Optionally, a hash algorithm is used to calculate the digest value of the storage identifier, the digest value is modulo with the number of the second data table, and the int fields are grouped according to a modulo result.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for data storage, including: the device comprises a receiving module, a determining module, a first storage module and a second storage module; wherein the content of the first and second substances,
the receiving module is used for receiving data to be stored;
the determining module is used for determining whether the high-frequency cache stores effective cached data corresponding to the user identification according to the user identification indicated by the data to be stored; if yes, triggering the first storage module; if not, triggering the second storage module;
the first storage module is used for storing the data to be stored in the high-frequency cache by using Bitmaps;
and the second storage module is used for storing the data to be stored in the database by utilizing the int field.
According to a third aspect of embodiments of the present invention, there is provided an electronic device for data storage, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method as in any one of the methods of data storage provided by the first aspect above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method as set forth in any one of the methods of data storage provided by the first aspect above.
One embodiment of the above invention has the following advantages or benefits: when cached data corresponding to the data to be stored is stored in the high-frequency cache, storing the data to be cached in the high-frequency cache by using Bitmaps; therefore, each bit of Bitmaps can be used for storing one data to be cached, so that the data storage capacity in the high-frequency cache is reduced; further, the cached data stored in the high-frequency cache may facilitate access relative to the stored data stored in the database, thereby improving the efficiency of access to the data. In addition, when the stored data corresponding to the data to be stored is not stored in the high-frequency cache, the data to be stored is stored in the database by utilizing the int field, wherein each bit of the int field can also store one data to be stored, so that the data storage capacity in the database is reduced, and the storage space occupied by the stored data is reduced. In conclusion, the data is cached in a storage mode combining the high-frequency cache and the database, so that the data storage capacity is reduced, the storage space is saved, and the data access efficiency is improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a data storage method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating dynamic storage of data using an int field according to an embodiment of the invention;
FIG. 3 is a diagram illustrating a data storage structure for an int field according to an embodiment of the invention;
FIG. 4 is a diagram illustrating a table partitioning according to a table partitioning policy, according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a table division using a hash algorithm according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the main modules of a data storage device according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a method of data storage according to an embodiment of the present invention, and as shown in fig. 1, the method may include the following steps S101 to S104:
step S101: data to be stored is received.
When the data storage method provided by the embodiment of the invention is applied to an e-commerce scene, the data to be stored can be sign-in data, purchase data, order data and the like. In the embodiment of the present invention, the data storage method provided in the embodiment of the present invention is described in detail by taking the check-in data as an example. When the data to be stored is check-in data, the data to be stored may include a user identifier (e.g., UUID) and user check-in information.
Step S102: judging whether effective cached data corresponding to the user identification is stored in the high-frequency cache or not according to the user identification indicated by the data to be stored; if yes, step S103 is triggered, otherwise step S104 is executed.
Before step S102 is executed, it may be determined whether a user identifier indicated by data to be stored is a high-frequency user, if so, it is determined whether the cached data exists in the high-frequency cache according to the user identifier, and if so, it is determined that the cached data is valid.
In the embodiment of the present invention, whether the user identifier is a high-frequency user may be determined by: determining a frequency corresponding to the user identification according to any one or more of the following factors; the elements include: the stored data corresponding to the user identification in a preset time length, the total time length corresponding to the stored data, a first time length from the storage time of the first stored data in the stored data to the current time, and a second time length from the preset time to the current time, wherein the preset time is before the first stored storage time; and when the frequency is greater than a preset frequency threshold value, determining that the user identification corresponds to a high-frequency user.
Specifically, when the data to be stored is check-in data, the high-frequency user corresponds to the high-frequency check-in user. In the elements adopted by the high-frequency user, the stored data corresponding to the user identification in the preset time length can be the sign-in days indicated by the sign-in data of the user in a certain past time length; the total duration corresponding to the stored data represents the total check-in days of the user from the start of the check-in activity; the first duration represents a total number of days elapsed since the first sign-in date for the user; the second duration represents the total number of days until the start date of the check-in activity. In the process of determining whether the user is a high-frequency user, in order to improve the determination accuracy of the high-frequency user, the weight values corresponding to different elements can be set according to the importance degrees of the different elements, and the check-in frequency of the user can be calculated according to the weight values.
In one embodiment of the present invention, the check-in frequency of the user may be calculated by the following formula:
Figure BDA0003039555730000071
a represents the sign-in days indicated by the stored data corresponding to the user identification within a preset time length; a is a preset duration; b represents the total time length corresponding to the stored data, B represents the first time length from the storage time of the first stored data in the stored data to the current time, C represents the second time length from the preset time to the current time, and M1, M2 and M3 are weighted values.
Still taking the above explanation of each element in the check-in scene as an example, when judging whether the user is a high-frequency user according to the data in one month, a is the check-in days in the check-in data of the user in 30 days, a is 30 days, B is the total check-in days from the start of the check-in activity of the user, B is the total days elapsed from the first check-in day to the present of the user, and C is the total days from the start of the check-in activity to the present; m1 can be set according to actual requirements, for example, M1 is 50%, M2 is set to be 30%, and M3 is set to be 20%.
After the check-in frequency of the user is calculated according to the above formula, whether the user is a high-frequency user or not can be judged according to a preset frequency threshold, wherein the preset frequency threshold can be a number between 0 and 100%, for example, the preset frequency threshold is 60%, if the calculated frequency value is greater than 60%, the user identifier corresponds to the high-frequency user, otherwise, the user identifier is a low-frequency user.
In addition, the embodiment of the present invention may also determine whether the user is the high-frequency user by other methods, for example, determining whether the user is the high-frequency user according to the member level of the user, the total amount of orders of the user, or the purchase frequency of the user.
And under the condition that the user identification corresponds to the high-frequency user, determining whether the cached data exists in the high-frequency cache according to the user identification, if the cached data corresponding to the user identification exists in the high-frequency cache, indicating that the check-in data before the user is stored in the high-frequency cache, directly determining that the cached data in the high-frequency cache is valid, and then continuously executing the following step S103, namely storing the data to be stored (the check-in data at this time) in the high-frequency cache by using Bitmaps.
Under the condition that the user identification corresponds to the high-frequency user is judged, but cached data corresponding to the user identification is not stored in the high-frequency cache, the user is a low-frequency user before, the user is converted into the high-frequency user according to last check-in data after checking in for the last time, and under the condition, the check-in data corresponding to the user identification needs to be stored by the high-frequency cache, so that the stored data corresponding to the user identification can be determined in a database, and the stored data is stored in the high-frequency cache by means of Bitmaps. Wherein the stored data in the database is stored in the int field.
In the embodiment of the present invention, under the condition that the user identifier does not correspond to the high-frequency user, that is, under the condition that the user identifier corresponds to the low-frequency user, it may be determined whether the cached data exists in the high-frequency cache according to the user identifier, and if so, it may be determined whether the cached data is valid according to a preset validity period.
Here, when the cached data in the high-frequency cache is still valid, step S103 of storing the data to be stored (the check-in data this time) in the high-frequency cache by using Bitmaps may be performed. When the cached data corresponding to the user identifier does not exist in the high-frequency cache or the cached data in the high-frequency cache is invalid, the following step S104 may be executed, that is, the data to be stored is stored in the database by using the int field.
In order to improve the data access efficiency, in the embodiment of the present invention, the data buffer period when the high frequency user changes to the low frequency user is configured by updating the data buffer duration in the high frequency buffer. Specifically, in an embodiment of the present invention, after the data to be stored is stored in the high-frequency cache by using Bitmaps, the cache duration of the stored data corresponding to the user identifier is updated in the high-frequency cache.
For example, the preset data validity period is 7 days, and 4 days have elapsed since the last time the data to be cached was checked in to be currently received, so that the remaining valid cache duration of the data stored in the high-frequency cache before the data to be stored is stored in the high-frequency cache is 3 days. After the data to be stored (check-in data of this time) is stored in the high-frequency cache, the validity period of the stored data corresponding to the user identifier in the high-frequency cache can be updated to 7 days.
Therefore, the stored data converted from the high-frequency user to the low-frequency user in a short period is not deleted in the high-frequency cache at the time when the stored data is converted to the low-frequency user, but is continuously retained in the high-frequency cache during the validity period. If new data to be stored (i.e. new check-in data) corresponding to the user identifier is received within the valid time limit, the cache duration of the stored data corresponding to the user identifier is continuously updated, so that the check-in data converted into the low-frequency user can be continuously stored in the high-frequency cache for a certain duration, and the data access efficiency is further improved. Of course, for the data to be stored of the high-frequency user, after the data is stored in the high-frequency cache, the cache duration of the data is also updated correspondingly.
In addition, when the user identification is determined not to correspond to the high-frequency user and the cached data of the user identification in the high-frequency cache is invalid, updating the database by using the int field according to the stored data stored by using Bitmaps in the high-frequency cache, and deleting the stored data in the high-frequency cache after updating. This is because, when the user identification was once determined to be a high frequency user, its corresponding check-in data is stored directly in the high frequency cache using Bitmaps, and does not access the database once every time check-in data is received, avoiding frequent database, that is, when the user identifier is determined to be a high-frequency user once, the corresponding check-in data is stored in the high-frequency cache only by using Bitmaps, so that, when the user identifier is determined as a low-frequency user and the cached data in the high-frequency cache is invalid, that is, when the user changes from the high-frequency user to the low-frequency user and the corresponding cached data is invalid, the check-in data of the database needs to be stored by the database, and at this time, the data stored by the int field in the data needs to be updated according to the cached data in the high-frequency cache, so as to ensure that the data stored in the database is synchronized to the latest state. Further, in order to save the storage space in the high-frequency cache, after synchronization, the cached data corresponding to the user identifier in the high-frequency cache may be deleted.
Step S103: and storing the data to be stored in the high-frequency cache by using Bitmaps.
In the embodiment of the invention, Redis can be used for customizing the high-frequency user cache for the high-frequency user, the storage of the full-scale check-in data of the life cycle of the high-frequency user is realized through Bitmaps of the Redis, and the query with high performance is provided. Where the maximum number of bits 2^32 (about 42.9 billion) supported by Bitmaps of Redis can fully support all check-in data for the entire lifecycle (100 years) of each user. By the mode, the high-frequency storage scheme is used for storing the check-in data of the high-frequency client, and Redis storage resources can be saved under the condition that ultra-large amount of user data needs to be stored.
For example, in a check-in activity, assuming that the activity starts on 1 month 1 of 2021, using the first day of the start of the activity as a reference, all data bits of bitmaps may be set to 0 before the start of the activity, with each bit corresponding to one check-in day. And after the user starts to sign in, updating the data bit of the date corresponding to the sign-in data of the user to be 1. For example, when the user signs in on 20210101 days, the 0 th position of bitmaps is set to 1, after the client signs in on 20210102 days, the 1 st position of bitmaps is set to 1, and after the client signs in on 20210101+ n days, the 1 st position is set to the nth position. The overall strategy is that the offset between the check-in date and the activity starting date is the check-in position of the current date, the corresponding data position is set to 1 when the user checks in, and the corresponding data position is set to 0 when the user does not check in. Each user only has one unique piece of data in the cache, namely, the unique id (UUID) of the user can be set as the key of redis.
Step S104: and storing the data to be stored in the database by utilizing the int field.
In step S104, the check-in data can be stored monthly using the int field, because the int field occupies 32 bits, the check-in record of the next month can be perfectly installed, and the storage manner of the check-in record is the same as that of redis. Therefore, the check-in activity cycle can be considered, and the storage space of the database can be considered. In an embodiment of the present invention, each bit of the int field is used to store each piece of data to be stored, and a storage identifier of the int field is set according to the user identifier.
When the int field is used for storing data, the storage mode is basically consistent with that of Bitmaps. Specifically, still taking a check-in activity as an example, with the first day of the start of the activity as a reference, assuming that the activity starts at 1 st 1/2021, all data bits of the int field may be set to 0 before the start of the activity, and each bit corresponds to one check-in day. And after the user starts to sign in, updating the data bit of the date corresponding to the sign-in data of the user to be 1. For example, when the user checks in at 20210101, the 0 th position of the int field is set to 1, after the customer checks in at 20210102, the 1 st position of the int field is set to 1, and after the customer checks in at 20210101+ n, the 1 st position is set to 1. The overall strategy is that the offset between the check-in date and the activity starting date is the check-in position of the current date, the corresponding data position is set to 1 when the user checks in, and the corresponding data position is set to 0 when the user does not check in. When the check-in data is stored by the int field by month, the storage identifier of the int field can be set to user id (uuid) + check-in month. A schematic diagram of dynamically storing data by using the int field can be shown in fig. 2, and a storage result of the int field can be shown in fig. 3.
The data table is adopted to store the int fields in the embodiment of the invention, and it can be understood that each user's monthly check-in data corresponds to one int field, so that when the number of users participating in the check-in activity is large and the continuous check-in duration of the users is long, the int fields stored in the data table are very large, thereby possibly influencing the access efficiency of the data table. Thus, in one embodiment of the invention, the int field is stored in a first data table; when the number of int fields stored in the first data table is greater than a preset number threshold, further comprising: grouping the plurality of int fields in the first data table, and storing the grouped int fields by using a plurality of second data tables respectively.
Here, the preset number threshold may be set according to the storage capacity of the first data table, for example, the first data table may store up to 100 int fields, and then the preset number threshold may be set up to the maximum storable amount × 95% of the first data table, that is, 95, and of course, the preset number threshold may also be set up to other values according to actual requirements. When the amount of the int fields stored in the first data table is greater than the preset amount threshold, the stored multiple int fields can be stored in a sub-table manner, that is, the grouped int fields are respectively stored by using the multiple second data tables after the sub-table. A schematic diagram of performing table splitting storage on the int field in the first data table according to the preset table splitting policy may be as shown in fig. 4.
In an embodiment of the present invention, during table splitting, a hash algorithm may be used to calculate a digest value of the storage identifier, modulo the digest value with the number of the second data table, and the int fields are grouped according to a modulo result.
Specifically, the unique id (uuid) of the user is subjected to hash operation, and the operation result is modulo according to the number of tables to be sorted (the number of the second data tables), that is, hash (uuid)% n, where n represents the number of the second data tables. The hash operation is performed first, so that the int fields are distributed as uniformly as possible in the second data tables. When the second data table is about to reach the bottleneck, the sub-tables can be dynamically continued according to the months, namely a plurality of int fields stored in the same second data table, storing the int field of the same month by using a third data table according to the month indicated by the storage identifier, storing the int fields corresponding to different months by using different third data tables, wherein the process of the sub-table can be as shown in fig. 5, among them, table-0-201901 stores the check-in data of month 1 of 2019 divided from the second data table storing the identifier table0, table-0-201902 stores the check-in data of month 2 of 2019 divided from the second data table storing the identifier table0, table-63-201901 stores the check-in data of month 1 of 2019 divided from the second data table storing the identifier table63, and table-63-201902 stores the check-in data of month 2 of 2019 divided from the second data table storing the identifier table 63.
It is worth mentioning that when the low-frequency user changes to the high-frequency user, that is, the check-in data corresponding to the user identifier is stored by using the high-frequency cache for the first time, the int fields may be assembled in sequence according to check-in months corresponding to the plurality of int fields corresponding to the user identifier in the database, and the assembled stored data is stored in the high-frequency cache.
According to the data storage method provided by the embodiment of the invention, when cached data corresponding to the data to be stored is stored in the high-frequency cache, the data to be cached is stored in the high-frequency cache by using Bitmaps; therefore, each bit of Bitmaps can be used for storing one data to be cached, so that the data storage capacity in the high-frequency cache is reduced; further, the cached data stored in the high-frequency cache may facilitate access relative to the stored data stored in the database, thereby improving the efficiency of access to the data. In addition, when the stored data corresponding to the data to be stored is not stored in the high-frequency cache, the data to be stored is stored in the database by utilizing the int field, wherein each bit of the int field can also store one data to be stored, so that the data storage capacity in the database is reduced, and the storage space occupied by the stored data is reduced. In conclusion, the data is cached in a storage mode combining the high-frequency cache and the database, so that the data storage capacity is reduced, the storage space is saved, and the data access efficiency is improved.
As shown in fig. 6, an embodiment of the present invention provides a data storage device, including: a receiving module 601, a determining module 602, a first storing module 603 and a second storing module 604; wherein the content of the first and second substances,
the receiving module 601 is configured to receive data to be stored;
the determining module 602 is configured to determine, according to the user identifier indicated by the data to be stored, whether valid cached data corresponding to the user identifier is stored in the high-frequency cache; if yes, triggering the first storage module 603; if not, triggering the second storage module 604;
the first storage module 603 stores the data to be stored in the high-frequency cache by using Bitmaps;
the second storage module 604 is configured to store the data to be stored in the database by using the int field.
In an embodiment of the present invention, the determining module 602 is further configured to determine whether the user identifier corresponds to a high-frequency user, if so, determine whether the cached data exists in the high-frequency cache according to the user identifier, and if so, determine that the cached data is valid.
In an embodiment of the present invention, the determining module 602 is configured to determine, when it is determined that the user identifier corresponds to a high-frequency user and the cached data corresponding to the user identifier does not exist in the high-frequency cache, stored data corresponding to the user identifier in a database, and store the stored data in the high-frequency cache by using Bitmaps.
In an embodiment of the present invention, the determining module 602 is configured to determine a frequency corresponding to the user identifier according to any one or more of the following elements; the elements include: the stored data corresponding to the user identification in a preset time length, the total time length corresponding to the stored data, a first time length from the storage time of the first stored data in the stored data to the current time, and a second time length from the preset time to the current time, wherein the preset time is before the first stored storage time; and when the frequency is greater than a preset frequency threshold value, determining that the user identification corresponds to a high-frequency user.
In an embodiment of the present invention, the determining module 602 is configured to determine whether the cached data exists in the high-frequency cache according to the user identifier, and if so, determine whether the cached data is valid according to a preset validity period.
In an embodiment of the present invention, the storing module 603 is configured to update a cache duration of the stored data corresponding to the user identifier in the high-frequency cache.
In an embodiment of the present invention, the first storing module 603 is configured to update the int field in the database according to the stored data stored in the high-frequency cache by using Bitmaps, and delete the stored data in the high-frequency cache after the update.
In an embodiment of the present invention, the second storing module 604 is configured to update the int field in the database according to the stored data stored in the high-frequency cache by using Bitmaps, and delete the stored data in the high-frequency cache after the update.
In an embodiment of the present invention, the second storage module 604 is configured to store each piece of the to-be-stored data by using each bit of the int field, and set a storage identifier of the int field according to the user identifier.
In an embodiment of the present invention, the second storing module 604 is configured to group the int fields in the first data table, and store the int fields after grouping by using a plurality of second data tables, respectively.
In an embodiment of the present invention, the second storage module 604 is configured to calculate a digest value of the storage identifier by using a hash algorithm, modulo the digest value with the number of the second data table, and group the int fields according to a modulo result.
According to the data storage device provided by the embodiment of the invention, when cached data corresponding to the data to be stored is stored in the high-frequency cache, the data to be cached is stored in the high-frequency cache by using Bitmaps; therefore, each bit of Bitmaps can be used for storing one data to be cached, so that the data storage capacity in the high-frequency cache is reduced; further, the cached data stored in the high-frequency cache may facilitate access relative to the stored data stored in the database, thereby improving the efficiency of access to the data. In addition, when the stored data corresponding to the data to be stored is not stored in the high-frequency cache, the data to be stored is stored in the database by utilizing the int field, wherein each bit of the int field can also store one data to be stored, so that the data storage capacity in the database is reduced, and the storage space occupied by the stored data is reduced. In conclusion, the data is cached in a storage mode combining the high-frequency cache and the database, so that the data storage capacity is reduced, the storage space is saved, and the data access efficiency is improved.
An embodiment of the present invention further provides a server, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the data storage method as provided in any of the embodiments above.
The embodiment of the present invention further provides a computer readable medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the data storage method provided in any of the above embodiments.
FIG. 7 illustrates an exemplary system architecture 700 of a data storage method or data storage device to which embodiments of the present invention may be applied.
As shown in fig. 7, the system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the terminal devices 701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. Various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the terminal devices 701, 702, and 703.
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 705 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 701, 702, 703. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the data storage method provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, the data storage device is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a receiving module, a determining module, a first storage module, and a second storage module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, a receiving module may also be described as a "module that receives data to be stored".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving data to be stored; determining whether effective cached data corresponding to the user identification is stored in a high-frequency cache or not according to the user identification indicated by the data to be stored; if so, storing the data to be stored in the high-frequency cache by using Bitmaps; and if not, storing the data to be stored in the database by utilizing the int field.
According to the technical scheme of the embodiment of the invention, when cached data corresponding to the data to be stored is stored in the high-frequency cache, the data to be cached is stored in the high-frequency cache by using Bitmaps; therefore, each bit of Bitmaps can be used for storing one data to be cached, so that the data storage capacity in the high-frequency cache is reduced; further, the cached data stored in the high-frequency cache may facilitate access relative to the stored data stored in the database, thereby improving the efficiency of access to the data. In addition, when the stored data corresponding to the data to be stored is not stored in the high-frequency cache, the data to be stored is stored in the database by utilizing the int field, wherein each bit of the int field can also store one data to be stored, so that the data storage capacity in the database is reduced, and the storage space occupied by the stored data is reduced. In conclusion, the data is cached in a storage mode combining the high-frequency cache and the database, so that the data storage capacity is reduced, the storage space is saved, and the data access efficiency is improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A method of data storage, comprising:
receiving data to be stored;
determining whether effective cached data corresponding to the user identification is stored in a high-frequency cache or not according to the user identification indicated by the data to be stored;
if so, storing the data to be stored in the high-frequency cache by using Bitmaps;
and if not, storing the data to be stored in the database by utilizing the int field.
2. The method of claim 1, further comprising, prior to said determining whether valid cached data corresponding to the user identifier is stored in the high frequency cache:
and judging whether the user identification corresponds to a high-frequency user, if so, determining whether the cached data exists in the high-frequency cache according to the user identification, and if so, determining that the cached data is valid.
3. The method of claim 2, wherein upon determining that the user identifier corresponds to a high frequency user and that the cached data corresponding to the user identifier does not exist in the high frequency cache,
and determining stored data corresponding to the user identification in a database, and storing the stored data in the high-frequency cache by using Bitmaps.
4. The method of claim 2, wherein the determining whether the user identifier corresponds to a high frequency user comprises:
determining a frequency corresponding to the user identification according to any one or more of the following factors; the elements include:
the stored data corresponding to the user identification in a preset time length, the total time length corresponding to the stored data, a first time length from the storage time of the first stored data in the stored data to the current time, and a second time length from the preset time to the current time, wherein the preset time is before the first stored storage time;
and when the frequency is greater than a preset frequency threshold value, determining that the user identification corresponds to a high-frequency user.
5. The method according to claim 2, when it is determined that the user identifier does not correspond to a high frequency user, further comprising:
and determining whether the cached data exists in the high-frequency cache according to the user identification, and if so, determining whether the cached data is valid according to a preset validity period.
6. The method of claim 1, further comprising, after storing the data to be stored in the high-frequency cache using Bitmaps:
and updating the cache duration of the stored data corresponding to the user identification in the high-frequency cache.
7. The method of claim 5, wherein when the cached data is invalid, further comprising:
and updating the data base by using the int field according to the stored data stored by using Bitmaps in the high-frequency cache, and deleting the stored data in the high-frequency cache after updating.
8. The method according to claim 1, wherein storing the data to be stored in the database by using an int field comprises:
and storing each data to be stored by utilizing each bit of the int field, and setting a storage identifier of the int field according to the user identifier.
9. The method of claim 8, wherein the int field is stored in a first data table; when the number of int fields stored in the first data table is greater than a preset number threshold, further comprising:
grouping the plurality of int fields in the first data table, and storing the grouped int fields by using a plurality of second data tables respectively.
10. The method of claim 9,
and calculating the digest value of the storage identifier by using a Hash algorithm, performing modulo on the digest value and the number of the second data table, and grouping the plurality of int fields according to a modulo result.
11. An apparatus for data storage, comprising:
the device comprises a receiving module, a determining module, a first storage module and a second storage module; wherein the content of the first and second substances,
the receiving module is used for receiving data to be stored;
the determining module is used for determining whether the high-frequency cache stores effective cached data corresponding to the user identification according to the user identification indicated by the data to be stored; if yes, triggering the first storage module; if not, triggering the second storage module;
the first storage module is used for storing the data to be stored in the high-frequency cache by using Bitmaps;
and the second storage module is used for storing the data to be stored in the database by utilizing the int field.
12. An electronic device for data storage, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.
13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN202110453160.1A 2021-04-26 2021-04-26 Data storage method and device Pending CN113239303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110453160.1A CN113239303A (en) 2021-04-26 2021-04-26 Data storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110453160.1A CN113239303A (en) 2021-04-26 2021-04-26 Data storage method and device

Publications (1)

Publication Number Publication Date
CN113239303A true CN113239303A (en) 2021-08-10

Family

ID=77129349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110453160.1A Pending CN113239303A (en) 2021-04-26 2021-04-26 Data storage method and device

Country Status (1)

Country Link
CN (1) CN113239303A (en)

Similar Documents

Publication Publication Date Title
CN109657174B (en) Method and device for updating data
CN111858586B (en) Data processing method and device
CN110909022A (en) Data query method and device
CN113761565B (en) Data desensitization method and device
CN112948498A (en) Method and device for generating global identification of distributed system
CN113641706A (en) Data query method and device
CN113452733A (en) File downloading method and device
CN111177109A (en) Method and device for deleting overdue key
CN109144991B (en) Method and device for dynamic sub-metering, electronic equipment and computer-storable medium
CN113239303A (en) Data storage method and device
CN113347052B (en) Method and device for counting user access data through access log
CN113824675B (en) Method and device for managing login state
CN114064693A (en) Method, device, electronic equipment and computer readable medium for processing account data
CN113760861A (en) Data migration method and device
CN110019671B (en) Method and system for processing real-time message
CN109213815B (en) Method, device, server terminal and readable medium for controlling execution times
CN113220981A (en) Method and device for optimizing cache
CN113742376A (en) Data synchronization method, first server and data synchronization system
CN111737218A (en) File sharing method and device
CN112699116A (en) Data processing method and system
CN113535768A (en) Production monitoring method and device
CN113127416A (en) Data query method and device
CN113138943A (en) Method and device for processing request
CN112131287A (en) Method and device for reading data
CN113778909B (en) Method and device for caching data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination