CN107357794B - Method and device for optimizing data storage structure of key value database - Google Patents

Method and device for optimizing data storage structure of key value database Download PDF

Info

Publication number
CN107357794B
CN107357794B CN201610305828.7A CN201610305828A CN107357794B CN 107357794 B CN107357794 B CN 107357794B CN 201610305828 A CN201610305828 A CN 201610305828A CN 107357794 B CN107357794 B CN 107357794B
Authority
CN
China
Prior art keywords
keywords
prefix
prefixes
database
prefix list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610305828.7A
Other languages
Chinese (zh)
Other versions
CN107357794A (en
Inventor
黄肖明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610305828.7A priority Critical patent/CN107357794B/en
Publication of CN107357794A publication Critical patent/CN107357794A/en
Application granted granted Critical
Publication of CN107357794B publication Critical patent/CN107357794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Abstract

Methods and apparatus to optimize data storage structures of key-value stores are disclosed. One embodiment of the method comprises: reading a preset number of keywords from a database and performing an analysis step on each read keyword until all keywords in the database are analyzed to obtain a prefix list; acquiring the number and storage space of keywords of prefixes in a hit prefix list; and optimizing the data storage structure of the database according to the number and the storage space of the keywords of the prefixes in the hit prefix list. The embodiment improves the storage performance of the database and accelerates the reading efficiency of the database.

Description

Method and device for optimizing data storage structure of key value database
Technical Field
The present application relates to the field of computer technologies, and in particular, to the field of internet technologies, and in particular, to a method and an apparatus for optimizing a data storage structure of a key-value database.
Background
With the increasing complexity of distributed data storage services, in order to avoid excessive reading pressure on a single shared memory, load balancing may be performed on the shared storage device through algorithms such as hash (hash), which also brings about a dramatic increase in the number of key value tables.
Currently, when monitoring a key value table in a distributed data storage, all keywords are generally acquired and a storage space and a data structure of the keywords are displayed one by one, or the storage space and the data structure of the keywords are acquired for the keywords specified by a user, or the keywords are read and written through online real-time monitoring application, and the use conditions such as the reading performance of the keywords are analyzed in real time.
However, currently, the data acquired when the key value table is monitored is only the storage space occupied by a certain row of data in the key value table or the information such as frequent reading and writing, and the data storage cannot be optimized according to the acquired data.
Disclosure of Invention
It is an object of the present application to provide an improved method and apparatus for optimizing a data storage structure of a key-value store, so as to solve the technical problems mentioned in the above background section.
In a first aspect, the present application provides a method for optimizing a data storage structure of a key-value store, the method comprising: reading a preset number of keywords from a database and performing an analysis step on each of the read keywords until all keywords in the database have been analyzed, the analysis step comprising: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last participle to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in a prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single participle, if so, executing an analysis step on the read next keyword, and if not, executing the detection step on the participle in the predicted prefix; acquiring the number and storage space of keywords hitting the prefixes in the prefix list; and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the prefix list.
In some embodiments, the obtaining the number of keys and storage space that hit prefixes in the prefix list includes: re-executing the analysis step on the keywords corresponding to the prefixes with the hit times less than 2 times in the prefix list to obtain an updated prefix list; acquiring the number and storage space of keywords hitting the prefixes in the updated prefix list; and optimizing a data storage structure of a database according to the number of keywords and the storage space of the prefixes in the prefix list, wherein the keywords are hit by the prefixes in the prefix list, and the optimizing the data storage structure comprises: and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the updated prefix list.
In some embodiments, reading a preset number of keywords from the database and performing the analyzing step on each of the read keywords until all keywords in the analyzed database include: reading a preset number of keywords from a database and performing an analysis step on each read keyword, and when the analysis step on the preset number of keywords is finished, taking prefixes with hit identifiers less than 2 times in the prefix list as independent prefixes and moving the prefixes to an independent hash table until all keywords in the analyzed database are obtained; the obtaining the number and the storage space of the keywords hitting the prefixes in the prefix list comprises: after identifying all keywords in the database, re-executing the analysis step on the keys corresponding to the independent prefixes to obtain an updated prefix list, and acquiring the number and storage space of the keywords hitting the prefixes in the updated prefix list; and optimizing a data storage structure of a database according to the number of keywords and the storage space of the prefixes in the prefix list, wherein the keywords are hit by the prefixes in the prefix list, and the optimizing the data storage structure comprises: and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the updated prefix list.
In some embodiments, the database includes instance and/or in-memory snapshot files.
In some embodiments, the analyzing the read keywords for word segmentation comprises:
in some embodiments, the optimizing the data storage structure of the database according to the number of keywords and storage space of the prefixes in the prefix list includes: presenting options for optimizing a data storage structure to a user according to the number and storage space of keywords hitting the prefixes in the prefix list; optimizing a data storage structure of the database in response to receiving a user selection of the option.
In a second aspect, the present application provides an apparatus for optimizing a data storage structure of a key-value store, the apparatus comprising: an analysis unit, configured to read a preset number of keywords from a database and perform an analysis step on each read keyword until all keywords in the database have been analyzed, where the analysis step includes: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last participle to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in a prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single participle, if so, executing an analysis step on the read next keyword, and if not, executing the detection step on the participle in the predicted prefix; an obtaining unit, configured to obtain the number of keywords and a storage space of prefixes in the prefix list; and the optimization unit is used for optimizing a data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the prefix list.
In some embodiments, the obtaining unit is further configured to: re-executing the analysis step on the keywords corresponding to the prefixes with the hit times less than 2 times in the prefix list to obtain an updated prefix list; acquiring the number and storage space of keywords hitting the prefixes in the updated prefix list; and the optimization unit is further configured to: and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the updated prefix list.
In some embodiments, the analysis unit is further configured to: reading a preset number of keywords from a database and performing an analysis step on each read keyword, and when the analysis step on the preset number of keywords is finished, taking prefixes with hit identifiers less than 2 times in the prefix list as independent prefixes and moving the prefixes to an independent hash table until all keywords in the analyzed database are obtained; the acquisition unit is further configured to: re-executing the analysis step on the key corresponding to the independent prefix to obtain an updated prefix list, and acquiring the number and storage space of keywords hitting the prefixes in the updated prefix list; and the optimization unit is further configured to: and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the updated prefix list.
In some embodiments, in some optional implementations of this embodiment, the analyzing unit is configured to read a preset number of keywords from the database, and includes: the analysis unit is used for reading a preset number of keywords from the instance and/or the memory snapshot file.
In some embodiments, the analyzing unit is configured to analyze the read keywords to obtain the segmentation words, and the analyzing unit is configured to: the analysis unit is used for analyzing the read keywords according to one or more of the following items to obtain word segmentation: separators, case changes, and numeric letter changes.
In some embodiments, the optimization module is further to: presenting options for optimizing a data storage structure to a user according to the number and storage space of keywords hitting the prefixes in the prefix list; optimizing a data storage structure of the database in response to receiving a user selection of the option.
The method and the device for optimizing the data storage structure of the key value database provided by the application read a preset number of keywords from the database and perform an analysis step on each read keyword until all keywords in the database are analyzed, wherein the analysis step comprises the following steps: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last participle to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in a prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single participle, if so, executing an analysis step on the read next keyword, and if not, executing the detection step on the participle in the predicted prefix; then obtaining the number and storage space of keywords hitting the prefixes in the prefix list; and finally, optimizing the data storage structure of the key value database according to the number and the storage space of the keywords hitting the prefixes in the prefix list, thereby improving the storage performance of the database and accelerating the reading efficiency of the database.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of optimizing a data storage structure of a key-value store according to the present application;
FIG. 3 is a flow diagram of yet another embodiment of a method of optimizing a data storage structure of a key-value store according to the present application;
FIG. 4 is a flow chart of a third embodiment of a method of optimizing a data storage structure of a key-value store according to the present application;
FIG. 5 is a block diagram illustrating one embodiment of an apparatus for optimizing data storage structures of a key-value store according to the present application;
fig. 6 is a schematic structural diagram of a computer system suitable for implementing the terminal device or the server according to the embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the method of optimizing a data storage structure of a key-value store or the apparatus for optimizing a data storage structure of a key-value store of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and servers 105, 106. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user 110 may use the terminal devices 101, 102, 103 to interact with the servers 105, 106 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as database management applications, shopping applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio layer iii, mpeg compression standard Audio layer 3), MP4 players (Moving Picture Experts Group Audio layer IV, mpeg compression standard Audio layer 4), laptop and desktop computers, and the like.
The servers 105, 106 may be servers providing various database services, such as background servers providing support for the terminal devices 101, 102, 103. The background server can analyze and process the received data such as the request and feed back the processing result to the terminal equipment.
It should be noted that the method for optimizing the data storage structure of the key-value store provided in the embodiment of the present application is generally executed by the terminal devices 101, 102, 103 or the servers 105, 106, and accordingly, the apparatus for optimizing the data storage structure of the key-value store is generally disposed in the terminal devices 101, 102, 103 or the servers 105, 106.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
It should be understood that a Key-Value database in this application refers to a Key-Value database, and may include Key-Value databases in the prior art as well as those developed in the future. The method and the device for optimizing the data storage structure of the database are described below by taking a Redis database in a Key-Value database as an example.
Referring to fig. 2, fig. 2 illustrates a flow 200 of one embodiment of a method of optimizing a data storage structure of a key-value store according to the present application. The process of the method for optimizing the data storage structure of the key value database takes a Redis database as an example, and specifically comprises the following steps:
in step 210, a preset number of keywords (keys) are read from the Redis database and an analysis step is performed on each of the read keywords until all keywords in the Redis database have been analyzed.
In this embodiment, the analyzing step includes: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last word segmentation to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in the prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single word segmentation, if so, executing an analysis step on the read next keyword, and if not, executing a detection step on the word segmentation in the predicted prefix.
When a preset number of keywords are read from the Redis database, in order to prevent the memory of the application tool from being exhausted due to the fact that the read Redis data amount is too large, the preset number of keywords can be read from the Redis instance and/or the Redis memory snapshot file each time.
In analyzing the read keywords to obtain the participles, the read keywords may be analyzed to obtain the participles according to one or more of the following: separators, case changes, and numeric letter changes. For example, the keyword Customer-username can be analyzed according to the separator to obtain the participles "Customer" and "username"; the keyword CustomerSaveKeyUsername can be analyzed according to case change to obtain the participle: "Customer", "Save", "Key" and "Username"; the segmentation keyword "abc 1231" may be changed according to alphanumerics to obtain the segmentation words "abc", "123", and "1". Here, if the keyword that may be read is a global User Identification (UID) type keyword, the global user identification type keyword may be used as a type of keyword that is recorded separately.
It should be understood that the prefix list is empty when the detecting step is executed for the first time, the predicted prefix misses the prefix in the prefix list, then the predicted prefix is added to the prefix list, and then the subsequent participle can identify whether the predicted prefix hits the prefix in the prefix list when the detecting step is executed.
In step 220, the number of keys and storage space of the prefixes in the hit prefix list are obtained.
In this embodiment, a prefix list obtained by identifying all keywords in the Redis database in step 210 is traversed, all keys in the Redis are traversed, a Key that hits the prefix in the prefix list and a Value (Value) of the Key are obtained, and a memory space occupied by each Key and Value is recorded.
When the number and the storage space of the keywords of the prefixes in the hit prefix list are obtained, the number and the storage space of the keywords of all the prefixes in the hit prefix list can be obtained, and therefore the comprehensiveness of the obtained data is improved; the number of keys and the storage space of which the number of times of hitting the prefixes in the prefix list exceeds a predetermined number of times (for example, 2 times) can also be acquired, and for the number of keys and the storage space of which the number of times of hitting the prefixes in the prefix list does not exceed the predetermined number of times (for example, 2 times), the data can be acquired or discarded separately, so that the acquisition efficiency is improved.
In step 230, the data storage structure of the Redis database is optimized according to the number of keywords and the storage space of the prefixes in the hit prefix list.
In the present embodiment, the data storage structure of the Redis database is optimized based on the number of keywords and the storage space of the prefixes in the hit prefix list acquired in step 220. For example, the load pressure of the cache system may be prompted according to the number of keywords and the storage space of the prefixes in the obtained hit prefix list, so as to perform load balancing; data storage can also be optimized according to the number and storage space of the keywords of the prefixes in the obtained hit prefix list, for example, deleting garbage data or removing duplicate data.
When the data storage structure of the Redis database is optimized according to the number and the storage space of the keywords of the prefixes in the hit prefix list, the operations defined in the preset rules can be completed when the number and the storage space of the keywords of the prefixes in the hit prefix list meet the preset conditions of the preset rules, so that the data storage structure of the Redis database is optimized, and when the number and the storage space of the keywords of the prefixes in the hit prefix list meet the preset conditions, options for optimizing the data storage structure can be presented to a user, and the data storage structure of the Redis database is optimized in response to receiving the selection of the options from the user.
According to the embodiment of the application, the data storage structure of the Redis database is optimized, the storage performance of the Redis database is improved, and the reading efficiency of the Redis database is accelerated.
With further reference to FIG. 3, a flow diagram 300 of yet another embodiment of a method of optimizing data storage structures of a key-value store is shown, in accordance with the present application. Still taking the Redis database as an example, the process 300 of the method for optimizing the data storage structure of the key-value database specifically includes the following steps:
in step 310, a preset number of keywords are read from the Redis database and an analysis step is performed on each of the read keywords until all keywords in the Redis database have been analyzed.
In this embodiment, the analyzing step includes: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last word segmentation to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in the prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single word segmentation, if so, executing an analysis step on the read next keyword, and if not, executing a detection step on the word segmentation in the predicted prefix.
It should be understood that step 310 in this embodiment is the same as step 210 in fig. 2, and therefore, the operations and features described in step 210 are also applicable to step 310, and are not described herein again.
In step 320, the analysis step is performed again on the keyword corresponding to the prefix with the hit frequency less than 2 times in the prefix list, so as to obtain an updated prefix list.
In this embodiment, in order to improve the accuracy of identifying all keywords in the Redis database, the analysis step may be performed again on the keyword corresponding to the prefix with the hit frequency less than 2 times in the prefix list, so as to obtain an updated prefix list.
In step 330, the number of keys and storage space that hit prefixes in the updated prefix list are obtained.
In this embodiment, based on the updated prefix list obtained by re-executing the analysis step on the keyword corresponding to the prefix whose hit frequency is less than 2 times in step 320, all the keyword keys in the Redis are traversed, the Key which hits the prefix in the updated prefix list and the Value of the Key are obtained, and the memory space occupied by each Key and Value is recorded.
In step 340, the data storage structure of the Redis database is optimized according to the number of keywords and the storage space of the prefixes in the hit updated prefix list.
In this embodiment, based on the number of keywords and the storage space of the prefixes in the hit updated prefix list acquired in step 330, the data storage structure of the Redis database may be optimized. For example, the load pressure of the cache system may be prompted according to the number and storage space of the keywords of the prefixes in the obtained hit updated prefix list, so as to perform load balancing; data storage can also be optimized according to the number and storage space of the keywords of the prefixes in the obtained hit updated prefix list, for example, deleting garbage data or removing duplicate data.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the method for optimizing the data storage structure of the cache Redis in this embodiment highlights the step of re-performing the analysis step on the keyword corresponding to the prefix with the hit number less than 2 to obtain the updated prefix list. Therefore, the scheme described in the embodiment can improve the number of keywords for identifying the prefixes in the prefix list after the prefixes are updated and the precision of the storage space, and provide more accurate data for optimizing the data storage structure of the Redis database.
With further reference to FIG. 4, a flowchart 400 of a third embodiment of a method of optimizing a data storage structure of a key-value store in accordance with the present application is illustrated. The process 400 of the method for optimizing a data storage structure of a key-value store, continuing with the Redis database as an example, includes the steps of:
in step 410, a preset number of keywords are read from the Redis database and an analysis step is performed on each of the read keywords.
In this embodiment, the analyzing step includes: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last word segmentation to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in the prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single word segmentation, if so, executing an analysis step on the read next keyword, and if not, executing a detection step on the word segmentation in the predicted prefix.
When a preset number of keywords are read from the Redis database, in order to prevent the memory of the application tool from being exhausted due to the fact that the read Redis data amount is too large, the preset number of keywords can be read from the Redis instance and/or the Redis memory snapshot file each time.
In analyzing the read keywords to obtain the participles, the read keywords may be analyzed to obtain the participles according to one or more of the following: separators, case changes, and numeric letter changes. For example, the keyword Customer-username can be segmented into "Customer" and "username" according to the separator; the keyword CustomerSaveKeyUsername can be segmented into words according to case change: "Customer", "Save", "Key" and "Username"; the keyword "abc 1231" may be participled as "abc", "123", and "1" according to variations in alphanumerics. Here, the global User Identification (UID) class may be recorded separately as a type of Key (Key).
It should be understood that the prefix list is empty when the detecting step is executed for the first time, the predicted prefix misses the prefix in the prefix list, then the predicted prefix is added to the prefix list, and then the subsequent participle can identify whether the predicted prefix hits the prefix in the prefix list when the detecting step is executed.
In step 420, at the end of the analyzing step performed on the preset number of keywords, moving prefixes with hit identifiers less than 2 times in the prefix list as independent prefixes into the independent hash table until all keywords in the Redis database have been analyzed.
In this embodiment, when the recognition of each group of the preset number of keywords is finished, prefixes with hit identifiers less than 2 times may be moved to the independent hash table as independent prefixes, so that after all the keywords in the Redis database are completely recognized, step 430 is performed to obtain an updated prefix list. Therefore, the phenomenon that the detection efficiency of the detection step is influenced by overlong queues in the prefix list due to the fact that random character strings generated by some programs directly through algorithms are added to the prefix list can be avoided.
In step 430, the analyzing step is re-performed on the keys corresponding to the independent prefixes to obtain an updated prefix list.
In this embodiment, after all the keywords in the Redis database are completely identified, the keywords in the independent hash table may be taken out one by one and an analysis step is performed to obtain an updated prefix list. Here, in order to improve the recognition efficiency, the keys in the independent hash tables may be extracted one by one according to the length and the analyzing step may be performed.
In step 440, the number of keys and storage space that hit prefixes in the updated prefix list are obtained.
In this embodiment, based on the updated prefix list obtained in step 430, all Key keys in the Redis are traversed, a Key that hits a prefix in the prefix list and a Value of the Key are obtained, and a memory space occupied by each Key and Value is recorded.
In step 450, the data storage structure of the Redis database is optimized according to the number of keywords and the storage space of the prefixes in the hit updated prefix list.
In this embodiment, the data storage structure of the Redis database is optimized based on the number of keywords and storage space of the prefixes in the hit updated prefix list acquired in step 440. For example, the load pressure of the cache system may be prompted according to the number and storage space of the keywords of the prefixes in the obtained hit updated prefix list, so as to perform load balancing; data storage can also be optimized according to the number and storage space of the keywords of the prefixes in the obtained hit updated prefix list, for example, deleting garbage data or removing duplicate data.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for optimizing the data storage structure of the cache Redis in the present embodiment highlights the step of re-performing the analysis step on the key corresponding to the independent prefix to obtain the updated prefix list. Therefore, the scheme described in the embodiment can improve the efficiency of identifying the keywords hitting the prefixes in the updated prefix list, and provide more accurate data for optimizing the data storage structure of the Redis database.
With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for optimizing a data storage structure of a key-value store. Still taking Redis database as an example, the embodiment of the apparatus corresponds to the embodiment of the method shown in FIG. 2, and the apparatus can be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for optimizing a data storage structure of Redis of the present embodiment includes: an analyzing unit 510, an obtaining unit 520 and an optimizing unit 530.
The analysis unit 510 is configured to read a preset number of keywords from the Redis database and perform an analysis step on each read keyword until all keywords in the Redis database are analyzed, where the analysis step includes: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last word segmentation to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in the prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single word segmentation, if so, executing an analysis step on the read next keyword, and if not, executing a detection step on the word segmentation in the predicted prefix.
An obtaining unit 520, configured to obtain the number of keys and the storage space of the prefixes in the hit prefix list.
And the optimizing unit 530 is configured to optimize a data storage structure of the Redis database according to the number of keywords and the storage space of the prefixes in the hit prefix list.
In some optional implementations of this embodiment, the obtaining unit is further configured to: re-executing the analysis step on the keywords corresponding to the prefixes with the hit times less than 2 times in the prefix list to obtain an updated prefix list; acquiring the number and storage space of keywords hitting prefixes in the updated prefix list; and the optimization unit is further configured to: and optimizing the data storage structure of the Redis database according to the number and the storage space of the keywords of the prefixes in the prefix list after the prefixes are hit and updated.
In some optional implementations of this embodiment, the analysis unit is further configured to: reading a preset number of keywords from a Redis database and performing an analysis step on each read keyword, and when the analysis step on the preset number of keywords is finished, taking prefixes with hit identifiers less than 2 times in a prefix list as independent prefixes and moving the prefixes to an independent hash table until all keywords in the Redis database are analyzed; the obtaining unit is further configured to: re-executing the analysis step on the keys corresponding to the independent prefixes to obtain an updated prefix list, and acquiring the number of keywords and the storage space of the prefixes in the updated prefix list; and the optimization unit is further configured to: and optimizing the data storage structure of the Redis database according to the number and the storage space of the keywords of the prefixes in the prefix list after the prefixes are hit and updated.
In some optional implementations of the embodiment, the reading, by the analysis unit, a preset number of keywords from the Redis database includes: the analysis unit is used for reading a preset number of keywords from the Redis instance and/or the Redis memory snapshot file.
In some optional implementations of the embodiment, the analyzing unit is configured to analyze the read keyword to obtain the word segmentation, and the analyzing unit is configured to: the analysis unit is used for analyzing the read keywords according to one or more of the following items to obtain participles: separators, case changes, and numeric letter changes.
In some optional implementations of this embodiment, the optimization module is further configured to: presenting options for optimizing a data storage structure to a user according to the number of keywords and the storage space of the prefixes in the hit prefix list; in response to receiving a user selection of an option, a data storage structure of the Redis database is optimized.
Those skilled in the art will appreciate that the above-described apparatus 500 for optimizing data storage structures for key-value stores also includes some other well-known structures, such as processors, memories, etc., which are not shown in fig. 5 in order to not unnecessarily obscure embodiments of the present disclosure.
It should be understood that the elements recited in apparatus 500 correspond to various steps in the methods described with reference to fig. 2, 3, and 4. Thus, the operations and features described above with respect to the method for optimizing a data storage structure of a key-value store database are equally applicable to the apparatus 500 and the units included therein, and will not be described again here. Corresponding elements in the apparatus 500 may cooperate with elements in the server to implement aspects of embodiments of the present application.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing a terminal device or server of an embodiment of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an analysis unit, an acquisition unit, and an optimization unit. Where the names of the units do not in some cases constitute a limitation of the unit itself, for example, an analysis unit may also be described as a "unit that reads a preset number of keywords from a database and performs an analysis step on each of the read keywords until all keywords in the database have been analyzed".
As another aspect, the present application also provides a non-volatile computer storage medium, which may be the non-volatile computer storage medium included in the apparatus in the above-described embodiments; or it may be a non-volatile computer storage medium that exists separately and is not incorporated into the terminal. The non-volatile computer storage medium stores one or more programs that, when executed by a device, cause the device to: reading a preset number of keywords from a database and performing an analysis step on each of the read keywords until all keywords in the database have been analyzed, the analysis step comprising: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last word segmentation to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in the prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single word segmentation, if so, executing an analysis step on the read next keyword, and if not, executing a detection step on the word segmentation in the predicted prefix; acquiring the number and storage space of keywords of prefixes in a hit prefix list; and optimizing the data storage structure of the database according to the number and the storage space of the keywords of the prefixes in the hit prefix list.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (14)

1. A method of optimizing a data storage structure of a key-value store, the method comprising:
reading a preset number of keywords from a database each time and performing an analysis step on each read keyword until all keywords in the database are read for analysis, wherein the analysis step comprises the following steps: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last participle to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in a prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single participle, if so, executing an analysis step on the read next keyword, and if not, executing the detection step on the participle in the predicted prefix;
acquiring the number and storage space of keywords hitting the prefixes in the prefix list;
and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the prefix list.
2. The method of claim 1, wherein the obtaining the number of keys and storage space that hit a prefix in the prefix list comprises: re-executing the analysis step on the keywords corresponding to the prefixes with the hit times less than 2 times in the prefix list to obtain an updated prefix list; acquiring the number and storage space of keywords hitting the prefixes in the updated prefix list; and
the optimizing a data storage structure of a database according to the number of keywords and the storage space of the prefixes in the prefix list comprises: and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the updated prefix list.
3. The method of claim 1, wherein reading a preset number of keywords from the database and performing the analyzing step on each of the read keywords until all keywords in the database have been analyzed comprises: reading a preset number of keywords from a database and performing an analysis step on each read keyword, and when the analysis step on the preset number of keywords is finished, taking prefixes with hit identifiers less than 2 times in the prefix list as independent prefixes and moving the prefixes to an independent hash table until all keywords in the analyzed database are obtained;
the obtaining the number and the storage space of the keywords hitting the prefixes in the prefix list comprises: re-executing the analysis step on the key corresponding to the independent prefix to obtain an updated prefix list, and acquiring the number and storage space of keywords hitting the prefixes in the updated prefix list; and
the optimizing a data storage structure of a database according to the number of keywords and the storage space of the prefixes in the prefix list comprises: and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the updated prefix list.
4. A method according to any of claims 1-3, wherein the database comprises instance and/or in-memory snapshot files.
5. The method according to any one of claims 1-3, wherein the analyzing the read keywords for word segmentation comprises:
analyzing the read keywords to obtain word segmentation according to one or more of the following items: separators, case changes, and numeric letter changes.
6. The method according to any one of claims 1 to 3, wherein optimizing a data storage structure of a database according to the number of keywords and the storage space of the keywords hitting the prefixes in the prefix list comprises:
presenting options for optimizing a data storage structure to a user according to the number and storage space of keywords hitting the prefixes in the prefix list;
optimizing a data storage structure of the database in response to receiving a user selection of the option.
7. An apparatus for optimizing a data storage structure of a key-value store, the apparatus comprising:
an analysis unit, configured to read a preset number of keywords from a database each time and perform an analysis step on each read keyword until all keywords in the database are read for analysis, where the analysis step includes: analyzing the read keywords to obtain participles, and executing a detection step on the participles, wherein the detection step comprises the following steps: deleting the last participle to obtain a predicted prefix, identifying whether the predicted prefix hits the prefix in a prefix list, if so, adding 1 to the hit times of the prefixes in the prefix list, if not, adding the predicted prefix to the prefix list, identifying whether the predicted prefix is a single participle, if so, executing an analysis step on the read next keyword, and if not, executing the detection step on the participle in the predicted prefix;
an obtaining unit, configured to obtain the number of keywords and a storage space of prefixes in the prefix list;
and the optimization unit is used for optimizing a data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the prefix list.
8. The apparatus of claim 7, wherein the obtaining unit is further configured to: re-executing the analysis step on the keywords corresponding to the prefixes with the hit times less than 2 times in the prefix list to obtain an updated prefix list; acquiring the number and storage space of keywords hitting the prefixes in the updated prefix list; and
the optimization unit is further configured to: and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the updated prefix list.
9. The apparatus of claim 7, wherein the analysis unit is further configured to: reading a preset number of keywords from a database and performing an analysis step on each read keyword, and when the analysis step on the preset number of keywords is finished, taking prefixes with hit identifiers less than 2 times in the prefix list as independent prefixes and moving the prefixes to an independent hash table until all keywords in the analyzed database are obtained;
the acquisition unit is further configured to: re-executing the analysis step on the key corresponding to the independent prefix to obtain an updated prefix list, and acquiring the number and storage space of keywords hitting the prefixes in the updated prefix list; and
the optimization unit is further configured to: and optimizing the data storage structure of the database according to the number and the storage space of the keywords hitting the prefixes in the updated prefix list.
10. The apparatus according to any one of claims 7-9, wherein the analyzing unit is configured to read a preset number of keywords from the database, and comprises: the analysis unit is used for reading a preset number of keywords from the instance and/or the memory snapshot file.
11. The apparatus according to any one of claims 7-9, wherein the analyzing unit is configured to analyze the read keywords to obtain word segments including:
the analysis unit is used for analyzing the read keywords according to one or more of the following items to obtain word segmentation: separators, case changes, and numeric letter changes.
12. The apparatus of any of claims 7-9, wherein the optimization module is further configured to:
presenting options for optimizing a data storage structure to a user according to the number and storage space of keywords hitting the prefixes in the prefix list;
optimizing a data storage structure of the database in response to receiving a user selection of the option.
13. An apparatus for optimizing a data storage structure of a key-value store, comprising:
a memory; and a processor coupled to the memory, the processor configured to perform the method of optimizing a data storage structure of a key-value store of any of claims 1-6 based on instructions stored in the memory.
14. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement a method of optimizing a data storage structure of a key-value store according to any one of claims 1 to 6.
CN201610305828.7A 2016-05-10 2016-05-10 Method and device for optimizing data storage structure of key value database Active CN107357794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610305828.7A CN107357794B (en) 2016-05-10 2016-05-10 Method and device for optimizing data storage structure of key value database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610305828.7A CN107357794B (en) 2016-05-10 2016-05-10 Method and device for optimizing data storage structure of key value database

Publications (2)

Publication Number Publication Date
CN107357794A CN107357794A (en) 2017-11-17
CN107357794B true CN107357794B (en) 2020-06-05

Family

ID=60271375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610305828.7A Active CN107357794B (en) 2016-05-10 2016-05-10 Method and device for optimizing data storage structure of key value database

Country Status (1)

Country Link
CN (1) CN107357794B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086002A (en) * 2018-06-28 2018-12-25 平安科技(深圳)有限公司 Space management, device, computer installation and the storage medium of storage object
CN110413546B (en) * 2019-06-19 2024-03-12 平安科技(深圳)有限公司 Redis-based data storage method, device and computer readable storage medium
CN111125095B (en) * 2019-11-26 2023-11-10 北京文渊佳科技有限公司 Method, device, electronic equipment and medium for adding data prefix
CN111339736B (en) * 2020-02-18 2023-09-01 江苏满运软件科技有限公司 Method for adding prefix name, configuration acquisition method, device and electronic equipment
CN112148554A (en) * 2020-09-14 2020-12-29 北京金和网络股份有限公司 Method and device for calculating occupation size of redis service data in real time

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101282313A (en) * 2008-05-22 2008-10-08 北京航空航天大学 Electronic mail system for electric conference accessory system
CN105373541A (en) * 2014-08-22 2016-03-02 博雅网络游戏开发(深圳)有限公司 Processing method and system for data operation request of database
CN105491116A (en) * 2015-11-26 2016-04-13 广州华多网络科技有限公司 Cross-window data submitting method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9317536B2 (en) * 2010-04-27 2016-04-19 Cornell University System and methods for mapping and searching objects in multidimensional space
US9846642B2 (en) * 2014-10-21 2017-12-19 Samsung Electronics Co., Ltd. Efficient key collision handling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101282313A (en) * 2008-05-22 2008-10-08 北京航空航天大学 Electronic mail system for electric conference accessory system
CN105373541A (en) * 2014-08-22 2016-03-02 博雅网络游戏开发(深圳)有限公司 Processing method and system for data operation request of database
CN105491116A (en) * 2015-11-26 2016-04-13 广州华多网络科技有限公司 Cross-window data submitting method and system

Also Published As

Publication number Publication date
CN107357794A (en) 2017-11-17

Similar Documents

Publication Publication Date Title
CN107357794B (en) Method and device for optimizing data storage structure of key value database
US9798831B2 (en) Processing data in a MapReduce framework
US20070136274A1 (en) System of effectively searching text for keyword, and method thereof
US11907659B2 (en) Item recall method and system, electronic device and readable storage medium
US8214411B2 (en) Atomic deletion of database data categories
CN108984553B (en) Caching method and device
US10346496B2 (en) Information category obtaining method and apparatus
CN110069698B (en) Information pushing method and device
KR20200003164A (en) Database synchronization
CN112364014B (en) Data query method, device, server and storage medium
CN113660541A (en) News video abstract generation method and device
US9213759B2 (en) System, apparatus, and method for executing a query including boolean and conditional expressions
CN108011936B (en) Method and device for pushing information
WO2018054352A1 (en) Item set determination method, apparatus, processing device, and storage medium
US10372299B2 (en) Preserve input focus in virtualized dataset
CN116226681B (en) Text similarity judging method and device, computer equipment and storage medium
CN107908724B (en) Data model matching method, device, equipment and storage medium
US9286348B2 (en) Dynamic search system
CN115328898A (en) Data processing method and device, electronic equipment and medium
CN115454971A (en) Data migration method and device, electronic equipment and storage medium
CN113836157A (en) Method and device for acquiring incremental data of database
CN110633430B (en) Event discovery method, apparatus, device, and computer-readable storage medium
CN113268987B (en) Entity name recognition method and device, electronic equipment and storage medium
CN112148841B (en) Object classification and classification model construction method and device
CN111400342A (en) Database updating method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant