CN113342813A - Key value data processing method and device, computer equipment and readable storage medium - Google Patents

Key value data processing method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN113342813A
CN113342813A CN202110639861.4A CN202110639861A CN113342813A CN 113342813 A CN113342813 A CN 113342813A CN 202110639861 A CN202110639861 A CN 202110639861A CN 113342813 A CN113342813 A CN 113342813A
Authority
CN
China
Prior art keywords
data
key
value data
key value
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110639861.4A
Other languages
Chinese (zh)
Other versions
CN113342813B (en
Inventor
顾凌云
郭志攀
王伟
张爱平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Bingjian Information Technology Co ltd
Original Assignee
Nanjing Bingjian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Bingjian Information Technology Co ltd filed Critical Nanjing Bingjian Information Technology Co ltd
Priority to CN202110639861.4A priority Critical patent/CN113342813B/en
Publication of CN113342813A publication Critical patent/CN113342813A/en
Application granted granted Critical
Publication of CN113342813B publication Critical patent/CN113342813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables

Abstract

The application discloses a key value data processing method, a key value data processing device, computer equipment and a readable storage medium, wherein the key value data processing method comprises the following steps: acquiring a key value data set to be processed, and calculating absolute values of a plurality of hash values of the key value data set to be processed, wherein the key value data set to be processed comprises a plurality of key value data to be processed, and the key value data to be processed corresponds to the hash values; aggregating the key value data sets to be processed according to the absolute values, and merging the key value data sets to be processed with the same hash value; compressing the aggregated key value data set to be processed to obtain a compressed key value data set; the compressed key value data set is written into the file, the hash value is used as the index item of the file, and compared with the prior art that the storage of the big data is realized only by using the key value pair, the big data can be conveniently stored by using the hash value as the storage query basis through the scheme, and meanwhile, the efficiency of subsequently querying the data is improved.

Description

Key value data processing method and device, computer equipment and readable storage medium
Technical Field
The present application relates to the field of big data processing, and in particular, to a key value data processing method and apparatus, a computer device, and a readable storage medium.
Background
The development of big data enhances the processing efficiency of various services, and in the related art, the key value pair search refers to a process of querying a corresponding value through a given key. The data is processed by using the professional knowledge and skill of the user, and the value corresponding to the key can be found more quickly. In the industry, it is common to use a database to store key-value pairs, and when looking up a value, SQL (structured query language) is used to look up the value corresponding to a given key from the database. When the data volume is small, the scheme can achieve good performance effect, but when the data volume is larger and larger, more and more problems occur, such as defects of slow query speed, downtime of a database, long program waiting time and the like.
Disclosure of Invention
The application aims to provide a key value data processing method and device, computer equipment and a readable storage medium.
In a first aspect, an embodiment of the present application provides a key value data processing method, including:
acquiring a key value data set to be processed, and calculating absolute values of a plurality of hash values of the key value data set to be processed, wherein the key value data set to be processed comprises a plurality of key value data to be processed, and the key value data to be processed corresponds to the hash values;
aggregating the key value data sets to be processed according to the absolute values, and merging the key value data sets to be processed with the same hash value;
compressing the aggregated key value data set to be processed to obtain a compressed key value data set;
and writing the compressed key value data set into a file, and taking the hash value as an index entry of the file.
In one possible embodiment, the file includes an index file and a data file, the writing of the compressed key value data set into the file and the taking of the hash value as an index entry of the file includes:
writing data included in the compressed key value data set into a data file;
writing the hash value into an index file, and constructing an index item by taking the first hash value of data included in the compressed key value data set as the file name of the file;
and dropping the files into the disk, and fixing the number of the files stored in the disk.
In one possible embodiment, writing data included in a compressed key-value data set to a data file includes:
and writing the data included in the compressed key value data set into the data file in an adding mode according to the sequence of the hash value of the data included in the compressed key value data set, and acquiring the starting offset of the data included in the compressed key value data set and the length of the compressed data.
In one possible embodiment, writing the hash value to the index file includes:
adding the starting offset of the data included in the compressed key value data set and the length of the compressed data into an index file to obtain a plurality of index items, wherein the length of each index item is fixed.
In one possible embodiment, the method further comprises:
acquiring key data to be queried, and calculating a hash value corresponding to the key data;
determining a corresponding index item according to the hash value;
positioning the index item to the position in the file, and reading the data information of a preset number of bytes backwards;
and positioning the starting offset corresponding to the key data according to the data information, and reading the preset byte length backwards to obtain the value data corresponding to the key data.
In a possible implementation manner, locating a starting offset corresponding to the key data according to the data information, and reading back a preset byte length to obtain value data corresponding to the key data includes:
decompressing data corresponding to the data information;
acquiring decompressed data corresponding to the key data according to the starting offset and the preset byte length corresponding to the key data;
encrypting the decompressed data according to a preset key encryption rule to obtain an encryption result;
if the encryption result is matched with the key data, taking the decompressed data as the value data corresponding to the key data;
and if the encryption result is not matched with the key data, returning to execute the step of obtaining the decompressed data corresponding to the key data according to the starting offset and the preset byte length corresponding to the key data until traversing the data corresponding to the data information.
In one possible implementation, before locating the position of the index entry in the file, the method includes:
acquiring index item information of an index item;
judging whether a preset ending byte of the index item information is 0 or not;
if so, ending the query process, and displaying that the value data corresponding to the key data does not exist;
if not, executing the step of positioning the position of the index item in the file.
In a second aspect, an embodiment of the present application provides a key-value data processing apparatus, including:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a key value data set to be processed and calculating absolute values of a plurality of hash values of the key value data set to be processed, the key value data set to be processed comprises a plurality of key value data to be processed, and the key value data to be processed corresponds to the hash values;
the aggregation module is used for aggregating the key value data sets to be processed according to the absolute values and merging the key value data sets to be processed with the same hash value in the key value data sets to be processed;
the compression module is used for compressing the aggregated key value data set to be processed to obtain a compressed key value data set;
and the storage module is used for writing the compressed key value data set into a file and taking the hash value as an index item of the file.
In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a nonvolatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device executes the key-value data processing method in at least one possible implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present application provides a readable storage medium, where the readable storage medium includes a computer program, and the computer program, when running, controls a computer device in which the readable storage medium is located to execute the key value data processing method in any one of the at least one possible implementation manners of the first aspect.
Compared with the prior art, the beneficial effects provided by the application comprise: the application discloses a key value data processing method, a key value data processing device, computer equipment and a readable storage medium, wherein the key value data processing method comprises the following steps: acquiring a key value data set to be processed, and calculating absolute values of a plurality of hash values of the key value data set to be processed, wherein the key value data set to be processed comprises a plurality of key value data to be processed, and the key value data to be processed corresponds to the hash values; aggregating the key value data sets to be processed according to the absolute values, and merging the key value data sets to be processed with the same hash value; compressing the aggregated key value data set to be processed to obtain a compressed key value data set; the compressed key value data set is written into the file, the hash value is used as the index item of the file, and compared with the prior art that the storage of the big data is realized only by using the key value pair, the big data can be conveniently stored by using the hash value as the storage query basis through the scheme, and meanwhile, the efficiency of subsequently querying the data is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only certain embodiments of the application and are therefore not to be considered limiting of its scope. For a person skilled in the art, it is possible to derive other relevant figures from these figures without inventive effort.
Fig. 1 is a schematic flowchart illustrating steps of a key value data processing method according to an embodiment of the present application;
fig. 2 is a schematic block diagram of a key value data processing apparatus according to an embodiment of the present application;
fig. 3 is a block diagram schematically illustrating a structure of a computer device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
In the description of the present application, it is also to be noted that, unless otherwise explicitly stated or limited, the terms "disposed" and "connected" are to be interpreted broadly, for example, "connected" may be a fixed connection, a detachable connection, or an integral connection; can be mechanically or electrically connected; the connection may be direct or indirect via an intermediate medium, and may be a communication between the two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
The following detailed description of embodiments of the present application will be made with reference to the accompanying drawings.
In order to solve the technical problem in the foregoing background art, fig. 1 is a schematic flow chart of a key value data processing method provided in an embodiment of the present application, and the key value data processing method is described in detail below.
Step S201, a to-be-processed key value data set is obtained, and absolute values of a plurality of hash values of the to-be-processed key value data set are calculated.
The key value data set to be processed comprises a plurality of key value data to be processed, and the key value data to be processed corresponds to the hash value.
Step S202, according to the absolute value, the key value data sets to be processed are aggregated, and the key value data sets to be processed with the same hash value are merged.
And step S203, compressing the aggregated key value data set to be processed to obtain a compressed key value data set.
And step S204, writing the compressed key value data set into a file, and taking the hash value as an index item of the file.
It should be understood that, in the field of large data, data is stored in a database, the amount of data stored in the database may be in trillion levels, and the storage mode is generally stored in a key-value pair mode, that is, a user acquires a key of required data and determines a value corresponding to the key, so that when acquiring a key-value set to be processed, the key-value set to be processed is not directly taken from the database, but is generated according to a preset key-value rule. For example, the preset key value rule may be an identity card rule, all identity cards may be generated from the database according to the rule, the preset key value rule may also be a mobile phone number rule, all mobile phone numbers may be generated from the database according to the rule, and the method is not limited herein.
In the embodiment of the application, the hash value of each data included in the key value data set to be processed can be calculated, and the absolute value of each hash value can be further calculated. And aggregating each data contained in the key value data set to be processed based on each hash value absolute value, and merging the data with the same hash value to obtain a compressed key value data set.
In a possible implementation, the files include an index file and a data file, and the foregoing step S204 can be implemented by performing the following detailed steps.
Substep S204-1 writes the data comprised by the compressed key-value data set into the data file.
And a substep S204-2, writing the hash value into the index file, and constructing an index item by taking the first hash value of the data included in the compressed key value data set as the file name of the file.
And a substep S204-3, falling the file into the disk, and fixing the number of the files stored in the disk.
The compressed key value data set can be written into an index file and a data file, the file name takes the hash value of the numerical value in the data file corresponding to the first index item of the current file as the file name, the index file is marked by an index suffix, and the data file is marked by a data suffix. And the files fall into a disk, the number of the files is fixed so as to ensure that the maximum number of keys stored in each file is ensured, the file names of the data files and the index files are the same, and the data files and the index files are distinguished by using suffixes.
In a possible implementation, the aforementioned sub-step S204-1 may be implemented by the following steps.
(1) And writing the data included in the compressed key value data set into the data file in an adding mode according to the sequence of the hash value of the data included in the compressed key value data set, and acquiring the starting offset of the data included in the compressed key value data set and the length of the compressed data.
In the embodiment of the present application, the start offset (data position) of the data included in the compressed key-value data set may be the aforementioned hash value, the byte length of which may be 8 bytes, and the data length (datalength) of the data after data compression may be 4 bytes.
In a possible embodiment, the aforementioned sub-step S204-2 may be implemented by the following embodiments.
(1) Adding the starting offset of the data included in the compressed key value data set and the length of the compressed data into an index file to obtain a plurality of index items, wherein the length of each index item is fixed.
On the basis of the foregoing, each indexing item is composed of a start offset and a data length after data compression, and thus the length of each indexing item may be 12 bytes. It should be appreciated that both the index file and the data file are appended without line breaks, which makes the data more compact.
In addition to the foregoing scheme of storing key value data, the embodiments of the present application also provide the following corresponding scheme of searching for key value data.
Step S205, obtaining key data to be queried, and calculating a hash value corresponding to the key data.
And step S206, determining a corresponding index item according to the hash value.
Step S207, locating the position of the index entry in the file, and reading the data information of the preset number of bytes backwards.
And S208, positioning the starting offset corresponding to the key data according to the data information, and reading the preset byte length backwards to obtain the value data corresponding to the key data.
In the embodiment of the present application, the manner of obtaining the data of the key to be queried may be obtained from a text file, a database, an HTTP interface, and the like. A hash value of the data may be calculated. Using the hash value to find out in which index file in the disk the index entry corresponding to the key data is stored, the index file location code may refer to the following example: "(hashCode/fileMaxCount) × fileMaxCount", wherein hashCode is the hash value of the key to be searched, and fileMaxCount is the maximum number of index entries stored in each index file. After the index file is found, the position (position) of the index item recorded by the current hash value in the file is specified, and then 12 bytes of data (a preset number of bytes) are read backwards, and the content data (data information) is the information of the data corresponding to the key to be searched in the data file. Data file information location can be realized by the following codes: "(hashCode-fileName) × 12", where hashCode is the hash value of the key to be looked up and ileName is the index file name found in the previous step.
In a possible embodiment, the foregoing step S208 can be implemented by the following detailed embodiment.
And a substep S208-1 of decompressing the data corresponding to the data information.
And a substep S208-2, obtaining the decompressed data corresponding to the key data according to the starting offset and the preset byte length corresponding to the key data.
And a substep S208-3, encrypting the decompressed data according to a preset key encryption rule to obtain an encryption result.
In the substep S208-4, if the encryption result matches the key data, the decompressed data is used as the value data corresponding to the key data.
And a substep S208-5, if the encryption result is not matched with the key data, returning to execute the step of obtaining the decompressed data corresponding to the key data according to the starting offset and the preset byte length corresponding to the key data until traversing the data corresponding to the data information.
In the embodiment of the present application, the obtained data may be decompressed, the decompressed data may be obtained according to a fixed length, the obtained data may be encrypted according to an encryption method of a key, the encrypted data and the key are matched, and if the encrypted data and the key are matched, the encrypted data is found and iteration is not performed on the remaining data. Otherwise, continuing to execute the step S208-2 to the step S208-4 in a backward iteration mode, and if the key is not matched after the whole data is iterated, the corresponding value data of the corresponding key is not found.
In a possible embodiment, before step S207 is executed, the following steps may also be executed.
In step S209, index item information of the index item is acquired.
In step S210, it is determined whether the predetermined end byte of the index information is 0.
And step S211, if yes, finishing the query process and displaying that the value data corresponding to the key data does not exist.
Step S212, if not, executing the step of positioning to the position of the index item in the file.
In order to improve the efficiency of data processing, after the position information of the data in the data file is found, according to 4 bytes (9 th-12 th bytes) at the end of the index item information, the following selections are made: if the data length is 0, the data corresponding to the hash value does not exist, the searching process is ended, and the information of the value corresponding to the key does not exist is directly responded. If the data length is not 0, the data corresponding to the hash value exists, and the next operation is continuously executed.
Through the steps, the data are divided into the index file and the data file by utilizing a pre-calculation mode in advance, and the access range and the access time of the data are reduced. The method can ensure high query speed and short response time under the condition of reducing computer resource consumption, and because the method utilizes the characteristic of Hash of data to position the file position of the fast positioning data during searching, the whole process only relates to 2 times of file IO (Input/Output) operation, no redundant invalid operation exists, and the necessity of traversing a large amount of data does not exist, so that the speed is higher on the whole, the efficiency is higher, and moreover, through the whole searching process, invalid data access is reduced, the searching process is simplified, the data storage cost is reduced, and the response speed of a program is improved. The problem that in the related technology, when a database program is used for inquiring values, the database is required to quickly inquire the values corresponding to the corresponding keys because the inquiry is required to have quick response in a service scene, and generally, the requirement on the storage space of the database is high due to large data volume, so that a large amount of inquiry time and calculation resources are consumed for the inquiry; and when the method for searching the value by using the database program is used, the database query cannot be accurately positioned to the position of the data (the indexes are established at certain intervals), and if the queried data is not in the database table, the problem that all a large amount of data can be traversed is solved.
Please refer to fig. 2 in conjunction, the key-value data processing apparatus 110 includes:
the obtaining module 1101 is configured to obtain a to-be-processed key value data set, and calculate absolute values of a plurality of hash values of the to-be-processed key value data set, where the to-be-processed key value data set includes a plurality of to-be-processed key value data, and the to-be-processed key value data corresponds to the hash values.
The aggregating module 1102 is configured to aggregate the to-be-processed key value data sets according to the absolute values, and merge the to-be-processed key value data having the same hash value in the to-be-processed key value data sets.
A compressing module 1103, configured to compress the aggregated to-be-processed key value data set to obtain a compressed key value data set.
And the storage module 1104 is configured to write the compressed key value data set into a file, and use the hash value as an index entry of the file.
In a possible implementation, the file includes an index file and a data file, and the storage module 1104 is specifically configured to:
writing data included in the compressed key value data set into a data file; writing the hash value into an index file, and constructing an index item by taking the first hash value of data included in the compressed key value data set as the file name of the file; and dropping the files into the disk, and fixing the number of the files stored in the disk.
In a possible implementation, the storage module 1104 is further specifically configured to:
and writing the data included in the compressed key value data set into the data file in an adding mode according to the sequence of the hash value of the data included in the compressed key value data set, and acquiring the starting offset of the data included in the compressed key value data set and the length of the compressed data.
In a possible implementation, the storage module 1104 is further specifically configured to:
adding the starting offset of the data included in the compressed key value data set and the length of the compressed data into an index file to obtain a plurality of index items, wherein the length of each index item is fixed.
In one possible embodiment, the value data processing apparatus further comprises a lookup module 1105, the lookup module 1105 being configured to:
acquiring key data to be queried, and calculating a hash value corresponding to the key data; determining a corresponding index item according to the hash value; positioning the index item to the position in the file, and reading the data information of a preset number of bytes backwards; and positioning the starting offset corresponding to the key data according to the data information, and reading the preset byte length backwards to obtain the value data corresponding to the key data.
In a possible implementation, the lookup module 1105 is specifically configured to:
decompressing data corresponding to the data information; acquiring decompressed data corresponding to the key data according to the starting offset and the preset byte length corresponding to the key data; encrypting the decompressed data according to a preset key encryption rule to obtain an encryption result; if the encryption result is matched with the key data, taking the decompressed data as the value data corresponding to the key data; and if the encryption result is not matched with the key data, returning to execute the step of obtaining the decompressed data corresponding to the key data according to the starting offset and the preset byte length corresponding to the key data until traversing the data corresponding to the data information.
In one possible implementation, the lookup module 1105 is further configured to:
acquiring index item information of an index item; judging whether a preset ending byte of the index item information is 0 or not; if so, ending the query process, and displaying that the value data corresponding to the key data does not exist; if not, executing the step of positioning the position of the index item in the file.
It should be noted that, for the implementation principle of the key value data processing apparatus 110, reference may be made to the implementation principle of the key value data processing method, which is not described herein again. It should be understood that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the obtaining module 1101 may be a processing element separately set up, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the processing element of the apparatus calls and executes the functions of the obtaining module 1101. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).
The embodiment of the present application provides a computer device 100, where the computer device 100 includes a processor and a non-volatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device 100 executes the key-value data processing apparatus 110. As shown in fig. 3, fig. 3 is a block diagram of a computer device 100 according to an embodiment of the present disclosure. The computer device 100 includes a key-value data processing apparatus 110, a memory 111, a processor 112, and a communication unit 113.
To facilitate the transfer or interaction of data, the elements of the memory 111, the processor 112 and the communication unit 113 are electrically connected to each other, directly or indirectly. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The key-value data processing means 110 includes at least one software functional module that can be stored in the memory 111 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the computer device 100. The processor 112 is used for executing the key-value data processing apparatus 110 stored in the memory 111, such as a software functional module and a computer program included in the key-value data processing apparatus 110.
An embodiment of the present application provides a readable storage medium, where the readable storage medium includes a computer program, and the computer program controls, when running, a computer device where the readable storage medium is located to execute the foregoing key value data processing method.
In summary, the present application discloses a key value data processing method, device, computer device and readable storage medium, including: acquiring a key value data set to be processed, and calculating absolute values of a plurality of hash values of the key value data set to be processed, wherein the key value data set to be processed comprises a plurality of key value data to be processed, and the key value data to be processed corresponds to the hash values; aggregating the key value data sets to be processed according to the absolute values, and merging the key value data sets to be processed with the same hash value; compressing the aggregated key value data set to be processed to obtain a compressed key value data set; the compressed key value data set is written into the file, the hash value is used as the index item of the file, and compared with the prior art that the storage of the big data is realized only by using the key value pair, the big data can be conveniently stored by using the hash value as the storage query basis through the scheme, and meanwhile, the efficiency of subsequently querying the data is improved.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the application to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical application, to thereby enable others skilled in the art to best utilize the application and various embodiments with various modifications as are suited to the particular use contemplated. The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the application to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical application, to thereby enable others skilled in the art to best utilize the application and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (10)

1. A key-value data processing method, comprising:
acquiring a key value data set to be processed, and calculating absolute values of a plurality of hash values of the key value data set to be processed, wherein the key value data set to be processed comprises a plurality of key value data to be processed, and the key value data to be processed corresponds to the hash values;
according to the absolute value, the key value data sets to be processed are aggregated, and the key value data sets to be processed with the same hash value are merged;
compressing the aggregated key value data set to be processed to obtain a compressed key value data set;
and writing the compressed key value data set into a file, and taking the hash value as an index item of the file.
2. The method of claim 1, wherein the files include an index file and a data file, and wherein writing the compressed key value data set to a file and taking the hash value as an index entry of the file comprises:
writing data included in the compressed key value data set into a data file;
writing the hash value into the index file, and constructing the index item by taking the first hash value of the data included in the compressed key value data set as the file name of the file;
and dropping the files into a disk, and fixing the number of the files stored in the disk.
3. The method of claim 2, wherein writing data included in the compressed key-value data set to a data file comprises:
and writing the data included in the compressed key value data set into a data file in an adding mode according to the sequence of the hash value of the data included in the compressed key value data set, and acquiring the starting offset of the data included in the compressed key value data set and the length of the compressed data.
4. The method of claim 3, wherein writing the hash value to the index file comprises:
and adding a starting offset of data included in the compressed key value data set and the length of the compressed data into the index file to obtain a plurality of index items, wherein the length of each index item is fixed.
5. The method of claim 1, further comprising:
acquiring key data to be queried, and calculating the hash value corresponding to the key data;
determining the corresponding index item according to the hash value;
positioning the index item to the position of the index item in the file, and reading the data information of a preset number of bytes backwards;
and positioning to the starting offset corresponding to the key data according to the data information, and reading the preset byte length backwards to obtain the value data corresponding to the key data.
6. The method according to claim 5, wherein the locating to a starting offset corresponding to the key data according to the data information and reading back a preset byte length to obtain value data corresponding to the key data comprises:
decompressing the data corresponding to the data information;
acquiring decompressed data corresponding to the key data according to the starting offset corresponding to the key data and the preset byte length;
encrypting the decompressed data according to a preset key encryption rule to obtain an encryption result;
if the encryption result is matched with the key data, taking the decompressed data as the value data corresponding to the key data;
and if the encryption result is not matched with the key data, returning to execute the step of obtaining the decompressed data corresponding to the key data according to the starting offset corresponding to the key data and the preset byte length until traversing the data corresponding to the data information.
7. The method of claim 5, wherein prior to said locating the position of the indexing item in the file, comprising:
acquiring index item information of the index item;
judging whether a preset ending byte of the index item information is 0 or not;
if so, ending the query process, and displaying that the value data corresponding to the key data does not exist;
if not, the step of positioning the position of the index item in the file is executed.
8. A key-value data processing apparatus characterized by comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a key value data set to be processed and calculating absolute values of a plurality of hash values of the key value data set to be processed, the key value data set to be processed comprises a plurality of key value data to be processed, and the key value data to be processed corresponds to the hash values;
the aggregation module is used for aggregating the key value data sets to be processed according to the absolute values and merging the key value data sets to be processed with the same hash value in the key value data sets to be processed;
the compression module is used for compressing the aggregated key value data set to be processed to obtain a compressed key value data set;
and the storage module is used for writing the compressed key value data set into a file and taking the hash value as an index item of the file.
9. A computer device comprising a processor and a non-volatile memory storing computer instructions that, when executed by the processor, perform the key-value data processing method of any one of claims 1-7.
10. A readable storage medium, characterized in that the readable storage medium comprises a computer program which, when running, controls a computer device on which the readable storage medium is located to perform the key-value data processing method of any one of claims 1-7.
CN202110639861.4A 2021-06-09 2021-06-09 Key value data processing method, device, computer equipment and readable storage medium Active CN113342813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110639861.4A CN113342813B (en) 2021-06-09 2021-06-09 Key value data processing method, device, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110639861.4A CN113342813B (en) 2021-06-09 2021-06-09 Key value data processing method, device, computer equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN113342813A true CN113342813A (en) 2021-09-03
CN113342813B CN113342813B (en) 2024-01-26

Family

ID=77475445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110639861.4A Active CN113342813B (en) 2021-06-09 2021-06-09 Key value data processing method, device, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113342813B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115883545A (en) * 2023-02-15 2023-03-31 江西飞尚科技有限公司 High-frequency data transmission method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673307A (en) * 2009-10-21 2010-03-17 中国农业大学 Space data index method and system
CN103593436A (en) * 2013-11-12 2014-02-19 华为技术有限公司 File merging method and device
CN106547755A (en) * 2015-09-17 2017-03-29 北京国双科技有限公司 A kind of data processing method and device based on piece key
CN107870970A (en) * 2017-09-06 2018-04-03 北京理工大学 A kind of data store query method and system
CN109885535A (en) * 2019-01-04 2019-06-14 平安科技(深圳)有限公司 A kind of method and relevant apparatus of file storage
CN110532284A (en) * 2019-09-06 2019-12-03 深圳前海环融联易信息科技服务有限公司 Mass data storage and search method, device, computer equipment and storage medium
CN111177476A (en) * 2019-12-05 2020-05-19 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium
CN111400308A (en) * 2020-02-21 2020-07-10 中国平安财产保险股份有限公司 Processing method of cache data, electronic device and readable storage medium
CN112486994A (en) * 2020-11-30 2021-03-12 武汉大学 Method for quickly reading data of key value storage based on log structure merging tree

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101673307A (en) * 2009-10-21 2010-03-17 中国农业大学 Space data index method and system
CN103593436A (en) * 2013-11-12 2014-02-19 华为技术有限公司 File merging method and device
CN106547755A (en) * 2015-09-17 2017-03-29 北京国双科技有限公司 A kind of data processing method and device based on piece key
CN107870970A (en) * 2017-09-06 2018-04-03 北京理工大学 A kind of data store query method and system
CN109885535A (en) * 2019-01-04 2019-06-14 平安科技(深圳)有限公司 A kind of method and relevant apparatus of file storage
CN110532284A (en) * 2019-09-06 2019-12-03 深圳前海环融联易信息科技服务有限公司 Mass data storage and search method, device, computer equipment and storage medium
CN111177476A (en) * 2019-12-05 2020-05-19 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium
CN111400308A (en) * 2020-02-21 2020-07-10 中国平安财产保险股份有限公司 Processing method of cache data, electronic device and readable storage medium
CN112486994A (en) * 2020-11-30 2021-03-12 武汉大学 Method for quickly reading data of key value storage based on log structure merging tree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PANDIAN RAJU等: "PebblesDB:Building Key-Value Stores using Fragmented Log-Structured Merge Trees", 《SOSP 17》, pages 497 - 514 *
孙勇等: "面向云计算的键值型分布式存储系统研究", 《电子学报》, vol. 41, no. 7, pages 1 - 6 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115883545A (en) * 2023-02-15 2023-03-31 江西飞尚科技有限公司 High-frequency data transmission method and system

Also Published As

Publication number Publication date
CN113342813B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
CN108319654B (en) Computing system, cold and hot data separation method and device, and computer readable storage medium
WO2021091489A1 (en) Method and apparatus for storing time series data, and server and storage medium thereof
CN106874348B (en) File storage and index method and device and file reading method
CN107704202B (en) Method and device for quickly reading and writing data
US20240126817A1 (en) Graph data query
CN111858520A (en) Method and device for separately storing block link point data
CN108182221B (en) Data processing method and related equipment
CN110716965A (en) Query method, device and equipment in block chain type account book
CN113297269A (en) Data query method and device
US9213759B2 (en) System, apparatus, and method for executing a query including boolean and conditional expressions
CN116049109A (en) File verification method, system, equipment and medium based on filter
CN112084291A (en) Information recommendation method and device
CN113342813B (en) Key value data processing method, device, computer equipment and readable storage medium
CN111857574A (en) Write request data compression method, system, terminal and storage medium
CN111625600B (en) Data storage processing method, system, computer equipment and storage medium
CN115129791A (en) Data compression storage method, device and equipment
CN115203148A (en) Method and device for modifying file
CN109213972B (en) Method, device, equipment and computer storage medium for determining document similarity
CN107609038B (en) Data cleaning method and device
CN113849524A (en) Data processing method and device
CN114896276A (en) Data storage method and device, electronic equipment and distributed storage system
CN114816219A (en) Data writing and reading method and device and data reading and writing system
CN113297267A (en) Data caching and task processing method, device, equipment and storage medium
CN112667682A (en) Data processing method, data processing device, computer equipment and storage medium
CN112612865A (en) Document storage method and device based on elastic search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant