CN110727685A - Data compression method, equipment and storage medium based on Cassandra database - Google Patents

Data compression method, equipment and storage medium based on Cassandra database Download PDF

Info

Publication number
CN110727685A
CN110727685A CN201910955257.5A CN201910955257A CN110727685A CN 110727685 A CN110727685 A CN 110727685A CN 201910955257 A CN201910955257 A CN 201910955257A CN 110727685 A CN110727685 A CN 110727685A
Authority
CN
China
Prior art keywords
merging
merged
files
mergeable
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910955257.5A
Other languages
Chinese (zh)
Other versions
CN110727685B (en
Inventor
王文庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wave Intelligent Technology Co Ltd
Original Assignee
Suzhou Wave Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wave Intelligent Technology Co Ltd filed Critical Suzhou Wave Intelligent Technology Co Ltd
Priority to CN201910955257.5A priority Critical patent/CN110727685B/en
Publication of CN110727685A publication Critical patent/CN110727685A/en
Application granted granted Critical
Publication of CN110727685B publication Critical patent/CN110727685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Abstract

The invention discloses a data merging method based on a Cassandra database, which comprises the following steps: obtaining a predicted mergeable time period; the method comprises the steps that a plurality of SSTable files are merged into a plurality of sets according to the sizes of the SSTable files in a Cassandra database and preset rules; determining a plurality of sets which can be merged in the plurality of sets according to the mergeable time period; determining a plurality of SSTable files to be merged in each set of a plurality of sets according to the mergeable time periods and reading the SSTable files into a memory of a server; and respectively merging the SSTable files of each set in the memory. The invention also discloses a computer device and a readable storage medium. The method disclosed by the invention has the advantages that reasonable merging time is predicted, the user request access of the system is influenced to the minimum extent, the system idle time and idle resources are utilized to the maximum extent, the influence of the merging process on the user request is reduced to the minimum extent, and the final merging result is realized, so that the improvement of the reading performance can be optimized.

Description

Data compression method, equipment and storage medium based on Cassandra database
Technical Field
The invention relates to the field of databases, in particular to a data merging method, equipment and a storage medium based on a Cassandra database.
Background
Cassandra is a mixed type of non-relational database, which is ideal in terms of social networking cloud computing. For a write operation, Cassandra first records the data and operations submitted by the client into the commit log, which is to improve reliability (to play a role in data recovery). Cassandra then writes the data to memory table Memtable, where the data organized is sorted by key. Cassandra will not refresh the data in Memtable to a SSTable in bulk until the size of the data in Memtable reaches a certain limit.
SSTable is not alterable, but only read, once it completes writing. Therefore, for Cassandra, only sequential writes can be considered, and random write operations are not. Due to the read-only property of SSTable, data of the same Column Family may be stored in a plurality of SSTable, and if the data amount in one Column Family is large, Cassandra needs to read a plurality of SSTable and Memtable in a combining way, so that the query efficiency is seriously reduced.
To avoid the performance impact of large amounts of SSTable, Cassandra deals with the expanding SSTable over time by a mechanism called Compaction. Cassandra regularly merges multiple SStables into one SSTable.
Against this shortcoming, the Cassandra database itself provides two complete sets of merging mechanisms to improve.
1, Major compact: this is a manual intervention mechanism, mainly for the system administrator to perform the management operation. The mechanism is to combine all sstables. The mechanism is triggered by an administrator manual command, has the advantages of manual intervention, and can eliminate all redundant data by combining all SSTable data, so that the performance of the system is optimal. The disadvantage is that the merging time is long due to the large amount of merged files, and a large amount of system resources are occupied in the period.
Minor compact: this is a merging mechanism triggered automatically by the system, and can be executed by setting automatic triggering when the file redundancy reaches a certain amount. The merge operation does not merge all disk files, but selects only a portion to merge. The advantage is compared with manual merge, and this amalgamation process is lightweight, can promote the performance of system to a certain extent. The method has the disadvantages of automatic triggering and no manual intervention, and the execution will seriously affect the read-write response speed of the system when the system requests are intensive due to the occupation of more system resources.
In addition, the Major compact and the Minor compact have a common disadvantage that: the amount of merged data is too large, often exceeding the total capacity of the memory space, so that a large amount of disk memory data replacement is generated in the merging process, and the merging process time is prolonged. Meanwhile, the selection of the files participating in the merging is not optimal, that is, the SStable files with the same number are merged, and the effect of improving the system performance is not optimal. The merging process of the system is specifically as follows:
1. firstly, reading an SStable structure of a disk into a memory, and if the memory space is insufficient, performing replacement reading;
2. simultaneously performing merging operation in the reading process, and writing the merged operation into a temporary SStable file of a disk;
3, when the SStable files are completely merged, changing the written temporary file into a permanent file, and deleting all files participating in merging;
4. if the merging process is terminated due to power failure or human intervention and the like in the merging process, the system automatically performs rollback operation to delete the temporary file and simultaneously keeps the original file unchanged.
The memory read-in and merging process of the data needs to occupy a large amount of system resources, and for the write-in operation, the write-in operation is only to write the data into the memory space in sequence, and does not need too much system resource support, so that the data cannot be influenced too much. However, for the read operation, a large amount of memory data merging processes of disk memory data exchange and processor intervention are required in the read process, so that the system resource demand is large, and the influence of the merging processes is also the greatest.
The merging mechanism of the SStable is not designed reasonably, the merging process occupies a large amount of system resources, the merging start time is not reasonable, the merging period time is long, and therefore the system performance, especially the read operation performance, is seriously affected in the period.
Therefore, a data merging method based on the Cassandra database is urgently needed.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a data merging method based on a Cassandra database, including the steps of:
obtaining a predicted mergeable time period;
the method comprises the steps that a plurality of SSTable files of a Cassandra database are merged into a plurality of sets according to the sizes of the SSTable files and preset rules;
determining a plurality of sets which can be merged in the plurality of sets according to the mergeable time period;
determining a plurality of SSTable files to be merged in each set of the plurality of sets according to the mergeable time period and reading the SSTable files into a memory of a server;
and respectively merging the SSTable files of each set in the memory.
In some embodiments, further comprising:
and in response to the detection that the performance of the server cannot meet the read-write requirements of the user, terminating the merging, performing data rollback, and deleting the temporary merged file.
In some embodiments, obtaining the predicted mergeable time period further comprises:
and predicting a combinable time period according to the historical access rule of the database.
In some embodiments, obtaining the predicted mergeable time period further comprises:
circularly acquiring the environmental parameters of the server;
judging whether the server is in an idle state all the time within a preset time period according to the environmental parameters of the server;
predicting a mergeable time period in response to the server being in an idle state for the preset time.
In some embodiments, merging the plurality of SSTable files into a plurality of sets further comprises:
and the SSTable files in each set are arranged in a reverse order according to the creation time.
In some embodiments, determining and reading the number of SSTable files to be merged in each of the number of sets into a memory of a server according to the mergeable time period further comprises:
establishing a correlation function of the size of the merged file and the predicted merging time;
predicting the size of the merged file through the correlation function and the mergeable time period;
and determining a plurality of SSTable files to be merged in each set according to the size of the predicted merged file.
In some embodiments, determining a number of SSTable files to be merged in each set according to the predicted size of merged files further comprises:
the top SSTable files in each set are preferably selected.
In some embodiments, further comprising:
and controlling the total amount of the SSTable files to be merged in each set to be less than a threshold value.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program operable on the processor, wherein the processor when executing the program performs any of the steps of the Cassandra database-based data consolidation method described above.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program, which when executed by a processor performs the steps of any of the above-mentioned data merging methods based on the Cassandra database.
The invention has one of the following beneficial technical effects: the method disclosed by the invention has the advantages that reasonable merging time is predicted, the user request access of the system is influenced to the minimum extent, the system idle time and idle resources are utilized to the maximum extent, the influence of the merging process on the user request is reduced to the minimum extent, and the final merging result is realized, so that the improvement of the reading performance can be optimized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a data merging method based on a Cassandra database according to an embodiment of the present invention;
fig. 2 is a flowchart of a data merging method based on the Cassandra database according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
According to an aspect of the present invention, an embodiment of the present invention provides a data merging method based on a Cassandra database, as shown in fig. 1, which may include the steps of: s1, acquiring a predicted mergeable time period; s2, merging the SSTable files into a plurality of sets according to the sizes of the SSTable files in the Cassandra database and preset rules; s3, determining a plurality of sets which can be merged in the plurality of sets according to the mergeable time period; s4, determining a plurality of SSTable files to be merged in each of the plurality of sets according to the mergeable time period and reading the SSTable files into a memory of a server; s5, merging the SSTable files of each set in the memory respectively.
The method disclosed by the invention has the advantages that reasonable merging time is predicted, the user request access of the system is influenced to the minimum extent, the system idle time and idle resources are utilized to the maximum extent, the influence of the merging process on the user request is reduced to the minimum extent, and the final merging result is realized, so that the improvement of the reading performance can be optimized.
The following is a detailed description with reference to the flow chart of the data merging method based on the Cassandra database shown in fig. 2.
First, a predicted mergeable time period is obtained.
In some embodiments, obtaining the predicted mergeable time period may further include: circularly acquiring the environmental parameters of the server; judging whether the server is in an idle state all the time within a preset time period according to the environmental parameters of the server; predicting a mergeable time period in response to the server being in an idle state for the preset time.
Specifically, the system parameters can be obtained by monitoring the server state in real time to determine whether a large number of user read-write operations are currently performed and whether subsequent merging operations can be performed. If the requirement is met, starting to carry out merging operation; if the requirement is not met, continuing to circularly detect; if the processor and data bandwidth of the server remain at a low level for a certain time, the mergeable time period is predicted and the SStable merge is started.
It should be noted that the obtained system environment parameters refer to the CPU utilization and the memory bandwidth utilization of the system, where the CPU utilization may reflect the user request and the busy degree of the merging process on the whole system, and the memory bandwidth may also reflect the busy degree of data exchange between the disk and the memory to a great extent, so as to reflect the occupied amount of the memory bandwidth by the merging operation, especially the reading operation, and the bandwidth resource may also severely restrict the performance of the data reading step in the merging process.
Therefore, the data merging process is started when the specific condition is reached by detecting the two parameters as main indexes. The "specific conditions" herein mainly refer to two points:
the first point is that the system resources occupied by the current user request operation are in a lower state, and at this time, the remaining resources are enough to support the merge operation without basically affecting the request operation response rate of the user.
The second point means that the state of the first point lasts for a certain time and it can be predicted to a large extent that there is no large fluctuation in user request in the near future. The first point meets the system resource requirement of the merging operation, the second point meets the time requirement of the merging operation, and when the two conditions are met, the merging operation can be confirmed to start.
For a specific Cassandra database system application server, the quantitative standard of the specific condition is obtained according to actual tests, and the quantitative standard is mainly related to software and hardware configuration of the system and a service-oriented user group. Through actual test and software and hardware configuration, the minimum merging trigger condition standard which does not influence the reading request of the user can be tested.
In some embodiments, the duration of time that the minimum trigger condition is satisfied may also be set according to the daily request amount fluctuation situation of the user group, which is a learnable mechanism, and the duration of time that the merging condition may be satisfied in the near future is inferred from the previous access rule and the detected access rule, so as to determine to start the merging process. I.e. the mergeable time period is predicted according to the historical access law of the database.
It should be noted that, in the embodiment of the present invention, the main thread only monitors the system environment change in real time, the branch thread is responsible for merging the main processes, and the prefetch policy and the multithread merging policy are mainly used to accelerate the merging process.
And then, according to the sizes of the SSTable files of the Cassandra database and preset rules, the SSTable files are merged into a plurality of sets.
In some embodiments, merging the plurality of SSTable files into a plurality of sets further comprises: and the SSTable files in each set are arranged in a reverse order according to the creation time.
It should be noted that, through the merging and sorting functions of the files, the optimal merged file can be determined conveniently, the total size of the merged file is limited, and the merging effect is optimal. In addition, the system resources are effectively utilized through various mechanisms, and the merging process is accelerated.
When merging and sorting files, firstly, file creation time and size information are read, and files are merged into different lists (sets) according to the range of the file sizes, and simultaneously, the files in each list are arranged according to the reverse order of the creation time.
Then, a plurality of SSTable files to be merged in each of the plurality of sets are determined according to the mergeable time period and read into a memory of a server.
In some embodiments, determining and reading the number of SSTable files to be merged in each of the number of sets into a memory of a server according to the mergeable time period may further include:
establishing a correlation function of the size of the merged file and the predicted merging time;
predicting the size of the merged file through the correlation function and the mergeable time period;
and determining a plurality of SSTable files to be merged in each set according to the size of the predicted merged file.
Specifically, after classification according to file sizes, files in different lists can be reasonably selected according to predicted mergeable time periods for merging, and the larger the number of files is, the longer the list merging time of a single file is, so that a correlation function between the size of a merged file and the predicted merging time can be pre-established, and thus a proper list can be more reasonably selected according to the predicted mergeable time periods and the predicted merging time for merging operation, for example, a first list is selected or a corresponding list file is selected according to time prediction for executing merging operation. In the merging process, files with a certain number or a certain size and total amount in the list files are selected, so that the merging time can be limited in a controllable range according to the size.
It should be noted that, the final merging result is that multiple SSTable files in each list are merged into one file, but it is only possible that the merging is not completed at one time, but may be divided into multiple merges according to the predicted mergeable time period.
For example, a list includes three SSTable files a, B, and C, and the three SSTable files need to be merged into one file, where the merging process may be to merge a and B first, store the file merged by a and B into the list after the merging is completed, and then merge C with the file merged by a and B after the next merging.
In some embodiments, the top SSTable files in each set are preferentially selected.
The SSTable files in each set are arranged according to the time sequence and the reverse order, so that the SSTable files which are written recently and arranged in the front can be preferentially combined, the files are most likely to be accessed again in the near term, and the recently written files can be preferentially combined by combining according to the time sequence and the reverse order, so that the performance improvement generated by combination is optimal.
Then, reading a plurality of SSTable files to be merged in each set of the plurality of sets into a memory of a server;
specifically, for the prefetching policy, the file participating in the merging can be read into the memory at one time, the memory reading process of the file can be promoted, and the effect is more remarkable particularly for large-scale files.
In some embodiments, it is desirable to control the total number of SSTable files to be merged in each set to be less than a threshold.
It should be noted that the cache space is reasonably utilized by adjusting the total size of the SSTable files to be merged in all sets. When the total amount of the merged files does not reach half of the space of the cache, the exchange of a large amount of data between a disk and a memory can be avoided, and the merging efficiency is greatly accelerated.
Therefore, the idle time can be fully and effectively utilized through the mechanism, the performance effect of the merging process is improved to be optimal, and the file logic selection process of the Cassandra system merging process is greatly optimized.
And finally, combining the SSTable files of each set in the memory respectively.
For a multithread merging mechanism, as the merging process cannot fully utilize the hardware resources of the modern complex processor core, the multithread merging mechanism can be adopted to perform multithread parallel merging on the files which are pre-read into the memory in advance, so that the merging process is accelerated. When the final merging is finished, the main thread is informed, and the merging process of the branch threads is finished. This can greatly reduce the time of the Cassandra system merging process.
In addition, for large-scale file merging operation, further optimization and improvement can be carried out in a pipeline mode. For the prefetch part in the merging process, CPU resources are basically not occupied, but a large amount of memory bandwidth resources are occupied due to a large amount of disk reading operations; for the multithread merging mechanism in the merging process, a large amount of CPU resources are occupied due to multithread parallel execution, but the merged file is the file cache existing in the memory through prefetching, so that the multithread merging process does not occupy excessive memory resources. Therefore, no conflict of resource requirements is generated between the pre-fetching operation and the multi-thread merging operation, and therefore the two processes can be executed in a pipeline mode in parallel. Therefore, for the prefetching and merging process of the large files, the merging process can be accelerated greatly theoretically, the whole merging time is reduced, the influence of the merging process on the occupation of system resources is continuously reduced, and meanwhile, the success rate of the merging process is increased.
It should be noted that the multi-thread parallel execution of the merge operation refers to performing merge processing on a plurality of files in each set.
In some embodiments, as shown in fig. 2, further comprising: and in response to the detection that the performance of the server cannot meet the read-write requirements of the user, terminating the merging, performing data rollback, and deleting the temporary merged file.
Specifically, when merging is performed on the branch threads, the main thread implements a detection system environment, mainly detecting whether a large number of user reading operations exist. If a large number of users do not read, the waiting thread informs the main thread of finishing the merging process and then finishing the whole merging operation, and returning to the step to obtain the predicted mergeable time period. If a large number of user reading requests appear in the period, in order to meet the user requests firstly, the main thread informs the sub-threads to finish the merging process immediately, meanwhile, data rollback operation is carried out, the temporary merging files are deleted, the user requests are responded firstly, and the step is returned to obtain the predicted mergeable time period.
For a real-time service system, the user request response performance should be considered as a priority, and at this time, the main thread notifies the sub-threads to immediately finish the merging process, so that a large amount of system resources are used for providing the user requirements. The guarantee mechanism can effectively protect the resource allocation of the system, forcibly removes the resource occupation of the merging mechanism and ensures the long-term real-time efficient response performance of the system.
The method provided by the invention mainly determines whether a large number of user read-write operations are currently carried out by monitoring the state of the server in real time. If the processor and data bandwidth of the server remain at a lower level for a certain period of time, the SStable merge is initiated. The merging process mainly comprises the steps of dividing all SStables into a plurality of levels according to the available hardware resources of the server, storing a smaller SStable structure at a lower level, and then sequentially carrying out data merging operation at different levels from bottom to top. In a specific SStable set at a certain level, sequencing is performed according to the sequence of file writing time, a newly written SStable structure is preferentially merged, and the total data amount of the newly written SStable structure is controlled within a certain range, especially not more than half of the total capacity of the page cache.
In addition, in order to avoid the overlong delay of the user request caused by the long-time merging operation, the user request is monitored in real time in the merging process, and when the excessive user requests cause the overhigh CPU and bandwidth utilization rate, the merging operation is automatically terminated, and the data is returned to the state before merging. The SStable merge time is reduced by a prefetch and multithreading mechanism. In order to reduce the impact of the merging process on the user, some strategies are employed to reduce the merging process time. The data to be merged is first loaded into the page cache ahead of time by a data prefetch operation. Because of the integral loading of data, the efficiency is higher than that of the operation of directly merging. Secondly, as the data merging does not occupy too much CPU resources, a multithreading mechanism can be adopted to merge the data loaded into the memory concurrently, and the merging process is accelerated.
And for the whole flow operation of the Cassandra database, the merging process is indispensable, but the merging process is a database big data application oriented to user service, so that a user needs to firstly consider meeting the read-write operation request of the user, on the basis, the idle time and idle resources of the system are utilized to the maximum extent, the influence of the merging process on the request of the user is reduced to the minimum, and the final merging result is realized, so that the improvement of the reading performance can be optimal.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:
at least one processor 520; and
a memory 510, the memory 510 storing a computer program 511 executable on the processor, the processor 520 when executing the program performing the steps of any of the above methods of data consolidation based on the Cassandra database.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the steps of any of the above data merging methods based on the Cassandra database.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program to instruct related hardware to implement the methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
In addition, the apparatuses, devices, and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal device, such as a server, and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed by the embodiment of the invention can be applied to any one of the electronic terminal devices in the form of electronic hardware, computer software or a combination of the electronic hardware and the computer software.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A data merging method based on a Cassandra database comprises the following steps:
obtaining a predicted mergeable time period;
the method comprises the steps that a plurality of SSTable files of a Cassandra database are merged into a plurality of sets according to the sizes of the SSTable files and preset rules;
determining a plurality of sets which can be merged in the plurality of sets according to the mergeable time period;
determining a plurality of SSTable files to be merged in each set of the plurality of sets according to the mergeable time period and reading the SSTable files into a memory of a server;
and respectively merging the SSTable files of each set in the memory.
2. The method of claim 1, further comprising:
and in response to the detection that the performance of the server cannot meet the read-write requirements of the user, terminating the merging, performing data rollback, and deleting the temporary merged file.
3. The method of claim 1, wherein obtaining a predicted mergeable time period further comprises:
and predicting a combinable time period according to the historical access rule of the database.
4. The method of claim 1, wherein obtaining a predicted mergeable time period further comprises:
circularly acquiring the environmental parameters of the server;
judging whether the server is in an idle state all the time within a preset time period according to the environmental parameters of the server;
predicting a mergeable time period in response to the server being in an idle state for the preset time.
5. The method of claim 1, wherein merging the plurality of SSTable files into a plurality of sets, further comprises:
and the SSTable files in each set are arranged in a reverse order according to the creation time.
6. The method of claim 1, wherein the number of SSTable files to be merged in each of the number of sets is determined and read into a memory of a server according to the mergeable time period, further comprising:
establishing a correlation function of the size of the merged file and the predicted merging time;
predicting the size of the merged file through the correlation function and the mergeable time period;
and determining a plurality of SSTable files to be merged in each set according to the size of the predicted merged file.
7. The method of claim 6, wherein determining a number of SSTable files to be merged in each set based on the predicted size of merged files further comprises:
the top SSTable files in each set are preferably selected.
8. The method of claim 7, further comprising:
and controlling the total size of the SSTable files to be merged in each set to be smaller than a threshold value.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 8.
CN201910955257.5A 2019-10-09 2019-10-09 Data compression method, equipment and storage medium based on Cassandra database Active CN110727685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910955257.5A CN110727685B (en) 2019-10-09 2019-10-09 Data compression method, equipment and storage medium based on Cassandra database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910955257.5A CN110727685B (en) 2019-10-09 2019-10-09 Data compression method, equipment and storage medium based on Cassandra database

Publications (2)

Publication Number Publication Date
CN110727685A true CN110727685A (en) 2020-01-24
CN110727685B CN110727685B (en) 2022-04-22

Family

ID=69219799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910955257.5A Active CN110727685B (en) 2019-10-09 2019-10-09 Data compression method, equipment and storage medium based on Cassandra database

Country Status (1)

Country Link
CN (1) CN110727685B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309269A (en) * 2020-02-28 2020-06-19 苏州浪潮智能科技有限公司 Method, system and equipment for dropping compressed data and readable storage medium
CN111881092A (en) * 2020-06-22 2020-11-03 武汉绿色网络信息服务有限责任公司 Method and device for merging files based on cassandra database
CN112286917A (en) * 2020-10-22 2021-01-29 北京锐安科技有限公司 Data processing method and device, electronic equipment and storage medium
CN112732197A (en) * 2021-01-14 2021-04-30 苏州浪潮智能科技有限公司 Data IO processing method and device, storage medium and equipment
US11537613B1 (en) 2021-10-29 2022-12-27 Snowflake Inc. Merge small file consolidation
US11593306B1 (en) * 2021-10-29 2023-02-28 Snowflake Inc. File defragmentation service

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473239A (en) * 2012-06-08 2013-12-25 腾讯科技(深圳)有限公司 Method and device for updating data of non relational database
CN103761153A (en) * 2012-05-02 2014-04-30 北京奇虎科技有限公司 Resource calling method and device of compress software
US20170212680A1 (en) * 2016-01-22 2017-07-27 Suraj Prabhakar WAGHULDE Adaptive prefix tree based order partitioned data storage system
CN110059090A (en) * 2019-04-19 2019-07-26 阿里巴巴集团控股有限公司 A kind of write-in/dump/merging/the querying method and device of bitmap index

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761153A (en) * 2012-05-02 2014-04-30 北京奇虎科技有限公司 Resource calling method and device of compress software
CN103473239A (en) * 2012-06-08 2013-12-25 腾讯科技(深圳)有限公司 Method and device for updating data of non relational database
US20170212680A1 (en) * 2016-01-22 2017-07-27 Suraj Prabhakar WAGHULDE Adaptive prefix tree based order partitioned data storage system
CN110059090A (en) * 2019-04-19 2019-07-26 阿里巴巴集团控股有限公司 A kind of write-in/dump/merging/the querying method and device of bitmap index

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309269A (en) * 2020-02-28 2020-06-19 苏州浪潮智能科技有限公司 Method, system and equipment for dropping compressed data and readable storage medium
WO2021169388A1 (en) * 2020-02-28 2021-09-02 苏州浪潮智能科技有限公司 Method, system and device for writing compressed data to disk, and readable storage medium
CN111881092A (en) * 2020-06-22 2020-11-03 武汉绿色网络信息服务有限责任公司 Method and device for merging files based on cassandra database
CN112286917A (en) * 2020-10-22 2021-01-29 北京锐安科技有限公司 Data processing method and device, electronic equipment and storage medium
CN112732197A (en) * 2021-01-14 2021-04-30 苏州浪潮智能科技有限公司 Data IO processing method and device, storage medium and equipment
US11537613B1 (en) 2021-10-29 2022-12-27 Snowflake Inc. Merge small file consolidation
US11593306B1 (en) * 2021-10-29 2023-02-28 Snowflake Inc. File defragmentation service

Also Published As

Publication number Publication date
CN110727685B (en) 2022-04-22

Similar Documents

Publication Publication Date Title
CN110727685B (en) Data compression method, equipment and storage medium based on Cassandra database
US9355112B1 (en) Optimizing compression based on data activity
RU2616545C2 (en) Working set swap, using sequentially ordered swap file
CN108920387B (en) Method and device for reducing read delay, computer equipment and storage medium
US10417137B2 (en) Flushing pages from solid-state storage device
US9635123B2 (en) Computer system, and arrangement of data control method
US8825959B1 (en) Method and apparatus for using data access time prediction for improving data buffering policies
CN105630834B (en) Method and device for deleting repeated data
CN111881135A (en) Data aggregation method, device, equipment and computer readable storage medium
US8032708B2 (en) Method and system for caching data in a storgae system
US20150302903A1 (en) System and method for deep coalescing memory management in a portable computing device
CN110147331B (en) Cache data processing method and system and readable storage medium
CN110727404A (en) Data deduplication method and device based on storage end and storage medium
US20240086332A1 (en) Data processing method and system, device, and medium
US20170123975A1 (en) Centralized distributed systems and methods for managing operations
CN110287152A (en) A kind of method and relevant apparatus of data management
CN111061690B (en) RAC-based database log file reading method and device
CN111881092A (en) Method and device for merging files based on cassandra database
KR102195896B1 (en) Device and method of managing disk cache
CN110750211A (en) Storage space management method and device
CN112711564B (en) Merging processing method and related equipment
CN111625203A (en) Method, system, device and medium for hierarchical storage
US9501414B2 (en) Storage control device and storage control method for cache processing according to time zones
US20220261354A1 (en) Data access method and apparatus and storage medium
CN106681939B (en) Reading method and device for disk page

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant