CN113761295B - Index segment merging method and device - Google Patents

Index segment merging method and device Download PDF

Info

Publication number
CN113761295B
CN113761295B CN202111106356.XA CN202111106356A CN113761295B CN 113761295 B CN113761295 B CN 113761295B CN 202111106356 A CN202111106356 A CN 202111106356A CN 113761295 B CN113761295 B CN 113761295B
Authority
CN
China
Prior art keywords
index
period
merging
preset
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111106356.XA
Other languages
Chinese (zh)
Other versions
CN113761295A (en
Inventor
杨梦龙
范渊
刘博�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202111106356.XA priority Critical patent/CN113761295B/en
Publication of CN113761295A publication Critical patent/CN113761295A/en
Application granted granted Critical
Publication of CN113761295B publication Critical patent/CN113761295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an index segment merging method and equipment, which are used for predicting storage parameters of prediction data of a time segment to be subjected to index segment merging by analyzing the storage parameters of historical data, and then detecting a data storage low valley period and/or a data storage peak period according to the storage parameters of the prediction data, so that the data storage low valley period and/or the data storage peak period can be accurately determined under the condition of irregular data storage, index segment merging can be carried out according to a preset index segment merging strategy in the data storage low valley period and/or the data storage peak period later, the reliability and efficiency of index segment merging can be improved on the basis of not influencing data storage, the memory consumption is reduced, and the query efficiency is improved.

Description

Index segment merging method and device
Technical Field
The present invention relates to the field of information retrieval, and in particular, to a method and apparatus for merging index segments.
Background
The Elastic Search is an open source Lucene-based Search server that can store and retrieve data in real time. Each index in the Elastic Search creates one to multiple slices, which are essentially Lucene indexes, which in turn consist of one to multiple index segments. When the Elastic Search stores data into a disk, new index segments are created, the more the number of the index segments is, the more memory is consumed, and the lower the retrieval performance is, so that it is important to reduce the number of the index segments through the combination of the index segments. In the prior art, a fixed period is usually set to perform index segment merging, but because the index segment merging is an operation which consumes a disk IO very much and the data amount stored in the disk is variable, the situation that the index segment merging is performed in a data storage peak period, so that the disk IO does not have enough residual resources to process a current data storage task, or the residual resources of the disk IO are not fully utilized in a data storage valley period to perform the index segment merging, so that the index segment merging efficiency is low easily occurs.
Disclosure of Invention
The invention aims to provide an index segment merging method and device, which can accurately determine a data storage low valley period and/or a data storage peak period under the condition of irregular data storage, merge index segments according to a preset index segment merging strategy in the data storage low valley period and/or the data storage peak period, improve the reliability and efficiency of index segment merging, reduce memory consumption and improve query efficiency on the basis of not influencing data storage.
In order to solve the above technical problems, the present invention provides an index segment merging method, including:
predicting storage parameters of prediction data of a merging time period of an index period to be performed according to storage parameters of historical data of a current cluster in a preset time period;
obtaining a data storage low valley period and/or a data storage peak period of the merging time period of the index period to be performed according to the storage parameters of the prediction data of the merging time period of the index period to be performed;
and carrying out index segment merging on the index segments in each fragment in the target index list in the data storage low valley period and/or the data storage peak period according to a preset index segment merging strategy.
Preferably, before predicting the storage parameter of the prediction data of the merging time period of the index period to be performed according to the storage parameter of the historical data of the current cluster in the preset time period, the method further comprises:
judging whether the duration of the preset time period is greater than a duration threshold value or not;
if yes, a step of predicting storage parameters of prediction data of a merging time period of an index period to be carried out according to the storage parameters of historical data of the current cluster in a preset time period is carried out;
if not, the automatic index segment merging is not started for the current cluster.
Preferably, predicting storage parameters of prediction data of a merging time period of an index period to be performed according to storage parameters of historical data of a current cluster in a preset time period includes:
and obtaining the storage speed of the predicted data of each preset time point in the merging time period of the to-be-indexed time period according to the storage speed of the historical data of each preset time point in the preset time period.
Preferably, the method further includes, according to the storage parameters of the prediction data of the to-be-indexed segment merging time segment, obtaining a data storage low valley period and/or a data storage peak period of the to-be-indexed segment merging time segment, where the method further includes:
according to the storage speed of the historical data at each preset time point in a preset time period, obtaining the average storage speed of the historical data in the preset time period;
obtaining the data storage low valley period and/or the data storage peak period of the to-be-indexed segment merging time segment according to the storage parameters of the predicted data of the to-be-indexed segment merging time segment, wherein the method comprises the following steps:
according to the storage average speed and the storage speed of each piece of predicted data, obtaining each first time point in each preset time point, wherein the storage speed of the predicted data is lower than the storage average speed by a first preset percentage;
determining N data storage low valley periods from the first time points, wherein the duration of each data storage low valley period is longer than the combination duration of preset sections, the ratio of the number of the first time points contained in each data storage low valley period to the number of the preset time points contained in the data storage low valley period is greater than a second preset percentage, and N is a positive integer;
and/or, according to the storage average speed and the storage speed of each piece of predicted data, obtaining each second time point in each preset time point, wherein the storage speed of the predicted data is not lower than a third preset percentage of the storage average speed;
and determining M data storage peak periods from the second time points, wherein the duration of each data storage peak period is greater than the combination duration of the preset periods, the ratio of the number of the second time points contained in each data storage peak period to the number of the preset time points contained in the data storage peak period is greater than a fourth preset percentage, and M is a positive integer.
Preferably, after obtaining the data storage valley period of the to-be-indexed segment merging time segment according to the storage parameter of the predicted data of the to-be-indexed segment merging time segment, the method further includes:
when any one data storage valley period starts, detecting the average utilization rate of the disk IO in a first preset duration;
judging whether the average utilization rate is lower than a first preset utilization rate or not and whether the residual duration of the current data storage valley period is longer than the preset segment merging duration or not after the average utilization rate is detected;
if yes, entering a step of merging index segments in each segment in a target index list in the low valley period of the data storage according to a preset index segment merging strategy;
if not, the method enters a step of judging whether the average utilization rate is lower than a first preset utilization rate and whether the residual duration of the current data storage valley period is longer than the preset segment merging duration after the detection of the average utilization rate is completed.
Preferably, the step of merging index segments in each segment in the target index list in the data storage valley period according to a preset index segment merging strategy includes:
sequentially carrying out the following index segment merging operation on the index segments in each fragment in the target index list according to a preset sequence:
detecting the storage capacity occupied by an index segment in the current segment in the target index list;
and merging all index segments of which the storage capacity is smaller than a first preset storage capacity threshold value in the current fragment.
Preferably, after merging all the index segments in the current segment, where the storage capacity is smaller than the first preset storage capacity threshold, the method further includes:
detecting whether the residual duration of the current data storage valley period is greater than the preset segment merging duration;
if yes, carrying out index segment merging operation on index segments in the next segment in the target index list according to a preset sequence;
if not, carrying out index segment merging operation on the index segments in the next segment in the target index list according to a preset sequence in the next data storage valley period.
Preferably, the method further includes, when index segment merging is performed on index segments in each segment in the target index list in the data storage valley period according to a preset index segment merging strategy,:
judging whether the average utilization rate of the disk IO in the second preset time period is larger than a second preset utilization rate or not;
if yes, carrying out index segment merging on index segments which are smaller than a second preset storage capacity threshold value in index segments which are not subjected to index segment merging in the current segment, and after the index segment merging in the current segment is completed, not carrying out index segment merging on index segments in other segments in the target index list, wherein the second storage capacity threshold value is smaller than the first preset storage capacity threshold value;
if not, the step of judging whether the average utilization rate of the disk IO in the second preset time period is larger than the second preset utilization rate is carried out until the index segments in the current segment are combined.
Preferably, the step of merging index segments in each segment in the target index list according to a preset index segment merging strategy in the data storage low valley period and/or the data storage peak period includes:
index segment merging is carried out on the index segments in each fragment in the target index list in the data storage valley period;
and not merging index segments in each fragment in the target index list in the data storage peak period.
The invention also provides index segment merging equipment, which comprises the following steps:
a memory for storing a computer program;
and the processor is used for realizing the steps of the index segment merging method when executing the computer program.
The invention provides an index segment merging method and equipment, which are used for predicting storage parameters of prediction data of a time segment to be subjected to index segment merging by analyzing the storage parameters of historical data, and then detecting a data storage low valley period and/or a data storage peak period according to the storage parameters of the prediction data, so that the data storage low valley period and/or the data storage peak period can be accurately determined under the condition of irregular data storage, index segment merging can be carried out according to a preset index segment merging strategy in the data storage low valley period and/or the data storage peak period later, the reliability and efficiency of index segment merging can be improved on the basis of not influencing data storage, the memory consumption is reduced, and the query efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the prior art and the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an index segment merging method according to the present invention;
fig. 2 is a schematic structural diagram of an index segment merging device provided by the present invention.
Detailed Description
The core of the invention is to provide an index segment merging method and device, which can accurately determine the low valley period and/or the peak period of data storage under the condition of irregular data storage, merge index segments according to a preset index segment merging strategy in the low valley period and/or the peak period of data storage, improve the reliability and efficiency of index segment merging, reduce memory consumption and improve query efficiency on the basis of not influencing data storage.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of an index segment merging method provided in the present invention, where the method includes:
s11: predicting storage parameters of prediction data of a merging time period of an index period to be performed according to storage parameters of historical data of a current cluster in a preset time period;
s12: obtaining a data storage low valley period and/or a data storage peak period of the merging time period of the index period to be performed according to the storage parameters of the prediction data of the merging time period of the index period to be performed;
s13: and carrying out index segment merging on the index segments in each fragment in the target index list in the data storage low valley period and/or the data storage peak period according to a preset index segment merging strategy.
In the prior art, a fixed period is set to perform index segment merging, which may occur when the index segment merging is still performed in the data storage peak period, so that the disk IO does not have enough residual resources to process the task of data storage, or when the index segment merging is still performed according to the index segment merging policy corresponding to the data storage peak period in the data storage valley period, the index segment merging is omitted, so that the index segment merging efficiency is low.
In order to solve the technical problem, the method predicts the storage parameters of the prediction data of the merging time period of the to-be-indexed time period by the storage parameters of the historical data of the current cluster in the preset time period, and the storage parameters of the historical data are changed in different time periods, but when the data amount of the collected historical data is enough, the storage parameters of the historical data can reflect the storage parameters of the prediction data of the merging time period of the to-be-indexed time period to a certain extent.
Based on the above, the invention utilizes the storage parameters of the predicted data to obtain the data storage low-valley period and/or the data storage peak period of the merging time period of the index period to be performed. The present application is not particularly limited in that only the data storage low-peak period or only the data storage peak period or both of the data storage low-peak period and the data storage peak period are obtained by predicting the storage parameters of the data. And after the data storage low valley period and/or the data storage peak period are obtained, carrying out index segment merging on the index segments in each segment in the target index list according to a preset segment merging strategy.
In summary, the method predicts the storage parameters of the predicted data of the period to be subjected to index segment merging by analyzing the storage parameters of the historical data, then detects the data storage low valley period and/or the data storage peak period according to the storage parameters of the predicted data, and accurately determines the data storage low valley period and/or the data storage peak period under the condition that the data storage is irregular, so that index segment merging is carried out according to a preset index segment merging strategy in the data storage low valley period and/or the data storage peak period later, the reliability and efficiency of index segment merging can be improved on the basis that the data storage is not affected, the memory consumption is reduced, and the query efficiency is improved.
Based on the above embodiments:
as a preferred embodiment, before predicting the storage parameters of the prediction data of the period to be indexed and combined according to the storage parameters of the history data of the current cluster in the preset period, the method further includes:
judging whether the duration of the preset time period is greater than a duration threshold value or not;
if yes, a step of predicting storage parameters of prediction data of a merging time period of an index period to be carried out according to the storage parameters of historical data of the current cluster in a preset time period is carried out;
if not, the automatic index segment merging is not started for the current cluster.
Considering that the sample size of the historical data is large enough, the storage parameters of the prediction data of the period to be subjected to index segment merging can be predicted relatively accurately according to the storage parameters of the historical data. Aiming at the technical problems, a time length threshold is set, the time length of a preset time period is compared with the time length threshold, and when the time length of the preset time period is greater than the time length threshold, a step of predicting storage parameters of prediction data of a merging time period of an index period to be performed according to storage parameters of historical data of a current cluster in the preset time period is carried out; and when the duration of the preset time period is not greater than the duration threshold value, the automatic index segment merging is not started.
In summary, the invention ensures the accuracy of the storage parameters of the prediction data of the merging time period of the index period to be performed.
As a preferred embodiment, predicting storage parameters of prediction data of a merging time period of an index period to be performed according to storage parameters of historical data of a current cluster in a preset time period includes:
and obtaining the storage speed of the predicted data of each preset time point in the merging time period of the index period to be carried out according to the storage speed of the historical data of each preset time point in the preset time period.
In this embodiment, the storage speed is used as a storage parameter, a preset time point is set, and the storage speed of the predicted data of each preset time point in the merging time period of the indexing period to be performed is predicted according to the storage speed of the history data of each preset time point in the preset time period.
For example, the storage speed of data at each time point on the day is predicted by the ARIMA model from the storage speed of data at each time point corresponding to each day in the past month of the history index. Too long a time interval between preset time points may affect the accuracy of the storage speed of the predicted data at each preset time point, and the preset time points may be set to correspond to one time point per minute, which is not particularly limited in this application.
The data storage amount of the predicted data at each preset time point in the merging period of the indexing period to be performed may be obtained according to the data storage amount of the history data at each preset time point in the preset period, in addition to the storage speed of the history data at each preset time point in the preset period, which is obtained according to the storage speed of the history data at each preset time point in the preset period.
In addition, other time series analysis methods may be used to predict the storage speed of the predicted data at each preset time point, which is not particularly limited in the present application.
In summary, the method and the device can predict the storage speed of the predicted data according to the storage speed of the historical data under the condition that the data storage amount is irregular, so as to determine the data storage peak period and/or the data storage valley period later.
As a preferred embodiment, the method further includes, according to the storage parameters of the prediction data of the to-be-indexed segment merging time segment, obtaining a data storage low valley period and/or a data storage peak period of the to-be-indexed segment merging time segment, and further including:
according to the storage speed of the historical data at each preset time point in the preset time period, obtaining the average storage speed of the historical data in the preset time period;
obtaining a data storage low valley period and/or a data storage peak period of the to-be-indexed segment merging time segment according to storage parameters of the predicted data of the to-be-indexed segment merging time segment, wherein the method comprises the following steps:
according to the storage average speed and the storage speed of each piece of predicted data, obtaining each first time point in each preset time point, wherein the storage speed of the predicted data is lower than the storage average speed by a first preset percentage;
determining N data storage low valley periods from each first time point, wherein the duration of each data storage low valley period is longer than the merging duration of the preset sections, the ratio of the number of the first time points contained in each data storage low valley period to the number of the preset time points contained in the upper data storage low valley period is greater than a second preset percentage, and N is a positive integer;
and/or, according to the storage average speed and the storage speed of each predicted data, obtaining each second time point in each preset time point, wherein the storage speed of the predicted data is not lower than the storage average speed by a third preset percentage;
and determining M data storage peak periods from the second time points, wherein the duration of each data storage peak period is greater than the merging duration of the preset period, the ratio of the number of the second time points contained in each data storage peak period to the number of the preset time points contained in the upper data storage peak period is greater than a fourth preset percentage, and M is a positive integer.
In this embodiment, the storage average speed of the historical data in the preset time period is obtained first, and then the storage speed of the predicted data is compared with the storage average speed to obtain the data storage low valley period and/or the data storage peak period in the preset time period.
Firstly, each first time point that the storage speed of the predicted data in each preset time point is lower than the storage average speed by a first preset percentage is found out, and then one or more data storage valley periods are found out from each first time point, and the time length of the data storage valley periods is longer than the time length of the index period combination, namely the preset period combination time length, because a certain time is needed for the index period combination. In the data storage valley period, a certain number of preset time points which do not meet the requirement that the storage speed of the predicted data is lower than the first preset percentage of the storage average speed are allowed to exist, so that the ratio of the number of the first time points contained in the data storage valley period to the number of the preset time points contained in the upper data storage valley period is larger than the second preset percentage.
For example, each first time point satisfying that the storage speed of the predicted data is lower than 50% of the storage average speed among the preset time points is determined, then, from among the first time points, data storage valley periods having N times longer than 2 hours are determined, and the ratio of the number of the first time points to the number of the preset time points included in the upper data storage valley period in the data storage valley period is greater than 90%. The 2 hours are experimental results of the time required for merging the index segments, and the time required for merging the index segments is different from the data of different sizes or the hosts with different performances, so that the preset time for merging the segments can be adjusted according to actual application scenes, and the application is not particularly limited.
The data storage peak period determination process refers to the data storage valley period determination process described above. Specific values of the first preset percentage, the second preset percentage, the third preset percentage, and the fourth preset percentage are not particularly limited in this application.
In summary, the method can predict the data storage low valley period and/or the data storage peak period in the period of the index period merging to be performed, is simple and reliable, and is convenient for the subsequent index period merging in the data storage low valley period and/or the data storage peak period.
As a preferred embodiment, after obtaining the data storage valley period of the to-be-indexed segment merging period according to the storage parameter of the prediction data of the to-be-indexed segment merging period, the method further includes:
when any one data storage valley period starts, detecting the average utilization rate of the disk IO in a first preset duration;
judging whether the average utilization rate is lower than a first preset utilization rate or not and whether the residual duration of the current data storage valley period is longer than the preset segment merging duration after the detection of the average utilization rate is finished or not;
if yes, entering a step of merging index segments in each fragment in a target index list in a data storage valley period according to a preset index segment merging strategy;
if not, the method enters a step of judging whether the average utilization rate is lower than a first preset utilization rate and whether the residual duration of the current data storage valley period is longer than the preset segment merging duration after the detection of the average utilization rate is completed.
In consideration of the fact that although the storage parameters of the prediction data to be subjected to the index segment merging period are predicted based on the storage parameters of the history data within the preset period, that is, in the case where the sample amount of the storage parameters of the history data is sufficiently large, there may be cases where the storage parameters of the prediction data are not accurately predicted in a small number of cases.
In order to solve the technical problems, when any one data storage valley period starts, the average utilization rate of the disk IO in a first preset duration is detected, whether the average utilization rate of the disk IO in the first preset duration is lower than the first preset utilization rate is judged, and because the detection of the utilization rate of the disk IO also occupies a part of time, whether the remaining time of the current data storage valley period is longer than the preset segment merging duration is also required to be detected, and index segment merging is carried out on index segments in each segment in a target index list according to a preset index segment merging strategy under the condition that the two conditions are met; if the average utilization rate of the disk IO in the first preset duration is not lower than the first preset utilization rate, the utilization rate of the disk IO is continuously detected, and the judging process is executed.
For example, when any data storage valley period starts, firstly detecting the average utilization rate of disk IO in the period of 5 minutes, if the average utilization rate of disk IO in the period of 5 minutes is lower than 50%, entering a step of merging index segments in each segment in a target index list according to a preset index segment merging strategy in the data storage valley period; if the average utilization rate of the disk IO is not lower than 50% in 5 minutes, the average utilization rate of the disk IO is continuously detected in 5 minutes from the current time, and if the average utilization rate of the disk IO is lower than 50%, but the remaining time of the current data storage valley period is less than 2 hours, namely the preset period merging duration, the index period merging is not performed in the current data storage valley period.
In summary, by detecting the average utilization rate of the disk IO in the first preset duration before the index segment merging, the index segment merging can be further ensured under the condition of less data storage amount, and the reliability and efficiency of the index segment merging are improved on the basis of not influencing the data storage.
As a preferred embodiment, the method for merging index segments in each slice in the target index list according to a preset index segment merging strategy in a data storage valley period includes:
sequentially carrying out the following index segment merging operation on the index segments in each segment in the target index list according to a preset sequence:
detecting the storage capacity occupied by an index segment in the current fragment in the target index list;
and merging all index segments with the current fragment storage capacity smaller than the first preset storage capacity threshold.
In this embodiment, the storage capacity occupied by the index segment in the current segment is detected first, and then all the index segments with the storage capacity smaller than the first preset storage capacity threshold in the current segment are combined together. For example, all index segments with storage capacity smaller than 100M in the current segment are combined. The first preset storage capacity threshold may be set according to practical situations, which is not particularly limited in the present application.
In summary, the method and the device combine all index segments with the current slice storage capacity smaller than the first preset storage capacity threshold, so that the memory consumption can be reduced, and the query efficiency can be improved.
As a preferred embodiment, after merging all index segments whose current slice storage capacity is smaller than the first preset storage capacity threshold, the method further includes:
detecting whether the residual duration of the current data storage valley period is greater than the merging duration of a preset segment;
if yes, carrying out index segment merging operation on index segments in the next segment in the target index list according to a preset sequence;
if not, the index segment merging operation is carried out on the index segments in the next segment in the target index list according to the preset sequence in the next data storage valley period.
Considering that a certain time is required for carrying out index segment merging on the index segments in each segment, after the index segments in the current segment are completed, whether the residual time of the current data storage low valley period is longer than the preset segment merging duration or not needs to be detected, and if the residual time of the current data storage low valley period is longer than the preset segment merging duration, carrying out index segment merging on the index segments in the next segment; and if the remaining time of the current data storage valley period is not greater than the preset segment merging duration, carrying out index segment merging on the index segments in the next segment in the target index list according to the preset sequence in the next data storage valley period.
In summary, the present application ensures the accuracy and reliability of index segment merging.
As a preferred embodiment, according to a preset index segment merging policy, when the index segments in each slice in the target index list are merged in the low valley period of the data storage, the method further includes:
judging whether the average utilization rate of the disk IO in the second preset time period is larger than the second preset utilization rate or not;
if yes, carrying out index segment merging on the index segments which are not subjected to index segment merging in the current segment and are smaller than a second preset storage capacity threshold value, and after the index segment merging in the current segment is completed, not carrying out index segment merging on the index segments in other segments in the target index list, wherein the second storage capacity threshold value is smaller than the first preset storage capacity threshold value;
if not, the step of judging whether the average utilization rate of the disk IO in the second preset time period is larger than the second preset utilization rate is carried out until the index segments in the current segment are combined.
In this embodiment, when the segments are combined in the index segment, detecting an average usage rate of the disk IO within a second preset duration, and judging whether the average usage rate of the disk IO within the second preset duration is greater than a second preset usage rate; if so, the current data storage amount is changed, for example, the data storage amount is increased due to abnormal performance of a host disk or other emergencies. If the index segment merging is performed according to the preset index segment merging strategy, the task of data storage may be affected, so that index segments smaller than a second preset storage capacity threshold value in the index segments not subjected to the index segment merging in the current segment are subjected to the index segment merging, wherein the second storage capacity threshold value is smaller than the first preset storage capacity threshold value. For example, the second storage capacity threshold is set to 2M, and the index segments with the current slice storage capacity smaller than 2M are combined. And after the index segments in the current segment are combined, the index segments in other segments in the target index list are not combined.
In summary, the present application can determine whether there is an emergency by detecting the usage rate of the disk IO during the merging of the index segments, and reduce the merging operation of the index segments by reducing the storage capacity threshold, so that the disk IO can have enough remaining resources to handle the emergency.
As a preferred embodiment, the method for merging index segments in each segment in the target index list according to a preset index segment merging strategy in the low data storage period and/or the high data storage period includes:
index segment merging is carried out on the index segments in each fragment in the target index list in the low valley period of the data storage;
index segment merging is not performed on the index segments in each fragment in the target index list during the data storage peak period.
In this embodiment, index segment merging is performed on the index segments in each segment in the target index list only in the valley period of data storage, so that the efficiency of index segment merging is ensured, the memory consumption is reduced, and the query efficiency is improved.
As shown in fig. 2, fig. 2 is a schematic structural diagram of an index segment merging device provided by the present invention, where the device includes:
a memory 1 for storing a computer program;
and a processor 2 for implementing the above-mentioned index segment merging method when executing the computer program.
For the related description of the index segment merging device provided by the present invention, reference is made to the embodiment of the above index segment merging method, and details are not repeated herein.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. An index segment merging method, comprising:
predicting storage parameters of prediction data of a merging time period of an index period to be performed according to storage parameters of historical data of a current cluster in a preset time period;
obtaining a data storage low valley period and/or a data storage peak period of the merging time period of the index period to be performed according to the storage parameters of the prediction data of the merging time period of the index period to be performed;
according to a preset index segment merging strategy, index segment merging is carried out on the index segments in each fragment in the target index list in the data storage low valley period and/or the data storage peak period;
predicting storage parameters of prediction data of a merging time period of an index period to be performed according to storage parameters of historical data of a current cluster in a preset time period, wherein the storage parameters comprise:
obtaining the storage speed of the predicted data of each preset time point in the merging time period of the to-be-indexed time period according to the storage speed of the historical data of each preset time point in the preset time period;
obtaining the data storage low valley period and/or the data storage peak period of the to-be-indexed period combining time period according to the storage parameters of the predicted data of the to-be-indexed period combining time period, and further comprising:
according to the storage speed of the historical data at each preset time point in a preset time period, obtaining the average storage speed of the historical data in the preset time period;
obtaining the data storage low valley period and/or the data storage peak period of the to-be-indexed segment merging time segment according to the storage parameters of the predicted data of the to-be-indexed segment merging time segment, wherein the method comprises the following steps:
according to the storage average speed and the storage speed of each piece of predicted data, obtaining each first time point in each preset time point, wherein the storage speed of the predicted data is lower than the storage average speed by a first preset percentage;
determining N data storage low valley periods from the first time points, wherein the duration of each data storage low valley period is longer than the combination duration of preset sections, the ratio of the number of the first time points contained in each data storage low valley period to the number of the preset time points contained in the data storage low valley period is greater than a second preset percentage, and N is a positive integer;
and/or, according to the storage average speed and the storage speed of each piece of predicted data, obtaining each second time point in each preset time point, wherein the storage speed of the predicted data is not lower than a third preset percentage of the storage average speed;
and determining M data storage peak periods from the second time points, wherein the duration of each data storage peak period is greater than the combination duration of the preset periods, the ratio of the number of the second time points contained in each data storage peak period to the number of the preset time points contained in the data storage peak period is greater than a fourth preset percentage, and M is a positive integer.
2. The method for merging index segments according to claim 1, wherein before predicting storage parameters of prediction data of a period of merging index segments according to storage parameters of history data of a current cluster within a preset period of time, further comprising:
judging whether the duration of the preset time period is greater than a duration threshold value or not;
if yes, a step of predicting storage parameters of prediction data of a merging time period of an index period to be carried out according to the storage parameters of historical data of the current cluster in a preset time period is carried out;
if not, the automatic index segment merging is not started for the current cluster.
3. The method for merging index segments according to claim 1, further comprising, after obtaining a data storage valley period of the to-be-indexed segment merging time segment according to a storage parameter of the prediction data of the to-be-indexed segment merging time segment:
when any one data storage valley period starts, detecting the average utilization rate of the disk IO in a first preset duration;
judging whether the average utilization rate is lower than a first preset utilization rate or not and whether the residual duration of the current data storage valley period is longer than the preset segment merging duration or not after the average utilization rate is detected;
if yes, entering a step of merging index segments in each segment in a target index list in the low valley period of the data storage according to a preset index segment merging strategy;
if not, the method enters a step of judging whether the average utilization rate is lower than a first preset utilization rate and whether the residual duration of the current data storage valley period is longer than the preset segment merging duration after the detection of the average utilization rate is completed.
4. The method of merging index segments as claimed in claim 3, wherein merging index segments in each slice in the target index list according to a preset index segment merging strategy during the low valley period of the data storage, comprises:
sequentially carrying out the following index segment merging operation on the index segments in each fragment in the target index list according to a preset sequence:
detecting the storage capacity occupied by an index segment in the current segment in the target index list;
and merging all index segments of which the storage capacity is smaller than a first preset storage capacity threshold value in the current fragment.
5. The method for merging index segments as set forth in claim 4, further comprising, after merging all index segments in the current segment having the storage capacity smaller than a first preset storage capacity threshold:
detecting whether the residual duration of the current data storage valley period is greater than the preset segment merging duration;
if yes, carrying out index segment merging operation on index segments in the next segment in the target index list according to a preset sequence;
if not, carrying out index segment merging operation on the index segments in the next segment in the target index list according to a preset sequence in the next data storage valley period.
6. The method of claim 5, wherein the merging of index segments in each segment in the target index list according to a preset index segment merging policy while the merging of index segments in each segment in the target index list is performed in the data storage valley period, further comprises:
judging whether the average utilization rate of the disk IO in the second preset time period is larger than a second preset utilization rate or not;
if yes, carrying out index segment merging on index segments which are smaller than a second preset storage capacity threshold value in index segments which are not subjected to index segment merging in the current segment, and after the index segment merging in the current segment is completed, not carrying out index segment merging on index segments in other segments in the target index list, wherein the second preset storage capacity threshold value is smaller than the first preset storage capacity threshold value;
if not, the step of judging whether the average utilization rate of the disk IO in the second preset time period is larger than the second preset utilization rate is carried out until the index segments in the current segment are combined.
7. The method of any one of claims 1 to 6, wherein performing index segment merging on the index segments in each segment in the target index list during the data storage low valley period and/or data storage peak period according to a preset index segment merging policy, comprises:
index segment merging is carried out on the index segments in each fragment in the target index list in the data storage valley period;
and not merging index segments in each fragment in the target index list in the data storage peak period.
8. An index segment merging apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the index segment merging method according to any one of claims 1 to 7 when executing said computer program.
CN202111106356.XA 2021-09-22 2021-09-22 Index segment merging method and device Active CN113761295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111106356.XA CN113761295B (en) 2021-09-22 2021-09-22 Index segment merging method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111106356.XA CN113761295B (en) 2021-09-22 2021-09-22 Index segment merging method and device

Publications (2)

Publication Number Publication Date
CN113761295A CN113761295A (en) 2021-12-07
CN113761295B true CN113761295B (en) 2024-02-27

Family

ID=78796747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111106356.XA Active CN113761295B (en) 2021-09-22 2021-09-22 Index segment merging method and device

Country Status (1)

Country Link
CN (1) CN113761295B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110042576A (en) * 2009-10-19 2011-04-27 한국과학기술정보연구원 Dynamic index information maintenance system adapted solid state disk and method thereof and recording medium having program source thereof
CN102087646A (en) * 2009-12-07 2011-06-08 北大方正集团有限公司 Method and device for establishing index
CN110968272A (en) * 2019-12-16 2020-04-07 华中科技大学 Time sequence prediction-based method and system for optimizing storage performance of mass small files
CN112732723A (en) * 2021-01-20 2021-04-30 浪潮卓数大数据产业发展有限公司 Method for improving Elasticissearch concurrent retrieval efficiency
CN112732189A (en) * 2021-01-07 2021-04-30 Oppo广东移动通信有限公司 Data storage method and device, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110042576A (en) * 2009-10-19 2011-04-27 한국과학기술정보연구원 Dynamic index information maintenance system adapted solid state disk and method thereof and recording medium having program source thereof
CN102087646A (en) * 2009-12-07 2011-06-08 北大方正集团有限公司 Method and device for establishing index
CN110968272A (en) * 2019-12-16 2020-04-07 华中科技大学 Time sequence prediction-based method and system for optimizing storage performance of mass small files
CN112732189A (en) * 2021-01-07 2021-04-30 Oppo广东移动通信有限公司 Data storage method and device, storage medium and electronic equipment
CN112732723A (en) * 2021-01-20 2021-04-30 浪潮卓数大数据产业发展有限公司 Method for improving Elasticissearch concurrent retrieval efficiency

Also Published As

Publication number Publication date
CN113761295A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
US20150106582A1 (en) Apparatus and method for managing data in hybrid memory
US7721288B2 (en) Organizing transmission of repository data
US20220075794A1 (en) Similarity analyses in analytics workflows
CN111125417B (en) Data searching method and device, electronic equipment and storage medium
CN110727685B (en) Data compression method, equipment and storage medium based on Cassandra database
WO2017031837A1 (en) Disk capacity prediction method, device and apparatus
CN111026728A (en) Log data processing method and related device
CN105718028A (en) Power-saving method and device based on power consumption application recognition
CN107402851A (en) A kind of data recovery control method and device
CN107704507B (en) Database processing method and device
CN111176578A (en) Object aggregation method, device and equipment and readable storage medium
CN113486037A (en) Cache data updating method, manager and cache server
CN116244085A (en) Kubernetes cluster container group scheduling method, device and medium
CN114510474B (en) Sample deleting method based on time attenuation, device and storage medium thereof
CN110688360A (en) Distributed file system storage management method, device, equipment and storage medium
CN113761295B (en) Index segment merging method and device
CN114238516A (en) Data synchronization method, system and computer readable medium
CN110928636A (en) Virtual machine live migration method, device and equipment
CN109284193A (en) A kind of distributed data processing method and server based on multithreading
CN111858469A (en) Self-adaptive hierarchical storage method based on time sliding window
CN109189343B (en) Metadata disk-dropping method, device, equipment and computer-readable storage medium
CN109101259B (en) Updating method and device of OSDMap cache container and terminal
CN108924002B (en) Method, device and equipment for analyzing performance data file and storage medium
US9087087B2 (en) Performing index scans in a database
CN110442555B (en) Method and system for reducing fragments of selective reserved space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant