CN117453149B - Data balancing method, device, terminal and storage medium of distributed storage system - Google Patents
Data balancing method, device, terminal and storage medium of distributed storage system Download PDFInfo
- Publication number
- CN117453149B CN117453149B CN202311777759.6A CN202311777759A CN117453149B CN 117453149 B CN117453149 B CN 117453149B CN 202311777759 A CN202311777759 A CN 202311777759A CN 117453149 B CN117453149 B CN 117453149B
- Authority
- CN
- China
- Prior art keywords
- data
- group
- amount
- candidate
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000013500 data storage Methods 0.000 claims abstract description 132
- 238000013508 migration Methods 0.000 claims description 59
- 230000005012 migration Effects 0.000 claims description 59
- 238000004458 analytical method Methods 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 abstract description 2
- 238000004590 computer program Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0647—Migration mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data balancing method, a device, a terminal and a storage medium of a distributed storage system, wherein the method comprises the following steps: acquiring data use information of each allocation group in a distributed storage system, and determining a data use low-valley period based on the data use information, wherein the data use low-valley period reflects a period with the least data use amount of the distributed storage system; acquiring data storage amounts corresponding to the various homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts; and based on the data using the valley period, migrating the preset data quantity in the data release allocation group to the data receiving allocation group. The invention can analyze the data use condition of each of the preset groups, and transfer the data of the preset groups with more data storage quantity to the preset groups with less data storage quantity, so that the data storage quantity in each preset group is not different, and the data balance among the preset groups is realized.
Description
Technical Field
The present invention relates to the field of data balancing technologies, and in particular, to a data balancing method and apparatus for a distributed storage system, a terminal, and a storage medium.
Background
Ceph is an open-source distributed storage system whose functions include object storage, block devices, and file systems. There are a plurality of PGs (placement groups) in the distributed storage system. The PGs are distributed on the disk logic unit OSD (Object Storage Daemon), but the data storage amounts in the PGs are different due to the distributed storage of the data, and the OSD is determined whether to be available according to the PG with the least data storage amount, so that when the PG with the least data storage amount is not used, the OSD on the PG is not available, thereby wasting resources.
Accordingly, there is a need for improvement and advancement in the art.
Disclosure of Invention
The invention aims to solve the technical problems that in the prior art, a data balancing method, a device, a terminal and a storage medium of a distributed storage system are provided for overcoming the defects of the prior art, and the problems that in the prior art, due to the fact that data storage amounts in various allocation groups are different, some magnetic disks are not available easily, and therefore resource waste is caused are solved.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a data balancing method for a distributed storage system, the method comprising:
acquiring data use information of each allocation group in a distributed storage system, and determining a data use low-valley period based on the data use information, wherein the data use low-valley period reflects a period with the least data use amount of the distributed storage system;
acquiring data storage amounts corresponding to the various homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts;
and based on the data using the valley period, migrating the preset data quantity in the data release allocation group to the data receiving allocation group.
In one implementation, the acquiring the data usage information of each homing group in the distributed storage system, and determining the data usage valley period based on the data usage information includes:
recording data use information of all the homing groups in a period by taking the time of a day as the period, and determining the data use amount of each moment of all the homing groups based on the data use information;
and comparing the data use quantity at each moment with a preset use threshold value quantity, and determining the data use valley time period.
In one implementation, the comparing the data usage amount at each time with a preset usage threshold amount, and determining the data usage valley period includes:
comparing the data usage amount of each moment with the usage threshold amount respectively, and screening out a plurality of target moments of which the data usage amount is smaller than the usage threshold amount;
splicing adjacent moments in the target moments according to a time sequence to obtain a plurality of candidate time periods;
and respectively acquiring the data use amount corresponding to each candidate time period, and taking the candidate time period with the minimum data use amount as the data use valley time period.
In one implementation manner, the acquiring the data storage amount corresponding to each homing group, and determining the data receiving homing group and the data releasing homing group based on the data storage amount, includes:
acquiring a preset first storage capacity threshold value and a preset second storage capacity threshold value, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value;
respectively comparing the data storage amount corresponding to each reset group with the first storage amount threshold value and the second storage amount threshold value;
determining a plurality of homing groups of which the data storage amount is larger than the first storage amount threshold value to obtain a first candidate homing group;
determining a plurality of homing groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate homing group;
and determining the data receiving and releasing allocation groups based on the first and second candidate allocation groups.
In one implementation, the determining the data reception and release homing groups based on the first and second candidate homing groups includes:
sorting a plurality of the first candidate preset groups from high to low in data storage amount to obtain the preset group with the highest data storage amount, and taking the preset group with the highest data storage amount as the data release preset group;
and sequencing a plurality of the second candidate preset groups from low data storage to high data storage to obtain the preset group with the lowest data storage, and taking the preset group with the lowest data storage as the data receiving preset group.
In one implementation, the migrating the preset data amount in the data release configuration group to the data receiving configuration group based on the data usage valley period includes:
acquiring intersection of the data using the valley time period and a preset configuration time period to obtain a data migration time period;
and in the data migration time period, migrating the preset data quantity in the data release allocation group to the data receiving allocation group.
In one implementation manner, the migrating, during the data migration period, the preset data amount in the data release configuration group to the data receiving configuration group includes:
determining a data quantity difference value according to the data release allocation group and the data receiving allocation group;
taking half of the data volume difference value as the preset data volume;
and in the data migration time period, migrating the preset data quantity from the data release allocation group to the data receiving allocation group.
In a second aspect, an embodiment of the present invention further provides a data balancing apparatus of a distributed storage system, where the apparatus includes:
the data use analysis module is used for acquiring data use information of each homing group in the distributed storage system and determining a data use low-valley time period based on the data use information, wherein the data use low-valley time period reflects a time period with the minimum data use amount of the distributed storage system;
the data storage quantity determining module is used for obtaining the data storage quantity corresponding to each resetting group and determining a data receiving resetting group and a data releasing resetting group based on the data storage quantity;
and the data migration module is used for migrating the preset data quantity in the data release allocation group to the data receiving allocation group based on the data using valley time period.
In a third aspect, an embodiment of the present invention further provides a terminal, where the terminal includes a memory, a processor, and a data balancing program of a distributed storage system stored in the memory and capable of running on the processor, and when the processor executes the data balancing program of the distributed storage system, the processor implements the steps of the data balancing method of the distributed storage system in any one of the foregoing schemes.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a data balancing program of a distributed storage system is stored on the computer readable storage medium, where the data balancing program of the distributed storage system is executed by a processor, to implement the steps of the data balancing method of the distributed storage system according to any one of the foregoing solutions.
The beneficial effects are that: compared with the prior art, the invention provides a data balancing method of a distributed storage system, which comprises the steps of firstly acquiring data use information of each homing group in the distributed storage system, and determining a data use low-valley time period based on the data use information, wherein the data use low-valley time period reflects a time period with the minimum data use amount of the distributed storage system. And then, acquiring data storage amounts corresponding to the respective homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts. Then, based on the data using the valley period, a preset amount of data in the data release configuration group is migrated to the data reception configuration group. The invention can analyze the data use condition of each of the preset groups, and transfer the data of the preset groups with more data storage quantity to the preset groups with less data storage quantity, so that the data storage quantity in each preset group is not different, and the data balance among the preset groups is realized.
Drawings
Fig. 1 is a flowchart of a specific implementation manner of a data balancing method of a distributed storage system according to an embodiment of the present invention.
Fig. 2 is a functional schematic diagram of a data balancing device of a distributed storage system according to an embodiment of the present invention.
Fig. 3 is a schematic block diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment provides a data balancing method of a distributed storage system, which can adjust the data storage amount in each of the allocation groups based on the method of the embodiment, and ensure that the data storage amounts in each of the allocation groups are not different, thereby realizing data balancing. In specific application, the embodiment first obtains data use information of each allocation group in the distributed storage system, and determines a data use low-valley period based on the data use information, where the data use low-valley period reflects a period with the minimum data use amount of the distributed storage system. And then, acquiring data storage amounts corresponding to the respective homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts. Then, based on the data using the valley period, a preset amount of data in the data release configuration group is migrated to the data reception configuration group. According to the embodiment, the data use condition of each of the preset groups can be analyzed, and the data of the preset groups with more data storage capacity are migrated to the preset groups with less data storage capacity, so that the data storage capacity in each preset group is not different, and data balance among the preset groups is realized.
The data balance of the distributed storage system of the embodiment can be applied to terminals, wherein the terminals comprise intelligent product terminals such as computers, mobile phones and intelligent televisions. Specifically, as shown in fig. 1, in this embodiment, the data balancing method of the distributed storage system includes the following steps:
step S100, acquiring data use information of each allocation group in the distributed storage system, and determining a data use low-valley period based on the data use information, wherein the data use low-valley period reflects a period with the least data use amount of the distributed storage system.
The terminal firstly acquires the data use information of each allocation group in the distributed storage system, and the data use information can be used for reflecting the use condition of each allocation group on the data. For example, the data usage amount of each of the homing groups at each time point. Furthermore, the embodiment can determine the time period with the least data usage of the distributed storage system, namely the data usage valley time period. According to the embodiment, the idle time periods of the reset groups can be determined by analyzing the data using valley time periods, and the space time periods can be used for data migration, so that the influence on the client side and the read-write speed of the client side can be avoided when the data migration is carried out.
In one implementation, the present embodiment includes the following steps when determining that the data uses the valley period:
step S101, taking the time of a day as a period, recording data use information of all the homing groups in the period, and determining the data use amount of each moment of all the homing groups based on the data use information;
step S102, comparing the data usage amount at each moment with a preset usage threshold amount, and determining the data usage valley time period.
In particular application, the embodiment records the data usage information of each homing group at each time in one day, and since the data usage information reflects the data usage amount and the data storage amount at the corresponding time, the data usage period of the data usage valley can be determined by comparing the data usage amount at each time with the preset usage threshold amount. The usage threshold amount in this embodiment is used to measure whether the usage amount of data of each of the preset groups meets the requirement, and if the usage amount of data at a certain time is greater than the usage threshold amount, it is indicated that the usage amount of data at the time is greater, and the time is not in the valley period. Conversely, if the amount of data usage at a time instant is less than the usage threshold amount, it is indicated that the amount of data usage at that time instant is relatively small, and that time instant is the off-peak period.
Specifically, the embodiment may record the data usage amount at each time, and then compare the data usage amount at each time with the usage threshold amount, and screen out a plurality of target time points where the data usage amount is smaller than the usage threshold amount. These target moments are time points, and the moments of the present embodiment may be set every 5 minutes, and the target moments are time points, for example, 14:20, 14:25;14:30, etc. Next, in this embodiment, adjacent time instants among the plurality of target time instants are spliced according to a time sequence, so as to obtain a plurality of candidate time periods. Since the present embodiment analyzes the data usage amount at a single time, it may occur that the data usage amount at some time is greater than the usage threshold amount and the data usage amount at some time is less than the usage threshold amount. In this embodiment, adjacent time instants may be spliced according to a time sequence to obtain a plurality of candidate time periods, where the duration of each candidate time period may be the same or different. In practical application, when a certain time in the target time is isolated and there is no time adjacent to the certain time, the certain time can be a candidate time period alone, and the duration of the candidate time period can be a duration of setting each time at intervals, such as 5 minutes. Next, the present embodiment obtains the data usage amount corresponding to each candidate time period, that is, sums the data usage amounts at each time under each time period, so as to obtain the data usage amount corresponding to each candidate time period. Then, the present embodiment uses the candidate period with the smallest data usage amount as the data usage valley period. The data uses the least amount of data used in the valley period, that is, the idle period of each of the homing groups.
Step 200, obtaining data storage amounts corresponding to the respective homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts.
Next, the embodiment acquires the data storage amount of each of the configuration groups, where the data storage amount may be the remaining data amount after a period of use of the data, or may be the total data amount when the data has not been used. The terminal may determine which of the homing groups has more data storage and which homing group has less data storage based on the data storage. The object of the present embodiment is to achieve data balance between respective sorting groups such that the data storage amounts in the respective sorting groups do not differ much, and therefore, a sorting group with a large amount of determined data storage amounts can be used as a data release sorting group for releasing data. And taking the determined reset group with less data storage amount as a data receiving reset group for receiving data, thereby realizing data balance.
In one implementation, the method includes the following steps when determining the data reception configuration combination data release configuration group:
step S201, a preset first storage capacity threshold value and a preset second storage capacity threshold value are obtained, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value;
step S202, respectively comparing the data storage amount corresponding to each reset group with the first storage amount threshold and the second storage amount threshold;
step S203, determining a plurality of preset groups of which the data storage amount is larger than the first storage amount threshold value to obtain a first candidate preset group;
step S204, determining a plurality of preset groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate preset group;
step S205, determining the data receiving and releasing allocation groups based on the first and second candidate allocation groups.
In specific application, the embodiment obtains a preset first storage capacity threshold value and a preset second storage capacity threshold value, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value. The first storage threshold is used for screening out the preset groups with more data storage, and the second storage threshold is used for screening out the preset groups with less data storage. The embodiment compares the data storage amount corresponding to each preset group with the first storage amount threshold value and the second storage amount threshold value respectively. Determining a plurality of homing groups with the data storage capacity larger than the first storage capacity threshold value to obtain a first candidate homing group; and determining a plurality of homing groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate homing group. At this time, the first candidate placement group is a placement group with a large data storage amount, and the second candidate placement group is a placement group with a small data storage amount. Then, the embodiment may sort the plurality of preset groups in the first candidate preset groups from high to low data storage amount, obtain the preset group with the highest data storage amount, and use the preset group with the highest data storage amount as the data release preset group. And then, sorting a plurality of the second candidate preset groups from low data storage to high data storage to obtain the preset group with the lowest data storage, and taking the preset group with the lowest data storage as the data receiving preset group. Therefore, the embodiment obtains the most data storage group and the least data storage group, and the data storage amounts in the two groups are relatively different, so that the embodiment can take the most data storage group as the data release group and the least data storage group as the data receiving group, thereby being convenient for taking the most data storage group with the low data release value. The data storage amount is relatively balanced between the first storage amount threshold value and the second storage amount threshold value, and the data storage amount is not greatly different, so that the data balance is realized.
In another implementation manner, after determining the first candidate allocation group and the second candidate allocation group, the embodiment may release allocation groups as data for each allocation group in the first candidate allocation group, and then use each allocation group in the second candidate allocation group as a data receiving group. When data migration is performed in the subsequent step, the homing group with the highest data storage amount in the first candidate homing group and the homing group with the lowest data storage amount in the second candidate homing group can form a data migration group, and data migration is performed between the homing groups; and forming a data migration group by arranging the second data storage amount in the first candidate arranging group and the last second data storage amount in the second candidate arranging group, and performing data migration among the first candidate arranging group, and the second candidate arranging group, forming a plurality of data migration groups and performing data migration so as to realize data balance.
And step S300, migrating the preset data quantity in the data release configuration group to the data receiving configuration group based on the data using the valley period.
After determining the data release allocation group and the data receiving allocation group, the embodiment can migrate the preset data amount in the data release allocation group to the data receiving allocation group in the data use valley period so as to realize data balance.
In one implementation manner, the data migration method in this embodiment includes the following steps:
step S301, the intersection of the data using the valley time period and a preset configuration time period is taken, and a data migration time period is obtained;
step S302, in the data migration time period, migrating the preset data quantity in the data release allocation group to the data receiving allocation group.
Specifically, the present embodiment may first acquire a configuration period, which is preset based on a historical usage time of a disk (OSD), the configuration period being an idle period of the disk. In this embodiment, the intersection of the data use valley period and the preset configuration period is taken, so as to obtain a data migration period, where the data migration period uses a relatively small period or an idle period of the disk, so that the data migration in the data migration period can be ensured not to affect the client and not to affect the read-write speed of the client. When data migration is performed, the embodiment determines a data quantity difference value according to the data release allocation group and the data receiving allocation group. Then, half of the data amount difference is taken as the preset data amount. And finally, migrating the preset data quantity from the data release allocation group to the data receiving allocation group in the data migration time period.
In summary, the present embodiment first obtains data usage information of each of the allocation groups in the distributed storage system, and determines a data usage low-valley period based on the data usage information, where the data usage low-valley period reflects a period with a minimum data usage amount of the distributed storage system. And then, acquiring data storage amounts corresponding to the respective homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts. Then, based on the data using the valley period, a preset amount of data in the data release configuration group is migrated to the data reception configuration group. According to the embodiment, the data use condition of each of the preset groups can be analyzed, and the data of the preset groups with more data storage capacity are migrated to the preset groups with less data storage capacity, so that the data storage capacity in each preset group is not different, and data balance among the preset groups is realized.
Based on the above embodiment, the present invention further provides a data balancing apparatus of a distributed storage system, as shown in fig. 2, where the apparatus includes: the data usage analysis module 10, the set-up group determination module 20, and the data migration module 30. Specifically, the data usage analysis module 10 is configured to obtain data usage information of each of the configuration groups in the distributed storage system, and determine a data usage low-valley period based on the data usage information, where the data usage low-valley period reflects a period in which the data usage of the distributed storage system is minimum. The reset group determining module 20 is configured to obtain data storage amounts corresponding to the reset groups, and determine a data receiving reset group and a data releasing reset group based on the data storage amounts. The data migration module 30 is configured to migrate, based on the data usage valley period, a preset data amount in the data release configuration group to the data reception configuration group.
In one implementation, the data usage analysis module 10 includes:
a usage amount acquisition unit configured to record data usage information of all the allocation groups in a period with a time of day as a period, and determine a data usage amount of each time of all the allocation groups based on the data usage information;
and the usage amount analysis unit is used for comparing the data usage amount at each moment with a preset usage threshold amount and determining the data usage valley time period.
In one implementation, the usage analysis unit includes:
the target time determining subunit is used for respectively comparing the data use quantity of each time with the use threshold quantity and screening out a plurality of target times of which the data use quantity is smaller than the use threshold quantity;
a candidate time period determining subunit, configured to splice adjacent time instants in the plurality of target time instants according to a time sequence, so as to obtain a plurality of candidate time periods;
and the low-valley time period determining subunit is used for respectively acquiring the data use amount corresponding to each candidate time period and taking the candidate time period with the least data use amount as the data use low-valley time period.
In one implementation, the homing group determination module 20 includes:
the storage capacity threshold value acquisition unit is used for acquiring a preset first storage capacity threshold value and a preset second storage capacity threshold value, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value;
the storage quantity comparison unit is used for respectively comparing the data storage quantity corresponding to each reset group with the first storage quantity threshold value and the second storage quantity threshold value;
the first candidate allocation group determining unit is used for determining a plurality of allocation groups of which the data storage amount is larger than the first storage amount threshold value to obtain a first candidate allocation group;
the second candidate allocation group determining unit is used for determining a plurality of allocation groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate allocation group;
and the sorting unit is used for determining the data receiving sorting group and the data releasing sorting group based on the first candidate sorting group and the second candidate sorting group.
In one implementation, the homing group screening unit includes:
the data release and allocation group determining subunit is used for sequencing a plurality of allocation groups in the first candidate allocation group from high data storage amount to low data storage amount to obtain the allocation group with the highest data storage amount, and taking the allocation group with the highest data storage amount as the data release and allocation group;
and the data receiving and arranging group determining subunit is used for sorting a plurality of arranging groups in the second candidate arranging group from low data storage amount to high data storage amount to obtain the arranging group with the lowest data storage amount, and taking the arranging group with the lowest data storage amount as the data receiving and arranging group.
In one implementation, the data migration module 30 includes:
the time period analysis unit is used for acquiring an intersection of the data use valley time period and a preset configuration time period to obtain a data migration time period;
and the data migration execution unit is used for migrating the preset data quantity in the data release configuration group to the data receiving configuration group in the data migration time period.
In one implementation, the data migration execution unit includes:
a difference value determining subunit, configured to determine a data amount difference value according to the data release allocation group and the data reception allocation group;
a preset data amount determining subunit, configured to take half of the data amount difference as the preset data amount;
and the data migration subunit is used for migrating the preset data quantity from the data release allocation group to the data receiving allocation group in the data migration time period.
The working principle of each module in the data balancing device of the distributed storage system in this embodiment is the same as that of each step in the above method embodiment, and will not be described here again.
Based on the above embodiment, the present invention also provides a terminal, and a schematic block diagram of the terminal may be shown in fig. 3. The terminal may include one or more processors 100 (only one shown in fig. 3), a memory 101, and a computer program 102 stored in the memory 101 and executable on the one or more processors 100, such as a data balancing program of a distributed storage system. The one or more processors 100, when executing computer program 102, may implement the various steps in a data balancing method embodiment of a distributed storage system. Alternatively, the functions of the modules/units in the data balancing apparatus embodiment of the distributed storage system may be implemented by one or more processors 100 when executing computer program 102, which is not limited herein.
In one embodiment, the processor 100 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In one embodiment, the memory 101 may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The memory 101 may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the electronic device. Further, the memory 101 may also include both an internal storage unit and an external storage device of the electronic device. The memory 101 is used to store computer programs and other programs and data required by the terminal. The memory 101 may also be used to temporarily store data that has been output or is to be output.
It will be appreciated by those skilled in the art that the functional block diagram shown in fig. 3 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the terminal to which the present inventive arrangements may be applied, as a specific terminal may include more or less components than those shown, or may be combined with some components, or may have a different arrangement of components.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium, that when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, operational database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual operation data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (4)
1. A method of data balancing for a distributed storage system, the method comprising:
acquiring data use information of each allocation group in a distributed storage system, and determining a data use low-valley period based on the data use information, wherein the data use low-valley period reflects a period with the least data use amount of the distributed storage system;
acquiring data storage amounts corresponding to the various homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts;
migrating a preset data amount in the data release allocation group to the data receiving allocation group based on the data using the valley period;
the acquiring the data use information of each allocation group in the distributed storage system, and determining the data use low-valley time period based on the data use information comprises the following steps:
recording data use information of all the homing groups in a period by taking the time of a day as the period, and determining the data use amount of each moment of all the homing groups based on the data use information;
comparing the data usage amount at each moment with a preset usage threshold amount, and determining a data usage valley time period;
comparing the data usage amount at each moment with a preset usage threshold amount, and determining the data usage valley period comprises the following steps:
comparing the data usage amount of each moment with the usage threshold amount respectively, and screening out a plurality of target moments of which the data usage amount is smaller than the usage threshold amount;
splicing adjacent time points in the target time points according to a time sequence to obtain a plurality of candidate time periods, and when one time point in the target time points is isolated and the adjacent time point does not exist, the time point is independently a candidate time period;
respectively acquiring the data usage amount corresponding to each candidate time period, and taking the candidate time period with the minimum data usage amount as the data usage valley time period;
the obtaining the data storage amount corresponding to each homing group, and determining the data receiving homing group and the data releasing homing group based on the data storage amount, includes:
acquiring a preset first storage capacity threshold value and a preset second storage capacity threshold value, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value;
respectively comparing the data storage amount corresponding to each reset group with the first storage amount threshold value and the second storage amount threshold value;
determining a plurality of homing groups of which the data storage amount is larger than the first storage amount threshold value to obtain a first candidate homing group;
determining a plurality of homing groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate homing group;
determining the data reception and release homing groups based on the first and second candidate homing groups;
the determining the data reception and release homing groups based on the first and second candidate homing groups comprises:
sorting a plurality of the first candidate preset groups from high to low in data storage amount to obtain the preset group with the highest data storage amount, and taking the preset group with the highest data storage amount as the data release preset group;
sorting a plurality of the second candidate preset groups from low data storage to high data storage to obtain the preset group with the lowest data storage, and taking the preset group with the lowest data storage as the data receiving preset group;
or,
the determining the data reception and release homing groups based on the first and second candidate homing groups comprises:
taking each of the first candidate allocation groups as a data release allocation group;
taking each of the second candidate allocation groups as a data receiving allocation group;
when data migration is carried out, forming a data migration group by the highest data storage capacity of the first candidate allocation group and the lowest data storage capacity of the second candidate allocation group, and carrying out data migration between the first candidate allocation group and the second candidate allocation group;
forming a data migration group by arranging the second data storage amount in the first candidate arranging group and the last second data storage amount in the second candidate arranging group, performing data migration among the first candidate arranging group and the second candidate arranging group, and so on to form a plurality of data migration groups and performing data migration;
the migration of the preset data amount in the data release configuration group to the data receiving configuration group based on the data using the valley period includes:
acquiring intersection of the data using the valley time period and a preset configuration time period to obtain a data migration time period;
migrating a preset data amount in the data release allocation group to the data receiving allocation group in the data migration time period;
the step of migrating the preset data amount in the data release allocation group to the data receiving allocation group in the data migration time period includes:
determining a data quantity difference value according to the data release allocation group and the data receiving allocation group;
taking half of the data volume difference value as the preset data volume;
and in the data migration time period, migrating the preset data quantity from the data release allocation group to the data receiving allocation group.
2. A data balancing apparatus for a distributed storage system, the apparatus comprising:
the data use analysis module is used for acquiring data use information of each homing group in the distributed storage system and determining a data use low-valley time period based on the data use information, wherein the data use low-valley time period reflects a time period with the minimum data use amount of the distributed storage system;
the data storage quantity determining module is used for obtaining the data storage quantity corresponding to each resetting group and determining a data receiving resetting group and a data releasing resetting group based on the data storage quantity;
the data migration module is used for migrating preset data quantity in the data release allocation group to the data receiving allocation group based on the data using valley time period;
the data usage analysis module includes:
a usage amount acquisition unit configured to record data usage information of all the allocation groups in a period with a time of day as a period, and determine a data usage amount of each time of all the allocation groups based on the data usage information;
the usage amount analysis unit is used for comparing the data usage amount at each moment with a preset usage threshold amount and determining a data usage valley time period;
the usage analysis unit includes:
the target time determining subunit is used for respectively comparing the data use quantity of each time with the use threshold quantity and screening out a plurality of target times of which the data use quantity is smaller than the use threshold quantity;
a candidate time period determining subunit, configured to splice adjacent time instants in the plurality of target time instants according to a time sequence, so that a plurality of candidate time periods are obtained, and when a certain time instant in the target time instants is isolated and no time instant adjacent to the certain time instant exists, the time instant alone becomes a candidate time period;
the data processing unit is used for determining a data usage amount of each candidate time period according to the data usage amount of the candidate time period;
the homing group determination module comprises:
the storage capacity threshold value acquisition unit is used for acquiring a preset first storage capacity threshold value and a preset second storage capacity threshold value, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value;
the storage quantity comparison unit is used for respectively comparing the data storage quantity corresponding to each reset group with the first storage quantity threshold value and the second storage quantity threshold value;
the first candidate allocation group determining unit is used for determining a plurality of allocation groups of which the data storage amount is larger than the first storage amount threshold value to obtain a first candidate allocation group;
the second candidate allocation group determining unit is used for determining a plurality of allocation groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate allocation group;
a preset group screening unit, configured to determine the data receiving preset group and the data releasing preset group based on the first candidate preset group and the second candidate preset group;
the homing group screening unit comprises:
the data release and allocation group determining subunit is used for sequencing a plurality of allocation groups in the first candidate allocation group from high data storage amount to low data storage amount to obtain the allocation group with the highest data storage amount, and taking the allocation group with the highest data storage amount as the data release and allocation group;
a data receiving and arranging group determining subunit, configured to sort a plurality of arranging groups in the second candidate arranging group from low data storage to high data storage, obtain an arranging group with the lowest data storage, and use the arranging group with the lowest data storage as the data receiving and arranging group;
or,
the homing group screening unit comprises:
taking each of the first candidate allocation groups as a data release allocation group;
taking each of the second candidate allocation groups as a data receiving allocation group;
when data migration is carried out, forming a data migration group by the highest data storage capacity of the first candidate allocation group and the lowest data storage capacity of the second candidate allocation group, and carrying out data migration between the first candidate allocation group and the second candidate allocation group;
forming a data migration group by arranging the second data storage amount in the first candidate arranging group and the last second data storage amount in the second candidate arranging group, performing data migration among the first candidate arranging group and the second candidate arranging group, and so on to form a plurality of data migration groups and performing data migration;
the data migration module comprises:
the time period analysis unit is used for acquiring an intersection of the data use valley time period and a preset configuration time period to obtain a data migration time period;
the data migration execution unit is used for migrating the preset data quantity in the data release configuration group to the data receiving configuration group in the data migration time period;
the data migration execution unit includes:
a difference value determining subunit, configured to determine a data amount difference value according to the data release allocation group and the data reception allocation group;
a preset data amount determining subunit, configured to take half of the data amount difference as the preset data amount;
and the data migration subunit is used for migrating the preset data quantity from the data release allocation group to the data receiving allocation group in the data migration time period.
3. A terminal comprising a memory, a processor and a data balancing program of a distributed storage system stored in the memory and operable on the processor, the processor implementing the steps of the data balancing method of the distributed storage system as claimed in claim 1 when executing the data balancing program of the distributed storage system.
4. A computer readable storage medium, wherein a data balancing program of a distributed storage system is stored on the computer readable storage medium, and when the data balancing program of the distributed storage system is executed by a processor, the steps of the data balancing method of the distributed storage system according to claim 1 are implemented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311777759.6A CN117453149B (en) | 2023-12-22 | 2023-12-22 | Data balancing method, device, terminal and storage medium of distributed storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311777759.6A CN117453149B (en) | 2023-12-22 | 2023-12-22 | Data balancing method, device, terminal and storage medium of distributed storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117453149A CN117453149A (en) | 2024-01-26 |
CN117453149B true CN117453149B (en) | 2024-04-09 |
Family
ID=89580224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311777759.6A Active CN117453149B (en) | 2023-12-22 | 2023-12-22 | Data balancing method, device, terminal and storage medium of distributed storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117453149B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108023967A (en) * | 2017-12-20 | 2018-05-11 | 联想(北京)有限公司 | A kind of management equipment in data balancing method, apparatus and distributed memory system |
CN108197229A (en) * | 2017-12-29 | 2018-06-22 | 北京搜狐新媒体信息技术有限公司 | The balance method and system of a kind of data in magnetic disk |
CN111290699A (en) * | 2018-12-07 | 2020-06-16 | 杭州海康威视系统技术有限公司 | Data migration method, device and system |
CN111611055A (en) * | 2020-05-27 | 2020-09-01 | 上海有孚智数云创数字科技有限公司 | Virtual equipment optimal idle time migration method and device and readable storage medium |
CN115203177A (en) * | 2022-09-16 | 2022-10-18 | 北京智阅网络科技有限公司 | Distributed data storage system and storage method |
-
2023
- 2023-12-22 CN CN202311777759.6A patent/CN117453149B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108023967A (en) * | 2017-12-20 | 2018-05-11 | 联想(北京)有限公司 | A kind of management equipment in data balancing method, apparatus and distributed memory system |
CN108197229A (en) * | 2017-12-29 | 2018-06-22 | 北京搜狐新媒体信息技术有限公司 | The balance method and system of a kind of data in magnetic disk |
CN111290699A (en) * | 2018-12-07 | 2020-06-16 | 杭州海康威视系统技术有限公司 | Data migration method, device and system |
CN111611055A (en) * | 2020-05-27 | 2020-09-01 | 上海有孚智数云创数字科技有限公司 | Virtual equipment optimal idle time migration method and device and readable storage medium |
CN115203177A (en) * | 2022-09-16 | 2022-10-18 | 北京智阅网络科技有限公司 | Distributed data storage system and storage method |
Also Published As
Publication number | Publication date |
---|---|
CN117453149A (en) | 2024-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108846749B (en) | Partitioned transaction execution system and method based on block chain technology | |
CN106407207B (en) | Real-time newly-added data updating method and device | |
US20200210815A1 (en) | Output method and apparatus for multiple neural network, server and computer readable storage medium | |
US10613992B2 (en) | Systems and methods for remote procedure call | |
CN111596927B (en) | Service deployment method and device and electronic equipment | |
CN112133357B (en) | eMMC test method and device | |
CN110543279B (en) | Data storage and processing method, device and system | |
US20240272998A1 (en) | Method and apparatus for improving quality of service of SSD, and computer device and storage medium | |
CN109002348B (en) | Load balancing method and device in virtualization system | |
CN110764930A (en) | Request or response processing method and device based on message mode | |
US20210263867A1 (en) | Memory protocol with command priority | |
CN117453149B (en) | Data balancing method, device, terminal and storage medium of distributed storage system | |
CN110795215B (en) | Data processing method, computer equipment and storage medium | |
CN117762892B (en) | Data distribution control method, device, terminal and medium of distributed storage system | |
CN109298974B (en) | System control method, device, computer and computer readable storage medium | |
CN111459937A (en) | Data table association method, device, server and storage medium | |
CN116594734A (en) | Container migration method and device, storage medium and electronic equipment | |
US20180373746A1 (en) | Table partition configuration method, apparatus and system for database system | |
CN117453148B (en) | Data balancing method, device, terminal and storage medium based on neural network | |
CN112216333B (en) | Chip testing method and device | |
CN112286704B (en) | Processing method and device of delay task, computer equipment and storage medium | |
CN115358331A (en) | Device type identification method and device, computer readable storage medium and terminal | |
CN114595063A (en) | Method and device for determining access frequency of data | |
US11561934B2 (en) | Data storage method and method for executing an application with reduced access time to the stored data | |
CN110727701A (en) | Application automatic allocation method, device, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |