CN111475108B - Distributed storage method, computer equipment and computer readable storage medium - Google Patents

Distributed storage method, computer equipment and computer readable storage medium Download PDF

Info

Publication number
CN111475108B
CN111475108B CN202010199725.3A CN202010199725A CN111475108B CN 111475108 B CN111475108 B CN 111475108B CN 202010199725 A CN202010199725 A CN 202010199725A CN 111475108 B CN111475108 B CN 111475108B
Authority
CN
China
Prior art keywords
file
fragmented
files
copy
copies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010199725.3A
Other languages
Chinese (zh)
Other versions
CN111475108A (en
Inventor
郑映锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Saiante Technology Service Co Ltd
Original Assignee
Shenzhen Saiante Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Saiante Technology Service Co Ltd filed Critical Shenzhen Saiante Technology Service Co Ltd
Priority to CN202010199725.3A priority Critical patent/CN111475108B/en
Publication of CN111475108A publication Critical patent/CN111475108A/en
Application granted granted Critical
Publication of CN111475108B publication Critical patent/CN111475108B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is applicable to the technical field of computer application, and provides a distributed storage method, computer equipment and a computer readable storage medium, comprising the following steps: the method comprises the steps of obtaining access quantity of fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files; calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat; and adjusting the number of copies of the fragmented files according to the copy demand. The method comprises the steps of calculating file heat of an original file according to access amounts of all fragmented files of the original file, predicting the predicted copy number of each fragmented file according to the file heat, and finally adjusting the copy number of each fragmented file according to the predicted copy number and the real-time number of each fragmented file, so that the load of a storage node is reduced on the premise of guaranteeing the reliability of data, and the aim of load balancing is achieved.

Description

Distributed storage method, computer equipment and computer readable storage medium
Technical Field
The present application relates to the field of computer application technologies, and in particular, to a distributed storage method, a computer device, and a computer readable storage medium.
Background
Because of flexible deployment, the P2P-based network storage technology is developing at a high speed, and more popular items in foreign countries, such as Inter-satellite file system (Inter-Planetary File System, IPFS) and the like, are all distributed storage services based on P2P technology to realize quasi commercialization. Based on the unreliability and randomness of the P2P node, the integrity of the data can be ensured only by storing the data copy in a redundant way, and the data processing terminal can acquire the complete data on the premise of a certain redundancy. In the prior art, the integrity of the data is ensured through processing and storing a plurality of times and a large number of data copies, but the high redundancy data also brings about the problem of huge storage pressure.
Disclosure of Invention
The embodiment of the application provides a distributed storage method, computer equipment and a computer readable storage medium, which can solve the problems of high redundancy and high storage pressure in the process of processing and storing data copies in the prior art.
In a first aspect, an embodiment of the present application provides a distributed storage method, including:
obtaining the access quantity of the fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files;
calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat;
and adjusting the number of copies of the fragmented files according to the copy demand.
It is understood that the file heat of the original file is calculated according to the access amount of all the fragmented files of the original file, the predicted copy number of each fragmented file is predicted according to the file heat, and finally the copy number of each fragmented file is adjusted according to the predicted copy number and the real-time number of each fragmented file, so that the load of a storage node is reduced on the premise of ensuring the reliability of data, and the purpose of load balancing is achieved.
In a second aspect, embodiments of the present application provide a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
Obtaining the access quantity of the fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files;
calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat;
and adjusting the number of copies of the fragmented files according to the copy demand.
In a third aspect, an embodiment of the present application provides a computer apparatus, including:
the acquisition unit is used for acquiring the access quantity of the fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files;
the prediction unit is used for calculating the file heat of the original file according to the access amount and predicting the copy demand of each fragmented file according to the file heat;
and the adjusting unit is used for adjusting the number of copies of the fragmented files according to the copy demand.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect described above.
In a fifth aspect, an embodiment of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to perform the distributed storage method according to any of the first aspects above.
It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Compared with the prior art, the embodiment of the application has the beneficial effects that: the method comprises the steps of obtaining access quantity of fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files; calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat; and adjusting the number of copies of the fragmented files according to the copy demand. The method comprises the steps of calculating file heat of an original file according to access amounts of all fragmented files of the original file, predicting the predicted copy number of each fragmented file according to the file heat, and finally adjusting the copy number of each fragmented file according to the predicted copy number and the real-time number of each fragmented file, so that the load of a storage node is reduced on the premise of guaranteeing the reliability of data, and the aim of load balancing is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a distributed storage method according to a first embodiment of the present application;
FIG. 2 is a flowchart of a distributed storage method according to a second embodiment of the present application;
FIG. 3 is a schematic diagram of a computer device according to a third embodiment of the present application;
fig. 4 is a schematic diagram of a computer device according to a fourth embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Referring to fig. 1, fig. 1 is a flowchart of a distributed storage method according to a first embodiment of the present application. The execution body of the distributed storage method in this embodiment is a device with a distributed storage function, including but not limited to a computer, a server, a tablet computer, a terminal, or the like. The distributed storage method as shown in the figure may include the steps of:
s101: obtaining the access quantity of the fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files.
The distributed storage system dispersedly stores data on a plurality of independent devices, the traditional network storage system adopts a centralized storage server to store all data, and the storage server becomes a bottleneck of system performance, is a focus of reliability and safety, and cannot meet the requirement of large-scale storage application. The distributed network storage system adopts an expandable system structure, utilizes a plurality of storage servers to share the storage load, and utilizes the position servers to position the storage information, thereby improving the reliability, availability and access efficiency of the system and being easy to expand. In a distributed system, storage nodes are easy to be abnormal and fail, and the failure of the storage nodes can lead to the loss of data stored by the nodes, thereby directly affecting the reliability of the system and the data. Distributed systems often use a full-cost redundancy strategy, i.e., utilizing space in exchange for higher fault tolerance. The realization of the full copy data redundancy strategy is visual, and after a file generates a plurality of copy files through full copy, the copy files are deployed in a server for storage, so that the backup of the system files is realized. When the file fails and needs to be restored, the original file data can be restored through any copy file in the plurality of copies. The full-copy data redundancy strategy is simple to implement, access delay can be effectively reduced, usability of stored data is improved, the more the number of file copies is, the stronger the reliability of the file data is, but the physical space consumption of a server is larger, and storage cost is increased. The reasonable copy number and copy placement strategy can make the full copy data redundancy strategy more efficient, otherwise, the storage cost is larger, and the overall performance of the storage system is affected.
The setting of the copy number is the key of a copy strategy, and the higher copy number can improve the reliability and the usability of the data file and the storage system and ensure the load balance of the system. But the overhead increases with it, including storage, transmission, and copy maintenance. The lower copy number can reduce the reliability and availability of the data file and the storage system, and when node failure occurs, the data is extremely easy to lose and cannot be recovered. Proper placement of file copies in the storage system may reduce access load, balancing load to idle storage servers to respond to requests. If placed improperly, the local load pressure may be excessive. Optionally, in order to solve the effective operation of the copies, the specific number and storage positions of the copies may be set in advance, if the storage state changes, the set number and positions of the copies remain unchanged all the time, so that the later-stage copy management is easier, but because the access of the file is changed at any time, the storage node position of the data file also changes, and thus file access failure and storage space waste may be caused.
In this embodiment, the fragmented files are obtained by performing a fragmentation process on an original file, copying each fragmented file to obtain at least two fragmented file copies, and storing the fragmented file copies in different storage nodes, so as to obtain corresponding fragmented files when obtaining the original file. In this embodiment, the access amount of the fragmented files belonging to the same original file in the preset period is acquired first. It should be noted that, here, the fragmented files belong to the same original file, and the obtained access amount may be for one fragmented file or for at least two fragmented files. The access amount in this embodiment may be the number of times of calling all the partitioned file copies of one partitioned file, and by counting the number of times of calling the partitioned file copies in all the storage nodes, the number of times of calling the partitioned file copies is used as the access amount of the partitioned file belonging to the same original file in a preset time.
S102: and calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat.
Considering that the number of copies and storage positions of the data files can be adjusted according to the change of the storage state of the system, in the embodiment, the file heat is calculated through the access amount, so that the copy demand amount corresponding to each fragmented file is determined. However, for the data file with lower heat, if the number of copies is set to be too high, the storage space is wasted greatly; if the number of copies is set too low, the reliability of the file is significantly reduced. In order to solve the problems, the number of copies of the data files with low heat is reduced, so that on one hand, the reliability of the data files is ensured, and on the other hand, the storage space is saved. The access quantity of all the fragmented files of the original file is obtained, so that the file heat of the original file is calculated according to the access condition of each fragmented file of the original file, and the copy demand quantity of each fragmented file is predicted according to the file heat.
S103: and adjusting the number of copies of the fragmented files according to the copy demand.
The copy demand of the embodiment is proportional to the file heat of the fragmented file, and when the file heat of the fragmented file is higher, the copy demand corresponding to the fragmented file is high, so that the higher fragmented file calling demand can be met through a larger number of fragmented file copies. After the copy demand is determined, the number of copies of the fragmented file is adjusted according to the copy demand.
Further, step S103 includes S1031 to S1033:
s1031: and counting the number of copies of the fragmented file stored in all preset storage nodes corresponding to the fragmented file.
In the distributed storage, each storage node receives a call of the using terminal while storing the copy of the fragmented file, and in this case, the storage node sends the copy of the fragmented file to the using terminal. Meanwhile, the storage node may delete or modify the stored copy of the fragmented file due to the storage space or data reading and other reasons; further, the storage node may be attacked by an attacker, and the data stored in the storage node may be deleted or modified, in which case the total amount of the fragmented files stored in all the storage nodes in the system may be different from the number of copies of the fragmented files generated at the beginning, so that the number of copies of the fragmented files in the storage node needs to be counted in real time.
It should be noted that, in this embodiment, at least one preset storage node is provided for each segmented file, where the storage nodes are used to store the segmented file copies of the segmented file, and by using a fixed storage mode, traceability of the segmented file copies is ensured, and reliability of file storage is improved. Meanwhile, one storage node can store the partitioned file copies of a plurality of partitioned files so as to ensure the storage utilization rate of one storage node.
It should be noted that, the above operations are cyclically executed in each preset period, and the number of copies in each preset period is adjusted. If the monitoring program detects that no file access operation exists, the strategy program is not executed. Load balancing is to manage copies according to the load condition of the system, and store multiple copies in a storage server with lighter load. If there is only one copy in the server, when a single copy is accessed, all access read-write operations are submitted to the server of the copy, so that serious congestion occurs in the server, and the load pressure of the server is difficult to bear. When the file is copied into a plurality of copies and the copies are stored in a plurality of storage nodes, the request load can be balanced to the plurality of storage nodes, so that the system load is reduced, and the availability of the system is improved. The problem of unbalanced node load caused by high access quantity is solved, and the problem of storage space waste caused by less access quantity data is solved.
Further, step S1031 includes: detecting whether the copy of the fragmented file stored by the storage node corresponding to the preset fragmented file is lost or not; and if the copy of the fragmented file stored by the storage node is lost, counting the copy number of the copy of the fragmented file in all the storage nodes.
Specifically, in this embodiment, node identifiers of all storage nodes storing each shard file may be recorded, and periodic inspection is performed according to the node identifiers to verify whether the shard file copies are safely and completely stored in the storage nodes, and when it is detected that the shard file copies stored in the storage nodes are lost, the copy number of the shard file copies in all the storage nodes is counted.
Furthermore, whether the partitioned file in the storage node has the integrity can be detected, and the partitioned file copies without the integrity are not counted into the number of copies, so that the correctness of the counted number of the copies is ensured.
S1032: and if the number of copies is smaller than the copy demand, copying the fragmented file to generate fragmented file copies.
If the number of copies is smaller than the required number of copies, the number of copies of the fragmented file is increased, namely the copied fragmented file generates fragmented file copies. By the method, more partitioned file copies with large demand can be stored in the storage node, calling efficiency of the partitioned files is guaranteed, meanwhile, for partitioned files with low demand, storage capacity of the partitioned file copies in the storage node is reduced, redundant data in the storage node is reduced, and storage efficiency of partitioned file storage and space utilization of the storage node are improved.
S1033: and identifying a storage node which does not store the copy of the fragmented file as a target storage node in all storage nodes corresponding to the fragmented file and preset, and sending the copy of the fragmented file to the target storage node.
In this embodiment, at least one storage node corresponding to a sliced file is preset for the sliced file, after the number of copies is smaller than the required number of copies, the number of copies of the sliced file is increased, whether the sliced file copies are stored in all storage nodes corresponding to the sliced file is detected, if the sliced file copies are not stored, the storage node is identified as a target storage node, and the sliced file copies are sent to the target storage node.
Alternatively, the method may further include acquiring real-time data storage amounts in storage spaces of all current storage nodes in real time, determining that the storage nodes with the real-time data storage amounts smaller than a preset storage threshold are target nodes, and sending the segmented file copies to the target nodes to store the segmented file copies corresponding to the segmented files.
Further, step S103 further includes: and adjusting the number of copies of the fragmented files according to the copy demand and a preset adjustment period.
In this embodiment, an adjustment period is preset, so as to regulate and control the number of copies of the fragmented file in a system according to the adjustment period, so as to achieve the purpose of balancing the number of copies of the system, so that the number of copies is not too small to affect the retrieval of the fragmented file, and meanwhile, the excessive load pressure of the system is not caused, and further, the retrieval efficiency and the utilization rate of the copies of the file are improved on the basis of load balancing.
Specifically, after the file copy is adjusted, a preset period is maintained, and when the next preset period is reached, the file heat of the original file is calculated according to the access condition of each fragmented file in the current period, and the number of predicted copies is recalculated, and the specific calculation manner is described in detail in step S2022 in the following embodiment, which is not described here again.
And after the number of the predicted copies is calculated, carrying out copy strategy adjustment according to the number of the predicted copies.
Specifically, when the number of predicted copies is greater than the number of copies of the fragmented files stored in the current storage node, adaptively reducing the number of current copies, namely deleting a part of file copies stored in the storage node, so that the number of current copies is the same as the number of preset copies; when the number of predicted copies is smaller than the number of copies of the fragmented files stored in the current storage node, adaptively increasing the number of current copies, namely adding some storage nodes for storing more number of file copies, so that the number of current copies is the same as the number of preset copies; when the current storage copy number is the same as the predicted copy number, the original storage node and the copy number are maintained.
According to the scheme, the access amount of the fragmented files belonging to the same original file in a preset period is obtained; the fragmented files are obtained by performing fragmentation processing on the original files; calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat; and adjusting the number of copies of the fragmented files according to the copy demand. The method comprises the steps of calculating file heat of an original file according to access amounts of all fragmented files of the original file, predicting the predicted copy number of each fragmented file according to the file heat, and finally adjusting the copy number of each fragmented file according to the predicted copy number and the real-time number of each fragmented file, so that the load of a storage node is reduced on the premise of guaranteeing the reliability of data, and the aim of load balancing is achieved.
Referring to fig. 2, fig. 2 is a flowchart of a distributed storage method according to a second embodiment of the present application. The execution body of the distributed storage method in this embodiment is a device with a distributed storage function, including but not limited to a computer, a server, a tablet computer, a terminal, or the like. The distributed storage method as shown in the figure may include the steps of:
S201: obtaining the access quantity of the fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files.
In this embodiment, the fragmented files are obtained by performing a fragmentation process on an original file, copying each fragmented file to obtain at least two fragmented file copies, and storing the fragmented file copies in different storage nodes, so as to obtain corresponding fragmented files when obtaining the original file. In this embodiment, the access amount of the fragmented files belonging to the same original file in the preset period is acquired first. It should be noted that, here, the fragmented files belong to the same original file, and the obtained access amount may be for one fragmented file or for at least two fragmented files. The access amount in this embodiment may be the number of times of calling all the partitioned file copies of one partitioned file, and by counting the number of times of calling the partitioned file copies in all the storage nodes, the number of times of calling the partitioned file copies is used as the access amount of the partitioned file belonging to the same original file in a preset time.
S202: and calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat.
Considering that the number of copies and storage positions of the data files can be adjusted according to the change of the storage state of the system, in the embodiment, the file heat is calculated through the access amount, so that the copy demand amount corresponding to each fragmented file is determined. However, for the data file with lower heat, if the number of copies is set to be too high, the storage space is wasted greatly; if the number of copies is set too low, the reliability of the file is significantly reduced. In order to solve the problems, the number of copies of the data files with low heat is reduced, so that on one hand, the reliability of the data files is ensured, and on the other hand, the storage space is saved. The access quantity of all the fragmented files of the original file is obtained, so that the file heat of the original file is calculated according to the access condition of each fragmented file of the original file, and the copy demand quantity of each fragmented file is predicted according to the file heat.
Further, step S202 may include S2021 to S2023:
s2021: and calculating the heat of each fragment file according to the access quantity.
In this embodiment, the heat of each of the partitioned files is calculated according to the following formula:
wherein H is (t,a) The heat degree of the fragmented file a within the preset time t is represented by H (t-1,a) The heat of the fragmented file a in the period before the preset time t is represented; m is m (t,a) The access quantity of the fragmented file a in the preset time t is represented; n is n (t,A) Representing the total access amount of the original file A within a preset time t; the values of the historical access coefficients i and j are set according to the fluctuation size of the historical access quantity;
specifically, if a file access operation occurs, starting a timer, counting the access quantity of each fragmented file in a preset time period, and after the preset time period is counted, calculating the heat of the fragmented file in the system as follows:wherein H is (t,a) For indicating the heat degree of the fragmented file a within a preset time t, H (t-1,a) The heat of the fragmented file a in the period before the preset time t is represented; m is m (t,a) The method is used for representing the total access quantity of the fragmented file a in the preset time t; n is n (t,A) The method is used for representing the total access quantity of the original file A within the preset time t; the values of the history access coefficients i and j are set according to the fluctuation of the history access quantity and satisfy i>0,j>0 and i+j=1. i. The selection of the j coefficient can be set according to different conditions of users, and when the value of i is more approximate to 1, the history access quantity fluctuation of the file is shown to be relatively highThe heat of the current preset period is not greatly fluctuated by the access quantity, and the whole access condition is considered; when the value of j is more approximate to 1, the historical access quantity fluctuation of the file is smaller, the access frequency of the current preset period can represent the access frequency of a plurality of preset periods, and the system is more important to the latest access condition.
S2022: and calculating the file heat of the original file according to the heat of each fragmented file.
In this embodiment, the file heat of the original file is calculated by the following formula:
wherein H is (t,avr) Representing the file heat of the original file; n represents the number of fragmented files of the original file.
S2023: and calculating the copy demand of each fragmented file according to the file heat and the total number of the fragmented files corresponding to the original file.
In this embodiment, the copy requirement of each of the fragmented files is calculated according to the following formula:
NUM=log 2 n·H (t,avr)
wherein H is (t,avr) Representing the file heat of the original file; n represents the number of fragmented files of the original file.
S203: and obtaining the real-time copy number of the fragmented file.
In the distributed storage, each storage node receives a call of the using terminal while storing the copy of the fragmented file, and in this case, the storage node sends the copy of the fragmented file to the using terminal. Meanwhile, the storage node may delete or modify the stored copy of the fragmented file due to the storage space or data reading and other reasons; further, the storage node may be attacked by an attacker, and the data stored in the storage node may be deleted or modified, in which case the total amount of the fragmented files stored in all the storage nodes in the system may be different from the number of copies of the fragmented files generated at the beginning, so that the number of copies of the fragmented files in the storage node needs to be counted in real time.
S204: and identifying the fragmented files with the real-time copy number larger than the real-time copy demand as fragmented files to be deleted, and determining storage nodes corresponding to the fragmented files to be deleted.
For the segmented files with the real-time copy number larger than the real-time copy demand, the copy number of the segmented files needs to be reduced. Specifically, in this embodiment, the fragmented files when the number of real-time copies is greater than the real-time copy demand are identified as the fragmented files to be deleted, and storage nodes corresponding to the fragmented files to be deleted are determined.
Specifically, in this embodiment, node identifiers of all storage nodes corresponding to each shard file copy may be recorded, and after determining that the shard file is to be deleted, the node identifiers of all storage nodes corresponding to the shard file to be deleted may be determined according to the node identifiers of the shard file copies recorded in advance and according to the real-time copy demand.
S205: and deleting the fragment file stored in the storage node corresponding to the fragment file to be deleted.
And deleting the sharded file stored in the storage nodes corresponding to the sharded file to be deleted after determining the node identifiers of all the storage nodes corresponding to the sharded file to be deleted. By the method, the storage amount of the segmented file copies in the storage nodes is reduced for the segmented files with lower demand, redundant data in the storage nodes is reduced, and the storage efficiency of segmented file storage and the space utilization of the storage nodes are improved.
According to the scheme, the access amount of the fragmented files belonging to the same original file in a preset period is obtained; the fragmented files are obtained by performing fragmentation processing on the original files; calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat; acquiring the number of real-time copies of the fragmented file; identifying the fragmented files with the real-time copy number larger than the real-time copy demand as fragmented files to be deleted, and determining storage nodes corresponding to the fragmented files to be deleted; and deleting the fragment file stored in the storage node corresponding to the fragment file to be deleted. According to the method, the file heat of the original file is calculated according to the access amount of all the fragmented files of the original file, the predicted copy number of each fragmented file is predicted according to the file heat, and finally, file copies of fragmented files with the real-time copy number larger than the real-time copy demand are deleted according to the predicted copy number and the real-time copy number of the fragmented files, so that the load of a storage node is reduced on the premise of guaranteeing the reliability of data, and the storage efficiency of the fragmented file storage and the space utilization of the storage node are improved.
Referring to fig. 3, fig. 3 is a schematic diagram of a computer device according to a third embodiment of the present application. The computer device 300 may be a mobile terminal such as a smart phone, tablet computer, etc. The computer device 300 of the present embodiment includes units for performing the steps in the embodiment corresponding to fig. 1, and refer to fig. 1 and the related descriptions in the embodiment corresponding to fig. 1, which are not repeated herein. The computer device 300 of the present embodiment includes:
an obtaining unit 301, configured to obtain an access amount of a fragmented file belonging to the same original file within a preset period; the fragmented files are obtained by performing fragmentation processing on the original files;
a prediction unit 302, configured to calculate a file heat of the original file according to the access amount, and predict a copy demand of each of the fragmented files according to the file heat;
and the adjusting unit 303 is configured to adjust the number of copies of the fragmented file according to the copy demand.
Further, the adjusting unit 303 includes:
the statistics unit is used for counting the number of copies of the fragmented files stored in all preset storage nodes corresponding to the fragmented files;
the generation unit is used for copying the fragmented file to generate fragmented file copies if the number of the copies is smaller than the copy demand;
And the sending unit is used for identifying the storage node which does not store the copy of the fragmented file as a target storage node in all storage nodes corresponding to the fragmented file and sending the copy of the fragmented file to the target storage node.
Further, the statistics unit includes:
the detection unit is used for detecting whether the copy of the fragmented file stored by the storage node corresponding to the preset fragmented file is lost or not;
and the copy statistics unit is used for counting the number of copies of the fragmented file copies in all the storage nodes if the fragmented file copies stored in the storage nodes are lost.
Further, the prediction unit 302 includes:
a first calculation unit, configured to calculate a heat degree of each of the fragmented files according to the access amount;
the second calculation unit is used for calculating the file heat of the original file according to the heat of each fragmented file;
and the third calculation unit is used for calculating the copy demand of each fragmented file according to the file heat and the total number of the fragmented files corresponding to the original file.
Further, the calculating the heat of each of the fragmented files according to the access amount includes:
The heat of each piece of files is calculated through the following formula:
wherein H is (t,a) The heat degree of the fragmented file a within the preset time t is represented by H (t-1,a) The heat of the fragmented file a in the period before the preset time t is represented; m is m (t,a) The access quantity of the fragmented file a in the preset time t is represented; n is n (t,A) Representing the total access amount of the original file A within a preset time t; the values of the history access coefficients i and j are set according to the fluctuation of the history file access amount;
Further, the calculating the file heat of the original file according to the heat of each fragmented file includes:
the file heat of the original file is calculated by the following formula:
wherein H is (t,avr) Representing the file heat of the original file; n represents the number of fragmented files of the original file;
further, the calculating the copy demand of each of the fragmented files according to the file heat and the total number of the fragmented files corresponding to the original file includes:
the copy demand of each of the fragmented files is calculated by the following formula:
NUM=log 2 n·H (t,avr)
further, the adjusting unit 303 includes:
and the period adjusting unit is used for adjusting the number of the copies of the fragmented files according to the copy demand and a preset adjusting period.
Further, the adjusting unit 303 includes:
the real-time acquisition unit is used for acquiring the number of real-time copies of the fragmented files;
the identifying unit is used for identifying the fragmented files with the real-time copy number larger than the real-time copy demand as fragmented files to be deleted and determining storage nodes corresponding to the fragmented files to be deleted;
and the deleting unit is used for deleting the fragment file stored in the storage node corresponding to the fragment file to be deleted.
According to the scheme, the access amount of the fragmented files belonging to the same original file in a preset period is obtained; the fragmented files are obtained by performing fragmentation processing on the original files; calculating the file heat of the original file according to the access quantity, and predicting the copy demand quantity of each fragmented file according to the file heat; and adjusting the number of copies of the fragmented files according to the copy demand. The method comprises the steps of calculating file heat of an original file according to access amounts of all fragmented files of the original file, predicting the predicted copy number of each fragmented file according to the file heat, and finally adjusting the copy number of each fragmented file according to the predicted copy number and the real-time number of each fragmented file, so that the load of a storage node is reduced on the premise of guaranteeing the reliability of data, and the aim of load balancing is achieved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Fig. 4 is a schematic diagram of a computer device according to a fourth embodiment of the present application. As shown in fig. 4, the computer device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42 stored in the memory 41 and executable on the processor 40. The processor 40, when executing the computer program 42, implements the steps of the various distributed storage method embodiments described above, such as the steps shown in fig. 1. Alternatively, the processor 40 may perform the functions of the modules/units of the apparatus embodiments described above, such as the functions of the units shown in fig. 3, when executing the computer program 42.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used to describe the execution of the computer program 42 in the computer device 4.
The computer device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The terminal device may include, but is not limited to, a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the computer device 4 and is not meant to be limiting as the computer device 4, may include more or fewer components than shown, or may combine certain components, or different components, e.g., the terminal device may also include an input-output device, a network access device, a bus, etc.
The processor 40 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. The memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (FC), or the like, which are provided on the computer device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the computer device 4. The memory 41 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 41 may also be used for temporarily storing data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. A distributed storage method, comprising:
obtaining the access quantity of the fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files; the access amount obtained is the access amount for one fragmented file or at least two fragmented files;
according to the access quantity, calculating the heat of each piece of files according to the following formula:
wherein H is (t,a) The heat degree of the fragmented file a within the preset time t is represented by H (t-1,a) The heat of the fragmented file a in the period before the preset time t is represented; m is m (t,a) The access quantity of the fragmented file a in the preset time t is represented; n is n (t,A) Representing the total access amount of the original file A within a preset time t; the values of the history access coefficients i and j are set according to the fluctuation size of the history file access quantity and meet i>0,j>0 and i+j=1; when the value of i is more approximate to 1, the history file access quantity is indicated to be large in fluctuation, and the system considers the whole access condition; when the value of j is more approximate to 1, the history file access quantity fluctuation is smaller, and the system considers the latest access condition;
according to the heat degree of each fragmented file, calculating the file heat degree of the original file according to the following formula:
Wherein H is (t,avr) Representing the file heat of the original file; n represents the number of fragmented files of the original file;
and calculating the copy demand of each fragmented file according to the file heat and the total number of the fragmented files corresponding to the original file by the following formula:
NUM=log 2 n·H (t,avr)
and adjusting the number of copies of the fragmented files according to the copy demand.
2. The distributed storage method of claim 1, wherein said adjusting the number of copies of the fragmented file according to the copy demand comprises:
counting the number of copies of the fragmented files stored in all preset storage nodes corresponding to the fragmented files;
if the number of the copies is smaller than the copy demand, copying the fragmented file to generate fragmented file copies;
and identifying a storage node which does not store the copy of the fragmented file as a target storage node in all storage nodes corresponding to the fragmented file and preset, and sending the copy of the fragmented file to the target storage node.
3. The distributed storage method according to claim 2, wherein the counting the number of copies of the fragmented file stored in all preset storage nodes corresponding to the fragmented file includes:
Detecting whether the copy of the fragmented file stored by the storage node corresponding to the preset fragmented file is lost or not;
and if the copy of the fragmented file stored by the storage node is lost, counting the copy number of the copy of the fragmented file in all the storage nodes.
4. The distributed storage method of claim 1, wherein said adjusting the number of copies of the fragmented file according to the copy demand comprises:
and adjusting the number of copies of the fragmented files according to the copy demand and a preset adjustment period.
5. The distributed storage method of any of claims 1-4, wherein said adjusting the number of copies of the fragmented file according to the copy demand comprises:
acquiring the number of real-time copies of the fragmented file;
identifying the fragmented files with the real-time copy number larger than the real-time copy demand as fragmented files to be deleted, and determining storage nodes corresponding to the fragmented files to be deleted;
and deleting the fragment file stored in the storage node corresponding to the fragment file to be deleted.
6. A computer device for implementing the distributed storage method of any of claims 1-5, the computer device comprising:
The acquisition unit is used for acquiring the access quantity of the fragmented files belonging to the same original file in a preset period; the fragmented files are obtained by performing fragmentation processing on the original files;
the prediction unit is used for calculating the file heat of the original file according to the access amount and predicting the copy demand of each fragmented file according to the file heat;
and the adjusting unit is used for adjusting the number of copies of the fragmented files according to the copy demand.
7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 5.
CN202010199725.3A 2020-03-20 2020-03-20 Distributed storage method, computer equipment and computer readable storage medium Active CN111475108B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010199725.3A CN111475108B (en) 2020-03-20 2020-03-20 Distributed storage method, computer equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010199725.3A CN111475108B (en) 2020-03-20 2020-03-20 Distributed storage method, computer equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111475108A CN111475108A (en) 2020-07-31
CN111475108B true CN111475108B (en) 2023-11-28

Family

ID=71747766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010199725.3A Active CN111475108B (en) 2020-03-20 2020-03-20 Distributed storage method, computer equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111475108B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022041205A1 (en) * 2020-08-31 2022-03-03 华为技术有限公司 Communication method and multi-access edge computing server
CN112527751B (en) * 2020-12-16 2023-10-31 中国联合网络通信集团有限公司 Data processing method, device, electronic equipment and storage medium
CN113722393A (en) * 2021-06-03 2021-11-30 京东城市(北京)数字科技有限公司 Control method and device of distributed platform and electronic equipment
CN115033187B (en) * 2022-08-10 2022-11-08 蓝深远望科技股份有限公司 Big data based analysis management method
CN116600015B (en) * 2023-07-18 2023-10-10 湖南快乐阳光互动娱乐传媒有限公司 Resource node adjustment method, system, electronic equipment and readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997512A (en) * 2014-04-14 2014-08-20 南京邮电大学 Data duplicate quantity determination method for cloud storage system
CN104978362A (en) * 2014-04-11 2015-10-14 中兴通讯股份有限公司 Data migration method of distributive file system, data migration device of distributive file system and metadata server
CN106648456A (en) * 2016-09-18 2017-05-10 重庆邮电大学 Dynamic save file access method based on use page view and prediction mechanism
CN107315547A (en) * 2017-07-18 2017-11-03 郑州云海信息技术有限公司 A kind of method and device for reading distributed meta data file
CN107770259A (en) * 2017-09-30 2018-03-06 武汉理工大学 Copy amount dynamic adjusting method based on file temperature and node load
CN108228106A (en) * 2017-12-30 2018-06-29 广东技术师范学院 A kind of self-adaptation control method of cost driving copy
CN108363643A (en) * 2018-03-27 2018-08-03 东北大学 A kind of HDFS copy management methods based on file access temperature
CN108920282A (en) * 2018-08-03 2018-11-30 北京科技大学 A kind of copy of content generation, placement and the update method of holding load equilibrium
CN109522151A (en) * 2017-09-15 2019-03-26 北京京东尚科信息技术有限公司 Method and device for data redundancy storage
CN109697018A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus for adjusting memory node copy amount
CN110019082A (en) * 2017-07-31 2019-07-16 普天信息技术有限公司 The more copy storage methods of distribution of file data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11113312B2 (en) * 2017-06-29 2021-09-07 Microsoft Technology Licensing, Llc Reliable hierarchical storage management with data synchronization

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978362A (en) * 2014-04-11 2015-10-14 中兴通讯股份有限公司 Data migration method of distributive file system, data migration device of distributive file system and metadata server
CN103997512A (en) * 2014-04-14 2014-08-20 南京邮电大学 Data duplicate quantity determination method for cloud storage system
CN106648456A (en) * 2016-09-18 2017-05-10 重庆邮电大学 Dynamic save file access method based on use page view and prediction mechanism
CN107315547A (en) * 2017-07-18 2017-11-03 郑州云海信息技术有限公司 A kind of method and device for reading distributed meta data file
CN110019082A (en) * 2017-07-31 2019-07-16 普天信息技术有限公司 The more copy storage methods of distribution of file data
CN109522151A (en) * 2017-09-15 2019-03-26 北京京东尚科信息技术有限公司 Method and device for data redundancy storage
CN107770259A (en) * 2017-09-30 2018-03-06 武汉理工大学 Copy amount dynamic adjusting method based on file temperature and node load
CN109697018A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus for adjusting memory node copy amount
CN108228106A (en) * 2017-12-30 2018-06-29 广东技术师范学院 A kind of self-adaptation control method of cost driving copy
CN108363643A (en) * 2018-03-27 2018-08-03 东北大学 A kind of HDFS copy management methods based on file access temperature
CN108920282A (en) * 2018-08-03 2018-11-30 北京科技大学 A kind of copy of content generation, placement and the update method of holding load equilibrium

Also Published As

Publication number Publication date
CN111475108A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN111475108B (en) Distributed storage method, computer equipment and computer readable storage medium
CN111813513B (en) Method, device, equipment and medium for scheduling real-time tasks based on distribution
US10048996B1 (en) Predicting infrastructure failures in a data center for hosted service mitigation actions
CN107465630B (en) Bandwidth flow supervision method and system
CN104504147B (en) A kind of resource coordination method of data-base cluster, apparatus and system
JP7167174B2 (en) Dynamic Grant Batch Processing in Distributed Storage Networks
CN111475483B (en) Database migration method and device and computing equipment
CN111913667B (en) OSD blocking detection method, system, terminal and storage medium based on Ceph
US9210219B2 (en) Systems and methods for consistent hashing using multiple hash rings
WO2020001287A1 (en) Data verification method and apparatus, and storage medium
WO2011140991A1 (en) Method and device for processing files of distributed file system
Xie et al. Pandas: robust locality-aware scheduling with stochastic delay optimality
WO2011088767A1 (en) Content delivery method, system and schedule server
CN111258980B (en) Dynamic file placement method based on combined prediction in cloud storage system
CN106648456A (en) Dynamic save file access method based on use page view and prediction mechanism
US20200004439A1 (en) Determining when to perform a data integrity check of copies of a data set by training a machine learning module
CN110737924B (en) Data protection method and equipment
CN111159195A (en) Data storage control method and equipment in block chain system
EP3588913A1 (en) Data caching method and apparatus
CN112685670A (en) Data scheduling method and device
CN115834587A (en) Method and device for selecting target storage server and electronic equipment
CN116820324A (en) Storage capacity expansion method and device, storage medium and electronic equipment
CN107025223A (en) A kind of buffer management method and server towards multi-tenant
CN113326170A (en) Task delay risk detection method, electronic device and storage medium
CN107797758B (en) Date storage method, data access method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20210125

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen saiante Technology Service Co.,Ltd.

Address before: 1-34 / F, Qianhai free trade building, 3048 Xinghai Avenue, Mawan, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong 518000

Applicant before: Ping An International Smart City Technology Co.,Ltd.

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant