CN114138711A - File migration method and device, storage medium and electronic equipment - Google Patents

File migration method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN114138711A
CN114138711A CN202111460857.8A CN202111460857A CN114138711A CN 114138711 A CN114138711 A CN 114138711A CN 202111460857 A CN202111460857 A CN 202111460857A CN 114138711 A CN114138711 A CN 114138711A
Authority
CN
China
Prior art keywords
storage node
data
storage
migrated
ledger
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111460857.8A
Other languages
Chinese (zh)
Inventor
王诗鈞
何光宇
徐石成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN202111460857.8A priority Critical patent/CN114138711A/en
Publication of CN114138711A publication Critical patent/CN114138711A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a file migration method, a file migration device, a storage medium and an electronic device. The file migration method is applied to a distributed storage system, the distributed storage system comprises a plurality of storage nodes, each storage node is used for storing block chain account book data, and the method comprises the following steps: for any storage node, under the condition that the total account data volume stored on the storage node is larger than the average value of the total account data volumes of a plurality of storage nodes, judging whether to perform file migration on at least part of the account data stored on the storage node according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account data volume and the hot spot data occupation ratio of the storage node; and under the condition that the file migration of at least part of the ledger data stored on the storage node is determined, performing file migration processing on the storage node. By adopting the method, the load balance between the storage node and other storage nodes in the distributed storage system can be ensured.

Description

File migration method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of block chaining technologies, and in particular, to a file migration method and apparatus, a storage medium, and an electronic device.
Background
The blockchain has the functional characteristics of decentralization, collective maintenance, high credibility, traceability, non-falsification, a consensus mechanism, intelligent contracts and the like, so the blockchain is applied to more and more service fields. The block chain stores the ledger data in a chain structure, and guarantees that the ledger data cannot be tampered in a common recognition mechanism, cryptography and other modes, and the whole ledger cannot be used due to any tampering operation. In short, the block chain ledger only supports ledger data query operation and ledger data write-in operation, and does not support ledger data modification operation and ledger data deletion operation. Since the blockchain account book data cannot be deleted, the blockchain account book data is continuously added over time, which may cause the hidden danger of expansion of the account book data.
In the related art, in order to avoid the hidden danger of "expansion" of the ledger data, the ledger data of the block chain may be stored in the distributed storage system, however, over time, a problem of load imbalance may exist among storage nodes in the distributed storage system.
Disclosure of Invention
The present disclosure is directed to a file migration method, device, storage medium, and electronic device, to solve the problems in the related art.
In order to achieve the above object, a first part of the embodiments of the present disclosure provides a file migration method, which is applied to a distributed storage system, where the distributed storage system includes a plurality of storage nodes, and each storage node is used to store block chain ledger data, and the method includes:
for any storage node, under the condition that the total account data volume stored on the storage node is larger than the average value of the total account data volumes of the plurality of storage nodes, judging whether to perform file migration on at least part of the account data stored on the storage node according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account data volume and the hot spot data occupation ratio of the storage node;
and under the condition that the file migration of at least part of the account book data stored on the storage node is determined, performing file migration processing on the storage node.
Optionally, the performing file migration processing on the storage node includes:
determining the storage node as a storage node to be migrated, and calculating the total data amount of the account book data to be migrated on the storage node to be migrated according to the total account book data amount stored on the storage node to be migrated, the total account book data amount mean value of the plurality of storage nodes, and a preset influence factor;
determining the number of the ledger data to be migrated according to the total data amount and the data amount of the single ledger data;
determining the account book data to be migrated corresponding to the number from the storage nodes to be migrated;
and for each piece of ledger data to be migrated, selecting a target migration storage node from other storage nodes except the storage node to be migrated, and storing the ledger data to be migrated to the target migration storage node.
Optionally, the determining, from the storage node to be migrated, the to-be-migrated ledger data corresponding to the number includes:
acquiring the inquired times of each account data on the storage node to be migrated;
calculating the average query times according to the queried times of each account data on the storage node to be migrated;
for each account book data on the storage node to be migrated, determining the account book data as candidate account book data to be migrated under the condition that the queried times corresponding to the account book data are less than the average query times so as to obtain a candidate account book data pool to be migrated;
and randomly selecting the account book data to be migrated corresponding to the number from the candidate account book data pool to be migrated.
Optionally, the selecting, for each to-be-migrated ledger data, a target migration storage node from storage nodes other than the to-be-migrated storage node includes:
determining a first candidate storage node pool except the storage node to be migrated;
determining a conflict storage node of a main file or a duplicate file storing the to-be-migrated ledger data;
excluding the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool;
and selecting the target migration storage node which meets a preset condition from the second candidate storage node pool.
Optionally, the selecting, for each to-be-migrated ledger data, a target migration storage node from storage nodes other than the to-be-migrated storage node includes:
determining a first candidate storage node pool except the storage node to be migrated;
determining a conflict storage node of a main file or a duplicate file storing the to-be-migrated ledger data;
excluding the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool;
excluding other storage nodes to be migrated from the second candidate storage node pool to obtain a third candidate storage node pool;
and selecting the target migration storage node which meets a preset condition from the third candidate storage node pool.
Optionally, the preset condition includes at least one of that the disk remaining memory is greater than the preset memory, the disk occupancy rate is less than the preset occupancy rate, the hot spot data occupancy rate is less than the preset occupancy rate, and the response duration is less than the preset duration.
Optionally, the determining, according to at least one of information of a disk remaining memory of the storage node, a disk occupancy rate, a total ledger data volume, and a hotspot data occupation rate, whether to perform file migration on at least part of ledger data stored on the storage node includes:
and inputting the residual memory of the disk, the occupancy rate of the disk, the data volume of the general ledger and the occupation rate of the hotspot data into a trained support vector machine to obtain a result of whether the representation output by the support vector machine carries out file migration processing or not.
A second part of the disclosed embodiments provides a file migration apparatus, where the apparatus is applied to a distributed storage system, where the distributed storage system includes a plurality of storage nodes, and each storage node is used to store block chain ledger data, and the apparatus includes:
the judging module is used for judging whether to perform file migration on at least part of the account book data stored on the storage nodes according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account book data volume and the hot spot data occupation rate of the storage nodes when the total account book data volume stored on the storage nodes is larger than the average value of the total account book data volumes of the plurality of storage nodes aiming at any storage node;
and the execution module is used for performing file migration processing on the storage node under the condition that at least part of the account book data stored on the storage node is determined to be subjected to file migration.
Optionally, the execution module includes:
the first calculation submodule is used for determining the storage node as a storage node to be migrated, and calculating the total data amount of the account book data to be migrated on the storage node to be migrated according to the total account book data amount stored on the storage node to be migrated, the average value of the total account book data amounts of the plurality of storage nodes, and a preset influence factor;
the first determining submodule is used for determining the number of the account book data to be migrated according to the total data amount and the data amount of the single account book data;
a second determining submodule, configured to determine, from the storage node to be migrated, the to-be-migrated ledger data corresponding to the number;
and the migration submodule is used for selecting a target migration storage node from other storage nodes except the to-be-migrated storage node aiming at each to-be-migrated ledger data and storing the to-be-migrated ledger data to the target migration storage node.
Optionally, the second determining sub-module includes:
the obtaining sub-module is used for obtaining the inquired times of each account data on the storage node to be migrated;
the second calculation sub-module is used for calculating the average query times according to the queried times of each account data on the storage node to be migrated;
a third determining sub-module, configured to determine, for each account book data on the storage node to be migrated, the account book data as candidate account book data to be migrated when it is determined that the number of times of querying that corresponds to the account book data is less than the average number of times of querying, so as to obtain a candidate account book data pool to be migrated;
and the first selection submodule is used for randomly selecting the account book data to be migrated corresponding to the number from the candidate account book data pool to be migrated.
Optionally, the migration submodule includes:
a fourth determining submodule, configured to determine a first candidate storage node pool except the storage node to be migrated;
a fifth determining submodule, configured to determine a conflict storage node of a main file or a duplicate file in which the to-be-migrated ledger data is stored;
a first exclusion submodule, configured to exclude the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool;
and the second selection submodule is used for selecting the target migration storage node which meets the preset condition from the second candidate storage node pool.
Optionally, the migration submodule includes:
a sixth determining submodule, configured to determine a first candidate storage node pool except the storage node to be migrated;
a seventh determining submodule, configured to determine a conflict storage node of a main file or a duplicate file in which the to-be-migrated ledger data is stored;
a second eliminating submodule, configured to eliminate the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool;
a third eliminating submodule, configured to eliminate other storage nodes to be migrated from the second candidate storage node pool, to obtain a third candidate storage node pool;
and the third selection submodule is used for selecting the target migration storage node which meets the preset condition from the third candidate storage node pool.
Optionally, the preset condition includes at least one of that the disk remaining memory is greater than the preset memory, the disk occupancy rate is less than the preset occupancy rate, the hot spot data occupancy rate is less than the preset occupancy rate, and the response duration is less than the preset duration.
Optionally, the determining module includes:
and the input submodule is used for inputting the residual memory of the disk, the occupancy rate of the disk, the data volume of the general ledger and the hot spot data occupation rate into a trained support vector machine to obtain a result of whether the representation output by the support vector machine carries out file migration processing or not.
A third part of the embodiments of the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of any one of the first parts.
A fourth aspect of the embodiments of the present disclosure provides an electronic apparatus, including:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of any of the first parts.
By adopting the technical scheme, the following beneficial technical effects can be at least achieved:
for any storage node in a distributed storage system for storing block chain account book data, when the total account book data volume stored on the storage node is greater than the average value of the total account book data volumes of all the storage nodes, it is described that the data storage volume on the storage node is unbalanced with the data storage volumes of other storage nodes, and further it is described that the storage node may be unbalanced with the loads of other storage nodes. And judging whether to perform file migration on at least part of the account book data stored on the storage node according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account book data volume and the hot spot data occupation rate of the storage node. And under the condition that the file migration of at least part of the ledger data stored on the storage node is determined, performing file migration processing on the storage node. By carrying out file migration processing on the storage node, the total account book data volume of the storage node can be reduced, and the load caused by accessing the account book data stored on the storage node can be reduced. That is to say, by adopting the above method of the present disclosure for each storage node in the distributed storage system, not only the data storage amount of each storage node can be more balanced, but also the load of each storage node can be more balanced.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 is a flowchart illustrating a file migration method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a block diagram illustrating a blockchain ledger data storage system in accordance with an exemplary embodiment of the present disclosure.
Fig. 3 is a block diagram illustrating a blockchain and blockchain ledger data storage system in accordance with an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram illustrating a file migration apparatus according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Fig. 1 is a flowchart illustrating a file migration method according to an exemplary embodiment of the present disclosure. The file migration method is applied to a distributed storage system, the distributed storage system comprises a plurality of storage nodes, and each storage node is used for storing block chain account book data. As shown in fig. 1, the file migration method includes the following steps:
and S11, aiming at any storage node, judging whether to perform file migration on at least part of the account book data stored on the storage node according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account book data volume and the hot spot data occupation ratio of the storage node when the total account book data volume stored on the storage node is larger than the average value of the total account book data volumes of the plurality of storage nodes.
In general, when the total-account data amount stored in a storage node is larger than the average total-account data amount calculated from the total-account data amounts of all storage nodes, it is described that the data amount stored in the storage node is unbalanced with the data amount stored in some storage nodes in the distributed storage system. In the case where the amount of data stored on the storage node is not balanced with the amount of data stored on some storage nodes in the distributed storage system, the load of accessing the ledger data stored on the storage node may also be unbalanced with the load of accessing the ledger data stored on other storage nodes. Of course, in a case where the amount of data stored on the storage node is not balanced with the amount of data stored by some storage nodes in the distributed storage system, the load of accessing the ledger data stored on the storage node may also be balanced with the load of accessing the ledger data stored on other storage nodes, and because the number of times each ledger data is accessed and queried may be different, the load on the storage node due to accessing each ledger data may be different.
Therefore, for any storage node, when the total account data amount stored on the storage node is greater than the average total account data amount of the plurality of storage nodes, whether file migration needs to be performed on at least part of the account data stored on the storage node is further determined according to at least one of the information of the disk remaining memory, the disk occupancy rate, the total account data amount, and the hot spot data occupation ratio of the storage node.
And S12, performing file migration processing on the storage node under the condition that at least part of the ledger data stored on the storage node is determined to be subjected to file migration.
By adopting the file migration method, aiming at any storage node in the distributed storage system for storing the block chain account book data, under the condition that the total account book data volume stored on the storage node is larger than the average value of the total account book data volumes of all the storage nodes, the imbalance between the data storage volume on the storage node and the data storage volume of other storage nodes is explained. And judging whether to perform file migration on at least part of the account book data stored on the storage node according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account book data volume and the hot spot data occupation rate of the storage node. And under the condition that the file migration of at least part of the ledger data stored on the storage node is determined, performing file migration processing on the storage node. By carrying out file migration processing on the storage node, the total account book data volume of the storage node can be reduced, and the load caused by accessing the account book data stored on the storage node can be reduced. That is to say, by adopting the above method of the present disclosure for each storage node in the distributed storage system, not only the data storage amount of each storage node can be more balanced, but also the load of each storage node can be more balanced.
Since the file migration method of the present disclosure may be applied to a distributed storage system, the distributed storage system may be a blockchain ledger data storage system shown in fig. 2 (as shown in fig. 2, the blockchain ledger data storage system is a distributed storage system). The file migration method of the present disclosure may be applied to the blockchain ledger data storage system shown in fig. 2. In more detail, the file migration method of the present disclosure may be specifically applied to the ledger data storage service module 120 in fig. 2. In order to make the file migration method of the present disclosure more easily understood by those of ordinary skill in the art. The distributed storage system shown in fig. 2 will be described first.
Fig. 2 is a block diagram illustrating a blockchain ledger data storage system according to an exemplary embodiment of the present disclosure, as shown in fig. 2, the system 100 includes a ledger access service module 110, a ledger data storage service module 120, and a cluster of storage nodes 130;
the book access service module 110 is configured to, in response to a book data storage request initiated by a block chain node being received, store book meta information of book data to be stored, and send the book data to be stored and a book data identifier to the book data storage service module 120, where the book data storage request carries the book data to be stored and the book meta information, the book meta information includes the book data identifier, the book data to be stored is book data corresponding to a first block in the block chain node, the book meta information further includes data used for representing a chain relationship between the first block and a second block, and the second block is a previous block of the first block in the block chain node;
the book data storage service module 120 is configured to, in response to receiving the book data to be stored and the book data identifier, generate a book write request according to a preset book write policy and the book data to be stored, and send the book write request to a target storage master node in the storage node cluster 130;
the target storage host node is configured to store the to-be-stored ledger data according to the ledger writing request, and feed back a first physical address for storing the to-be-stored ledger data to the ledger data storage service module 120 after storing the to-be-stored ledger data, so that the ledger data storage service module stores a mapping relationship between the ledger data identifier and the first physical address.
Explained based on the block link points shown in fig. 3, a block chain is stored in a block chain node, one block in the block chain corresponds to one block chain account file, and one block chain account book file includes block data (i.e., data in a block header and a block body) and corresponding account book data stored under the chain.
In the embodiment of the present disclosure, one block link point corresponds to one block chain ledger data storage system 100, and the block chain ledger data storage system 100 is configured to store ledger data corresponding to all blocks on the block link point under a chain. The ledger data is specifically used for recording specific ledger contents (such as transaction details).
When a block link point needs to store newly generated account book data to be stored, an account book data storage request may be initiated to the account book access service module 110, where the account book data storage request includes the account book data to be stored and the account book meta information.
In response to receiving an account book data storage request initiated by a block chain node, the account book access service module 110 may store the account book meta information carried in the account book data storage request. Ledger meta-information may include ledger data identification, for example, the ledger data identification is a data identification ID shown in fig. 3.
In some embodiments, the ledger meta-information may further include data for characterizing a chain relationship between the first block and the second block. For example, the ledger data hash value of the first block and the ledger data hash value of the second block; for another example, the ledger ID of the first block and the ledger ID of the second block; for another example, the chunk hash value of the first chunk and the chunk hash value of the second chunk are obtained by performing hash calculation on data in the whole chunk (i.e., the chunk header and the chunk body).
In some embodiments, the ledger meta-information may also include any one or more of the following: timestamp, account book channel name, account book data volume, and the like. It should be explained that a channel (channel) is a confidential transaction mechanism sometimes required in an enterprise-level federation chain scenario, for example, several participating nodes (blockchain) want to make transactions that are not disclosed to non-participating nodes, and then the several participating nodes form an isolated channel, the channel corresponds to a large ledger, the large ledger includes ledger files generated by each confidential transaction, and the ledger data of each ledger file records the content of the confidential transaction in detail.
At the same time or after the ledger access service module 110 stores the ledger meta information of the ledger data to be stored, the ledger data to be stored and the ledger data identification may be sent to the ledger data storage service module 120.
In response to receiving the to-be-stored book data and the book data identifier, the book data storage service module 120 generates a book write request according to a preset book write policy and the to-be-stored book data, and sends the book write request to a target storage master node in the storage node cluster 130. The target storage master node is determined by the ledger data storage service module 120 based on a preset ledger write policy.
The target storage master node responds to the received ledger writing request, stores the ledger data to be stored according to the ledger writing request, and feeds back a first physical address for storing the ledger data to be stored to the ledger data storage service module 120 after the ledger data to be stored is successfully stored. Wherein the first physical address comprises one or more of the following information: the name of the target storage main node (host), the file storage address for storing the ledger data to be stored, the offset and the like. It should be noted here that the physical address of the ledger data stored on any memory is composed of a segment address and an offset address, in the present disclosure, the file storage address represents the segment address, and the offset represents the offset address.
In some embodiments, the first physical address may also include a specific starting address and ending address where ledger data is stored.
After receiving the first physical address fed back by the target storage host node, the ledger data storage service module 120 stores the ledger data identifier of the ledger data to be stored in association with the first physical address.
By adopting the mode, in the process of storing the account book data to be stored, the chain storage structure of the block chain account book data can be logically ensured by storing the data representing the chain relation between the first block and the second block. Through storing the mapping relation between the account book data identification and the first physical address, the aim of storing the account book data to be stored on any storage node in the storage node cluster can be achieved on the basis of guaranteeing the chain type storage structure of the block chain account book data. Therefore, by adopting the distributed storage method for storing the block chain account book data through the storage node cluster, the chain storage structure of the block chain account book data can be ensured, the problem of expansion of the block chain account book data can be solved, and the purpose of storing massive block chain account book data is achieved. And the storage system has the capacity of dynamic capacity expansion because the storage nodes can be dynamically added in the storage node cluster.
In some embodiments, the data characterizing the chained relationship between the first block and the second block comprises: the hash value of the first chunk and the hash value of the second chunk.
If the data used to represent the chain relationship between the first block and the second block is the hash value of the first block (i.e. the hash of the ledger data shown in fig. 3) and the hash value of the second block. In the scenario of reading the ledger data, the correctness/integrity of the read ledger data can be checked according to the hash value of the first block when the ledger data corresponding to the first block is read from the storage node cluster. Therefore, on the basis of ensuring the chain type storage structure of the block chain account book data, the account book data stored in the storage node cluster can be ensured not to be tampered, and the safety of distributed account book data storage is improved.
In some embodiments, the ledger data storage service module 120 is further configured to calculate a hash value according to the ledger data identifier of the ledger data to be stored, and determine the target storage master node from the storage node cluster according to the hash value.
In specific implementation, the account data identifier may be calculated according to a hash algorithm to obtain a corresponding hash value. And dividing the hash value by the total number of the storage nodes in the storage node cluster, and taking a remainder value, wherein the storage node corresponding to the remainder value is the target storage main node. For example, suppose that the hash value calculated according to the ledger data identifier of the ledger data to be stored is 15, the storage node cluster includes 4 storage nodes, and the numbers of the storage nodes are 0, 1, 2, and 3, respectively. Then the hash value 15 divided by 4 takes the remainder of 3. The storage node numbered 3 may thus be determined as the target storage master node. In this way, since the remainder value obtained by dividing the hash value by the total number S of storage nodes is not greater than the total number S of storage nodes, the value range of the remainder value is [0, S-1], and when each storage node in the storage node cluster is encoded from 0, the encoding range of the storage node is also [0, S-1 ]. Therefore, in this way, a unique corresponding storage node number can be calculated and determined for any hash value (i.e. any one of infinite hash values), without limiting the value range of the any hash value, i.e. without limiting the value range of the ledger data identifier, and accordingly, without limiting the number of ledger data.
In addition, by adopting the mode, a plurality of account book data to be stored can be uniformly distributed to each storage node for storage, so that the load of each storage node is balanced. For example, assume that there are three storage nodes 0, 1, and 2, three to-be-stored ledger data a, b, and c, and hash values corresponding to the three to-be-stored ledger data are 1, 2, and 3 in sequence. Then, in the above manner of determining the target storage master node, the storage node 1 may be used as the target storage master node to store the ledger data a. And taking the storage node 2 as a target storage main node for storing the ledger data b. And taking the storage node 0 as a target storage main node for storing the ledger data c.
It should be explained here that the hash algorithm (also called Digest algorithm Digest) is used to calculate any set of input data to obtain an output value with a fixed length.
In some embodiments, the preset ledger write-in policy includes a preset number representing the number of copies of a file, and the ledger write-in request includes a target storage slave node list;
the ledger data storage service module 120 is further configured to determine, from other storage nodes in the storage node cluster 130 except the target storage master node, the preset number of target storage slave nodes with the lowest disk occupancy rate, so as to obtain the target storage slave node list;
the target storage master node is further configured to send a replica file storage request to each target storage slave node based on the target storage slave node list, so that each target storage slave node stores a replica file of the ledger data to be stored.
For example, assume that the value of the preset number is 2, and assume that the storage node cluster is a cluster consisting of storage node a, storage node B, storage node C, storage node D, storage node E, and storage node F. If the ledger data storage service module 120 determines that the storage node B is the target storage master node. Then, the storage nodes other than the target storage master node B include storage node a, storage node C, storage node D, storage node E, and storage node F.
If the disk occupancy rates of the storage node A, the storage node C, the storage node D, the storage node E and the storage node F are sequentially reduced, determining that the 2 target storage slave nodes with the lowest disk occupancy rates are the storage node E and the storage node F respectively from the storage node A, the storage node C, the storage node D, the storage node E and the storage node F. That is, the target storage slave node list includes storage node E and storage node F.
And the target storage master node (such as a storage node B) sends a replica file storage request to each target storage slave node (such as a storage node E and a storage node F), wherein the replica file storage request comprises a replica file of the ledger data to be stored. The target storage receives a replica file storage request from a node (storage node E, storage node F), and stores a replica file in which the ledger data is to be stored. The duplicate file refers to a text file obtained by copying and pasting the source file.
By adopting the mode of the distributed backup storage of the account book data, the data safety of the account book data can be guaranteed, and the account book data is prevented from being lost. For example, the problem that the ledger data stored in the target storage master node cannot be read due to a fault or a dropped line can be avoided.
In some embodiments, the ledger write request may further include a writing method of the main file and the duplicate file, and the writing method may be one of a synchronous writing method or an asynchronous writing method. Accordingly, the copy file storage request may also include a writing method of the main file and the copy file. The writing mode can be preset in a preset book writing strategy (based on different scenes). It should be noted that the master file refers to the ledger data stored in the master node, and the replica file refers to the ledger data stored in the slave node.
When the writing mode of the master file and the copy file is a synchronous writing mode, after the target storage master node and the target storage slave node both complete data writing, state information of successful storage needs to be fed back to the block link node. This may achieve strong transaction consistency. Wherein strong transactional consistency means that multiple transactions (e.g., a master file storage transaction and a replica file storage transaction) must change the database from one consistency state (e.g., a non-storage state) to another consistency state (e.g., a storage successful state).
Under the condition that the writing mode of the master file and the copy file is an asynchronous writing mode, after the target storage master node successfully writes data, the successfully stored state information can be fed back to the block link node, and the process of storing the copy file by the target storage slave node is not required to be waited.
In some embodiments, the ledger data storage service module 120 is further configured to monitor disk occupancy of each of the storage nodes to determine the target storage slave node list from the storage node cluster based on a magnitude relationship of disk occupancy between each of the storage nodes.
In some embodiments, ledger data storage services module 120 may periodically send a "heartbeat" signal to each storage node, which is used to query the storage node for current disk occupancy. After receiving the "heartbeat" signal, the storage node may feed back its current disk occupancy rate to the ledger data storage service module 120. Therefore, the purpose that the ledger data storage service module 120 monitors the disk occupancy rate of each storage node can be achieved.
In some embodiments, the target storage slave node is configured to, after storing the replica file of the ledger data to be stored, feed back a second physical address of the replica file of the ledger data to be stored to the target storage master node, so that the target storage master node feeds back the second physical address to the ledger data storage service module 120; the ledger data storage service module 120 is further configured to store a mapping relationship between the ledger data identifier and the second physical address.
Based on the foregoing embodiment that the storage node B is a target storage master node, and the storage node E is a target storage slave node, for example, after the storage node E stores the replica file of the ledger data to be stored, a second physical address of the replica file storing the ledger data to be stored may be fed back to the storage node B, where the second physical address may include a host name of the storage node E, a file storage address where the replica file is stored, an offset, and the like. After receiving the second physical address fed back by the storage node E, the storage node B feeds back the second physical address to the ledger data storage service module 120. So that ledger data storage service module 120 stores the mapping relationship between the ledger data identification and the second physical address.
Thus, the ledger data storage service module 120 may store a mapping relationship between the ledger data identifier of the ledger data to be stored and the first physical address, and may also store a mapping relationship between the ledger data identifier and the second physical address. Through the stored mapping relation, a first physical address and/or a second physical address corresponding to the account data identifier can be queried based on the account data identifier in an account data query scene, so that the account data corresponding to the account data identifier can be queried according to the first physical address or the second physical address.
In some embodiments, the ledger access service module 110 is further configured to receive a first ledger data query request initiated by the block chain node for querying target ledger data, and send the first ledger data query request to the ledger data storage service module when it is determined that the target ledger data is stored in the storage node cluster according to a target ledger data identifier carried in the first ledger data query request; the book data storage service module 120 is further configured to, in response to receiving the first book data query request, obtain a corresponding target physical address according to the target book data identifier, obtain the target book data stored in the storage node cluster based on the target physical address, and feed back the target book data to the block chaining node.
For example, in an account book data query/read scenario, a blockchain node may initiate a first account book data query request for querying/reading target account book data to the account book access service module 110, where the first account book data query request may carry a target account book data identifier.
In response to receiving a first account book data query request initiated by a block chain node, the account book access service module 110 determines whether target account book data is stored in a storage node cluster according to a target account book data identifier carried in the first account book data query request and account book metadata information of all stored account book data. And sends the first ledger data query request to ledger data storage service module 120 when it is determined that the target ledger data is stored in the storage node cluster. And under the condition that the target account book data is not stored in the storage node cluster, feeding back a result without corresponding account book data to the block link points.
Under the condition that the ledger data storage service module 120 receives the first ledger data query request, the corresponding target physical address is obtained according to the target ledger data identifier carried in the first ledger data query request and the mapping relationship between each ledger data identifier and the first physical address and the second physical address, so that the target ledger data is obtained from the storage node corresponding to the target physical address, and the target ledger data is fed back to the block chain node through the ledger access service module 110.
By adopting the mode, under the scene of reading the account book data, whether the target account book data is stored in the storage node cluster or not is determined according to the target account book data identification. And if the data does not exist, directly feeding back information of the data of the corresponding account book to the block link points. If the target account book data exists, a target physical address for storing the target account book data is further acquired from the account book data storage service module 120, and the target account book data is read based on the target physical address.
Since the mapping relationship between the ledger data identification of each ledger data and the first physical address and the second physical address can be stored in the ledger data storage service module 120. Therefore, in a scenario of reading the account data, the target physical address acquired from the account data storage service module 120 may be a first physical address corresponding to a storage master node that stores the account data, or may be a second physical address corresponding to a storage slave node that stores a duplicate file of the account data.
Thus, in some embodiments, the target physical address comprises a plurality of physical addresses, and the ledger data storage service module 120 is further configured to determine a target query physical address from the plurality of physical addresses according to a preset ledger reading policy, and send a second ledger data query request to a target query storage node corresponding to the target query physical address to query the target ledger data from the target query storage node.
The preset ledger reading policy may include a plurality of selection conditions of different priorities for screening the query address. And according to the direction from high to low of the priority, selecting by sequentially comparing corresponding selection conditions so as to obtain a target inquiry physical address by screening from a plurality of target physical addresses.
For example, assume that a plurality of target physical addresses storing target ledger data correspond to storage node B, storage node E, and storage node F, respectively. If the selection condition of the first priority is that the cpu occupancy is the lowest. Then, under the condition that the cpu occupancy rates of the storage node B, the storage node E and the storage node F are sequentially reduced, it may be determined that the target physical address corresponding to the storage node F is the target inquiry physical address.
And under the condition that the cpu occupancy rates of the storage node B, the storage node E and the storage node F are the same, further screening according to the selection condition of the second priority is required. If the second priority is selected with the condition that the IO occupancy of the disk (which can be understood as the busy level of the disk) is the lowest. Then, under the condition that the disk IO occupancy rates of the storage node B, the storage node E, and the storage node F are sequentially reduced, the target physical address corresponding to the storage node F may be determined as the target query physical address.
And under the condition that the disk IO occupancy rates of the storage node E and the storage node F are the same and minimum, further screening is carried out according to the selection condition of the third priority. If the selection condition of the third priority is that the capacity of the remaining memory space is the maximum, the target physical address corresponding to the storage node E may be determined as the target query physical address under the condition that the capacities of the remaining memory spaces of the storage node E and the storage node F are sequentially reduced.
It should be noted that the number of selection conditions, the priority level, and the condition content for screening the query address may be set according to actual requirements, and should not be limited to the above examples.
A detailed embodiment of the file migration method will be described below with reference to the distributed storage system shown in fig. 2. In the embodiment, the disclosed file migration method is specifically applied to the ledger data storage service module 120 in fig. 2 as an example.
Optionally, the determining, according to at least one of information of a disk remaining memory of the storage node, a disk occupancy rate, a total ledger data volume, and a hotspot data occupation rate, whether to perform file migration on at least part of ledger data stored on the storage node includes: and inputting at least one of the information of the residual memory of the disk, the occupancy rate of the disk, the data volume of the general ledger and the occupation rate of the hotspot data into a trained support vector machine to obtain a result of whether the representation output by the support vector machine carries out file migration processing or not.
In some embodiments, the ledger data storage services module 120 is further configured to:
monitoring the remaining capacity of the disk (namely the remaining memory of the disk), the total account data volume and the hot spot data occupation rate of each storage node; and for any storage node, determining whether to perform file migration on at least part of the account book data stored on the storage node according to the disk remaining capacity, the disk occupancy rate, the total account book data volume and the hot spot data occupation rate of the storage node.
It is to be explained that the total ledger data amount of any storage node refers to the total data amount of all ledger data that the storage node has stored.
The hotspot data (redis) refers to data that is referenced (i.e., accessed/read/queried) more than a preset number of times.
The hot spot data occupation ratio refers to a ratio of the total data amount of the hot spot data to the total account data amount.
In some embodiments, whether each ledger data is hot ledger data may be determined by:
setting a hot spot variable parameter for each account data, if the account data is referred in a previous unit time length (e.g., 60 seconds), increasing the hot spot variable parameter of the account data by a first preset value (e.g., the first preset value is 10, 5, or 1, etc.), and if the account data is not referred in the previous unit time length, decreasing the hot spot variable parameter of the account data by a second preset value (e.g., the second preset value is 10, 5, or 1, etc.). The first preset value and the second preset value may be the same or different. In the case that the hotspot variable parameter of the account data is greater than a preset threshold (e.g., a preset threshold of 50), the account data may be determined as hotspot data.
In some embodiments, the determining whether to perform file migration on at least part of ledger data stored on the storage node according to the disk remaining capacity, the disk occupancy, the total ledger data amount, and the hotspot data occupancy rate of the storage node includes:
and inputting the residual capacity of the disk, the occupancy rate of the disk, the data volume of the general ledger and the occupation rate of the hotspot data into a trained support vector machine to obtain a result of whether the representation output by the support vector machine carries out file migration or not.
Among them, a Support Vector Machine (SVM) is a classifier trained in a supervised learning manner. The training samples comprise input samples and output samples, wherein the input samples comprise disk residual capacity samples, disk occupancy rate samples, general ledger data volume samples and hotspot data proportion samples. The output sample is a sample corresponding to the input sample and representing whether to perform file migration.
To facilitate the understanding of the role of the support vector machine by those of ordinary skill in the art, the following is a brief explanation of the principles of the support vector machine:
since the support vector machine is a generalized linear classifier for binary classification of data, its decision boundary is the maximum-margin hyperplane (maximum-margin hyperplane) for solving the learning samples. Thus, a decision boundary 0-w may be defined firstTX+b=w1x1+w2x2+w3x3+w4x4+ b, wherein, x1Characterizing the remaining capacity, x, of the disk2Characterizing disk occupancy, x3Characterization general ledger data volume, x4Characterizing the percentage of hot spot data, w1、w2、w3、w4And b is a parameter needing to be solved in the process of training the support vector machine. The purpose of training the support vector machine is to find the optimal set of w1、w2、w3、w4、b。
It is worth explaining that a decision boundary can be understood as an area in a spatial problem for dividing output labels of a support vector machine into two sets, one set characterizing file migration and the other set characterizing no file migration. Namely, when the representation is subjected to file migration: w is aTX+b>0,yi>0. Characterizing w without File migrationTX+b<0,yi<0。yiIs the output label. Further, w isTX+b>0 and yi>0 is multiplied or w is multipliedTX+b<0 and yi<When multiplied by 0, y (w) can be obtainedTX+b)>0。
To find the farthest vertical distance of the output label to the decision boundary, i.e. to solve for the variable X ═ X (X)1、x2、x3、x4) Furthest distance to decision boundary
Figure BDA0003389833270000161
Wherein,
Figure BDA0003389833270000162
can be based on the principle of maximum value calculation by a derivative method, will be paired
Figure BDA0003389833270000163
The problem of finding the maximum value is converted into pair
Figure BDA0003389833270000164
The problem of finding the minimum derivative value is:
Figure BDA0003389833270000165
further, a lagrangian multiplier method (also called a lagrangian multiplier method) is adopted to calculate the minimum value of the formula, and the lagrangian formula is constructed as follows:
Figure BDA0003389833270000166
wherein,
Figure BDA0003389833270000167
μiis a lagrange multiplier that is used to tie the constraint function to the primitive function. It is worth explaining that the lagrange multiplier method is a method of extremizing the primitive function f (x1, x 2.) under the constraint of the constraint function g (x1, x 2.) -0.
Further, a Gaussian kernel function is used to replace (x)ixj) I.e. by
Figure BDA0003389833270000168
δ characterizes the standard deviation, δ being 2 and n being 4 in the examples of the present disclosure.
Will be provided with
Figure BDA0003389833270000169
Substituting into formula
Figure BDA00033898332700001610
To obtain
Figure BDA00033898332700001611
Further, an SMO algorithm (Sequential minor Optimization) is used to find mu1、μ2、μ3、μ4The optimal solution of (c) is as follows:
Figure BDA00033898332700001612
Figure BDA0003389833270000171
Figure BDA0003389833270000172
Figure BDA0003389833270000173
carry in | yi|=1、x1、x2、x3、x4The value of,
Figure BDA0003389833270000174
μiWhen the value is more than or equal to 0, calculating the value of mu1、μ2、μ3、μ4、y1、y2、y3、y4The value of (c). And then using a KKT condition (Karush-Kuhn-Tucker conditions, a method for solving the optimization problem, related to Lagrange multipliers), and determining that each parameter needing to be solved is as follows:
w1=μ1x1y1
w2=μ2x2y2
w3=μ3x3y3
w4=μ4x4y4
Figure BDA0003389833270000175
in finding an optimal set of w1、w2、w3、w4The resulting function of the trained support vector machine can then be characterized as:
Figure BDA0003389833270000176
wherein, -1 can characterize no file migration, and 1 can characterize file migration.
It should be noted here that in some scenarios, there may be a phenomenon in which the load of a certain storage node is small and the load of another storage node is large in a storage node cluster, for example, a phenomenon in which the ledger data in a certain storage node is frequently referred to and the ledger data in another storage node is referred to a small number of times, which represents that the loads of the two storage nodes are unbalanced. Therefore, in order to relatively balance the load among the storage nodes in the storage node cluster, the load of each storage node needs to be balanced.
Before balancing the load of each storage node, the load of the storage node needs to be determined. By adopting the above mode of the disclosure, for any storage node, the load of the storage node can be judged according to the disk remaining capacity, the disk occupancy rate, the total account data volume and the hot spot data occupation rate of the storage node. In the case that the load of the storage node is determined to be large, file migration processing can be performed on the storage node to reduce the load of the storage node. Therefore, by adopting the mode disclosed by the invention, the load among the storage nodes in the storage node cluster can be more balanced.
Optionally, the performing file migration processing on the storage node includes: determining the storage node as a storage node to be migrated, and calculating the total data amount of the account book data to be migrated on the storage node to be migrated according to the total account book data amount stored on the storage node to be migrated, the total account book data amount mean value of the plurality of storage nodes, and a preset influence factor; determining the number of the ledger data to be migrated according to the total data amount and the data amount of the single ledger data; determining the account book data to be migrated corresponding to the number from the storage nodes to be migrated; and for each piece of ledger data to be migrated, selecting a target migration storage node from other storage nodes except the storage node to be migrated, and storing the ledger data to be migrated to the target migration storage node.
Illustratively, the ledger data storage services module 120 is further configured to perform the following steps:
under the condition that file migration is determined to be performed on at least part of book data stored on the storage node, determining the storage node as a storage node to be migrated, and calculating the total data amount of the book data to be migrated on the storage node to be migrated; determining the number of the ledger data to be migrated according to the total data amount and the data amount of the single ledger data; and determining the account book data to be migrated corresponding to the number from the storage node to be migrated so as to perform file migration on the account book data to be migrated.
In one embodiment, the ledger data storage service module 120 is configured to calculate a total amount of data of ledger data to be migrated on the storage node to be migrated according to the following formula:
Figure BDA0003389833270000181
wherein β represents the total data amount of the ledger data to be migrated, γ represents the total ledger data amount on the storage node to be migrated, n represents the total storage node number in the storage node cluster, γiAnd characterizing the total account data volume of the ith storage node, and [ mu ] characterizing the influence factor.
Optionally, μ ═ 1.68.
By way of example, assume that a storage node cluster is a cluster consisting of storage node a, storage node B, and storage node C. Storage node A stores 40GB of book data, storage node B stores 20GB of book data, and storage node C stores 120GB of book data. Under the condition that the storage node C is determined to be a storage node to be migrated, the storage node C is determined to be a storage node to be migrated according to a formula
Figure BDA0003389833270000182
The total data amount of the ledger data to be migrated is calculated to be 19.2GB (namely 19.2 × 1024 × 1024 × 1024 bytes).
After the total data amount of the ledger data to be migrated on the storage node to be migrated is obtained through calculation, the number of the ledger data to be migrated can be determined according to the total data amount and the data amount of a single ledger data. It should be noted that the data amount of the single account book data is preset and may be 64M, 128M, or the like.
For example, assuming that the data amount of a single ledger data is 64M (i.e. 64 × 1024 × 1024 bytes), the number k of ledger data to be migrated can be calculated by the following formula:
Figure BDA0003389833270000191
it should be noted that, in some embodiments, the number of the ledger data to be migrated may also be determined according to the total data amount, the data amount of the single ledger data, and a control parameter, where the control parameter is a positive integer. For example, assuming that the data amount of a single ledger data is 64M (i.e. 64 × 1024 × 1024 bytes) and the control parameter is 2, the number k of ledger data to be migrated can be calculated by the following formula:
Figure BDA0003389833270000192
or,
Figure BDA0003389833270000193
[]the rounding operator.
After the number of the account book data to be migrated is determined, the corresponding number of the account book data to be migrated can be determined from the storage node to be migrated, and file migration is performed on the account book data to be migrated.
In some embodiments, the determining, from the to-be-migrated storage node, the corresponding number of to-be-migrated ledger data may include: and randomly determining the corresponding number of the account book data to be migrated from the storage nodes to be migrated.
Optionally, the determining, from the storage node to be migrated, the to-be-migrated ledger data corresponding to the number includes: acquiring the inquired times of each account data on the storage node to be migrated; calculating the average query times according to the queried times of each account data on the storage node to be migrated; for each account book data on the storage node to be migrated, determining the account book data as candidate account book data to be migrated under the condition that the queried times corresponding to the account book data are less than the average query times so as to obtain a candidate account book data pool to be migrated; and randomly selecting the account book data to be migrated corresponding to the number from the candidate account book data pool to be migrated.
For example, in some embodiments, the ledger data storage service module 120 determines the number of ledger data to be migrated from the storage node to be migrated, and may include the following steps:
acquiring the inquired times of each account data on the storage node to be migrated; calculating the average query times according to the queried times of each account data on the storage node to be migrated; for each account book data on the storage node to be migrated, determining the account book data as candidate account book data to be migrated under the condition that the queried times corresponding to the account book data are less than the average query times so as to obtain a candidate account book data pool to be migrated; and randomly selecting the account book data to be migrated corresponding to the number from the candidate account book data pool to be migrated.
For example, it is assumed that there are ledger data a, ledger data b, ledger data c, and ledger data d on the storage node to be migrated. The number of times of inquiring history of account data a is 10, the number of times of inquiring history of account data b is 15, the number of times of inquiring history of account data c is 20, and the number of times of inquiring history of account data d is 25. Then the average number of queries is (10+15+20+ 25)/4-17.5. Because the number of times of inquiring the ledger data a and the ledger data b is less than 17.5, the ledger data a and the ledger data b can be determined as candidate ledger data to be migrated, and the obtained candidate ledger data pool to be migrated includes the ledger data a and the ledger data b.
Further, under the condition that it is determined that the candidate to-be-migrated ledger data pool includes ledger data a and ledger data b, a corresponding number of to-be-migrated ledger data may be randomly selected from the candidate to-be-migrated ledger data pool. For example, assuming that the number is 1, ledger data a or ledger data b may be determined as ledger data to be migrated.
In some embodiments, the account book data to be migrated with the minimum number of inquired times may be selected from the candidate account book data pool to be migrated based on the size of the number of inquired times of each candidate account book data to be migrated. For example, assuming that the number is 1, in the case that the number of times of querying the candidate to-be-migrated ledger data a is 10, and the number of times of querying the candidate to-be-migrated ledger data b is 15, since 10 is less than 15, the candidate to-be-migrated ledger data a may be determined as to-be-migrated ledger data.
By adopting the mode, the account book data with the query times smaller than the average query times is determined as candidate account book data to be migrated, so that hot data can be guaranteed not to be migrated as much as possible, and data access faults caused when the hot data are migrated are avoided. Such as a temporary inaccessible failure of the hotspot data while being migrated.
Optionally, the selecting, for each to-be-migrated ledger data, a target migration storage node from storage nodes other than the to-be-migrated storage node includes: determining a first candidate storage node pool except the storage node to be migrated; determining a conflict storage node of a main file or a duplicate file storing the to-be-migrated ledger data; excluding the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool; and selecting the target migration storage node which meets a preset condition from the second candidate storage node pool.
Illustratively, the ledger data storage services module 120 is further configured to perform the following steps:
for each account book data to be migrated, selecting a target migration storage node from other storage nodes except the storage node to be migrated, and sending a file migration request to the target migration storage node; the target migration storage node is configured to store the to-be-migrated ledger data carried in the file migration request, and feed back a third physical address where the to-be-migrated ledger data is stored to the ledger data storage service module, so that the ledger data storage service module updates the first physical address or the second physical address of the to-be-migrated ledger data.
For example, assume that the storage node cluster is a cluster consisting of storage node a, storage node B, storage node C, storage node D, storage node E, and storage node F. If the ledger data storage service module 120 determines the storage node B as the storage node to be migrated. Then, the other storage nodes except the storage node B to be migrated include storage node a, storage node C, storage node D, storage node E, and storage node F. For each account book data to be migrated, a target migration storage node can be selected from the storage node a, the storage node C, the storage node D, the storage node E and the storage node F.
Since the ledger data to be migrated may be a main file or may also be a duplicate file, after the ledger data to be migrated is subjected to file migration, the third physical address storing the ledger data to be migrated needs to be fed back to the ledger data storage service module 120, so that the ledger data storage service module 120 correspondingly updates the first physical address or the second physical address of the ledger data to be migrated.
In some embodiments, the manner in which the ledger data storage service module 120 selects, for each ledger data to be migrated, a target migration storage node from storage nodes other than the storage node to be migrated, may include the following steps:
inquiring a corresponding target first physical address and a target second physical address according to the account book data identifier of the account book data to be migrated; one of the target first physical address and the target second physical address corresponds to a storage node to be migrated, and the storage nodes corresponding to the remaining addresses are conflict storage nodes. Excluding the storage nodes corresponding to the target first physical address and the target second physical address to obtain a second candidate storage node pool; and selecting a target migration storage node meeting a preset condition from the second candidate storage node pool.
It should be noted that the main file and the duplicate file of the same ledger data, or the duplicate file and the duplicate file of the same ledger data should not be stored in the same storage node. If the main file and the duplicate file of the same account book data or the duplicate file and the duplicate file of the same account book data are stored in one storage node, not only is the storage resource of the storage node wasted, but also the main file and/or the duplicate file of the account book data cannot be referred to after the storage node goes down.
For each account book data to be migrated, the same account book data (i.e. the main file and the duplicate file of the same account book data, or the duplicate file and the duplicate file of the same account book data) is prevented from being stored in the same storage node. Corresponding target first physical address and target second physical address can be inquired from the mapping relationship between the ledger data identifier and the first physical address, and between the ledger data identifier and the second physical address stored in the ledger data storage service module 120 according to the ledger data identifier of the ledger data to be migrated. The second candidate storage node pool may be obtained by excluding storage nodes corresponding to the target first physical address and the target second physical address.
After determining the second candidate storage node pool, a target migration storage node meeting a preset condition may be selected from the second candidate storage node pool.
In some embodiments, the preset condition includes at least one of a remaining disk memory being greater than a preset memory, a disk occupancy being less than a preset occupancy, a hot spot data occupancy being less than a preset occupancy, and a response duration being less than a preset duration.
For example, the preset memory may be 10G, the preset occupancy rate may be 65%, the preset duty ratio may be 50%, and the preset duration may be 5 seconds. For each candidate storage node in the second candidate storage node pool, in the case that the disk remaining memory of the candidate storage node is greater than 10G, the disk occupancy rate is less than 65%, the hot spot data occupancy rate is less than 50%, and the response time is less than 5 seconds, the candidate storage node may be determined as the target migration storage node.
It should be noted that the response time length is a difference between the time when the storage node receives the request information of the requesting party and the time when the storage node feeds back the requested target information to the requesting party. The response time length is used for evaluating the performance of the storage node.
In order to avoid migrating the ledger data to be migrated to other storage nodes to be migrated, for each ledger data to be migrated, selecting a target migration storage node from other storage nodes except the storage node to be migrated includes: determining a first candidate storage node pool except the storage node to be migrated; determining a conflict storage node of a main file or a duplicate file storing the to-be-migrated ledger data; excluding the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool; excluding other storage nodes to be migrated from the second candidate storage node pool to obtain a third candidate storage node pool; and selecting the target migration storage node which meets a preset condition from the third candidate storage node pool.
For example, the manner of selecting, by the ledger data storage service module 120, for each ledger data to be migrated, a target migration storage node from other storage nodes except the storage node to be migrated may include:
inquiring a corresponding target first physical address and a target second physical address according to the account book data identifier of the account book data to be migrated;
excluding the storage nodes corresponding to the target first physical address and the target second physical address to obtain a second candidate storage node pool;
excluding other storage nodes to be migrated from the second candidate storage node pool to obtain a third candidate storage node pool;
and selecting a target migration storage node meeting a preset condition from the third candidate storage node pool.
For example, assume that the storage node cluster is a cluster consisting of storage node a, storage node B, storage node C, storage node D, storage node E, and storage node F. If the ledger data storage service module 120 determines the storage nodes B and E as storage nodes to be migrated. When file migration is carried out on the account book data a to be migrated on the storage node B to be migrated, the corresponding target first physical address and the target second physical address are inquired according to the account book meta information of the account book data a to be migrated. Assume that the target first physical address corresponds to storage node B and the target second physical address corresponds to storage node F. Then the second candidate storage node pool obtained after excluding storage node B and storage node F includes storage node a, storage node C, storage node D, and storage node E. Further, other storage nodes E to be migrated are excluded from the second candidate storage node pool, and a third candidate storage node pool is obtained and includes a storage node A, a storage node C and a storage node D. And then selecting a target migration storage node meeting preset conditions from the storage node A, the storage node C and the storage node D to store the to-be-migrated account book data a.
Fig. 4 is a block diagram illustrating a file migration apparatus according to an exemplary embodiment of the present disclosure. The file migration apparatus is applied to a distributed storage system, where the distributed storage system includes a plurality of storage nodes, and each storage node is used to store block chain ledger data, as shown in fig. 4, the apparatus 400 includes:
a determining module 410, configured to determine, for any one of the storage nodes, whether to perform file migration on at least part of the ledger data stored on the storage node according to at least one of information of a disk remaining memory, a disk occupancy rate, a total ledger data volume, and a hot spot data occupancy rate of the storage node when the total ledger data volume stored on the storage node is greater than a total ledger data volume average value of the plurality of storage nodes;
an executing module 420, configured to perform file migration processing on the storage node when it is determined that file migration is performed on at least part of the ledger data stored on the storage node.
By adopting the device, aiming at any storage node in the distributed storage system for storing the block chain account book data, under the condition that the total account book data volume stored on the storage node is larger than the average value of the total account book data volumes of all the storage nodes, the data storage volume on the storage node is unbalanced with the data storage volumes of other storage nodes. And judging whether to perform file migration on at least part of the account book data stored on the storage node according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account book data volume and the hot spot data occupation rate of the storage node. And under the condition that the file migration of at least part of the ledger data stored on the storage node is determined, performing file migration processing on the storage node. By carrying out file migration processing on the storage node, the total account book data volume of the storage node can be reduced, and the load caused by accessing the account book data stored on the storage node can be reduced. That is to say, by adopting the above method of the present disclosure for each storage node in the distributed storage system, not only the data storage amount of each storage node can be more balanced, but also the load of each storage node can be more balanced.
Optionally, the executing module 420 includes:
the first calculation submodule is used for determining the storage node as a storage node to be migrated, and calculating the total data amount of the account book data to be migrated on the storage node to be migrated according to the total account book data amount stored on the storage node to be migrated, the average value of the total account book data amounts of the plurality of storage nodes, and a preset influence factor;
the first determining submodule is used for determining the number of the account book data to be migrated according to the total data amount and the data amount of the single account book data;
a second determining submodule, configured to determine, from the storage node to be migrated, the to-be-migrated ledger data corresponding to the number;
and the migration submodule is used for selecting a target migration storage node from other storage nodes except the to-be-migrated storage node aiming at each to-be-migrated ledger data and storing the to-be-migrated ledger data to the target migration storage node.
Optionally, the second determining sub-module includes:
the obtaining sub-module is used for obtaining the inquired times of each account data on the storage node to be migrated;
the second calculation sub-module is used for calculating the average query times according to the queried times of each account data on the storage node to be migrated;
a third determining sub-module, configured to determine, for each account book data on the storage node to be migrated, the account book data as candidate account book data to be migrated when it is determined that the number of times of querying that corresponds to the account book data is less than the average number of times of querying, so as to obtain a candidate account book data pool to be migrated;
and the first selection submodule is used for randomly selecting the account book data to be migrated corresponding to the number from the candidate account book data pool to be migrated.
Optionally, the migration submodule includes:
a fourth determining submodule, configured to determine a first candidate storage node pool except the storage node to be migrated;
a fifth determining submodule, configured to determine a conflict storage node of a main file or a duplicate file in which the to-be-migrated ledger data is stored;
a first exclusion submodule, configured to exclude the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool;
and the second selection submodule is used for selecting the target migration storage node which meets the preset condition from the second candidate storage node pool.
Optionally, the migration submodule includes:
a sixth determining submodule, configured to determine a first candidate storage node pool except the storage node to be migrated;
a seventh determining submodule, configured to determine a conflict storage node of a main file or a duplicate file in which the to-be-migrated ledger data is stored;
a second eliminating submodule, configured to eliminate the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool;
a third eliminating submodule, configured to eliminate other storage nodes to be migrated from the second candidate storage node pool, to obtain a third candidate storage node pool;
and the third selection submodule is used for selecting the target migration storage node which meets the preset condition from the third candidate storage node pool.
Optionally, the preset condition includes at least one of that the disk remaining memory is greater than the preset memory, the disk occupancy rate is less than the preset occupancy rate, the hot spot data occupancy rate is less than the preset occupancy rate, and the response duration is less than the preset duration.
Optionally, the determining module 410 includes:
and the input submodule is used for inputting the residual memory of the disk, the occupancy rate of the disk, the data volume of the general ledger and the hot spot data occupation rate into a trained support vector machine to obtain a result of whether the representation output by the support vector machine carries out file migration processing or not.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 is a block diagram illustrating an electronic device 700 according to an example embodiment. The electronic device may be ledger data storage services module 120 shown in fig. 2. As shown in fig. 5, the electronic device 700 may include: a processor 701 and a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.
The processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the file migration method. The memory 702 is used to store various types of data to support operation at the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 705 may thus include: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the file migration method described above.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the file migration method described above is also provided. For example, the computer readable storage medium may be the memory 702 described above including program instructions that are executable by the processor 701 of the electronic device 700 to perform the file migration method described above.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the file migration method described above when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A file migration method is applied to a distributed storage system, the distributed storage system comprises a plurality of storage nodes, each storage node is used for storing block chain ledger data, and the method comprises the following steps:
for any storage node, under the condition that the total account data volume stored on the storage node is larger than the average value of the total account data volumes of the plurality of storage nodes, judging whether to perform file migration on at least part of the account data stored on the storage node according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account data volume and the hot spot data occupation ratio of the storage node;
and under the condition that the file migration of at least part of the account book data stored on the storage node is determined, performing file migration processing on the storage node.
2. The method according to claim 1, wherein the performing the file migration process on the storage node comprises:
determining the storage node as a storage node to be migrated, and calculating the total data amount of the account book data to be migrated on the storage node to be migrated according to the total account book data amount stored on the storage node to be migrated, the total account book data amount mean value of the plurality of storage nodes, and a preset influence factor;
determining the number of the ledger data to be migrated according to the total data amount and the data amount of the single ledger data;
determining the account book data to be migrated corresponding to the number from the storage nodes to be migrated;
and for each piece of ledger data to be migrated, selecting a target migration storage node from other storage nodes except the storage node to be migrated, and storing the ledger data to be migrated to the target migration storage node.
3. The method according to claim 2, wherein the determining, from the storage node to be migrated, the number of the ledger data to be migrated includes:
acquiring the inquired times of each account data on the storage node to be migrated;
calculating the average query times according to the queried times of each account data on the storage node to be migrated;
for each account book data on the storage node to be migrated, determining the account book data as candidate account book data to be migrated under the condition that the queried times corresponding to the account book data are less than the average query times so as to obtain a candidate account book data pool to be migrated;
and randomly selecting the account book data to be migrated corresponding to the number from the candidate account book data pool to be migrated.
4. The method of claim 2, wherein the selecting, for each of the ledger data to be migrated, a target migration storage node from storage nodes other than the storage node to be migrated comprises:
determining a first candidate storage node pool except the storage node to be migrated;
determining a conflict storage node of a main file or a duplicate file storing the to-be-migrated ledger data;
excluding the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool;
and selecting the target migration storage node which meets a preset condition from the second candidate storage node pool.
5. The method of claim 2, wherein the selecting, for each of the ledger data to be migrated, a target migration storage node from storage nodes other than the storage node to be migrated comprises:
determining a first candidate storage node pool except the storage node to be migrated;
determining a conflict storage node of a main file or a duplicate file storing the to-be-migrated ledger data;
excluding the conflicting storage node from the first candidate storage node pool to obtain a second candidate storage node pool;
excluding other storage nodes to be migrated from the second candidate storage node pool to obtain a third candidate storage node pool;
and selecting the target migration storage node which meets a preset condition from the third candidate storage node pool.
6. The method according to claim 4 or 5, wherein the preset condition includes at least one of the disk remaining memory being greater than a preset memory, the disk occupancy being less than a preset occupancy, the hot spot data occupancy being less than a preset occupancy, and the response duration being less than a preset duration.
7. The method according to any one of claims 1 to 5, wherein the determining whether to perform file migration on at least part of the ledger data stored on the storage node according to at least one of the information of the disk remaining memory, the disk occupancy rate, the total ledger data amount, and the hot spot data occupation ratio of the storage node comprises:
and inputting the residual memory of the disk, the occupancy rate of the disk, the data volume of the general ledger and the occupation rate of the hotspot data into a trained support vector machine to obtain a result of whether the representation output by the support vector machine carries out file migration processing or not.
8. A file migration apparatus, wherein the apparatus is applied to a distributed storage system, the distributed storage system includes a plurality of storage nodes, each storage node is used for storing block chain ledger data, and the apparatus includes:
the judging module is used for judging whether to perform file migration on at least part of the account book data stored on the storage nodes according to at least one of the information of the disk residual memory, the disk occupancy rate, the total account book data volume and the hot spot data occupation rate of the storage nodes when the total account book data volume stored on the storage nodes is larger than the average value of the total account book data volumes of the plurality of storage nodes aiming at any storage node;
and the execution module is used for performing file migration processing on the storage node under the condition that at least part of the account book data stored on the storage node is determined to be subjected to file migration.
9. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 7.
CN202111460857.8A 2021-12-02 2021-12-02 File migration method and device, storage medium and electronic equipment Pending CN114138711A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111460857.8A CN114138711A (en) 2021-12-02 2021-12-02 File migration method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111460857.8A CN114138711A (en) 2021-12-02 2021-12-02 File migration method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114138711A true CN114138711A (en) 2022-03-04

Family

ID=80387163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111460857.8A Pending CN114138711A (en) 2021-12-02 2021-12-02 File migration method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114138711A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117938886A (en) * 2024-03-25 2024-04-26 武汉烽火信息集成技术有限公司 Cross-chain block multi-source selection storage method and system based on reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117938886A (en) * 2024-03-25 2024-04-26 武汉烽火信息集成技术有限公司 Cross-chain block multi-source selection storage method and system based on reinforcement learning
CN117938886B (en) * 2024-03-25 2024-07-26 武汉烽火信息集成技术有限公司 Cross-chain block multi-source selection storage method and system based on reinforcement learning

Similar Documents

Publication Publication Date Title
US11296940B2 (en) Centralized configuration data in a distributed file system
US9411814B2 (en) Predictive caching and fetch priority
US8990243B2 (en) Determining data location in a distributed data store
US20140095457A1 (en) Regulating data storage based on popularity
EP3163446B1 (en) Data storage method and data storage management server
US11526494B2 (en) Blockchain-based computing system and method for managing transaction thereof
KR20120018178A (en) Swarm-based synchronization over a network of object stores
US8135918B1 (en) Data de-duplication for iSCSI
WO2020093501A1 (en) File storage method and deletion method, server, and storage medium
KR20200048440A (en) System for providing retrieval service based on blockchain and method of the same
CN107391761B (en) Data management method and device based on repeated data deletion technology
US20170109376A1 (en) Method for managing data using in-memory database and apparatus thereof
US10423495B1 (en) Deduplication grouping
US10142415B2 (en) Data migration
CN113742135A (en) Data backup method and device and computer readable storage medium
CN114138711A (en) File migration method and device, storage medium and electronic equipment
CN114089924B (en) Block chain account book data storage system and method
CN115033551A (en) Database migration method and device, electronic equipment and storage medium
CN110825309B (en) Data reading method, device and system and distributed system
US10853892B2 (en) Social networking relationships processing method, system, and storage medium
KR102225577B1 (en) Method and device for distributed storage of data using hybrid storage
JP5444728B2 (en) Storage system, data writing method in storage system, and data writing program
CN114020218B (en) Hybrid de-duplication scheduling method and system
CN113553314A (en) Service processing method, device, equipment and medium of super-convergence system
CN113392067A (en) Data processing method, device and system for distributed database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination