CN117311620A

CN117311620A - Data processing method, device, equipment and storage medium

Info

Publication number: CN117311620A
Application number: CN202311257163.3A
Authority: CN
Inventors: 王帅阳
Original assignee: Jinan Inspur Data Technology Co Ltd
Current assignee: Jinan Inspur Data Technology Co Ltd
Priority date: 2023-09-26
Filing date: 2023-09-26
Publication date: 2023-12-29

Abstract

The invention relates to the technical field of computers, and discloses a data processing method, a device, equipment and a storage medium, which are applied to an unstructured storage system and comprise the following steps: acquiring a file to be written and an initial storage pool list, wherein a plurality of normal storage pools isolated according to storage nodes are stored in the initial storage pool list; for any normal storage pool, the storage pool capacities of the normal storage pools are identified, and a reference storage pool and a capacity approximation difference are determined based on the storage pool capacities of the normal storage pools, the capacity approximation difference being used to characterize the approximation difference that the storage pool capacities of the respective normal storage pools are at the same data capacity. Screening normal storage pools in the initial storage pool list within the capacity close difference based on the reference storage pool to obtain a target storage list; determining a target storage pool for storing the file to be written from the target storage list based on the file information of the file to be written; the invention can reduce the internal overhead of a single storage pool and reduce the influence of the storage pool faults on the service.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and storage medium.

Background

In a separate storage system for distributed unstructured storage, particularly a big data distributed file system, a storage pool is generally used to store data. However, in a large-scale distributed unstructured storage system, multiple disks are generally managed in one storage pool, so that the storage pool is excessively internally linked, internal consumption is serious, and even the storage pool cannot normally operate. Meanwhile, since the storage pool is closely related to the fault domain, when a problem occurs in the storage pool, the input and output of the whole storage service can be affected.

Disclosure of Invention

In view of this, the present invention provides a data processing method, apparatus, device and storage medium, so as to solve the problem that in the existing distributed unstructured storage, since multiple disks are usually managed in one storage pool, the storage pool is excessively linked, internal consumption is serious, and when the storage pool has a problem, the input and output of the whole namespace service are affected.

In a first aspect, the present invention provides a data processing method applied to an unstructured storage system, the method comprising: acquiring a file to be written and an initial storage pool list, wherein a plurality of storage nodes are stored in the initial storage pool list, at least one normal storage pool is stored in each storage node, and the normal storage pools are isolated according to the storage nodes; identifying the storage pool capacity of the normal storage pool for any normal storage pool, and determining a reference storage pool and a capacity close difference based on the storage pool capacity of the normal storage pool, wherein the capacity close difference is used for representing the close difference that the storage pool capacities of the normal storage pools are in the same data capacity; screening normal storage pools in the initial storage pool list within the capacity close difference based on the reference storage pool to obtain a target storage list; based on file information of the file to be written, a target storage pool for storing the file to be written is determined from the target storage list. By the process, the disk data in the storage pool can be reduced and treated separately, and the stability of the system is ensured; meanwhile, after the storage pools are grouped according to the nodes, a new fault domain is formed, and single-node faults are subjected to fault recovery and processing in the storage pools, so that other storage pools are not influenced, faulty nodes are effectively isolated, and the influence of the storage pool faults on services is reduced.

In some alternative embodiments, determining a target storage pool for storing the file to be written from the target storage list based on file information of the file to be written, includes:

acquiring file information of a file to be written, wherein the file information comprises a file name or a file path;

performing hash operation on the file information to obtain a hash value;

and taking the modulus of the hash value based on the serial numbers of all the storage pools in the target storage list to obtain the target storage pool for storing the files to be written.

In some alternative embodiments, filtering normal storage pools in the initial storage pool list within a close capacity difference based on the reference storage pool to obtain a target storage list includes:

obtaining the storage pool capacity of each normal storage pool in the initial storage pool list;

calculating a capacity difference between a storage pool capacity of the normal storage pool and a storage pool capacity of the reference storage pool;

and comparing the capacity difference with the capacity similar difference, and screening the normal storage pool in the initial storage pool list based on the comparison result to obtain a target storage list.

In some alternative embodiments, filtering the normal storage pool in the initial storage pool list based on the comparison result to obtain the target storage list includes:

When the comparison result represents that the capacity difference is smaller than or equal to the capacity similar difference, screening the normal storage pool from the initial storage pool list;

and obtaining a target storage list based on the screened normal storage pool.

In some alternative embodiments, determining the reference storage pool and the close-capacity difference based on the storage pool capacities of the normal storage pools includes:

acquiring a normal storage pool with the minimum storage pool capacity in an initial storage pool list;

determining a normal storage pool with the minimum storage pool capacity as a reference storage pool;

the close capacity difference is determined based on the storage pool capacity differences for each normal storage pool.

In some alternative embodiments, obtaining an initial storage pool list includes:

acquiring storage nodes to which each data disk belongs;

dividing corresponding storage pools for the data disk based on the storage nodes;

determining an original storage pool list based on the storage pool corresponding to the storage node;

an initial storage pool list is determined based on the storage status of each storage pool in the original storage pool list.

In some alternative embodiments, the initial storage pool list is determined based on the storage status of each storage pool in the original storage pool list;

monitoring the service state of the storage pool to obtain the state of the storage pool;

And when the service state of the storage pool, which is represented by the storage pool state, is abnormal, removing the storage pool, which is in the original storage pool list and is in the abnormal service state, so as to obtain an initial storage pool list.

In a second aspect, the present invention provides a data processing apparatus for use in an unstructured storage system, the apparatus comprising: the system comprises an information acquisition module, an information determination module, a list determination module and a file storage module; the information acquisition module is used for acquiring a file to be written and an initial storage pool list, wherein a plurality of storage nodes are stored in the initial storage pool list, at least one normal storage pool is stored in each storage node, and the normal storage pools are isolated according to the storage nodes; the information determining module is used for identifying the storage pool capacity of the normal storage pool aiming at any normal storage pool, determining a reference storage pool and a capacity approaching difference based on the storage pool capacity of the normal storage pool, wherein the capacity approaching difference is used for representing the approaching difference that the storage pool capacities of all the normal storage pools are in the same data capacity; the list determining module is used for screening normal storage pools in the initial storage pool list and within the similar capacity difference based on the reference storage pool to obtain a target storage list; and the file storage module is used for determining a target storage pool for storing the file to be written from the target storage list based on the file information of the file to be written. By the process, the disk data in the storage pool can be reduced and treated separately, and the stability of the system is ensured; meanwhile, after the storage pools are grouped according to the nodes, a new fault domain is formed, and single-node faults are subjected to fault recovery and processing in the storage pools, so that other storage pools are not influenced, faulty nodes are effectively isolated, and the influence of the storage pool faults on services is reduced.

In a third aspect, the present invention provides a computer device comprising: the data processing system comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the data processing method of the first aspect or any corresponding implementation mode of the first aspect is executed.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the data processing method of the first aspect or any of its corresponding embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an application environment of an embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another data processing method according to an embodiment of the present invention;

FIG. 4 is a flow chart of a further data processing method according to an embodiment of the present invention;

FIG. 5 is a flow chart of a method for processing data according to an embodiment of the present invention;

FIG. 6 is a data processing logic diagram of unstructured distributed storage according to an embodiment of the present invention;

FIG. 7 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first and second in the description and claims of the invention and in the above-mentioned figures are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The term "plurality" in the present invention may mean at least two, for example, two, three or more, and embodiments of the present invention are not limited.

Referring to fig. 1, fig. 1 is a schematic diagram of an application environment according to an embodiment of the present invention, where the schematic diagram includes a storage server 100 that may include a processor 101 and a memory 102. The storage server 100 may be communicatively coupled to a storage management server 200 via a network 300, the storage management server 200 may be configured to provide services (e.g., management services, etc.) for computing programs installed on clients, and a database 201 may be provided on the storage management server 200 or independent of the storage management server 200 for providing data storage services for the storage management server 200. In addition, the storage management server 200 may have a processing engine 202 running therein, the processing engine 202 being operable to perform the steps performed by the storage management server 200.

Alternatively, the storage server 100 may be, but is not limited to, a terminal capable of calculating data, such as a mobile terminal (e.g., tablet computer), a notebook computer, a PC (Personal Computer ) or the like, and the network may include, but is not limited to, a wireless network or a wired network. Wherein the wireless network comprises: bluetooth, WIFI (Wireless Fidelity ) and other networks that enable wireless communications. The wired network may include, but is not limited to: wide area network, metropolitan area network, storage management server cluster. The storage management server 200 may include, but is not limited to, any hardware device capable of performing calculations.

In addition, in this embodiment, the above-mentioned data processing method may be applied, but not limited to, to an independent processing device with a relatively high processing capability, without performing data interaction. For example, the processing device may be, but is not limited to, a more powerful terminal device, i.e. the individual operations of the data processing method described above may be integrated in a single processing device. The above is merely an example, and is not limited in any way in the present embodiment.

Alternatively, in the present embodiment, the above-described data processing method may be performed by the storage management server 200, may be performed by the storage server 100, or may be performed by both the storage management server 200 and the storage server 100. The data processing method performed by the storage server 100 according to the embodiment of the present invention may be performed by a client installed thereon.

According to an embodiment of the present invention, there is provided a data processing method embodiment, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

In this embodiment, a data processing method is provided, which may be used in the above storage server, where the storage server includes a namespace, and the namespace is bound to a storage pool so as to determine, based on the storage space, the storage pool to which a file to be written is written, and fig. 2 is a flowchart of the data processing method according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

step S201, a file to be written and an initial storage pool list are obtained, a plurality of storage nodes are stored in the initial storage pool list, at least one normal storage pool is stored in each storage node, and the normal storage pools are isolated according to the storage nodes.

As described above, the file to be written and the initial storage pool list are acquired, so that a target storage pool for storing the written file is selected from the normal storage pools in a plurality of storage nodes in the initial storage pool list, thereby ensuring the resource balance among the storage pools, ensuring the stable operation of the system, and because the normal storage pools are isolated according to the storage nodes, the single-node fault is recovered and processed in the storage pool without affecting other storage pools. And the newly created file of the metadata service is written into a normal storage pool, so that the fault node is effectively isolated, and the influence on the service is reduced.

In some alternative embodiments, a file unique identification number may be created based on the file write request, and then a file to be written based on the file unique identification number may be created for writing of file data. When the initial storage pool list is acquired, the storage node to which each data disk belongs can be acquired; dividing corresponding storage pools for the data disk based on the storage nodes; determining an original storage pool list based on the storage pool corresponding to the storage node; and determining an initial storage pool list based on the storage states of all storage pools in the original storage pool list, namely determining a list formed by the storage pools with normal storage states as the initial storage pool list.

Step S202, for any normal storage pool, the storage pool capacity of the normal storage pool is identified, and the reference storage pool and the capacity close difference are determined based on the storage pool capacity of the normal storage pool, and the capacity close difference is used for representing the close difference that the storage pool capacities of the normal storage pools are in the same data capacity.

As described above, by identifying the storage pool capacity of the normal storage pool and determining the reference storage pool and the capacity similar difference based on the storage pool capacity of the normal storage pool, the target storage pool for storing the file to be written is screened from the initial storage pool list based on the reference storage pool and the capacity similar difference, and the situation that the file system is unstable due to unbalanced data storage among the storage pools is avoided.

In some alternative embodiments, the normal storage pool with the smallest storage pool capacity in the original storage pool list may be acquired first; determining a normal storage pool with the minimum storage pool capacity as a reference storage pool; the close capacity difference is determined based on the storage pool capacity differences for each normal storage pool.

Specifically, the storage pool capacity of each normal storage pool in the initial storage pool list may be monitored, so as to obtain the current storage pool capacity of each normal storage pool, and the current storage pool capacities of each normal storage pool are compared, so as to obtain the normal storage pool with the minimum storage pool capacity, and the normal storage pool with the minimum storage pool capacity is determined as the reference storage pool. Queuing the storage pool capacities of the normal storage pools according to the large capacity, then sequentially subtracting the storage pool capacities of the head normal storage pool and the tail normal storage pool to obtain a capacity difference data set, averaging the capacity differences of the storage pools in the capacity difference data set, and multiplying the average value by an adjustment coefficient to obtain a capacity similar difference; wherein the adjustment coefficient may be obtained by multiplying the error coefficient by the total failure coefficient of the storage pool. Optionally, the capacity similar difference may be obtained by adjusting the capacity difference of each storage pool in the capacity difference data set based on the storage pool balance degree determined by the user, and then taking an average value.

In some optional embodiments, the normal storage pool with the largest residual capacity of the storage pool in the original storage pool list may be acquired first, and the normal storage pool with the largest residual capacity is directly determined as the target storage pool; and the normal storage pool with the minimum residual capacity of the storage pool in the original storage pool list can be obtained, and the normal storage pool with the minimum residual capacity is directly determined as the target storage pool. Alternatively, the normal storage pool with the largest remaining capacity may be determined as the reference pool.

And step S203, screening the normal storage pools in the initial storage pool list and in the close capacity difference based on the reference storage pool to obtain a target storage list.

As described above, the normal storage pools in the initial storage pool list within the capacity similar difference are screened based on the reference storage pool, so that a plurality of normal storage pools with similar capacities are selected from the initial storage pool list, the target storage list is obtained based on the plurality of normal storage pools with similar capacities, the determination efficiency of the target storage pool is improved, and the resource balance among the normal storage pools can be ensured.

In some alternative embodiments, the storage pool capacities of the normal storage pools in the initial storage pool list may be obtained first; calculating a capacity difference between a storage pool capacity of the normal storage pool and a storage pool capacity of the reference storage pool; and comparing the capacity difference with the capacity similar difference, and screening the normal storage pool in the initial storage pool list based on the comparison result to obtain a target storage list.

Screening the normal storage pool in the initial storage pool list based on the comparison result, and when the comparison result represents that the capacity difference is smaller than or equal to the capacity similar difference, screening the normal storage pool from the initial storage pool list when the target storage list is obtained; and obtaining a target storage list based on the screened normal storage pool.

Specifically, when the target storage list is acquired, the storage pool capacity of each normal storage pool in the initial storage pool list can be acquired first, the storage pool capacity of the normal storage pool is differenced from the storage pool capacity of the reference storage pool to obtain a capacity difference, then the capacity difference is compared with a capacity similar difference, the normal storage pool smaller than or equal to the capacity similar difference is screened out from the initial storage pool list, and the target storage list is obtained based on the screened normal storage pool.

In step S204, a target storage pool for storing the file to be written is determined from the target storage list based on the file information of the file to be written.

In this embodiment, the target storage pool for storing the file to be written is determined from the target storage list based on the file information of the file to be written, so that the selected target storage pool is recorded into the metadata attribute of the file to be written, so that the subsequent data is written into the target storage pool designated by the file to be written. Alternatively, the normal storage with the smallest storage pool capacity can be directly used as the target storage pool.

According to the data processing method provided by the embodiment, firstly, the files to be written and the initial storage pool list are acquired so as to conveniently select one target storage pool for storing the written files from all normal storage pools of the initial storage pool list, so that the resource balance among all storage pools is ensured, and the stable operation of the system is ensured; the storage pool capacity of the normal storage pool is identified, and the reference storage pool and the capacity similar difference are determined based on the storage pool capacity of the normal storage pool, so that the target storage pool for storing the files to be written is screened out from the initial storage pool list based on the reference storage pool and the capacity similar difference, and the situation that the file system is unstable due to unbalanced data storage among the storage pools is avoided; the normal storage pools in the initial storage pool list and within the similar difference of capacity are screened based on the reference storage pool, so that a plurality of normal storage pools with similar capacity are selected from the initial storage pool list, a target storage list is obtained based on the plurality of normal storage pools with similar capacity, the determination efficiency of the target storage pool is improved, and the resource balance among the normal storage pools can be ensured; the target storage pool for storing the file to be written is determined from the target storage list based on the file information of the file to be written, so that the selected target storage pool is recorded into the metadata attribute of the file to be written, and the subsequent data is written into the target storage pool appointed by the file to be written. Therefore, the invention can reduce the disk data in the storage pool, treat the disk data separately and ensure the stability of the system; meanwhile, after the storage pools are grouped according to the nodes, a new fault domain is formed, and single-node faults are subjected to fault recovery and processing in the storage pools, so that other storage pools are not influenced, faulty nodes are effectively isolated, and the influence of the storage pool faults on services is reduced.

In this embodiment, a data processing method is provided, which may be used in the above storage server, where the storage server includes a namespace, and the namespace is bound to a storage pool so as to determine, based on the storage space, the storage pool to which a file to be written is written, and fig. 3 is a flowchart of the data processing method according to an embodiment of the present invention, and as shown in fig. 3, the flowchart includes the following steps:

step S301, a file to be written and an initial storage pool list are obtained, a plurality of storage nodes are stored in the initial storage pool list, at least one normal storage pool is stored in each storage node, and the normal storage pools are isolated according to the storage nodes.

Specifically, the step S301 includes:

step S3011, obtaining storage nodes to which each data disk belongs.

As described above, the storage nodes to which the data disks belong are acquired, so that the storage pool is conveniently divided based on the storage nodes, the isolation of the data disks among different storage nodes can be realized, and the fault domain of the storage nodes is reduced.

In some alternative embodiments, the division of the storage nodes to which the data disks belong may be performed based on the number of the data disks, the division of the storage nodes to which the data disks belong may be performed based on the capacity of the data disks, and the division of the storage nodes to which the data disks belong may be performed based on the number of the data disks, the performance of the data disks, and the capacity of the data disks.

When the storage nodes to which the data disks belong are divided based on the number of the data disks, the plurality of data disks may be divided into the storage nodes on average based on the number of the storage nodes. When dividing storage nodes to which data disks belong based on the capacity of the data disks, a plurality of data disks may be divided into the storage nodes based on the number of the storage nodes and the capacity of each data disk so that the total storage capacities among the storage nodes are similar. When the data disk belonging storage nodes are divided based on the number of the data disks, the performance of the data disks and the capacity of the data disks, the data disks with the first capacity can be divided into one storage node based on the performance of the data disks, the data disks with the second capacity can be divided into another storage node based on the performance of the data disks, and therefore the balance of the storage capacity among the storage nodes is ensured.

It can be understood that when the number and capacity of the data disks do not satisfy the relative balanced allocation, the storage capacity of one of the storage nodes can be made larger, and the storage node is preferentially used for writing data later, and the data disk which does not satisfy the balanced allocation can be determined as an independent storage node for storing the data with smaller data volume.

Step S3012, dividing a corresponding storage pool for the data disk based on the storage node.

As above, the data storage pools corresponding to the data disk are divided based on the storage nodes, so that data isolation and fault isolation among different storage pools are realized, and therefore, faults of a single storage node are recovered and processed in the storage pools without affecting other storage pools.

In some alternative embodiments, the plurality of storage nodes may be divided equally into a plurality of storage pools to reduce the size of a single pool, which is too large and the overhead of internal connections is relatively large. So using a packet multi-pool technique, for example, one storage pool per 10 storage nodes, a 30 node cluster may be made with 3 storage pools, 3 storage pools being bound to a namespace to facilitate the determination of the target storage pool for storing the file to be written.

Step S3013, determining an original storage pool list based on the storage pools corresponding to the storage nodes.

As described above, the original storage pool list is determined based on the storage pool corresponding to the storage node, so that the initial storage pool list is determined based on the original storage pool list.

In step S3014, an initial storage pool list is determined based on the storage status of each storage pool in the original storage pool list.

As described above, the initial storage pool list is determined based on the storage status of each storage pool in the original storage pool list, so that the target storage pool for storing the file to be written is determined based on the initial storage pool list.

In some optional embodiments, the service status of each storage pool in the initial storage pool list may be monitored to obtain the storage pool status of each storage pool; and when the storage pool state of the storage pool represents that the service state of the storage pool is abnormal, eliminating the storage pool with the abnormal service state in the original storage pool list to obtain an initial storage pool list. The service state exception may be that the number of failed storage nodes in the storage pool reaches a failure threshold, or that the storage pool is not serviceable, and the service is not serviceable to indicate that the current storage node or the storage pool cannot complete writing of data within a threshold time.

In some optional embodiments, when determining the storage pool state of each storage pool, the target data may be used to perform a read-write test on the data disk in each storage node in the storage pool, so as to obtain a test result; when the test result represents that the target data is successfully read and written, adding a label for successfully testing to each storage node to represent that the service state of the current storage node is normal; when the test result represents that the target data fails to read and write, adding a label of test failure to each storage node to represent that the service state of the current storage node is abnormal. The read-write test of the target data can be a read-write test of continuous data or a read-write test of random data.

In step S302, for any normal storage pool, the storage pool capacities of the normal storage pools are identified, and a reference storage pool and a capacity close difference are determined based on the storage pool capacities of the normal storage pools, where the capacity close difference is used to characterize that the storage pool capacities of the normal storage pools are in a close difference of the same data capacity.

Please refer to step S202 in the embodiment shown in fig. 2, which is not described herein.

Step S303, screening the normal storage pool in the initial storage pool list, which is in the capacity close difference, based on the reference storage pool to obtain a target storage list.

Please refer to step S203 in the embodiment shown in fig. 2 in detail, which is not described herein.

Step S304, determining a target storage pool for storing the file to be written from the target storage list based on the file information of the file to be written.

In this embodiment, the target storage pool for storing the file to be written is determined from the target storage list based on the file information of the file to be written, so that the selected target storage pool is recorded into the metadata attribute of the file to be written, so that the subsequent data is written into the target storage pool designated by the file to be written.

Please refer to step S204 in the embodiment shown in fig. 2 in detail, which is not described herein.

According to the data processing method provided by the embodiment, firstly, the file to be written and the initial storage pool list are acquired so as to conveniently select a target storage pool for storing the written file from the normal storage pools of the initial storage pool list, so that the resource balance among the storage pools is ensured, and the stable operation of the system is ensured; the storage pool capacity of the normal storage pool is identified, and the reference storage pool and the capacity similar difference are determined based on the storage pool capacity of the normal storage pool, so that the target storage pool for storing the files to be written is screened out from the initial storage pool list based on the reference storage pool and the capacity similar difference, and the situation that the file system is unstable due to unbalanced data storage among the storage pools is avoided; the normal storage pools in the initial storage pool list and within the similar difference of capacity are screened based on the reference storage pool, so that a plurality of normal storage pools with similar capacity are selected from the initial storage pool list, a target storage list is obtained based on the plurality of normal storage pools with similar capacity, the determination efficiency of the target storage pool is improved, and the resource balance among the normal storage pools can be ensured; the target storage pool for storing the file to be written is determined from the target storage list based on the file information of the file to be written, so that the selected target storage pool is recorded into the metadata attribute of the file to be written, and the subsequent data is written into the target storage pool appointed by the file to be written. Therefore, the invention can reduce the disk data in the storage pool, treat the disk data separately and ensure the stability of the system; meanwhile, after the storage pools are grouped according to the nodes, a new fault domain is formed, and single-node faults are subjected to fault recovery and processing in the storage pools, so that other storage pools are not influenced, faulty nodes are effectively isolated, and the influence of the storage pool faults on services is reduced.

In this embodiment, a data processing method is provided, which may be used in the above storage server, where the storage server includes a namespace, and the namespace is bound to a storage pool so as to determine, based on the storage space, the storage pool to which a file to be written is written, and fig. 4 is a flowchart of the data processing method according to an embodiment of the present invention, and as shown in fig. 4, the flowchart includes the following steps:

step S401, a file to be written and an initial storage pool list are obtained, a plurality of storage nodes are stored in the initial storage pool list, at least one normal storage pool is stored in each storage node, and the normal storage pools are isolated according to the storage nodes.

Please refer to step S301 in the embodiment shown in fig. 3 in detail, which is not described herein.

In step S402, for any normal storage pool, the storage pool capacities of the normal storage pools are identified, and a reference storage pool and a capacity approach difference are determined based on the storage pool capacities of the normal storage pools, where the capacity approach difference is used to characterize that the storage pool capacities of the respective normal storage pools are in the same data capacity approach difference.

Step S403, screening the normal storage pool in the initial storage pool list within the capacity close difference based on the reference storage pool to obtain a target storage list.

Specifically, the step S403 includes:

in step S4031, the storage pool capacities of the normal storage pools in the initial storage pool list are acquired.

As above, the target storage list is determined based on the storage capacities of the respective normal storage pools by obtaining the storage pool capacities of the respective normal storage pools in the initial storage pool list.

In step S4032, the capacity difference between the storage pool capacity of the normal storage pool and the storage pool capacity of the reference storage pool is calculated.

As above, the storage pool capacity of the normal storage pool is calculated from the capacity difference of the storage pool capacity of the reference storage pool, so as to determine the normal storage pool with the similar storage capacity based on the capacity difference.

In some alternative embodiments, the storage pool capacities of the normal storage pools in the initial storage pool list may be obtained first, and the storage pool capacities of the normal storage pools and the reference storage pool may be differentiated to obtain the capacity difference.

And step S4033, comparing the capacity difference with the capacity similar difference, and screening the normal storage pool in the initial storage pool list based on the comparison result to obtain a target storage list.

As described above, the capacity difference is compared with the capacity similar difference, and the normal storage pool in the initial storage pool list is screened based on the comparison result, so that the target storage list is obtained based on the plurality of normal storage pools with similar capacities, the determination efficiency of the target storage pool is improved, and the resource balance among the normal storage pools can be ensured.

In some optional embodiments, when the comparison result represents that the capacity difference is less than or equal to the capacity close difference, the normal storage pool is screened out from the initial storage pool list; and obtaining a target storage list based on the screened normal storage pool.

In step S404, a target storage pool for storing the file to be written is determined from the target storage list based on the file information of the file to be written.

In this embodiment, a data processing method is provided, which may be used in the above storage server, where the storage server includes a namespace, and the namespace is bound to a storage pool so as to determine, based on the storage space, the storage pool to which a file to be written is written, and fig. 5 is a flowchart of the data processing method according to an embodiment of the present invention, and as shown in fig. 5, the flowchart includes the following steps:

step S501, a file to be written and an initial storage pool list are obtained, a plurality of storage nodes are stored in the initial storage pool list, at least one normal storage pool is stored in each storage node, and the normal storage pools are isolated according to the storage nodes.

Please refer to step S201 in the embodiment shown in fig. 2 in detail, which is not described herein.

Step S502, for any normal storage pool, identifies the storage pool capacity of the normal storage pool, and determines a reference storage pool and a capacity close difference based on the storage pool capacity of the normal storage pool, where the capacity close difference is used to characterize the close difference that the storage pool capacities of the normal storage pools are at the same data capacity.

Step S503, based on the reference storage pool, screening the normal storage pool in the initial storage pool list, which is in the capacity close difference, to obtain the target storage list.

Please refer to step S403 in the embodiment shown in fig. 4 in detail, which is not described herein.

In step S504, a target storage pool for storing the file to be written is determined from the target storage list based on the file information of the file to be written.

Specifically, the step S504 includes:

in step S5041, file information of the file to be written is acquired, the file information including a file name or a file path.

As described above, the target storage pool is determined by acquiring file information of the file to be written so as to be based on the file name or the file path of the file to be written.

In step S5042, a hash operation is performed on the file information to obtain a hash value.

As described above, by performing the hash operation on the file name or the file path of the file to be written, the unique target storage pool is determined in the target storage list, and the uniqueness obtained by the storage pool is ensured.

In step S5043, the hash value is modulo based on the number of each storage pool in the target storage list, to obtain the target storage pool for storing the file to be written.

As described above, by hashing the file name of the file to be written to the unique storage pool in the target storage pool list, the reliability of allocating the storage pool for the file to be written is ensured.

In some alternative embodiments, the file name of the file to be written may also be obtained first; performing hash operation on the file name to obtain a hash value; and taking the modulus of the hash value based on the serial numbers of all the storage pools in the target storage list to obtain the target storage pool for storing the files to be written. It will be appreciated that by jotting the file path of the file to be written to the unique storage pool in the target storage pool list, the reliability of allocating storage pools for the file to be written is ensured.

In some alternative embodiments, a file path of a file to be written may also be acquired first; performing hash operation on the file path to obtain a hash value; and taking the modulus of the hash value based on the serial numbers of all the storage pools in the target storage list to obtain the target storage pool for storing the files to be written. It will be appreciated that by jotting the file path of the file to be written to the unique storage pool in the target storage pool list, the reliability of allocating storage pools for the file to be written is ensured. A storage pool may also be selected from the target storage list by a random function to determine as the target storage pool.

In some optional embodiments, the content corresponding to the file information of the file to be written may be obtained and encoded, so as to obtain a content feature vector of the file to be written, and the content feature vector is modulo-removed and divided by the maximum number of the target storage pool list, so as to obtain the target storage pool for storing the file to be written.

In some alternative embodiments, the logic of an unstructured distributed storage namespace multi-pool storage deployment, as shown in figure 6,

firstly, establishing a data pool aiming at different node groups, and isolating nodes among the data pools; the naming space binding storage pool is used for persisting information such as a designated storage pool, a strategy and the like to a metadata storage pool through configuration management nodes; the monitoring node collects the states of all storage pools at regular time and synchronizes to the metadata service; the metadata service starts loading configuration information, and when the creation request is processed, a final storage pool of the file is determined through file attribution calculation according to the monitoring state and the configuration strategy.

Further, for the monitoring node: and periodically sending a query request to the storage pool, collecting the self state of the storage pool by the storage pool, and returning the self state to the monitoring node, wherein the collected information mainly comprises the storage pool water level and the storage pool state, and synchronizing the storage pool basic information into the metadata service after the monitoring node collects the storage pool information.

Further, for the configuration management node: a user configures a namespace multi-storage pool policy; the multi-storage pool configuration of the naming space mainly comprises a storage pool list specified by the naming space, a configuration strategy and similar difference of capacity, and the GB is taken as a unit. The configuration strategy mainly comprises a capacity strategy and a storage pool with low data writing capacity, so that the capacity balance of each storage pool is ensured, and other strategies can be supported subsequently; serializing the configuration strategy and then storing the configuration strategy into a metadata storage pool; and refreshing the configuration to the metadata server.

Further, for metadata service, when a service request written by a file is received, after a unique identifier of the file is created based on the service request, the file attribution calculation is entered, firstly, the metadata service excludes an abnormal storage pool according to the state information of the storage pool pushed by the monitoring node; selecting a storage pool with the smallest capacity; and screening the storage pool list with similar capacity from the normal storage pool list according to the similar capacity difference. And according to the file names, a designated unique storage pool, namely a target storage pool, is hashed in the storage pool list with similar capacities, and the target storage pool is bound with file attributes and is persisted to a metadata pool. The subsequent file data is written into the designated data pool.

In this embodiment, a data processing device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a data processing apparatus, as shown in fig. 7, including:

the information obtaining module 701 is configured to obtain a file to be written and an initial storage pool list, where a plurality of storage nodes are stored in the initial storage pool list, each storage node stores at least one normal storage pool, and the normal storage pools are isolated according to the storage nodes.

The information determining module 702 is configured to identify, for any normal storage pool, a storage pool capacity of the normal storage pool, and determine, based on the storage pool capacities of the normal storage pools, a reference storage pool and a capacity near difference, where the capacity near difference is used to characterize that the storage pool capacities of the normal storage pools are in a similar difference of the same data capacity.

The list determining module 703 is configured to filter, based on the reference storage pool, normal storage pools in the initial storage pool list that are within the capacity close difference, to obtain a target storage list.

The file storage module 704 is configured to determine a target storage pool for storing the file to be written from the target storage list based on the file information of the file to be written.

In some alternative embodiments, the file storage module 704 includes:

and the file information acquisition unit is used for acquiring file information of the file to be written, wherein the file information comprises a file name or a file path.

And the hash operation unit is used for carrying out hash operation on the file information to obtain a hash value.

And the storage pool determining unit is used for taking the modulus of the hash value based on the numbers of all the storage pools in the target storage list to obtain the target storage pool for storing the file to be written.

In some alternative embodiments, the list determination module 703 includes:

and the storage pool capacity acquisition unit is used for acquiring the storage pool capacity of each normal storage pool in the initial storage pool list.

And the capacity difference calculation unit is used for calculating the capacity difference between the storage pool capacity of the normal storage pool and the storage pool capacity of the reference storage pool.

And the difference comparison unit is used for comparing the capacity difference with the capacity similar difference, and screening the normal storage pool in the initial storage pool list based on the comparison result to obtain a target storage list.

In some alternative embodiments, the difference comparison unit includes:

and the storage pool screening subunit is used for screening the normal storage pool from the initial storage pool list when the comparison result represents that the capacity difference is smaller than or equal to the capacity close difference.

And the storage list determining subunit is used for obtaining a target storage list based on the screened normal storage pool.

In some alternative embodiments, the information determination module 702 includes:

and the storage pool acquisition unit is used for acquiring the normal storage pool with the minimum storage pool capacity in the initial storage pool list.

And the reference storage pool determining unit is used for determining the normal storage pool with the minimum storage pool capacity as the reference storage pool.

And the close difference determining unit is used for determining the close difference of the capacity based on the storage pool capacity difference of each normal storage pool.

In some alternative embodiments, the information acquisition module 701 includes:

and the storage node acquisition unit is used for acquiring the storage nodes to which the data discs belong.

And the storage pool dividing unit is used for dividing the corresponding storage pools for the data disk based on the storage nodes.

And the first list determining unit is used for determining an original storage pool list based on the storage pool corresponding to the storage node.

And the second list determining unit is used for determining an initial storage pool list based on the storage states of the storage pools in the original storage pool list.

In some alternative embodiments, the second list determining unit includes:

and the storage pool state monitoring subunit is used for monitoring the service state of the storage pool to obtain the storage pool state.

And the storage pool list determining subunit is used for rejecting the storage pool with the abnormal service state in the original storage pool list to obtain an initial storage pool list when the service state of the storage pool with the abnormal service state is represented by the storage pool state.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The data processing apparatus in this embodiment is presented in the form of a functional unit, where a unit refers to an ASIC (application specific integrated circuit) circuit, a processor and a memory that execute one or more software or firmware programs, and/or other devices that can provide the above-described functions.

The embodiment of the invention also provides computer equipment, which is provided with the data processing device shown in the figure 7.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 8, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., determined as an array of storage servers, a set of blade storage servers, or a multiprocessor system). One processor 10 is illustrated in fig. 8.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, server clusters, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A data processing method for application to an unstructured storage system, the method comprising:

acquiring a file to be written and an initial storage pool list, wherein a plurality of storage nodes are stored in the initial storage pool list, at least one normal storage pool is stored in each storage node, and the normal storage pools are isolated according to the storage nodes;

identifying the storage pool capacity of the normal storage pool for any normal storage pool, and determining a reference storage pool and a capacity similar difference based on the storage pool capacity of the normal storage pool, wherein the capacity similar difference is used for representing that the storage pool capacities of all the normal storage pools are in similar differences of the same data capacity;

screening normal storage pools in the initial storage pool list within the capacity close difference based on the reference storage pool to obtain a target storage list;

and determining a target storage pool for storing the file to be written from the target storage list based on the file information of the file to be written.

2. The method of claim 1, wherein the determining a target storage pool for storing the file to be written from the target storage list based on file information of the file to be written comprises:

acquiring file information of the file to be written, wherein the file information comprises a file name or a file path;

performing hash operation on the file information to obtain a hash value;

and taking the modulus of the hash value based on the serial numbers of all the storage pools in the target storage list to obtain the target storage pool for storing the file to be written.

3. The method of claim 1, wherein the screening normal storage pools in the initial storage pool list that are within the close-capacity difference based on the reference storage pool to obtain a target storage list comprises:

calculating a capacity difference between the storage pool capacity of the normal storage pool and the storage pool capacity of the reference storage pool;

and comparing the capacity difference with the capacity similar difference, and screening a normal storage pool in the initial storage pool list based on a comparison result to obtain a target storage list.

4. The method of claim 3, wherein the screening the normal storage pools in the initial storage pool list based on the comparison result to obtain the target storage list comprises:

and obtaining the target storage list based on the screened normal storage pool.

5. The method of claim 1, wherein the determining a reference storage pool and a capacity affinity difference based on a storage pool capacity of the normal storage pool comprises:

acquiring a normal storage pool with the minimum storage pool capacity in the initial storage pool list;

determining the normal storage pool with the minimum storage pool capacity as the reference storage pool;

and determining the capacity similar difference based on the storage pool capacity difference of each normal storage pool.

6. The method of claim 1, wherein obtaining an initial storage pool list comprises:

acquiring storage nodes to which each data disk belongs;

the initial storage pool list is determined based on the storage status of each storage pool in the original storage pool list.

7. The method of claim 6, wherein the initial storage pool list is determined based on a storage status of each storage pool in the original storage pool list;

monitoring the service state of the storage pool to obtain a storage pool state;

and when the storage pool state represents that the service state of the storage pool is abnormal, eliminating the storage pool with the abnormal service state in the original storage pool list to obtain the initial storage pool list.

8. A data processing apparatus for application to an unstructured storage system, the apparatus comprising:

the information acquisition module is used for acquiring a file to be written and an initial storage pool list, wherein a plurality of storage nodes are stored in the initial storage pool list, at least one normal storage pool is stored in each storage node, and the normal storage pools are isolated according to the storage nodes;

the information determining module is used for identifying the storage pool capacity of the normal storage pool aiming at any normal storage pool, determining a reference storage pool and a capacity similar difference based on the storage pool capacity of the normal storage pool, wherein the capacity similar difference is used for representing the similar difference that the storage pool capacities of all the normal storage pools are in the same data capacity;

The list determining module is used for screening normal storage pools in the initial storage pool list and within the capacity close difference based on the reference storage pool to obtain a target storage list;

and the file storage module is used for determining a target storage pool for storing the file to be written from the target storage list based on the file information of the file to be written.

9. A computer device, comprising:

a memory and a processor in communication with each other, the memory having stored therein computer instructions which, upon execution, cause the processor to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1 to 7.