WO2023029485A1 - 数据处理方法、装置、计算机设备及计算机可读存储介质 - Google Patents

数据处理方法、装置、计算机设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2023029485A1
WO2023029485A1 PCT/CN2022/086063 CN2022086063W WO2023029485A1 WO 2023029485 A1 WO2023029485 A1 WO 2023029485A1 CN 2022086063 W CN2022086063 W CN 2022086063W WO 2023029485 A1 WO2023029485 A1 WO 2023029485A1
Authority
WO
WIPO (PCT)
Prior art keywords
gateway
gateway device
data processing
storage
cluster
Prior art date
Application number
PCT/CN2022/086063
Other languages
English (en)
French (fr)
Inventor
沈姝
吴启庆
杨建�
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22862631.3A priority Critical patent/EP4383076A1/en
Publication of WO2023029485A1 publication Critical patent/WO2023029485A1/zh
Priority to US18/590,120 priority patent/US20240205292A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices

Definitions

  • the present application relates to the field of storage technologies, and in particular to a data processing method, device, computer equipment, and computer-readable storage medium.
  • a storage system with a storage-computing separation architecture has evolved on the basis of a storage system with a storage-computing integrated architecture.
  • a storage system that separates storage from computing includes a computing cluster, a load balancing cluster, and a storage cluster.
  • the computing cluster accesses the storage cluster through the data forwarding cluster.
  • the process of data processing in a storage system that separates storage from computing can be as follows: the computing nodes in the computing cluster send data processing requests to the load balancing device in the data forwarding cluster, and the load balancing device adopts a load balancing Select a gateway device from multiple gateway devices, and forward the data processing request to the selected gateway device, and the gateway device sends the data processing request to the storage node in the storage cluster, and the storage node processes the data processing request , such as reading data or storing data.
  • the load balancing device in the data forwarding cluster can be a load balancer, for example, the F5 load balancer provided by F5 Networks, or the load balancer Nginx, or a Linux virtual server (Linux virtual server, LVS) can also be deployed.
  • a load balancer for example, the F5 load balancer provided by F5 Networks, or the load balancer Nginx, or a Linux virtual server (Linux virtual server, LVS) can also be deployed.
  • the present application provides a data processing method, device, computer equipment, and storage medium, capable of constructing a storage system that separates storage from computation.
  • the technical solution is as follows:
  • a data processing method is provided, the method is applied to a storage system that separates storage from computing, the storage system includes a computing cluster and a storage cluster, the method is executed by computing nodes in the computing cluster, and the The method includes: receiving a data processing request, the data processing request indicating processing a file; determining a first gateway device from a plurality of gateway devices in the storage cluster; sending the data processing request to the first gateway device , the first gateway device forwards the data processing request to a storage node in the storage cluster to process the file.
  • the computing node in the computing cluster realizes the function of the load balancer, so that the computing node can select a gateway device from the multiple gateway devices in the storage cluster to send a data processing request, and the computing node does not need to pass through the load balancing cluster during data processing
  • the gateway device is selected for data processing requests, so that when building a storage system that separates storage and computing, there is no need to deploy a load balancing cluster, which reduces the cost of building a storage system that separates storage and computing.
  • the computing node records the correspondence between the plurality of gateway devices and indexes, and each of the plurality of gateway devices corresponds to an index; the slave Among the plurality of gateway devices in the storage cluster, determining the first gateway device includes: performing hash calculation on the identifier of the file carried in the data processing request to obtain a hash value of the file; based on the hash value, and acquire a target index; based on the correspondence between the plurality of gateway devices and the index, determine the gateway device corresponding to the target index among the plurality of gateway devices as the first gateway device.
  • the correspondence between the plurality of gateway devices and the index includes an identifier of each gateway device in the plurality of gateway devices and an index corresponding to each gateway device, and each gateway device The identification includes the network protocol IP address of each gateway device.
  • the target index is a remainder between the hash value and the number of the plurality of gateway devices.
  • the determining the first gateway device from the multiple gateway devices in the storage cluster includes:
  • the method before determining the first gateway device from the plurality of gateway devices in the storage cluster, the method further includes:
  • each gateway device in the storage cluster Obtain the status of each gateway device in the storage cluster from the monitoring node in the storage cluster, where the status includes any one of an idle state, a busy state, or a fault state;
  • the multiple gateway devices in an idle state are determined.
  • the sending the data processing request to the first gateway device, and the first gateway device forwards the data processing request to the storage nodes in the storage cluster to complete the processing
  • the files to be processed include:
  • the first gateway device is in a busy state or a failure state, based on the recorded states of the plurality of gateway devices, determine a second gateway device from the plurality of gateway devices, and report to the second gateway device Sending the data processing request, the second gateway device forwards the data processing request to the storage nodes in the storage cluster to complete processing the file, the second gateway device is the plurality of gateways Any gateway device in the device that is idle.
  • the second aspect provides a data processing method, characterized in that the method is applied to a storage system that separates storage from computing, and the storage system includes a computing cluster and a storage cluster, and the method is performed by a gateway in the storage cluster Executed by a device, the method includes: receiving a data processing request from a computing node in the computing cluster, where the data processing request indicates processing a file, and the gateway device sends the computing node from a plurality of storage clusters Determined in the gateway device; sending the data processing request to a storage node in the storage cluster, and the storage node processes the file based on the data processing request.
  • the method before receiving the data processing request of the computing node in the computing cluster, the method further includes: sending the status of the gateway device to the monitoring node in the storage cluster,
  • the state includes any one of an idle state, a busy state or a failure state.
  • the storage cluster includes M resource pools, each resource pool includes a master gateway software and N backup gateway software, the gateway device includes master gateway software in a resource pool, and the The main gateway software in the gateway device receives and sends the data processing request, and both the M and the N are integers greater than or equal to 1;
  • the method further includes: if the main gateway software in the gateway device is in a fault state or in a busy state, the gateway monitoring module in the gateway device enables the A standby gateway software in the resource pool where the main gateway software is located, the activated standby gateway software receives and sends the data processing request.
  • the gateway monitoring module in the gateway device enabling a backup gateway software in the resource pool where the main gateway software is located includes: if the gateway device also includes the resource pool where the main gateway software is located Among the K backup gateway software, the gateway monitoring module enables any backup gateway software in the K backup gateway software, and the K is an integer greater than or equal to 1 and less than or equal to the N; or, if the A backup gateway software in the resource pool where the main gateway software is located is deployed in the backup gateway device, and the gateway monitoring module sends an address update request to the backup gateway device, and the address update should instruct the backup gateway device to modify the IP address to The IP address of the gateway device, and enable the backup software in the backup gateway device.
  • a storage system that separates storage from computing is provided, where the storage system includes a computing cluster and a storage cluster, and the computing cluster includes computing nodes;
  • the computing node is configured to receive a data processing request; determine a first gateway device from a plurality of gateway devices in the storage cluster; and send the data processing request to the first gateway device, wherein the data processing request Instruct the file to be processed;
  • the first gateway device is configured to receive the data processing request from a computing node in the computing cluster; send the data processing request to a storage node in the storage cluster, and the storage node processes the request based on the data Requests to be processed in the file.
  • the computing node is also used for:
  • the gateway device corresponding to the target index among the plurality of gateway devices is determined as the first gateway device.
  • the correspondence between the plurality of gateway devices and the index includes an identifier of each gateway device in the plurality of gateway devices and an index corresponding to each gateway device, and each gateway device The identification includes the network protocol IP address of each gateway device.
  • the target index is a remainder between the hash value and the number of the plurality of gateway devices.
  • the computing node is further configured to: randomly select any gateway device from the multiple gateway devices as the first gateway device.
  • the computing node is further configured to: obtain the status of each gateway device in the storage cluster from the monitoring node in the storage cluster, where the status includes idle status, Any one of a busy state or a fault state; based on the state of each gateway device in the storage cluster, determine the plurality of gateway devices in an idle state.
  • the computing node is further configured to: if the first gateway device is in a busy state or a failure state, based on the recorded states of the multiple gateway devices, from the multiple In the gateway device, determine a second gateway device, and send the data processing request to the second gateway device, where the second gateway device is any gateway device in an idle state among the plurality of gateway devices;
  • a gateway device configured to receive the data processing request from a computing node in the computing cluster; send the data processing request to a storage node in the storage cluster, and the storage node executes the data processing request based on the data processing request processing in the above file.
  • the first gateway device is further configured to: send the status of the first gateway device to the monitoring node in the storage cluster, where the status includes idle status, busy status, or any of the fault states.
  • the storage cluster includes M resource pools, each resource pool includes a master gateway software and N backup gateway software, the gateway device includes master gateway software in a resource pool, and the The main gateway software in the first gateway device receives and sends the data processing request, the M and the N are both integers greater than or equal to 1; the gateway monitoring module in the first gateway device is used for if The primary gateway software in the first gateway device is in a failure state or a busy state, and a backup gateway software in the resource pool where the primary gateway software is located is enabled, and the enabled backup gateway software receives and sends the data processing request .
  • the gateway monitoring module is configured to: if the first gateway device also includes K backup gateway software in the resource pool where the main gateway software is located, activate the K backup gateway software Any backup gateway software in the software, the K is an integer greater than or equal to 1 and less than or equal to the N; or, if a backup gateway software in the resource pool where the main gateway software is located is deployed in the backup gateway device, the monitoring module sends The backup gateway device sends an address update request, the address update request instructs the backup gateway device to change the IP address to the IP address of the gateway device, and activates backup software in the backup gateway device.
  • a data processing device configured to execute the above data processing method.
  • the data processing apparatus includes a functional module configured to execute the data processing method provided in the above first aspect or any optional manner of the above first aspect.
  • a data processing device configured to execute the above data processing method.
  • the data processing apparatus includes a functional module for executing the data processing method provided in the second aspect above or in any optional manner of the second aspect above.
  • a computer device includes a processor, and the processor is configured to execute program codes, so that the computer device executes to implement the operations performed by the above data processing method. Specifically, causing the computer device to execute the operations performed by the data processing method provided in the first aspect or any optional manner of the first aspect above.
  • a computer device the computer device includes a processor, and the processor is configured to execute program codes, so that the computer device executes to implement the operations performed by the above data processing method. Specifically, causing the computer device to perform the operations performed by the data processing method provided in the second aspect or any optional manner of the second aspect above.
  • a computer-readable storage medium in which at least one piece of program code is stored, and the program code is read by a processor to enable a computer device to perform operations as performed by the above-mentioned data processing method. Specifically, causing the computer device to execute the operations performed by the data processing method provided in the first aspect or any optional manner of the first aspect above, or causing the computer device to execute the operation to realize the second aspect or the first aspect above Operations performed by the data processing method provided in either optional manner of the two aspects.
  • a computer program product includes at least one piece of program code, the program code is stored in a computer-readable storage medium, and a processor of a computer device reads the program code from the computer-readable storage medium , the processor executes the program code, so that the computer device executes the method provided in the above-mentioned first aspect or various optional implementation manners of the first aspect, or makes the computer device execute the above-mentioned second aspect or each method of the second aspect The method provided in an optional implementation.
  • Fig. 1 is a schematic diagram of a storage system with storage and calculation separation provided by the present application
  • Fig. 2 is a flow chart of a data processing method provided by the present application.
  • Fig. 3 is a flow chart of a data processing method provided by the present application.
  • Fig. 4 is a flow chart of a method for switching between active and standby gateway devices provided by the present application
  • Fig. 5 is a schematic structural diagram of a data processing device provided by the present application.
  • FIG. 6 is a schematic structural diagram of a data processing device provided by the present application.
  • Fig. 7 is a schematic structural diagram of a computer device provided by the present application.
  • Fig. 1 is a schematic diagram of a storage system that separates storage and computing provided by the present application.
  • the system 100 includes at least one computing cluster 101 and at least one storage cluster 102, where each computing cluster 101 can correspond to one or more One storage cluster 102, one storage cluster 102 may also correspond to one or more computing clusters 101.
  • the following embodiments are described by taking a computing cluster 101 that can access a storage cluster 102 and a storage cluster 102 that can be accessed by one or more computing clusters 101 as an example.
  • Each computing cluster 101 includes a plurality of computing nodes 1011 , and each computing node 1011 is used to provide data processing services for users, such as storing data provided by users in the storage cluster 102 or reading data from the storage cluster 102 for users.
  • Each computing node 1011 may be composed of at least one server.
  • Each computing cluster 101 can be used to deploy Hadoop database, database tool Hive, computing engine Spark or distributed database Hbase.
  • Each storage cluster 102 includes a plurality of storage nodes 1021 and a plurality of gateway devices 1022, each storage node 1021 is used to provide data storage services and data reading services, and each storage node 1021 may be composed of at least one server.
  • Each gateway device 1022 has the right to access each storage node 1021 in the storage cluster 102 .
  • at least one gateway software is deployed in each gateway device 1022, and each gateway software provides an application programming interface (application programming interface, API) for accessing the storage cluster 102, so that the computing cluster 101 can pass through the gateway in the gateway device 1022.
  • the software accesses the storage nodes 1021 in the storage cluster 102 .
  • the storage cluster 102 in the system 100 may be any storage cluster in a Ceph storage cluster or a FusionStorage distributed storage cluster. If the storage cluster 102 is a Ceph storage cluster, the gateway software deployed in the gateway device 1022 is an object storage gateway (Rados gateway, RGW), the storage node 1021 may be an object storage device (object-based storage device, OSD).
  • Rados gateway, RGW object storage gateway
  • OSD object-based storage device
  • a user can submit a data processing request (such as a read request for reading data) to any computing node 1011 in the computing cluster 101 through a user device. request or a write request for writing data).
  • Any computing node 1011 may select a gateway device 1022 from among the gateway devices 1022 deployed in the storage cluster 102, and forward the data processing request submitted by the user.
  • any computing node 1011 sends the data processing request submitted by the user to the selected gateway device 1022, and the gateway device 1022 forwards the data processing request to at least one storage node 1021 in the storage cluster 102, and the at least one The storage node 1021 processes the data processing request.
  • the computing node 1011 in the computing cluster 101 can select the gateway device 1022 in the storage cluster 102 to forward the data processing request submitted by the user, different data processing requests of the same computing node 1011 can be sent to different gateway devices 1022 in the storage cluster 102, Therefore, when processing multiple concurrent data processing requests, the load balancing of the storage cluster 102 can be guaranteed, therefore, there is no need to deploy a load balancing cluster between the storage cluster 102 and the computing cluster 101, reducing the construction cost of the storage system 100 with separation of storage and computing .
  • each storage cluster 102 further includes a monitoring node 1023, and the monitoring node 1023 is used to collect the status of each gateway device 1022 in the storage cluster 102.
  • the status of a gateway device 1022 includes idle state, busy state or fault state, wherein, the idle state and the busy state are the non-fault states of the gateway device 1022, and the gateway device 1022 can work normally in the non-fault state.
  • the computing node 1011 in the computing cluster 101 can obtain the state of each gateway device 1022 in the storage cluster 102 from the monitoring node 1023, and when the computing node 1011 selects a gateway device 1022 for a data processing request, it can preferentially select the gateway device 1022 that is in an idle state. Gateway device 1022. Wherein, selecting the gateway device 1022 in the idle state can ensure the load balance on the gateway device 1022, so that the data processing request does not need to queue up on the gateway device 1022 in the idle state to wait for forwarding.
  • FIG. 2 In order to further illustrate the process of a computing node selecting a gateway device for a data processing request, refer to the flow chart of a data processing method provided by the present application shown in FIG. 2 . The method is applied to a storage system that separates storage from computing, where the storage system that separates storage from computing may be the storage system shown in FIG. 1 above.
  • Step 201 the computing node receives a data processing request, and the data processing request indicates processing a file.
  • the computing node is any computing node in any computing cluster in the storage system where storage and computing are separated.
  • Any computing node in any computing cluster has a function of selecting a gateway device and a function of accessing the storage cluster through the selected gateway device.
  • any node device can select a gateway device to send a data processing request from multiple gateway devices in the storage cluster, and then the selected gateway device forwards the data processing request to the storage node in the storage cluster to complete the processing of the file. deal with.
  • the following embodiments are described by taking a computing node in any computing cluster selecting a gateway device to access a storage cluster as an example.
  • This file is a file to be processed specified by the user.
  • the data processing request may be a read request or a write request, and if the data processing request is a read request, the data processing request indicates to read the file. If the data processing request is a write request, the data processing request indicates to write the file into the storage cluster in the storage system, that is, to store the file.
  • the data processing request carries the identifier of the file, where the identifier of the file is used to uniquely indicate the file, and may be a name of the file. If the data processing request is a write request, the data processing request also carries the file.
  • the user submits a task to the computing node through a user device, or the user may submit the task on the computing node, and the computing node decomposes the task submitted by the user into multiple data processing requests.
  • the computing node receives a task submitted by a user and processes the task.
  • the computing node needs to process multiple files involved in the task, so the computing node generates a data processing request based on each of the multiple files, and each data processing request Indicates that one of many files should be processed.
  • the computing node takes any data processing request in the multiple data processing requests as the data processing request to be processed, and each data processing request in the multiple data processing requests is processed through the process shown in Figure 2 .
  • the user equipment includes terminal, terminal station, user terminal, user device, access device, subscriber station, subscriber unit, mobile station, user agent, user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the user equipment may be a mobile phone, a notebook computer, a tablet computer, a desktop computer, a smart TV, a smart wearable device, a computer, an artificial intelligence (AI) product, a smart car, a smart instrument, or the internet of things (internet of things, IoT) terminals, etc.
  • AI artificial intelligence
  • Step 202 the computing node determines a first gateway device from multiple gateway devices in the storage cluster.
  • the first gateway device is a gateway device selected by the computing node from the plurality of gateway devices for forwarding the data processing request.
  • a manner for the computing node to determine the first gateway device from the plurality of gateway devices includes any one of the following manners A or B.
  • the calculation node selects the first gateway device through hash calculation.
  • the calculation node records the corresponding relationship between the multiple gateway devices and the index, and each gateway device among the multiple gateway devices corresponds to an index.
  • the correspondence between the plurality of gateway devices and the index includes an identifier of each gateway device among the plurality of gateway devices and an index corresponding to each gateway device, and the identifier of each gateway device Including the Internet protocol (Internet protocol, IP) address of each gateway device.
  • the identifier of each gateway device further includes a port identifier of a communication port of each gateway device.
  • a gateway mapping (map) table is stored in the computing node, and the gateway mapping table records the correspondence between the plurality of gateway devices and indexes.
  • the gateway mapping table shown in Table 1 below, where M is an integer greater than or equal to 1.
  • manner A may be implemented through the following steps A1-A3.
  • Step A1 the calculation node performs hash calculation on the identifier of the file carried in the data processing request to obtain the hash value of the file.
  • Step A2 the computing node obtains the target index based on the hash value.
  • the target index is also the index corresponding to the first gateway device.
  • the target index is a remainder between the hash value and the numbers of the plurality of gateway devices.
  • the calculation node calculates the remainder of the hash value and M, and the obtained remainder is also the target index.
  • the remainder result is also the remainder between the hash value and the number of the plurality of gateway devices.
  • Step A3 The computing node determines the gateway device corresponding to the target index among the plurality of gateway devices as the first gateway device based on the correspondence between the plurality of gateway devices and the index.
  • the computing node After obtaining the target index, the computing node queries the identifier of the gateway device corresponding to the target index in the gateway mapping table, and uses the gateway device indicated by the identifier as the first gateway device. For example, if the target index is 2, the computing node uses the gateway device 2 corresponding to the target index 2 as the first gateway device.
  • the computing node selects the first gateway device based on a random selection rule.
  • any gateway device randomly selected by the computing node from the plurality of gateway devices is used as the first gateway device.
  • the computing node determines the gateway corresponding to any index in the gateway mapping table as the first gateway device.
  • the first gateway device can also be selected according to the load of each gateway device, that is, the computing node can also implement the load balancing function, according to the load of each gateway device.
  • the load balances the business processing situation, gives full play to the role of each gateway device, and improves the utilization rate of the devices in the entire system.
  • Step 203 the computing node sends the data processing request to the first gateway device, and the first gateway device forwards the data processing request to the storage nodes in the storage cluster to process the file.
  • Step 204 the first gateway device receives the data processing request.
  • Step 205 the first gateway device sends the data processing request to the storage nodes in the storage cluster.
  • the first gateway device determines at least one storage node from the multiple storage nodes of the storage cluster based on the data processing request, and sends the at least one storage node to the at least one The storage node sends the data processing request.
  • the at least one storage node is a storage node for processing the data processing request.
  • the first gateway device queries the metadata of the file from the recorded metadata of multiple files based on the identifier of the file in the data processing request, wherein the metadata of each file Metadata includes where each file is stored. After querying the metadata of the file, the first gateway device determines, among the plurality of storage nodes, the storage node to which the storage address of the file belongs as the at least one storage node.
  • the storage address of a file includes the storage address of at least one object (object) corresponding to the file, and the at least one object constitutes the file, wherein an object is the smallest unit of data storage in the storage cluster .
  • the original data of each file further includes an identifier of at least one object corresponding to each file, where each object identifier indicates an object and indicates a position of the object in the file.
  • the first gateway device obtains the file from the data processing request, and based on the size of the file, determines the at least one storage node from a plurality of storage nodes in the storage cluster , wherein the size of the remaining storage space of each storage node in the at least one storage node is greater than or equal to the size of the file.
  • the first gateway device may also split the file into at least one object, generate the identifier of the at least one object, and add the identifier of the object to the at least one object.
  • the first gateway device modifies the file in the data processing request to the at least one object with the tag added, and sends the modified data processing request to the at least one storage node.
  • the first gateway device may also create metadata of the file, and add the identifier of the at least one object and the identifier of the file to the metadata of the file.
  • Step 206 the storage node receives the data processing request.
  • Step 207 the storage node processes the file based on the data processing request.
  • the storage node stores at least one object carried in the data processing request, and stores the identifier of the at least one object, the storage address of the at least one object, and the The identifier of the file in the data processing request is stored associatively, so as to establish a correspondence between the file and the at least one object.
  • the storage node queries the storage address of at least one object corresponding to the file according to the correspondence between the file and the object.
  • the storage node acquires the at least one object at the storage address of the at least one object.
  • the storage node After the storage node finishes processing the data processing request, it generates a data processing response and sends the data processing response to the first gateway device, where the data processing response indicates that the file has been processed. If the data processing request is a read request, the data processing response carries at least one object of the file read from the storage node. If the data processing request is a write request, the data processing response carries the stored address of at least one object of the file.
  • the computing nodes in the computing cluster implement the function of the load balancer, so that the computing nodes can select the gateway device from the multiple gateway devices in the storage cluster to send data processing requests, and the computing nodes do not need to pass through the load balancer during data processing.
  • the balanced cluster selects gateway devices for data processing requests, so that when building a storage system that separates storage and computing, it is not necessary to deploy a load balancing cluster, reducing the cost of building a storage system that separates storage and computing.
  • the process shown in FIG. 2 is a process in which the computing node selects a gateway device from the recorded multiple gateway devices of the storage cluster to access the storage cluster.
  • the computing node may also consider the state of the gateway device when selecting the gateway device for a new data processing request.
  • FIG. 3 the flow chart of a data processing method provided by the present application as shown in FIG. 3 below, wherein the process shown in FIG. 3 is the state of multiple gateway devices of the storage cluster based on the computing node records, from the multiple Select a gateway device in the gateway device to access the process in the storage cluster.
  • the method is applied to a storage system that separates storage from computing, where the storage system that separates storage from computing may be the storage system shown in FIG. 1 above.
  • Step 301 for each gateway device in the storage cluster, the gateway monitoring module in each gateway device detects the state of the corresponding gateway software.
  • each gateway device includes a gateway monitoring module
  • each gateway device is provided with gateway software
  • the gateway monitoring module is used to detect the state of the gateway software of the gateway device in real time or periodically, and report the state of the gateway software .
  • the gateway software is used to forward the messages sent by the devices outside the storage cluster to the storage nodes in the storage cluster, so that the devices outside the storage cluster can access the storage nodes in the storage cluster.
  • the gateway software may forward the received data processing request to the storage nodes in the storage cluster.
  • the above step 205 may be executed by gateway software in the first gateway device.
  • the gateway software may be RGW software, or other gateway software capable of providing devices outside the storage cluster with an interface to access the storage cluster.
  • the state of the gateway software in the gateway device includes any one of an idle state, a busy state or a failure state.
  • the idle state means that the number of data processing requests to be forwarded in the gateway device is less than or equal to the first threshold, and if the gateway device in the idle state receives a new data processing request, it can receive a new data processing request in a short time (such as a preset duration) Forward new data processing requests within.
  • the busy state means that the number of data processing requests to be forwarded in the gateway device is greater than the first threshold. If the gateway device in the busy state receives a new data processing request, it cannot forward the new data processing request in a short time. Data processing requests need to be queued in the gateway device for forwarding.
  • the first threshold may be set by a person skilled in the art according to a specific implementation scenario, and here, the embodiment of the present application does not limit the first threshold.
  • the gateway monitoring module in the gateway device detects the state of the gateway software of the gateway device. For example, the gateway monitoring module detects whether the gateway software in the gateway device fails, and if the gateway software fails, it is determined that the gateway software is in a fault state (that is, the gateway device is in a fault state). If the gateway software does not break down, the gateway monitoring module detects the number of data processing requests to be forwarded in the gateway device, and if the number of data processing requests to be forwarded is less than or equal to the first threshold, it indicates that the gateway software is faulty. When the load is light, the gateway monitoring module determines that the gateway software is in an idle state (that is, the gateway device is in an idle state). If the number of data processing requests to be forwarded is greater than the first threshold, it indicates that the load of the gateway software is high, and the gateway monitoring module determines that the gateway software is in a busy state (that is, the gateway device is in a busy state).
  • each gateway monitoring module sends the status of the corresponding gateway software to the monitoring node in the storage system.
  • the gateway software corresponding to the gateway monitoring module refers to the gateway software in the gateway device where the gateway monitoring module is located. Still taking a gateway device as an example, if the gateway monitoring module in the gateway device detects the status of the gateway software in the gateway device, the gateway monitoring module sends a status notification message to the monitoring node, and the status notification message includes An identification of the state the gateway software is in.
  • Each state of the gateway software is represented by the identification of each state, and the identification of different states can have different representations, for example, the identification of the busy state is "00", the identification of the idle state is "01", and the identification of the fault state is Identified as "11".
  • the status notification message may be a heartbeat message or other message types.
  • Step 303 the monitoring node obtains and records the state of the gateway software of each gateway device.
  • a gateway device when the monitoring node receives a status notification message from a gateway device, it associates and stores the status identifier of the gateway software in the status notification message with the IP address of the gateway device.
  • a gateway state table is stored in the monitoring node, and the gateway state table is used to record the state of the gateway software in each gateway device in the storage cluster. Every time the monitoring node receives a gateway After determining the state of the gateway software of the device, record the state of the gateway software of the gateway device in the gateway state table.
  • the storage cluster including M gateway devices refer to the gateway state table shown in Table 2 below.
  • IP address of the gateway device The state of the gateway device IP address of gateway device 1 busy state (00) IP address of gateway device 2 idle state (01) ... ... IP address of gateway device M Fault status(11)
  • the fault state of the gateway software is not obtained by the monitoring node from the status notification message, but the gateway monitoring module in each gateway device in the storage cluster periodically sends a status notification message to the monitoring node .
  • the monitoring module in the gateway device does not send a status notification message to the monitoring node.
  • the monitoring node does not receive the status notification message from the gateway device after a preset period of time, it means that the gateway software in the gateway device is in a fault state, and the monitoring node will check the status of the gateway software in the gateway device. Logged as a fault condition.
  • the preset time length is the sending period of the gateway monitoring module periodically sending the status notification message. In this case, the status notification message may not carry the identification of the fault status.
  • the process shown in the above steps 302-303 is a process in which the gateway devices in the storage cluster actively report the status of their respective gateway software to the monitoring node.
  • the monitoring node queries the status of the gateway software in each gateway device, and records the status of the queried gateway software. For example, the monitoring node sends a query request to each gateway device in the storage cluster, and the query request is used to query the status of the gateway software in the gateway device. After receiving the query request, the gateway monitoring module in each gateway device sends the query request to The monitoring node sends the status of the corresponding gateway software, so that the monitoring node records the status of the gateway software.
  • Step 304 the computing nodes in the computing cluster obtain the status of the gateway software of each gateway device in the storage cluster from the monitoring node in the storage cluster, and use the status of the gateway software of each gateway device as each The state of the gateway device.
  • the computing node is any computing node in the computing cluster, and each computing node in the computing cluster can execute step 304 .
  • the computing node sends a gateway status acquisition request to the monitoring node, and the gateway status acquisition request indicates to acquire the status of each gateway device in the storage cluster.
  • the computing node After receiving the gateway status acquisition request, the computing node sends a gateway status acquisition response to the computing node, wherein the gateway status acquisition response carries the status identifier of each gateway device in the storage cluster and the IP address of each gateway device.
  • the gateway status acquisition response carries the gateway status table in the monitoring node.
  • the computing node receives the gateway state obtaining response from the monitoring node, and obtains an identifier of a state of each gateway device in the storage cluster from the gateway state obtaining response.
  • step 304 is a possible implementation manner for a computing node to obtain the status of each gateway device in the storage cluster.
  • the computing node obtains the state of the gateway software of each gateway device in the storage cluster from other computing nodes in the storage cluster.
  • the first computing node in the computing cluster obtains the status of each gateway device in the storage cluster from the monitoring node, and sends the status of each gateway device in the storage cluster to each computing node in the computing cluster.
  • the computing node receives the status of each gateway device in the storage cluster from the first computing node, wherein the first computing node is any computing node in the storage cluster except the computing node.
  • the storage cluster does not include a monitoring node, and the computing node obtains the status of the gateway software of each gateway device from each gateway device in the storage cluster.
  • the process of the computing node obtaining the state of the gateway software of each gateway device from each gateway device in the storage cluster is similar to the process of obtaining the state of the gateway software of each gateway device by the monitoring node.
  • the embodiment of the present application does not repeat the process of the computing node obtaining the status of the gateway software of each gateway device from each gateway device in the storage cluster.
  • each gateway device in the storage cluster sends the state of the respective gateway software to the first computing node in the computing cluster, and then the computing node can obtain the information of each gateway device in the storage cluster from the first computing node The state of the gateway software.
  • Step 305 the computing node determines multiple gateway devices in an idle state based on the states of each gateway device in the storage cluster.
  • the computing node determines the any gateway device as one of the multiple gateway devices.
  • the computing node records the corresponding relationship between each gateway device in the storage cluster and the index. If any gateway device in the storage cluster is in a busy state or a failure state, the computing node deletes the recorded correspondence between the any gateway device and the index, and The index of each gateway device is used as the index of the previous gateway device of the respective gateway device.
  • the computing node deletes the link between gateway device 1 and index 1 in Table 1. corresponding relationship, and correspond to the identification indexes 1 to M-1 of the gateway devices 2 to M, so that the indexes of the gateway devices 2 to M are updated to the indexes of the gateway devices 1 to M-1, as shown in Table 3 below.
  • the computing node When the computing node deletes the corresponding relationship between the gateway device in the fault state or the busy state and the index, that is, for the computing node, the gateway device in the fault state or busy state in the storage cluster has been deleted. remove.
  • the multiple gateway devices recorded by the computing node at this time are all in an idle state.
  • Step 306 the computing node receives a data processing request, and the data processing request indicates to process the file.
  • this step 306 is the same as the above-mentioned step 201, and here, this embodiment of the application does not repeat this step 301.
  • Step 307 the computing node determines a first gateway device from the multiple gateway devices.
  • this step 307 is the same as the above-mentioned step 202, and here, this embodiment of the application does not repeat this step 307.
  • Step 308 the computing node sends the data processing request to the first gateway device.
  • Step 309 the first gateway device receives the data processing request.
  • Step 310 the first gateway device sends the data processing request to the storage nodes in the storage cluster.
  • this step 310 is the same as the above-mentioned step 205, and here, this embodiment of the application does not repeat this step 310.
  • Step 311 the computing node records the corresponding relationship between the state of each gateway device in the storage system and the index of each gateway device.
  • the computing node may store the identifier of the state of each gateway device in the gateway mapping table, so that the state of each gateway device corresponds to the index of each gateway device.
  • the status of each gateway device is recorded in the gateway mapping table shown in Table 1, and the following Table 4 is obtained.
  • Step 312 the computing node determines a first gateway device from multiple gateway devices in the storage cluster.
  • this step 312 is the same as the above-mentioned step 202, and this embodiment of the present application does not repeat this step 312 here.
  • Step 313 If the first gateway device is in a busy state or a failure state, the computing node determines a second gateway device from among the multiple gateway devices based on the recorded states of the multiple gateway devices, and the second gateway device It is any gateway device in the idle state among the plurality of gateway devices.
  • the computing node queries the gateway mapping table, and determines any gateway device in an idle state recorded in the gateway mapping table as the second gateway device.
  • the computing node determines the gateway device 2 in the idle state as the second gateway device.
  • Step 314 the computing node sends the data processing request to the second gateway device, and the second gateway device forwards the data processing request to the storage nodes in the storage cluster to process the file.
  • Step 315 the second gateway device receives the data processing request.
  • Step 316 the second gateway device sends the data processing request to the storage nodes in the storage cluster.
  • this step 316 is the same as the above-mentioned step 205, and here, this embodiment of the application does not repeat this step 316.
  • Step 317 the storage node receives the data processing request.
  • the data processing request comes from the first gateway device or the second gateway device.
  • Step 318 the storage node processes the file based on the data processing request.
  • This step 318 is the same as the above-mentioned step 207, and this embodiment of the present application does not repeat this step 318 here.
  • the computing nodes in the computing cluster implement the function of the load balancer, so that the computing nodes can select the gateway device from the multiple gateway devices in the storage cluster to send data processing requests, and the computing nodes do not need to pass through load balancing during data processing.
  • the cluster selects gateway devices for data processing requests, so that when building a storage system that separates storage and computing, it is not necessary to deploy a load balancing cluster, which reduces the cost of building a storage system that separates storage and computing.
  • the computing node selects the gateway device, it selects the gateway device in the idle state instead of the gateway device in the busy state or the fault state, so that the data processing request can avoid the gateway device in the busy state waiting for forwarding for a long time It can also be avoided that the failed gateway device does not forward the data processing request, thereby improving the processing efficiency of the data processing request.
  • gateway devices are mutually active and standby. If the gateway device first selected by the computing node is in a fault state, the computing node will again select an idle gateway device to send a data processing request.
  • a storage cluster corresponds to one or more computing clusters
  • each gateway device in the storage cluster includes one or more communication ports
  • each communication port is used to communicate with a computing cluster, That is, it is used to receive messages sent by the computing cluster and send messages to the computing cluster, so that each gateway device can communicate with the one or more computing clusters through one or more communication ports.
  • each resource pool includes N+1 gateway software
  • each resource pool corresponds to an IP address and at least one port identifier
  • each port identifier in the at least one port identifier corresponds to a
  • the computing cluster is used to indicate that a communication port indicated by a port identifier in the gateway device in the storage cluster communicates with the corresponding computing cluster.
  • both M and N are integers greater than 1.
  • the N+1 gateway software in each resource pool is deployed on R gateway devices, and at least one gateway software is deployed in any gateway device in the R gateway devices, where R is greater than or equal to 1 and less than or equal to N+ 1 integer. Relevant technicians randomly select any gateway device from the R gateway devices in each resource pool as a master gateway device in each resource pool, and assign all the R gateway devices in each resource pool except the master gateway device
  • Each of the gateway devices is used as a backup gateway device.
  • the main gateway device and the backup gateway device in the same resource pool In order to facilitate the distinction between the main gateway device and the backup gateway device in the same resource pool, relevant technical personnel configure the IP address of the main gateway device in each resource pool as the IP address corresponding to each resource, and set the IP address of the backup gateway device in each resource pool
  • the IP address of the device is configured as any IP address except the IP address corresponding to each resource pool.
  • the identifier of the communication port used to communicate with the any computing cluster is configured as the port identifier corresponding to the any computing cluster in at least one port identifier corresponding to the any resource pool. And record the port identifier corresponding to the any computing cluster among the IP address corresponding to the any resource pool and at least one port identifier corresponding to the any resource pool in each computing node in the any computing cluster as the any The identifier of the main gateway device of a resource pool.
  • the main gateway device of the M resource pools provides the service of accessing the storage cluster for the computing cluster. It can also be understood that there are M*R gateway devices deployed in the storage cluster, and the M main gateway devices among the M*R gateway devices provide services for the computing cluster to access the storage cluster, while the M*R gateway devices Each backup gateway device of the network does not provide services for computing clusters to access storage clusters.
  • At least one gateway software in a resource pool is deployed in each main gateway device, and each main gateway device randomly selects a gateway software from at least one deployed gateway software to provide computing clusters with access to storage cluster services, That is, each main gateway device randomly starts a gateway software to provide computing clusters with services to access storage clusters.
  • the gateway software enabled by each master gateway device is regarded as the master gateway software in the resource pool to which each gateway device belongs, and all gateway software in each resource pool except the master gateway software are standby gateway software.
  • each resource pool includes one master gateway software and N standby gateway software, wherein the gateway device where the master gateway software of each resource pool is located is the master gateway device of each resource pool.
  • the state of the master gateway device is also the state of the master gateway software in the master gateway device.
  • the master gateway device can switch the currently used master gateway software to the backup gateway software in the resource pool to which it belongs, and switch a backup gateway software in the resource pool to the master Gateway software.
  • FIG. 4 refers to the flow chart of a method for switching an active/standby gateway device provided by the present application shown in FIG. 4 . The method is applied to a storage system that separates storage from computing, where the storage system that separates storage from computing may be the storage system shown in FIG. 1 above.
  • Step 401 the gateway monitoring module of the gateway device detects the state of the master gateway software in the gateway device.
  • the gateway device is the main gateway device of any resource pool in the M resource pools of the storage cluster, and the gateway device includes the main gateway software in the any resource pool.
  • the gateway The master gateway software within the device receives and sends data processing requests.
  • the gateway monitoring module detects the state process of the main gateway software in the gateway device, that is, the process in which the gateway monitoring module detects the state of the gateway software in the gateway device, and the gateway monitoring module detects the gateway software in the gateway device.
  • the process of the current state is described in step 301, and this embodiment of the present application does not repeat this step 401 here.
  • Step 402 If the main gateway software is in a fault state or busy state, the gateway monitoring module activates a backup gateway software in the resource pool where the main gateway software is located, and the activated backup gateway software receives and sends data processing requests.
  • the gateway monitoring module activates the standby gateway software to replace the main gateway software in the fault state or the busy state, so that the activated standby gateway software becomes the new main gateway software in any resource pool. And the main gateway software in the failure state or the busy state is switched to the standby gateway software in any resource pool.
  • the gateway monitoring module enables a standby gateway software in the resource pool where the main gateway software is located, including any one of the output modes 1 or 2.
  • the gateway monitoring module enables any backup gateway software in the K backup gateway software, where K is greater than or equal to 1 and An integer less than or equal to N.
  • the main gateway software and the K backup gateway software are all deployed on the gateway device, the main gateway software and the K backup gateway software share the IP address of the gateway device (that is, the IP address corresponding to any resource pool) , share the same communication port of the gateway device to communicate with the computing cluster, therefore, the gateway monitoring module enables any one of the K backup gateway software to replace the main gateway software in a busy state or a failure state, the gateway The device remains the primary gateway device for any resource pool.
  • the identifier of the master gateway device of any resource pool recorded in each computing node of the computing cluster includes the IP address and port identifier corresponding to the resource pool, the identifier of the master gateway device recorded in it has not changed, so even if The gateway software in the main gateway device is switched, and the computing nodes cannot perceive it.
  • Method 2 If a backup gateway software in the resource pool where the main gateway software is located is deployed on the backup gateway device, the gateway monitoring module sends an address update request to the backup gateway device.
  • the address update please instruct the backup gateway device to modify the IP address is the IP address of the gateway device, and enables the backup software in the backup gateway device.
  • the address update request carries the IP address of the gateway device.
  • the gateway monitoring module can also modify the IP address of the gateway device to any IP address other than the IP addresses corresponding to the M resource pools, so that the main gateway software Switching to the backup gateway software, the primary gateway device also switches to the backup gateway device.
  • the backup gateway device After receiving the address update request, the backup gateway device parses out the IP address of the gateway device from the address update request, and updates the IP address of the backup gateway device to the IP address of the gateway device. After the address update is completed, the IP address of the backup gateway software in the backup gateway device is the IP address of the main gateway device, and the backup gateway device enables the backup gateway software in the backup gateway device. At this time, the backup gateway software in the backup gateway device The backup gateway software becomes the master gateway software in any resource pool, and the backup gateway device becomes the newest master gateway device in any resource pool.
  • each gateway device of the any resource pool recorded in each computing node of the computing cluster includes the IP address corresponding to the any resource pool and the port identification of the communication port, and each gateway device of the any resource pool is related to
  • the identification of the communication port during the computing cluster communication is the identification of the communication port corresponding to any resource, and because the standby gateway device modifies its IP address to the IP address of the main gateway device, it replaces the busy or faulty
  • the main gateway device is not perceived by the computing node.
  • the identifier of the main gateway device recorded by it has not changed.
  • no monitoring node may be deployed in the storage cluster.
  • the backup gateway device does not work. Only when the current master gateway device fails or is busy, the backup gateway device will switch to the master gateway device to start working. Therefore, the monitoring node collects the main network The state of the software is enough, and there is no need to collect the state of the standby gateway software in the resource pool.
  • what the computing node obtains from the monitoring node is the state of the primary gateway software in the resource pool.
  • Step 403 the backup gateway software in the gateway device receives a data processing request from a computing node in the computing cluster, and the data processing request indicates to process the file, and the gateway device receives the data processing request from multiple gateway devices in the storage cluster by the computing node Sure.
  • the gateway device may be the first gateway device determined by the computing node from multiple gateway devices in the storage cluster.
  • the process shown in step 403 is also the process in which the gateway device receives the data processing request of the computing nodes in the computing cluster.
  • Step 404 the backup gateway software sends the data processing request to the storage nodes in the storage cluster.
  • step 404 is also the process in which the gateway device sends the data processing request to the storage nodes in the storage cluster.
  • Step 405 the storage node receives the data processing request.
  • this step 405 is the same as the above-mentioned step 206, and this embodiment of the present application does not repeat this step 405 here.
  • Step 406 the storage node processes the file based on the data processing request.
  • this step 406 is the same as the above-mentioned step 207, and here, this embodiment of the application does not repeat this step 406.
  • the computing nodes in the computing cluster implement the function of the load balancer, so that the computing nodes can select the gateway device from the multiple gateway devices in the storage cluster to send data processing requests, and the computing nodes do not need to pass through the load balancer during data processing.
  • the balanced cluster selects gateway devices for data processing requests, so that when building a storage system that separates storage and computing, it is not necessary to deploy a load balancing cluster, reducing the cost of building a storage system that separates storage and computing.
  • each gateway device recorded by it is in an idle state. Therefore, when selecting a gateway device, the selected gateway device is also in an idle state.
  • the gateway device is not a gateway device in a busy state or a fault state, so that the data processing request can avoid the gateway device in the busy state waiting for forwarding for a long time, and it can also prevent the faulty gateway device from not forwarding the data processing request, thereby improving The processing efficiency of data processing requests.
  • the computing node involved in the present application includes a computer-readable storage medium, and a load balancing software package is stored in the computer-readable storage medium, and the program codes in the load balancing software package are used to implement the above data processing method.
  • the processor of the computing node enables the computing node to execute the above data processing method by reading and running the program code in the load balancing software package.
  • the program code in the load balancing software package is generated based on a code generator library (code generator library, CGLIB).
  • CGLIB code generator library
  • the program code in the load balancing software package is assembled from the bytecode in the CGLIB.
  • CGLIB is a powerful and high-performance code generation library, and the load package generated based on the CGLIB has a relatively small amount of code.
  • the estimated code volume of the load balancing software package may reach 3000 lines, but the code volume of the actually generated load balancing software package is less than 300 lines.
  • the CGLIB is widely used in the aspect oriented programming (AOP) framework, and can be used to provide method interception operations, and the computing node can load the load balancing software package generated based on the CGLIB to the computing node in the form of a patch.
  • AOP aspect oriented programming
  • the processor of the computing node can execute the load balancing software package loaded in the source code during the process of running the source code of the computing cluster, so that the computing node can execute the above data processing method. Therefore, on the basis of realizing load balancing, it is possible to avoid modifying the source code of the computing cluster, and to eliminate users' concerns about the reliability and security of the computing cluster with the new load balancing function.
  • FIG. 5 shows a data processing device provided by the present application.
  • the device 500 may be a part of the computing node in the foregoing embodiments or in FIGS. 2-5 , and is used to execute the method executed by the computing node.
  • the device 500 is applied to a storage system that separates storage from computing.
  • the storage system includes a computing cluster and a storage cluster.
  • the device 500 is configured as a computing node in the computing cluster.
  • the device 500 includes:
  • the receiving module 501 is configured to receive a data processing request, and the data processing request indicates to process the file;
  • a determining module 502 configured to determine a first gateway device from multiple gateway devices in the storage cluster
  • a sending module 503, configured to send the data processing request to the first gateway device, and the first gateway device forwards the data processing request to a storage node in the storage cluster to process the file .
  • the device 500 in the embodiment of the present invention may be implemented by a central processing unit (central processing unit, CPU), or by an application-specific integrated circuit (ASIC), or a programmable logic device (programmable logic device, PLD) implementation
  • the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL ) or any combination thereof.
  • the device 500 and its modules can also be software modules.
  • the computing node records the correspondence between the plurality of gateway devices and indexes, and each of the plurality of gateway devices corresponds to an index; the determining Module 502 is used to:
  • the gateway device corresponding to the target index among the plurality of gateway devices is determined as the first gateway device.
  • the correspondence between the plurality of gateway devices and the index includes an identifier of each gateway device in the plurality of gateway devices and an index corresponding to each gateway device, and each gateway device The identification includes the network protocol IP address of each gateway device.
  • the target index is a remainder between the hash value and the number of the plurality of gateway devices.
  • the determining module 502 is configured to:
  • the device 500 further includes:
  • An acquisition module configured to acquire the status of each gateway device in the storage cluster from a monitoring node in the storage cluster, where the status includes any one of an idle state, a busy state, or a fault state;
  • the determining module 502 is further configured to determine the plurality of gateway devices in an idle state based on the states of each gateway device in the storage cluster.
  • the determining module 502 is further configured to: if the first gateway device is in a busy state or in a failure state, based on the recorded states of the multiple gateway devices, from the multiple Among the gateway devices, a second gateway device is determined, and the second gateway device is any gateway device in an idle state among the plurality of gateway devices;
  • the sending module 503 is further configured to send the data processing request to the second gateway device, and the second gateway device forwards the data processing request to a storage node in the storage cluster to complete the processing of the file to be processed.
  • the device 500 corresponds to the computing node in the above method embodiment, and each module in the device 500 and the other operations and/or functions mentioned above are to implement various steps and methods implemented by the computing node in the method embodiment, specifically For details, reference may be made to the foregoing method embodiments, and for the sake of brevity, details are not repeated here.
  • the device 500 selects a gateway device, it only uses the division of the above-mentioned functional modules for illustration. Different functional modules to complete all or part of the functions described above.
  • the device 500 provided in the above embodiment and the above method embodiment belong to the same idea, and its specific implementation process is detailed in the above method embodiment, and will not be repeated here.
  • apparatus 500 may be equivalent to the computing node 1011 in the system 100 , or equivalent to an execution component in the computing node 1011 .
  • Fig. 6 shows a data processing apparatus provided by the present application, and the apparatus 600 may be a part of the gateway device in the foregoing embodiments or Figs. 2-5, and is used to execute the method executed by the gateway device.
  • the apparatus 600 is applied to a storage system that separates storage from computing.
  • the storage system includes a computing cluster and a storage cluster.
  • the apparatus 600 is configured to be executed by a gateway device in the storage cluster.
  • the apparatus 600 includes:
  • the receiving module 601 is configured to receive a data processing request from a computing node in the computing cluster, the data processing request indicates to process the file, and the gateway device is sent by the computing node from multiple gateway devices in the storage cluster determined in
  • the sending module 602 is configured to send the data processing request to a storage node in the storage cluster, and the storage node processes the file based on the data processing request.
  • the device 600 in the embodiment of the present invention can be implemented by a central processing unit (CPU), or by an application-specific integrated circuit (ASIC), or by a programmable logic device (PLD).
  • PLD programmable logic device
  • the above-mentioned PLD can be complex Program Logic Device (CPLD), Field Programmable Gate Array (FPGA), General Array Logic (GAL), or any combination thereof.
  • CPLD Program Logic Device
  • FPGA Field Programmable Gate Array
  • GAL General Array Logic
  • the sending module 602 is also configured to:
  • the storage cluster includes M resource pools, each resource pool includes a master gateway software and N backup gateway software, the gateway device includes master gateway software in a resource pool, and the The main gateway software in the gateway device receives and sends the data processing request, and the M and the N are integers greater than or equal to 1; the device 600 also includes:
  • the gateway monitoring module 603 is used to enable a backup gateway software in the resource pool where the main gateway software is located by the gateway monitoring module in the gateway device if the main gateway software in the gateway device is in a fault state or busy state, and The enabled backup gateway software receives and sends the data processing request.
  • the gateway monitoring module 603 is configured to:
  • the gateway device also includes K standby gateway software in the resource pool where the main gateway software is located, any standby gateway software in the K standby gateway software is enabled, and the K is greater than or equal to 1 and less than or equal to all an integer of N;
  • a backup gateway software in the resource pool where the main gateway software is located is deployed in a backup gateway device, send an address update request to the backup gateway device, and for the address update, please instruct the backup gateway device to modify the IP address to the the IP address of the gateway device, and enable the backup software in the backup gateway device.
  • the device 600 corresponds to the gateway device in the above method embodiment, and the modules in the device 600 and the other operations and/or functions described above are to implement various steps and methods implemented by the gateway device in the method embodiment, specifically For details, reference may be made to the foregoing method embodiments, and for the sake of brevity, details are not repeated here.
  • the device 600 forwards the data processing request, it only uses the division of the above-mentioned functional modules for illustration. into different functional modules to complete all or part of the functions described above.
  • the device 600 provided in the above embodiment is based on the same idea as the above method embodiment, and its specific implementation process is detailed in the above method embodiment, and will not be repeated here.
  • the apparatus 600 may be equivalent to the gateway device 1022 in the system 100 , or equivalent to an execution component in the gateway device 1022 .
  • FIG. 7 is a schematic structural diagram of a computer device provided in the present application.
  • the computer device 700 may be any device involved in the content described in the parts of FIGS. 1-5 , such as a computing node, a gateway device, and the like.
  • the computer device 700 includes at least one processor 701 , a communication bus 702 , a memory 703 and at least one communication interface 704 .
  • the processor 701 may be a general-purpose central processing unit (central processing unit, CPU), a network processor (Network Processor, NP), a microprocessor, or may be one or more integrated circuits for implementing the scheme of the present application, such as , application-specific integrated circuit (ASIC), programmable logic device (programmable logic device, PLD) or a combination thereof.
  • the aforementioned PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
  • the communication bus 702 is used to transfer information between the aforementioned components.
  • the communication bus 702 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the memory 703 can be a read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, and can also be a random access memory (random access memory, RAM) that can store information and instructions
  • Other types of dynamic storage devices can also be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), read-only disc (compact disc read-only memory, CD-ROM) or other optical disc storage , optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and any other medium that can be accessed by a computer, but is not limited to.
  • the memory 703 may exist independently, and is connected to the processor 701 through the communication bus 702 .
  • the memory 703 can also be integrated with the processor 701.
  • the Communication interface 704 uses any transceiver-like device for communicating with other devices or a communication network.
  • the communication interface 704 includes a wired communication interface, and may also include a wireless communication interface.
  • the wired communication interface may be an Ethernet interface, for example.
  • the Ethernet interface can be an optical interface, an electrical interface or a combination thereof.
  • the wireless communication interface may be a wireless local area network (wireless local area networks, WLAN) interface, a cellular network communication interface or a combination thereof.
  • the processor 701 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 7 .
  • a computer device may include multiple processors, such as a processor 701 and a processor 705 as shown in FIG. 7 .
  • processors can be a single-core processor (single-CPU) or a multi-core processor (multi-CPU).
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data such as computer program instructions.
  • the computer device may further include an output device 706 and an input device 707 .
  • Output device 706 is in communication with processor 701 and can display information in a variety of ways.
  • the output device 706 may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a cathode ray tube (cathode ray tube, CRT) display device, or a projector (projector), etc.
  • the input device 707 communicates with the processor 701 and can receive user input in various ways.
  • the input device 707 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
  • the memory 703 is used to store the program code 710 for implementing the solution of the present application, and the processor 701 can execute the program code 710 stored in the memory 703 . That is, the computer device 700 can implement the methods provided in the embodiments of FIGS. 2-5 above through the processor 701 and the program code 710 in the memory 703 .
  • a computer-readable storage medium such as a memory including program codes, which can be executed by a processor in a computer device to implement the data processing method in the above-mentioned embodiments.
  • the computer-readable storage medium is a non-transitory computer-readable storage medium, such as read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), read-only optical disc (compact disc read-only memory, CD-ROM), tapes, floppy disks and optical data storage devices, etc.
  • the embodiment of the present application also provides a computer program product, the computer program product includes at least one piece of program code, the program code is stored in a computer-readable storage medium, and the processor of the computer device reads the computer program code from the computer-readable storage medium. instruction, the processor executes the program code, so that the computer device executes the above data processing method.
  • an embodiment of the present application also provides a device, which may specifically be a chip, a component or a module, and the device may include a connected processor and a memory; wherein the memory is used to store computer-executable instructions, and when the device is running, The processor can execute the computer-executable instructions stored in the memory, so that the chip executes the data processing methods in the above method embodiments.
  • the device, equipment, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above, therefore, the beneficial effects it can achieve can refer to the above-mentioned provided The beneficial effects of the corresponding method will not be repeated here.
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present invention will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server, or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种数据处理方法、装置、计算机设备及计算机可读存储介质,属于存储技术领域。本方法由计算集群中的计算节点实现负载均衡器的功能,使得计算节点可以从存储集群的多个网关设备中选择网关设备发送数据处理请求,数据处理的过程中计算节点无需通过负载均衡集群为数据处理请求选择网关设备,从而在构建存算分离的存储系统时,无须部署负载均衡集群,降低了构建存算分离的存储系统的成本。

Description

数据处理方法、装置、计算机设备及计算机可读存储介质
本申请要求于2021年08月31日提交的申请号为202111010047.2、发明名称为“数据处理方法、装置、计算机设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,特别涉及一种数据处理方法、装置、计算机设备及计算机可读存储介质。
背景技术
随着云计算技术的发展与应用,为了满足存储与计算的弹性扩展和高效存储,在存算一体架构的存储系统的基础上演化出存算分离架构的存储系统。例如存算分离的存储系统包括计算集群、负载均衡集群以及存储集群,计算集群通过数据转发集群访问存储集群。
目前,在存算分离的存储系统中数据处理的过程可以是:计算集群中的计算节点将数据处理请求发送至数据转发集群中的负载均衡设备,负载均衡设备采用负载均衡方案,从存储集群中的多个网关设备中选择一个网关设备,并向选择出的网关设备转发数据处理请求,由该网关设备将发数据处理请求发送至存储集群中的存储节点,存储节点对该数据处理请求进行处理,例如读取数据或者存储数据。其中,数据转发集群中的负载均衡设备可以是负载均衡器,例如,F5网络公司提供的F5负载均衡器,或者负载均衡器Nginx,也可以是部署有Linux虚拟服务器(Linux virtual server,LVS)。
但是,在存算分离的存储系统比较大的情况下,为了实现负载均衡,负载均衡集群中部署的负载均衡设备可能不止一台,且负载均衡设备造价昂贵,导致构建存算分离的存储系统的成本不断增加。因此,如何提供一种低成本的存储系统成为亟待解决的技术问题。
发明内容
本申请提供了一种数据处理方法、装置、计算机设备以及存储介质,能够构建存算分离的存储系统的成本。该技术方案如下:
第一方面,提供了一种数据处理方法,所述方法应用于存算分离的存储系统,所述存储系统包括计算集群以及存储集群,所述方法由所述计算集群中的计算节点执行,所述方法包括:接收数据处理请求,所述数据处理请求指示对文件进行处理;从所述存储集群的多个网关设备中确定第一网关设备;向所述第一网关设备发送所述数据处理请求,由所述第一网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理。
本方法由计算集群中的计算节点实现负载均衡器的功能,使得计算节点可以从存储集群的多个网关设备中选择网关设备发送数据处理请求,数据处理的过程中计算节点无需通过负载均衡集群为数据处理请求选择网关设备,从而在构建存算分离的存储系统时,无须部署负载均衡集群,降低了构建存算分离的存储系统的成本。
在一种可能的实现方式中,所述计算节点中记录有所述多个网关设备与索引之间的对应关系,所述多个网关设备中的每个网关设备分别对应一个索引;所述从所述存储集群的多个网关设备中,确定第一网关设备包括:对所述数据处理请求携带的所述文件的标识进行哈希计算,得到所述文件的哈希值;基于所述哈希值,获取目标索引;基于所述多个网关设备与索引之间的对应关系,将所述多个网关设备中所述目标索引对应的网关设备确定为所述第一网关设备。
在一种可能的实现方式中,所述多个网关设备与索引之间的对应关系包括所述多个网关设备中每个网关设备的标识以及每个网关设备所对应的索引,每个网关设备的标识包括每个网关设备的网络协议IP地址。
在一种可能的实现方式中,所述目标索引为所述哈希值与所述多个网关设备的数目之间的余数。
在一种可能的实现方式中,所述从所述存储集群的多个网关设备中确定第一网关设备包括:
在所述多个网关设备中随机选择任一网关设备作为所述第一网关设备。
在一种可能的实现方式中,所述从所述存储集群的多个网关设备中,确定第一网关设备之前,所述方法还包括:
从所述存储集群中的监控节点,获取所述存储集群中各个网关设备所处的状态,其中,所述状态包括空闲状态、繁忙状态或故障状态中任意一种;
基于所述存储集群中各个网关设备所处的状态,确定处于空闲状态的所述多个网关设备。
在一种可能的实现方式中,所述向所述第一网关设备发送所述数据处理请求,由所述第一网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理,包括:
若所述第一网关设备处于繁忙状态或故障状态,基于记录的所述多个网关设备所处的状态,从所述多个网关设备中,确定第二网关设备,向所述第二网关设备发送所述数据处理请求,由所述第二网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理,所述第二网关设备为所述多个网关设备中处于空闲状态的任一网关设备。
第二方面,提供了一种数据处理方法,其特征在于,所述方法应用于存算分离的存储系统,所述存储系统包括计算集群以及存储集群,所述方法由所述存储集群中的网关设备执行,所述方法包括:接收所述计算集群中计算节点的数据处理请求,所述数据处理请求指示对文件进行处理,所述网关设备由所述计算节点从所述存储集群中的多个网关设备中确定;向所述存储集群中的存储节点发送所述数据处理请求,由所述存储节点基于所述数据处理请求对所述文件中进行处理。
在一种可能的实现方式中,所述接收所述计算集群中计算节点的数据处理请求之前,所述方法还包括:向所述存储集群中的监控节点发送所述网关设备所处的状态,所述状态包括空闲状态、繁忙状态或故障状态中任意一种。
在一种可能的实现方式中,所述存储集群包括M个资源池,每个资源池包括一个主网关软件和N个备用网关软件,所述网关设备包括一个资源池内的主网关软件,由所述网关设备中的主网关软件接收并发送所述数据处理请求,所述M和所述N均为大于或等于1的整数;
所述接收所述计算集群中计算节点的数据处理请求之前,所述方法还包括:若所述网关 设备中的主网关软件处于故障状态或繁忙状态,所述网关设备中的网关监控模块启用所述主网关软件所在资源池中的一个备用网关软件,由启用的所述备用网关软件接收并发送所述数据处理请求。
在一种可能的实现方式中,所述网关设备中的网关监控模块启用所述主网关软件所在资源池中的一个备用网关软件包括:若所述网关设备还包括所述主网关软件所在资源池中的K个备用网关软件,所述网关监控模块启用所述K个备用网关软件中的任一备用网关软件,所述K为大于等于1且小于等于所述N的整数;或者,若所述主网关软件所在资源池中的一个备用网关软件部署在备用网关设备,所述网关监控模块向所述备用网关设备发送地址更新请求,所述地址更新请指示所述备用网关设备将IP地址修改为所述网关设备的IP地址,并启用所述备用网关设备中的备用软件。
第三方面,提供了一种存算分离的存储系统,所述存储系统包括计算集群以及存储集群,所述计算集群包括计算节点;
所述计算节点,用于接收数据处理请求;从所述存储集群的多个网关设备中确定第一网关设备;向所述第一网关设备发送所述数据处理请求,其中,所述数据处理请求指示对文件进行处理;
所述第一网关设备,用于接收所述计算集群中计算节点的所述数据处理请求;向所述存储集群中的存储节点发送所述数据处理请求,由所述存储节点基于所述数据处理请求对所述文件中进行处理。
在一种可能的实现方式中,所述计算节点还用于:
对所述数据处理请求携带的所述文件的标识进行哈希计算,得到所述文件的哈希值;
基于所述哈希值,获取目标索引;
基于所述多个网关设备与索引之间的对应关系,将所述多个网关设备中所述目标索引对应的网关设备确定为所述第一网关设备。
在一种可能的实现方式中,所述多个网关设备与索引之间的对应关系包括所述多个网关设备中每个网关设备的标识以及每个网关设备所对应的索引,每个网关设备的标识包括每个网关设备的网络协议IP地址。
在一种可能的实现方式中,所述目标索引为所述哈希值与所述多个网关设备的数目之间的余数。
在一种可能的实现方式中,所述计算节点还用于:在所述多个网关设备中随机选择任一网关设备作为所述第一网关设备。
在一种可能的实现方式中,所述计算节点还用于:从所述存储集群中的监控节点,获取所述存储集群中各个网关设备所处的状态,其中,所述状态包括空闲状态、繁忙状态或故障状态中任意一种;基于所述存储集群中各个网关设备所处的状态,确定处于空闲状态的所述多个网关设备。
在一种可能的实现方式中,所述计算节点,还用于若所述第一网关设备处于繁忙状态或故障状态,基于记录的所述多个网关设备所处的状态,从所述多个网关设备中,确定第二网关设备,向所述第二网关设备发送所述数据处理请求,所述第二网关设备为所述多个网关设备中处于空闲状态的任一网关设备;所述第二网关设备,用于接收所述计算集群中计算节点的所述数据处理请求;向所述存储集群中的存储节点发送所述数据处理请求,由所述存储节 点基于所述数据处理请求对所述文件中进行处理。
在一种可能的实现方式中,所述第一网关设备还用于:向所述存储集群中的监控节点发送所述第一网关设备所处的状态,所述状态包括空闲状态、繁忙状态或故障状态中任意一种。
在一种可能的实现方式中,所述存储集群包括M个资源池,每个资源池包括一个主网关软件和N个备用网关软件,所述网关设备包括一个资源池内的主网关软件,由所述第一网关设备中的主网关软件接收并发送所述数据处理请求,所述M和所述N均为大于或等于1的整数;所述第一网关设备中的网关监控模块,用于若所述第一网关设备中的主网关软件处于故障状态或繁忙状态,启用所述主网关软件所在资源池中的一个备用网关软件,由启用的所述备用网关软件接收并发送所述数据处理请求。
在一种可能的实现方式中,所述网关监控模块用于:若所述第一网关设备还包括所述主网关软件所在资源池中的K个备用网关软件,启用所述K个备用网关软件中的任一备用网关软件,所述K为大于等于1且小于等于所述N的整数;或者,若所述主网关软件所在资源池中的一个备用网关软件部署在备用网关设备,监控模块向所述备用网关设备发送地址更新请求,所述地址更新请指示所述备用网关设备将IP地址修改为所述网关设备的IP地址,并启用所述备用网关设备中的备用软件。
第四方面,提供了一种数据处理装置,用于执行上述数据处理方法。具体地,该数据处理装置包括用于执行上述第一方面或上述第一方面的任一种可选方式提供的数据处理方法的功能模块。
第五方面,提供了一种数据处理装置,用于执行上述数据处理方法。具体地,该数据处理装置包括用于执行上述第二方面或上述第二方面的任一种可选方式提供的数据处理方法的功能模块。
第六方面,提供一种计算机设备,该计算机设备包括处理器,所述处理器用于执行程序代码,使得计算机设备执行以实现如上述数据处理方法所执行的操作。具体地,使得计算机设备执行以实现如第一方面或上述第一方面的任一种可选方式提供的数据处理方法所执行的操作。
第七方面,提供一种计算机设备,该计算机设备包括处理器,所述处理器用于执行程序代码,使得计算机设备执行以实现如上述数据处理方法所执行的操作。具体地,使得计算机设备执行以实现如第二方面或上述第二方面的任一种可选方式提供的数据处理方法所执行的操作。
第八方面,提供一种计算机可读存储介质,该存储介质中存储有至少一条程序代码,该程序代码由处理器读取以使计算机设备执行如上述数据处理方法所执行的操作。具体地,使得计算机设备执行以实现如第一方面或上述第一方面的任一种可选方式提供的数据处理方法所执行的操作,或者,使得计算机设备执行以实现如第二方面或上述第二方面的任一种可选方式提供的数据处理方法所执行的操作。
第九方面,提供了一种计算机程序产品,该计算机程序产品包括至少一条程序代码,该程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该程序代码,处理器执行该程序代码,使得该计算机设备执行上述第一方面或者第一方面的各种可选实现方式中提供的方法,或者,使得该计算机设备执行上述第二方面或者第二方面的各种可选实现方式中提供的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1是本申请提供的一种存算分离的存储系统的示意图;
图2是本申请提供的一种数据处理方法的流程图;
图3是本申请提供的一种的数据处理方法的流程图;
图4是本申请提供的一种主备网关设备切换方法的流程图;
图5是本申请提供的一种数据处理装置的结构示意图;
图6是本申请提供的一种数据处理装置的结构示意图;
图7是本申请提供的一种计算机设备的结构示意图。
具体实施方式
下面将结合附图对本申请实施方式作进一步地详细描述。
图1是本申请提供的一种存算分离的存储系统的示意图,参见图1,该系统100包括至少一个计算集群101以及至少一个存储集群102,其中,每个计算集群101可以对应一个或多个存储集群102,一个存储集群102也可以对应一个或多个计算集群101。为了便于理解,以下实施例以一个计算集群101能够访问一个存储集群102,而一个存储集群102可以被一个或多个计算集群101访问为例进行说明。
每个计算集群101包括多个计算节点1011,每个计算节点1011用于为用户提供数据处理服务,例如将用户提供的数据存储在存储集群102或者为用户从存储集群102中读取数据。每个计算节点1011可以由至少一个服务器组成。每个计算集群101可以用于部署Hadoop数据库、数据库工具Hive、计算引擎Spark或分布式数据库Hbase。
每个存储集群102包括多个存储节点1021以及多个网关设备1022,每个存储节点1021用于提供数据存储服务以及数据读取服务等,每个存储节点1021可以由至少一个服务器组成。每个网关设备1022具有访问存储集群102中各个存储节点1021的权限。例如,每个网关设备1022内部署有至少一个网关软件,每个网关软件提供用于访问存储集群102的应用程序接口(application programming interface,API),以便计算集群101能够通过网关设备1022中的网关软件访问存储集群102中的存储节点1021。该系统100中的存储集群102可以是Ceph存储集群或者FusionStorage分布式存储集群中任一存储集群,若存储集群102为Ceph存储集群,网关设备1022内部署的网关软件为对象存储网关(Rados gateway,RGW),存储节点1021可以是对象存储设备(object-based storage device,OSD)。
以该系统100中的一个计算集群101与一个存储集群102之间的交互为例,用户可以通过用户设备向计算集群101中的任一计算节点1011提交数据处理请求(如用于读数据的读请求或者用于写数据的写请求)。该任一计算节点1011可以从存储集群102中部署的网关设备1022内选择一个网关设备1022,转发用户提交的数据处理请求。例如该任一计算节点1011将用户提交的数据处理请求发送给所选择的网关设备1022,由该网关设备1022将该数据处理请求转发至存储集群102中的至少一个存储节点1021,由该至少一个存储节点1021处理该数据处理请求。由于计算集群101中的计算节点1011可以选择存储集群102中的网关设备 1022转发用户提交的数据处理请求,则同一个计算节点1011的不同数据处理请求可以发送至存储集群102的不同网关设备1022,从而在处理多并发的数据处理请求时,能够保证存储集群102的负载均衡,因此,无须在存储集群102与计算集群101之间部署负载均衡集群,降低了存算分离的存储系统100的构建成本。
在一种可能的实现方式中,每个存储集群102还包括监控节点1023,监控节点1023用于收集存储集群102中的各个网关设备1022所处的状态,一个网关设备1022所处的状态包括空闲状态、繁忙状或故障状态,其中,空闲状态和繁忙状态为网关设备1022的非故障状态,网关设备1022在非故障状态下可以正常工作。计算集群101中的计算节点1011可以从该监控节点1023获取该存储集群102中各个网关设备1022所处的状态,计算节点1011在为数据处理请求选择网关设备1022时,可以优先选择处于空闲状态的网关设备1022。其中,选择处于空闲状态的网关设备1022可以保证网关设备1022上的负载均衡,使得数据处理请求无须在处于空闲状态的网关设备1022上排队等待转发。
为了进一步说明计算节点为数据处理请求选择网关设备的过程,参见图2所示的本申请提供的一种数据处理方法的流程图。该方法应用于存算分离的存储系统,其中,该存算分离的存储系统可以是上述图1所示的存储系统。
步骤201、计算节点接收数据处理请求,该数据处理请求指示对文件进行处理。
其中,该计算节点为存算分离的存储系统中任一计算集群内的任一计算节点。该任一计算集群中的任一计算节点均具有选择网关设备的功能以及通过选择的网关设备访问存储集群的功能。例如该任一节点设备均能够从存储集群的多个网关设备中选择一个网关设备发送数据理请求,再由选择的网关设备将该数据处理请求转发至存储集群中的存储节点完成对该文件进行处理。为了便于理解,以下实施例以任一计算集群中的一个计算节点选择网关设备访问一个存储集群为例进行说明。
该文件为用户指定的待处理文件。该数据处理请求可以是读请求或者写请求,若该数据处理请求为读请求,则该数据处理请求指示读取该文件。若该数据处理请求为写请求,则该数据处理请求指示将该文件写入该存储系统中的存储集群中,也即是存储该文件。
在一种可能的实现方式中,该数据处理请求携带该文件的标识,其中,该文件的标识用于唯一指示该文件,可以是该文件的名称。若该数据处理请求为写请求,该数据处理请求还携带该文件。
在一种可能的实现方式中,用户通过用户设备向该计算节点提交任务,或者用户可以在该计算节点上提交任务,该计算节点将用户提交的任务分解为多个数据处理请求。例如,该计算节点接收用户提交的任务,并对该任务进行处理。该计算节点在对该任务处理的过程中,需要对该任务涉及的多个文件进行处理,则该计算节点分别基于多个文件中的每个文件,生成一个数据处理请求,每个数据处理请求指示对多个文件中的一个文件进行处理。之后,该计算节点将多个数据处理请求中的任一数据处理请求作为待处理的该数据处理请求,该多数据处理请求中的每个数据处理请求均通过该图2所示的过程来处理。
其中,该用户设备包括终端、终端站点、用户终端、用户装置,接入装置,订户站,订户单元,移动站,用户代理,用户装备、便携式终端、膝上型终端、台式终端等其他名称。例如,用户设备可以是移动电话、笔记本电脑、平板电脑、台式电脑、智能电视、智能可穿戴设备、计算机、人工智能(artificial intelligence,AI)产品智能汽车、智能仪器或物联网(internet  of things,IoT)终端等。
步骤202、该计算节点从存储集群的多个网关设备中,确定第一网关设备。
其中,该第一网关设备为计算节点从该多个网关设备中,选择的用于转发该数据处理请求的网关设备。
该计算节点从该多个网关设备中确定第一网关设备的方式包括下述方式A或B中的任一种方式。
方式A、该计算节点通过哈希计算选择出第一网关设备。
其中,计算节点中记录有该多个网关设备与索引之间的对应关系,该多个网关设备中的每个网关设备分别对应一个索引。在一种可能的实现方式中,该多个网关设备与索引之间的对应关系包括该多个网关设备中每个网关设备的标识以及每个网关设备所对应的索引,每个网关设备的标识包括每个网关设备的网络协议(Internet protocol,IP)地址。在一种可能的实现方式中,该每个网关设备的标识还包括每个网关设备的通信端口的端口标识。
例如,该计算节点中存储有网关映射(map)表,该网关映射表中记录有该多个网关设备与索引之间的对应关系。以存储集群包括M个网关设备为例,参见下述表1所示的网关映射表,其中,M为大于或等于1的整数。
表1
Figure PCTCN2022086063-appb-000001
在一种可能的实现方式中,方式A可以通过下述步骤A1-A3实现。
步骤A1、该计算节点对该数据处理请求携带的该文件的标识进行哈希计算,得到该文件的哈希值。
步骤A2、该计算节点基于该哈希值,获取目标索引。
其中,该目标索引也即是第一网关设备对应的索引。在一种可能的实现方式中,该目标索引为该哈希值与该多个网关设备的数目之间的余数。
以该多个网关设备有M个为例,该计算节点对该哈希值与M求余,所得到的求余结果也即是该目标索引。求余结果也即是该哈希值与该多个网关设备的数目之间的余数。
步骤A3、该计算节点基于该多个网关设备与索引之间的对应关系,将该多个网关设备中该目标索引对应的网关设备确定为该第一网关设备。
当获取到该目标索引后,该计算节点查询网关映射表中该目标索引对应的网关设备的标识,并将该标识所指示的网关设备作为第一网关设备。例如目标索引为2,则该计算节点将目标索引2对应的网关设备2作为第一网关设备。
方式B、该计算节点基于随机选择规则选择出第一网关设备。
在一种可能的实现方式中,该计算节点在该多个网关设备中随机选择的任一网关设备作为该第一网关设备。例如该计算节点将该网关映射表中的任一索引对应的网关,确定为第一 网关设备。
可选地,除了通过上述方式A或方式B选择第一网关设备外,还可以通过根据各个网关设备的负载选择第一网关,也就是说,计算节点还可以实现负载均衡功能,根据各个网关的负载情况平衡业务处理的情况,充分发挥各个网关设备的作用,提升整个系统中设备的利用率。
步骤203、该计算节点向该第一网关设备发送该数据处理请求,由该第一网关设备将该数据处理请求转发至该存储集群中的存储节点完成对该文件进行处理。
步骤204、该第一网关设备接收该数据处理请求。
步骤205、该第一网关设备向该存储集群中的存储节点发送该数据处理请求。
在一种可能的实现方式中,当接收该数据处理请求后,该第一网关设备基于该数据处理请求,从该存储集群的多个存储节点中,确定至少一个存储节点,并向该至少一个存储节点发送该数据处理请求。其中,该至少一个存储节点为用于处理该数据处理请求的存储节点。
例如,若该数据处理请求为读请求,该第一网关设备基于数据处理请求中该文件的标识,从记录的多个文件的元数据中,查询该文件的元数据,其中,每个文件的元数据包括每个文件的存储地址。当查询到该文件的元数据后,该第一网关设备将该多个存储节点中该文件的存储地址所属的存储节点,确定为该至少一个存储节点。在一种可能的实现方式中,一个文件的存储地址包括该文件对应的至少一个对象(object)的存储地址,该至少一个对象组成该文件,其中,一个对象为存储集群中数据存储的最小单位。在一种可能的实现方式中,每个文件的原数据还包括每个文件对应的至少一个对象的标识,其中,每个对象的标识指示一个对象,并指示该对象在文件中的位置。
再例如,若该数据处理请求为写请求,该第一网关设备从该数据处理请求中获取该文件,并基于该文件的大小,从该存储集群的多个存储节点中确定该至少一个存储节点,其中,该至少一个存储节点中每个存储节点的剩余存储空间的大小大于或等于文件的大小。
该第一网关设备还可以将该文件拆分为至少一个对象,生成该至少一个对象的标识,在该至少一个对象中分别添加对象的标识。该第一网关设备将该数据处理请求中的文件修改为添加过标识的该至少一个对象,并向该至少一个存储节点发送修改后的数据处理请求。
之后,该第一网关设备还可以创建该文件的元数据,并在该文件的元数据中添加该至少一个对象的标识以及该文件的标识。
步骤206、该存储节点接收该数据处理请求。
步骤207、该存储节点基于该数据处理请求,对该文件进行处理。
在一种可能的实现方式中,若该数据处理请求为写请求,该存储节点存储该数据处理请求携带的至少一个对象,并将该至少一个对象的标识、该至少一个对象的存储地址以及该数据处理请求中该文件的标识进行关联存储,以建立该文件与该至少一个对象之间的对应关系。
在一种可能的实现方式中,若该数据处理请求为读请求,该存储节点根据文件与对象之间的对应关系,查询该文件所对应的至少一个对象的存储地址。该存储节点在该至少一个对象的存储地址,获取该至少一个对象。
当该存储节点处理完该数据处理请求后,生成数据处理响应,并向该第一网关设备发送该数据处理响应,该数据处理响应指示该文件已经处理完成。若该数据处理请求为读请求,则该数据处理响应携带从存储节点中读取到的该文件的至少一个对象。若该数据处理请求为 写请求,则该数据处理响应携带文件的至少一个对象的存储的地址。
由上述描述可知,由计算集群中的计算节点实现负载均衡器的功能,使得计算节点可以从存储集群的多个网关设备中选择网关设备发送数据处理请求,数据处理的过程中计算节点无需通过负载均衡集群为数据处理请求选择网关设备,从而在构建存算分离的存储系统时,无须部署负载均衡集群,降低了构建存算分离的存储系统的成本。
图2所示的过程是计算节点从记录的存储集群的多个网关设备中选择一个网关设备,访问存储集群的过程。作为另一种可能的实现方式,为了提高数据处理请求的处理效率,该计算节点在为新的数据处理请求选择网关设备时,还可以考虑网关设备的状态。例如下述图3所示的本申请提供的一种的数据处理方法的流程图,其中,图3所示的过程为计算节点基于记录的存储集群的多个网关设备的状态,从该多个网关设备中选择一个网关设备,访问存储集群中的过程。该方法应用于存算分离的存储系统,其中,该存算分离的存储系统可以是上述图1所示的存储系统。
步骤301、对于存储集群中的每个网关设备,每个网关设备内的网关监控模块检测对应网关软件所处的状态。
其中,每个网关设备包括网关监控模块,每个网关设备内设置有网关软件,网关监控模块用于实时或者周期性检测所属网关设备的网关软件所处的状态,以及上报网关软件所处的状态。网关软件用于将存储集群以外的设备所发送的消息转发至存储集群中的存储节点,从而使得存储集群以外的设备能够访问该存储集群中的存储节点。例如,网关软件可以将接收到的数据处理请求转发至存储集群中的存储节点。再例如上述步骤205可以由第一网关设备内的网关软件来执行。该网关软件可以是RGW软件,或者是能够为存储集群以外的设备提供访问存储集群接口的其他网关软件。
网关设备中网关软件所处的状态包括空闲状态、繁忙状态或故障状态中任意一种。该空闲状态是指该网关设备中待转发的数据处理请求的个数小于或等于第一阈值,处于空闲状态的网关设备若接收到新的数据处理请求,可以在短时间(如预设时长)内转发新的数据处理请求。繁忙状态是指网关设备中待转发的数据处理请求的个数大于第一阈值,处于繁忙状态的网关设备若接收到新的数据处理请求,在短时间内不能转发新的数据处理请求,新的数据处理请求需要在网关设备内排队等待转发。其中,该第一阈值可以由相关技术人员根据具体的实施场景进行设置,在此,本申请实施例对第一阈值不做限定。
以一个网关设备为例,该网关设备内的网关监控模块检测所在网关设备的网关软件所处的状态。例如该网关监控模块检测所在网关设备内的网关软件是否发生故障,若该网关软件发生故障,则确定该网关软件处于故障状态(也即是该网关设备处于故障状态)。若该网关软件没有发生故障,则该网关监控模块检测该网关设备内待转发数据处理请求的个数,若待转发的数据处理请求的个数小于或等第一阈值,则说明该网关软件的负载少,该网关监控模块确定该网关软件处于空闲状态(也即是该网关设备处于空闲状态)。若待转发的数据处理请求的个数大于第一阈值,则说明该网关软件的负载高,该网关监控模块确定该网关软件处于繁忙状态(也即是该网关设备处于繁忙状态)。
步骤302、每个网关监控模块向该存储系统中的监控节点发送对应网关软件所处的状态。
其中,网关监控模块对应的网关软件是是指该网关监控模块所在网关设备内的网关软件。仍以一个网关设备为例,若该网关设备内的网关监控模块检测到该网关设备内的网关软件所 处的状态后,该网关监控模块向该监控节点发送状态通知消息,该状态通知消息包括该网关软件所处的状态的标识。网关软件所处的每个状态由每个状态的标识来表示,不同状态的标识可以有不同的表示方式,例如繁忙状态的标识为“00”,空闲状态的标识为“01”,故障状态的标识为“11”。
其中,该状态通知消息可以是心跳消息或者其他消息类型的消息。
步骤303、该监控节点获取并记录每个网关设备的网关软件所处的状态。
仍以一个网关设备为例,当该监控节点接收到一个网关设备的状态通知消息后,将状态通知消息中该网关软件所处的状态的标识与该网关设备的IP地址进行关联存储。在一种可能的实现方式中,该监控节点内存储有网关状态表,该网关状态表用于记录该存储集群中各个网关设备内的网关软件所处的状态,该监控节点每接收到一个网关设备的网关软件所处的状态后,将该网关设备的网关软件所处的状态记录在该网关状态表中。以存储集群包括M个网关设备为例,参见下述表2所示的网关状态表。
表2
网关设备的IP地址 网关设备所处的状态
网关设备1的IP地址 繁忙状态(00)
网关设备2的IP地址 空闲状态(01)
网关设备M的IP地址 故障状态(11)
在一种可能的实现方式中,网关软件所处的故障状态并不是监控节点从状态通知消息中获取的,而是存储集群中的各个网关设备内的网关监控模块周期性向监控节点发送状态通知消息。当某一网关设备内的网关软件处于故障状态时,该网关设备内的监控模块不向监控节点发送状态通知消息。相应地,若监控节点经过预设时长没有接收到来自该网关设备的状态通知消息,则说明该网关设备内的网关软件处于故障状态,则该监控节点将该网关设备的网关软件所处的状态记录为故障状态。其中,该预设时长为网关监控模块周期性发送状态通知消息的发送周期。对于这种情况,状态通知消息可以不携带故障状态的标识。
上述步骤302-303所示的过程为存储集群中的网关设备主动向监控节点上报各自的网关软件所处的状态的过程。而在另一种可能的实现方式中,监控节点在各个网关设备查询网关软件所处的状态,并记录查询到的网关软件所处的状态。例如,监控节点向存储集群中的各个网关设备发送查询请求,该查询请求用于查询网关设备内的网关软件所处的状态,每个网关设备内的网关监控模块接收到该查询请求后,向该监控节点发送对应网关软件所处的状态,以便监控节点记录该网关软件所处的状态。
步骤304、计算集群中的计算节点从该存储集群中的监控节点,获取该存储集群中各个网关设备的网关软件所处的状态,并将每个网关设备的网关软件所处的状态作为每个网关设备所处的状态。
该计算节点为该计算集群中的任一计算节点,该计算集群中的每个计算节点均可以执行本步骤304。
该计算节点向该监控节点发送网关状态获取请求,该网关状态获取请求指示获取该存储集群中各个网关设备所处的状态。
该计算节点接收到该网关状态获取请求后,向该计算节点发送网关状态获取响应,其中, 该网关状态获取响应携带该存储集群中各个网关设备所处状态的标识以及各个网关设备的IP地址。例如,该网关状态获取响应携带该监控节点内的网关状态表。
该计算节点从该监控节点接收该网关状态获取响应,并从该网关状态获取响应中,获取该存储集群中每个网关设备所处的状态的标识。
需要说明的是,本步骤304为一个计算节点获取存储集群中各个网关设备所处的状态的一种可能的实现方式。在另一种可能的实现方式中,该计算节点从该存储集群中的其他计算节点,获取该存储集群中各个网关设备的网关软件所处的状态。例如,计算集群中的第一计算节点从该监控节点获取该存储集群中各个网关设备的状态,并向该计算集群中的各个计算节点发送该存储集群中各个网关设备的状态。之后,该计算节点从该第一计算节点接收该存储集群中各个网关设备的状态,其中,该第一计算节点为该存储集群中除该计算节点以外的任一计算节点。
再例如,该存储集群不包括监控节点,该计算节点从该存储集群中的各个网关设备,获取各个网关设备的网关软件所处的状态。其中,该计算节点从该存储集群中的各个网关设备,获取各个网关设备的网关软件所处的状态的过程与监控节点获取各个网关设备的网关软件所处的状态的过程同理。在此,本申请实施例对该计算节点从该存储集群中的各个网关设备,获取各个网关设备的网关软件所处的状态的过程不做赘述。
再例如,该存储集群中的各个网关设备向计算集群中的第一计算节点发送各自的网关软件所处的状态,然后,该计算节点可以从该第一计算节点获取存储集群中的各个网关设备的网关软件所处的状态。
步骤305、该计算节点基于该存储集群中各个网关设备所处的状态,确定处于空闲状态的多个网关设备。
在一种可能的实现方式中,若该存储集群中的任一网关设备所处的状态为空闲状态,则该计算节点将该任一网关设备确定为该多个网关设备中的一个。
在另一个可能的实现方式中,该计算节点内记录有该存储集群中各个网关设备与索引之间的对应关系。若该存储集群中的任一网关设备所处的状态为繁忙状态或故障状态,则该计算节点删除记录的该任一网关设备与索引之间的对应关系,并将该任一网关设备之后的各个网关设备的索引作为该各个网关设备的前一个网关设备的索引。
以上述的表1为例,若网关设备1处于故障状态,且存储集群中除网关设备1以外的网关设备均处于空闲状态,则该计算节点删除表1中网关设备1与索引1之间的对应关系,并将网关设备2至M的标识索引1至M-1对应,从而将网关设备2至M的索引更新为网关设备1至M-1的索引,如下述表3所示。
表3
Figure PCTCN2022086063-appb-000002
当该计算节点将该处于故障状态或者繁忙状态的网关设备与索引之间的对应关系均删除后,也即是对于该计算节点而言,已经将存储集群中处于故障状态或者繁忙状态的网关设备剔除。该计算节点此时记录的多个网关设备均处于空闲状态。
需要说明的是,上述步骤301-305所示的过程可以实时进行,也可以周期性进行,本申请实施例对上述步骤301-305的执行周期的时长不做限定。
步骤306、计算节点接收数据处理请求,该数据处理请求指示对文件进行处理。
其中,本步骤306与上述步骤201同理,在此,本申请实施例对本步骤301不做赘述。
步骤307、该计算节点从该多个网关设备中,确定第一网关设备。
其中,本步骤307与上述步骤202同理,在此,本申请实施例对本步骤307不做赘述。
步骤308、该计算节点向该第一网关设备发送该数据处理请求。
步骤309、该第一网关设备接收该数据处理请求。
步骤310、该第一网关设备向该存储集群中的存储节点发送该数据处理请求。
其中,本步骤310与上述步骤205同理,在此,本申请实施例对本步骤310不做赘述。
在另一种可能的实现方式中,上述步骤305-310所示的过程,可以由下述步骤311-314所示的过程来替换。
步骤311、该计算节点记录存储系统中各个网关设备所处的状态与各个网关设备的索引的对应关系。
例如,该计算节点可以将各个网关设备所处的状态的标识存储在网关映射表中,以便各个网关设备的状态与各个网关设备的索引对应。以存储集群包括M个网关设备为例,在表1所示的网关映射表中记录各个网关设备所处的状态,得到下述表4。
表4
Figure PCTCN2022086063-appb-000003
步骤312、该计算节点从存储集群的多个网关设备中,确定第一网关设备。
其中,本步骤312与上述步骤202同理,在此,本申请实施例对本步骤312不做赘述。
步骤313、若该第一网关设备处于繁忙状态或故障状态,该计算节点基于记录的多个网关设备所处的状态,从该多个网关设备中,确定第二网关设备,该第二网关设备为该多个网关设备中处于空闲状态的任一网关设备。
若该第一网关设备处于繁忙状态或者故障状态,该计算节点查询网关映射表,将该网关映射表中记录的处于空闲状态的任一网关设备确定为第二网关设备。
以表4所示的网关映射表为例,若第一网关设备为网关设备1,从表4可知,网关设备1处于繁忙状态,为了避免该数据处理请求在该网关设备1中长时间排队等待转发,则该计算节点将该处于空闲状态的网关设备2,确定为该第二网关设备。
步骤314、该计算节点向该第二网关设备发送该数据处理请求,由该第二网关设备将该 数据处理请求转发至该存储集群中的存储节点完成对该文件进行处理。
步骤315、该第二网关设备接收该数据处理请求。
步骤316、该第二网关设备向该存储集群中的存储节点发送该数据处理请求。
其中,本步骤316与上述步骤205同理,在此,本申请实施例对本步骤316不做赘述。
步骤317、该存储节点接收该数据处理请求。
其中,该数据处理请求来自第一网关设备或者第二网关设备。
步骤318、该存储节点基于该数据处理请求,对该文件进行处理。
本步骤318与上述步骤207同理,在此,本申请实施例对本步骤318不做赘述。
由上述描述可知,由计算集群中计算节点实现负载均衡器的功能,使得计算节点可以从存储集群的多个网关设备中选择网关设备发送数据处理请求,数据处理的过程中计算节点无需通过负载均衡集群为数据处理请求选择网关设备,从而在构建存算分离的存储系统时,无须部署负载均衡集群,降低了构建存算分离的存储系统的成本。并且,由于计算节点在选择网关设备时,选择的是处于空闲状态的网关设备,而不是繁忙状态或者故障状态的网关设备,从而使得数据处理请求可以避免在繁忙状态的网关设备长时间排队等待转发也可以避免故障的网关设备不转发该数据处理请求,从而提高了数据处理请求的处理效率。
在图3所示的过程中,多个网关设备互为主备关系,若计算节点最先选择的网关设备处于故障状态,则该计算节点再次选择一个处于空闲状态的网关设备发送数据处理请求。
在另一种可能的实现方式中,一个存储集群对应一个或多个计算集群,该存储集群中每个网关设备包括一个或多个通信端口,每个通信端口用于与一个计算集群进行通信,也即是用于接收该计算集群发送的消息以及向该计算集群发送消息,从而使得每个网关设备能够通过一个或多个通信端口与该一个或多个计算集群通信。
该存储集群中部署有M个资源池,每个资源池包括N+1个网关软件,每个资源池对应一个IP地址以及至少一个端口标识,该至少一个端口标识中的每个端口标识对应一个计算集群,以指示该存储集群中网关设备内的一个端口标识所指示通信端口与对应的计算集群通信。其中,M和N均为大于1的整数。每个资源池中的N+1个网关软件部署在R个网关设备上,该R个网关设备内任一网关设备中部署有至少一个网关软件,其中,R为大于等于1且小于等于N+1整数。相关技术人员分别从每个资源池的R个网关设备中,随机选择任一网关设备作为每个资源池的一个主网关设备,并将每个资源池的R个网关设备中除主网关设备以外的各个网关设备作为备用网关设备。
为了便于区分同一资源池中的主网关设备和备用网关设备,相关技术人员将每个资源池的主网关设备的IP地址配置为每个资源对应的IP地址,将每个资源池中的备用网关设备的IP地址配置为除各个资源池对应的IP地址以外的任一IP地址。
为了便于同一资源池中主备网关软件的切换,对于该至少一个计算集群中的任一计算集群以及该M个资源池中的任一资源池,相关技术员将该任一资源池的各个网关设备中用于与该任一计算集群通信的通信端口的标识,均配置为该任一资源池对应的至少一个端口标识中与该任一计算集群对应的端口标识。并在该任一计算集群中的各个计算节点中将该任一资源池对应的IP地址、该任一资源池对应的至少一个端口标识中与该任一计算集群对应的端口标识记录为该任一资源池的主网关设备的标识。
在存储集群工作的过程中,由该M个资源池的主网关设备为计算集群提供访问存储集群 的服务。也可以理解为,存储集群中部署有M*R个网关设备,且由M*R个网关设备中的M个主网关设备为计算集群提供访问存储集群的服务,而M*R个网关设备中的各个备用网关设备暂不为计算集群提供访问存储集群的服务。
其中,每个主网关设备内部署有一个资源池中的至少一个网关软件,每个主网关设备从部署的至少一个网关软件中随机选择一个网关软件,来为计算集群提供访问存储集群的服务,也即是每个主网关设备随机开启一个网关软件,为计算集群提供访问存储集群的服务。为了便于描述,将每个主网关设备开启的网关软件作为每个网关设备所属资源池中的主网关软件,每个资源池中除主网关软件以外的各个网关软件均为备用网关软件。也可以理解为每个资源池包括一个主网关软件和N个备用网关软件,其中,每个资源池的主网关软件所在的网关设备为每个资源池的主网关设备。此时,主网关设备所处的状态也即是主网关设备内的主网关软件所处的状态。当一个主网关软件处于故障状态或者繁忙状态时,则主网关设备可以将当前使用的主网关软件切换为所属资源池中的备用网关软件,将所属资源池中的某一备用网关软件切换为主网关软件。为了进一步说明该过程,参加图4所示的本申请提供的一种主备网关设备切换方法的流程图。该方法应用于存算分离的存储系统,其中,该存算分离的存储系统可以是上述图1所示的存储系统。
步骤401、网关设备的网关监控模块检测该网关设备内的主网关软件所处的状态。
其中,该网关设备为存储集群的M个资源池中任一资源池的主网关设备,该网关设备包括该任一资源池内的主网关软件,当该主网关软件处于空闲状态时,由该网关设备内的主网关软件接收并发送数据处理请求。
网关监控模块检测该网关设备内的主网关软件所处的状态过程,也即是网关监控模块检测该网关设备内的网关软件所处的状态的过程,而网关监控模块检测网关设备内的网关软件所处的状态的过程在301中有相关描述,在此,本申请实施例对本步骤401不做赘述。
步骤402、若该主网关软件处于故障状态或繁忙状态,该网关监控模块启用该主网关软件所在资源池中的一个备用网关软件,由启用的该备用网关软件接收并发送数据处理请求。
其中,该网关监控模块启用备用网关软件,替代处于故障状态或者繁忙状态的主网关软件,使得启用的备用网关软件成为该任一资源池中新的主网关软件。而处于故障状态或者繁忙状态的主网关软件切换为该任一资源池中的备用网关软件。
在一种可能的实现方式中,该网关监控模块启用该主网关软件所在资源池中的一个备用网关软件包括输出方式1或2中的任一种方式。
方式1、若该网关设备还包括该主网关软件所在资源池中的K个备用网关软件,该网关监控模块启用该K个备用网关软件中的任一备用网关软件,该K为大于等于1且小于等于N的整数。
由于该主网关软件与该K个备用网关软件均部署在该网关设备,则该主网关软件与该K个备用网关软件共用该网关设备的IP地址(即该任一资源池对应的IP地址),共用该网关设备的同一通信端口与该计算集群通信,因此,该网关监控模块启用该K个备用网关软件中的任一备用网关软件,替代处于繁忙状态或故障状态的主网关软件,该网关设备仍然为该任一资源池的主网关设备。由于计算集群的各个计算节点中记录的该任一资源池的主网关设备的标识包括该任一资源池对应的IP地址以及端口标识,其记录的主网关设备的标识没有发生变化,因此,即使主网关设备中的网关软件发生了切换,计算节点是感知不到的。
方式2、若该主网关软件所在资源池中的一个备用网关软件部署在备用网关设备,该网关监控模块向该备用网关设备发送地址更新请求,该地址更新请指示该备用网关设备将IP地址修改为该网关设备的IP地址,并启用该备用网关设备中的备用软件。
其中,该地址更新请求携带该网关设备的IP地址。
当该网关监控模块在发送出该地址更新请求后,该网关监控模块还可以将该网关设备的IP地址修改为除M个资源池对应的IP地址以外的任一IP地址,使得该主网关软件切换为备用网关软件,该主网关设备也切换为备用网关设备。
当该备用网关设备接收到该地址更新请求后,从该地址更新请求中解析出该网关设备的IP地址,将该备用网关设备的IP地址更新为该网关设备的IP地址。当地址更新完成后,该备用网关设备内的备用网关软件的IP地址为主网关设备的IP地址,该备用网关设备启用该备用网关设备内的备用网关软件,此时,该备用网关设备内的备用网关软件成为该任一资源池中的主网关软件,该备用网关设备成为该任一资源池中最新的主网关设备。
由于计算集群的各个计算节点中记录的该任一资源池的主网关设备的标识包括该任一资源池对应的的IP地址以及通信端口的端口标识,而该任一资源池的各个网关设备与计算集群通信时的通信端口的标识均为该任一资源对应的通信端口的标识,且由于该备用网关设备将自己的IP地址修改为主网关设备的IP地址,替代处于繁忙状态或故障状态的主网关设备,计算节点是感知不到的,对于计算节点而言,其记录的主网关设备的标识没有发生变化。
对于这种情况,由于计算节点不感知资源池中主备网关软件的切换过程,存储集群中可以不部署监控节点。当然,为了避免出现资源池中所有的网关软件全部处于故障状态或者繁忙状态,而计算节点感知不到的情况,该存储集群中还是可以部署监控节点的。由于一般情况下,备用网关设备是不工作的,只有在当前主网关设备出现故障或者繁忙的情况下,备用网关设备才会切换至主网关设备开始工作,因此,监控节点收集资源池中主网软件所处的状态即可,而无须收集资源池中备用网关软件所处的状态,相应地,计算节点从监控节点获取到的是资源池中主网关软件所处的状态。
步骤403、网关设备中的备用网关软件接收该计算集群中计算节点的数据处理请求,该数据处理请求指示对文件进行处理,该网关设备由该计算节点从该存储集群中的多个网关设备中确定。
此时,该网关设备可以是该计算节点从该存储集群中的多个网关设备中确定的第一网关设备。本步骤403所示的过程也即是网关设备接收该计算集群中计算节点的数据处理请求的过程。
步骤404、该备用网关软件向该存储集群中的存储节点发送该数据处理请求。
其中,本步骤404所示的过程也即是网关设备向该存储集群中的存储节点发送该数据处理请求的过程。
步骤405、该存储节点接收该数据处理请求。
其中,本步骤405与上述步骤206同理,在此,本申请实施例对本步骤405不做赘述。
步骤406、该存储节点基于该数据处理请求,对该文件进行处理。
其中,本步骤406与上述步骤207同理,在此,本申请实施例对本步骤406不做赘述。
由上述描述可知,由计算集群中的计算节点实现负载均衡器的功能,使得计算节点可以从存储集群的多个网关设备中选择网关设备发送数据处理请求,数据处理的过程中计算节点 无需通过负载均衡集群为数据处理请求选择网关设备,从而在构建存算分离的存储系统时,无须部署负载均衡集群,降低了构建存算分离的存储系统的成本。并且,由于计算节点不会感知到存储集群中网关设备主备之间的切换过程,其记录的各个网关设备均是处于空闲状态的,因此,在选择网关设备时,选择的也是处于空闲状态的网关设备,而不是繁忙状态或者故障状态的网关设备,从而使得数据处理请求可以避免在繁忙状态的网关设备长时间排队等待转发,也可以避免故障的网关设备不转发该数据处理请求,从而提高了数据处理请求的处理效率。
需要说明的是,本申请所涉及的计算节点包括计算机可读存储介质,该计算机可读存储介质内存储有负载均衡软件包,该负载均衡软件包中的程序代码用于实现上述数据处理方法。例如,计算节点的处理器通过读取并运行负载均衡软件包中的程序代码,使得该计算节点执行上述数据处理方法。
在一种可能的实现方式中,该负载均衡软件包中的程序代码基于代码生成库(code generator library,CGLIB)生成。例如,该负载均衡软件包中的程序代码由该CGLIB内字节码拼接而成。其中,CGLIB是一个强大的、高性能的代码生成库,基于该CGLIB所生成的负载软件包的代码量比较少。例如,预计的负载软件包的代码量可以达到3000行,而实际生成的负载均衡软件包的代码量不到300行。
该CGLIB被广泛应用于面向切面编程(aspect oriented programming,AOP)框架中,能够用以提供方法拦截操作,则计算节点可以补丁的形式,将基于该CGLIB所生成的负载均衡软件包加载至该计算集群的源码中,该计算节点的处理器在运行该计算集群的源码的过程,可以执行该源码内加载的负载均衡软件包,使得该计算节点能够执行上述数据处理方法。从而在实现负载均衡的基础上,能够避免修改该计算集群的源码,能够消除用户对新增负载均衡功能的计算集群的可靠性、安全性的顾虑。
以上介绍了本申请实施例的方法,以下介绍本申请实施例的装置。应理解,以下介绍的装置具有上述方法中计算节点的任意功能。
图5是本申请提供了一种数据处理装置,所述装置500可以为前面各个实施例或图2-5中的计算节点的部分,用于执行计算节点所执行的方法。所述装置500应用于存算分离的存储系统,所述存储系统包括计算集群以及存储集群,所述装置500被配置为所述计算集群中的计算节点,所述装置500包括:
接收模块501,用于接收数据处理请求,所述数据处理请求指示对文件进行处理;
确定模块502,用于从所述存储集群的多个网关设备中确定第一网关设备;
发送模块503,用于向所述第一网关设备发送所述数据处理请求,由所述第一网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理。
应理解的是,本发明本申请实施例的装置500可以通过中央处理单元(central processing unit,CPU)实现,也可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图2至图4所示的数据处理方法时,装置500及其各个模块也可以为软件模块。
在一种可能的实现方式中,所述计算节点中记录有所述多个网关设备与索引之间的对应关系,所述多个网关设备中的每个网关设备分别对应一个索引;所述确定模块502用于:
对所述数据处理请求携带的所述文件的标识进行哈希计算,得到所述文件的哈希值;
基于所述哈希值,获取目标索引;
基于所述多个网关设备与索引之间的对应关系,将所述多个网关设备中所述目标索引对应的网关设备确定为所述第一网关设备。
在一种可能的实现方式中,所述多个网关设备与索引之间的对应关系包括所述多个网关设备中每个网关设备的标识以及每个网关设备所对应的索引,每个网关设备的标识包括每个网关设备的网络协议IP地址。
在一种可能的实现方式中,所述目标索引为所述哈希值与所述多个网关设备的数目之间的余数。
在一种可能的实现方式中,所述确定模块502用于:
在所述多个网关设备中随机选择任一网关设备作为所述第一网关设备。
在一种可能的实现方式中,所述装置500还包括:
获取模块,用于从所述存储集群中的监控节点,获取所述存储集群中各个网关设备所处的状态,其中,所述状态包括空闲状态、繁忙状态或故障状态中任意一种;
所述确定模块502,还用于基于所述存储集群中各个网关设备所处的状态,确定处于空闲状态的所述多个网关设备。
在一种可能的实现方式中,所述确定模块502,还用于若所述第一网关设备处于繁忙状态或故障状态,基于记录的所述多个网关设备所处的状态,从所述多个网关设备中,确定第二网关设备,所述第二网关设备为所述多个网关设备中处于空闲状态的任一网关设备;
所述发送模块503,还用于向所述第二网关设备发送所述数据处理请求,由所述第二网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理。
应理解,装置500对应于上述方法实施例中的计算节点,装置500中的各模块和上述其他操作和/或功能分别为了实现方法实施例中的计算节点所实施的各种步骤和方法,具体细节可参见上述方法实施例,为了简洁,在此不再赘述。
应理解,装置500在选择网关设备时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置500的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置500与上述方法实施例属于同一构思,其具体实现过程详见上述方法实施例,这里不再赘述。
应理解,装置500可以相当于系统100中的计算节点1011,或者相当于计算节点1011中的执行部件。
图6是本申请提供了一种数据处理装置,所述装置600可以为前面各个实施例或图2-5中的网关设备的部分,用于执行网关设备所执行的方法。所述装置600应用于存算分离的存储系统,所述存储系统包括计算集群以及存储集群,所述装置600被配置为所述存储集群中的网关设备执行,所述装置600包括:
接收模块601,用于接收所述计算集群中计算节点的数据处理请求,所述数据处理请求指示对文件进行处理,所述网关设备由所述计算节点从所述存储集群中的多个网关设备中确 定;
发送模块602,用于向所述存储集群中的存储节点发送所述数据处理请求,由所述存储节点基于所述数据处理请求对所述文件中进行处理。
应理解的是,本发明本申请实施例的装置600可以通过中央处理单元(CPU)实现,也可以通过专用集成电路(ASIC)实现,或可编程逻辑器件(PLD)实现,上述PLD可以是复杂程序逻辑器件(CPLD),现场可编程门阵列(FPGA),通用阵列逻辑(GAL)或其任意组合。也可以通过软件实现图2至图4所示的数据处理方法时,装置600及其各个模块也可以为软件模块。
在一种可能的实现方式中,所述发送模块602还用于:
向所述存储集群中的监控节点发送所述网关设备所处的状态,所述状态包括空闲状态、繁忙状态或故障状态中任意一种。
在一种可能的实现方式中,所述存储集群包括M个资源池,每个资源池包括一个主网关软件和N个备用网关软件,所述网关设备包括一个资源池内的主网关软件,由所述网关设备中的主网关软件接收并发送所述数据处理请求,所述M和所述N均为大于或等于1的整数;所述装置600还包括:
网关监控模块603,用于若所述网关设备中的主网关软件处于故障状态或繁忙状态,所述网关设备中的网关监控模块启用所述主网关软件所在资源池中的一个备用网关软件,由启用的所述备用网关软件接收并发送所述数据处理请求。
在一种可能的实现方式中,所述网关监控模块603用于:
若所述网关设备还包括所述主网关软件所在资源池中的K个备用网关软件,启用所述K个备用网关软件中的任一备用网关软件,所述K为大于等于1且小于等于所述N的整数;
或者,
若所述主网关软件所在资源池中的一个备用网关软件部署在备用网关设备,向所述备用网关设备发送地址更新请求,所述地址更新请指示所述备用网关设备将IP地址修改为所述网关设备的IP地址,并启用所述备用网关设备中的备用软件。
应理解,装置600对应于上述方法实施例中的网关设备,装置600中的各模块和上述其他操作和/或功能分别为了实现方法实施例中的网关设备所实施的各种步骤和方法,具体细节可参见上述方法实施例,为了简洁,在此不再赘述。
应理解,装置600在转发数据处理请求时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置600的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置600与上述方法实施例属于同一构思,其具体实现过程详见上述方法实施例,这里不再赘述。
应理解,装置600可以相当于系统100中的网关设备1022,或者相当于网关设备1022中的执行部件。
图7是本申请提供的一种计算机设备的结构示意图,该计算机设备700可以是图1-5部分描述的内容中涉及的任一设备,比如计算节点、网关设备等。该计算机设备700包括至少一个处理器701、通信总线702、存储器703以及至少一个通信接口704。
处理器701可以是一个通用中央处理器(central processing unit,CPU)、网络处理器 (Network Processor,NP)、微处理器、或者可以是一个或多个用于实现本申请方案的集成电路,例如,专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
通信总线702用于在上述组件之间传送信息。通信总线702可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器703可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,也可以是随机存取存储器(random access memory,RAM)或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only Memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器703可以是独立存在,并通过通信总线702与处理器701相连接。存储器703也可以和处理器701集成在一起。
通信接口704使用任何收发器一类的装置,用于与其它设备或通信网络通信。通信接口704包括有线通信接口,还可以包括无线通信接口。其中,有线通信接口例如可以为以太网接口。以太网接口可以是光接口,电接口或其组合。无线通信接口可以为无线局域网(wireless local area networks,WLAN)接口,蜂窝网络通信接口或其组合等。
在具体实现中,作为一种实施例,处理器701可以包括一个或多个CPU,如图7中所示的CPU0和CPU1。
在具体实现中,作为一种实施例,计算机设备可以包括多个处理器,如图7中所示的处理器701和处理器705。这些处理器中的每一个可以是一个单核处理器(single-CPU),也可以是一个多核处理器(multi-CPU)。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,计算机设备还可以包括输出设备706和输入设备707。输出设备706和处理器701通信,可以以多种方式来显示信息。例如,输出设备706可以是液晶显示器(liquid crystal display,LCD)、发光二级管(light emitting diode,LED)显示设备、阴极射线管(cathode ray tube,CRT)显示设备或投影仪(projector)等。输入设备707和处理器701通信,可以以多种方式接收用户的输入。例如,输入设备707可以是鼠标、键盘、触摸屏设备或传感设备等。
在一些实施例中,存储器703用于存储执行本申请方案的程序代码710,处理器701可以执行存储器703中存储的程序代码710。也即是,该计算机设备700可以通过处理器701以及存储器703中的程序代码710,来实现上文图2-5实施例提供的方法。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括程序代码的存储器,上述程序代码可由计算机设备中的处理器执行以完成上述实施例中的数据处理方法。例如,该计算机可读存储介质是非临时计算机可读存储介质,如只读存储器(read-only memory, ROM)、随机存取存储器(random access memory,RAM)、只读光盘(compact disc read-only memory,CD-ROM)、磁带、软盘和光数据存储设备等。
本申请实施例还提供了一种计算机程序产品,该计算机程序产品包括至少一条程序代码,该程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该程序代码,使得计算机设备执行上述数据处理方法。
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中数据处理方法。
其中,本实施例提供的装置、设备、计算机可读存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (17)

  1. 一种数据处理方法,其特征在于,所述方法应用于存算分离的存储系统,所述存储系统包括计算集群以及存储集群,所述方法由所述计算集群中的计算节点执行,所述方法包括:
    接收数据处理请求,所述数据处理请求指示对文件进行处理;
    从所述存储集群的多个网关设备中确定第一网关设备;
    向所述第一网关设备发送所述数据处理请求,由所述第一网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述计算节点中记录有所述多个网关设备与索引之间的对应关系,所述多个网关设备中的每个网关设备分别对应一个索引;
    所述从所述存储集群的多个网关设备中,确定第一网关设备包括:
    对所述数据处理请求携带的所述文件的标识进行哈希计算,得到所述文件的哈希值;
    基于所述哈希值,获取目标索引;
    基于所述多个网关设备与索引之间的对应关系,将所述多个网关设备中所述目标索引对应的网关设备确定为所述第一网关设备。
  3. 根据权利要求2所述的方法,其特征在于,所述多个网关设备与索引之间的对应关系包括所述多个网关设备中每个网关设备的标识以及每个网关设备所对应的索引,每个网关设备的标识包括每个网关设备的网络协议IP地址。
  4. 根据权利要求2或3所述的方法,其特征在于,所述目标索引为所述哈希值与所述多个网关设备的数目之间的余数。
  5. 根据权利要求1所述的方法,其特征在于,所述从所述存储集群的多个网关设备中确定第一网关设备包括:
    在所述多个网关设备中随机选择任一网关设备作为所述第一网关设备。
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述从所述存储集群的多个网关设备中,确定第一网关设备之前,所述方法还包括:
    从所述存储集群中的监控节点,获取所述存储集群中各个网关设备所处的状态,其中,所述状态包括空闲状态、繁忙状态或故障状态中任意一种;
    基于所述存储集群中各个网关设备所处的状态,确定处于空闲状态的所述多个网关设备。
  7. 根据权利要求1-5中任一项所述的方法,其特征在于,所述向所述第一网关设备发送所述数据处理请求,由所述第一网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理,包括:
    若所述第一网关设备处于繁忙状态或故障状态,基于记录的所述多个网关设备所处的状 态,从所述多个网关设备中,确定第二网关设备,所述第二网关设备为所述多个网关设备中处于空闲状态的任一网关设备;
    向所述第二网关设备发送所述数据处理请求,由所述第二网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理。
  8. 一种数据处理装置,其特征在于,所述装置应用于存算分离的存储系统,所述存储系统包括计算集群以及存储集群,所述装置被配置为所述计算集群中的计算节点,所述装置包括:
    接收模块,用于接收数据处理请求,所述数据处理请求指示对文件进行处理;
    确定模块,用于从所述存储集群的多个网关设备中确定第一网关设备;
    发送模块,用于向所述第一网关设备发送所述数据处理请求,由所述第一网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理。
  9. 根据权利要求8所述的装置,其特征在于,所述计算节点中记录有所述多个网关设备与索引之间的对应关系,所述多个网关设备中的每个网关设备分别对应一个索引;所述确定模块用于:
    对所述数据处理请求携带的所述文件的标识进行哈希计算,得到所述文件的哈希值;
    基于所述哈希值,获取目标索引;
    基于所述多个网关设备与索引之间的对应关系,将所述多个网关设备中所述目标索引对应的网关设备确定为所述第一网关设备。
  10. 根据权利要求9所述的装置,其特征在于,所述多个网关设备与索引之间的对应关系包括所述多个网关设备中每个网关设备的标识以及每个网关设备所对应的索引,每个网关设备的标识包括每个网关设备的网络协议IP地址。
  11. 根据权利要求9或10所述的装置,其特征在于,所述目标索引为所述哈希值与所述多个网关设备的数目之间的余数。
  12. 根据权利要求8所述的装置,其特征在于,所述确定模块用于:
    在所述多个网关设备中随机选择任一网关设备作为所述第一网关设备。
  13. 根据权利要求8-12中任一项所述的装置,其特征在于,所述装置还包括:
    获取模块,用于从所述存储集群中的监控节点,获取所述存储集群中各个网关设备所处的状态,其中,所述状态包括空闲状态、繁忙状态或故障状态中任意一种;
    所述确定模块,还用于基于所述存储集群中各个网关设备所处的状态,确定处于空闲状态的所述多个网关设备。
  14. 根据权利要求8-13中任一项所述的装置,其特征在于,
    所述确定模块,还用于若所述第一网关设备处于繁忙状态或故障状态,基于记录的所述 多个网关设备所处的状态,从所述多个网关设备中,确定第二网关设备,所述第二网关设备为所述多个网关设备中处于空闲状态的任一网关设备;
    所述发送模块,还用于向所述第二网关设备发送所述数据处理请求,由所述第二网关设备将所述数据处理请求转发至所述存储集群中的存储节点完成对所述文件进行处理。
  15. 一种计算机设备,其特征在于,所述计算机设备包括处理器,所述处理器用于执行程序代码,使得所述计算机设备执行如权利要求1至权利要求7中任一项所述的方法。
  16. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器读取以使计算机设备执行如权利要求1至权利要求7中任一项所述的方法。
  17. 一种计算机程序产品,其特征在于,所述计算机程序产品包括至少一条程序代码,所述至少一条程序代码由计算机设备中的处理器读取以使计算机设备执行如权利要求1至权利要求7中任一项所述的方法。
PCT/CN2022/086063 2021-08-31 2022-04-11 数据处理方法、装置、计算机设备及计算机可读存储介质 WO2023029485A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22862631.3A EP4383076A1 (en) 2021-08-31 2022-04-11 Data processing method and apparatus, computer device, and computer-readable storage medium
US18/590,120 US20240205292A1 (en) 2021-08-31 2024-02-28 Data processing method and apparatus, computer device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111010047.2A CN115729693A (zh) 2021-08-31 2021-08-31 数据处理方法、装置、计算机设备及计算机可读存储介质
CN202111010047.2 2021-08-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/590,120 Continuation US20240205292A1 (en) 2021-08-31 2024-02-28 Data processing method and apparatus, computer device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2023029485A1 true WO2023029485A1 (zh) 2023-03-09

Family

ID=85291211

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086063 WO2023029485A1 (zh) 2021-08-31 2022-04-11 数据处理方法、装置、计算机设备及计算机可读存储介质

Country Status (4)

Country Link
US (1) US20240205292A1 (zh)
EP (1) EP4383076A1 (zh)
CN (1) CN115729693A (zh)
WO (1) WO2023029485A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117675932B (zh) * 2024-02-01 2024-05-14 腾讯科技(深圳)有限公司 请求处理方法、装置、电子设备、系统及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106130920A (zh) * 2016-07-14 2016-11-16 腾讯科技(深圳)有限公司 一种报文转发方法及装置
CN112929424A (zh) * 2021-01-26 2021-06-08 成都佳发安泰教育科技股份有限公司 网关负载均衡的方法、装置、设备及存储介质
US20210258380A1 (en) * 2019-11-08 2021-08-19 Goodblock Technologies, Inc. Resilient distributed storage system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106130920A (zh) * 2016-07-14 2016-11-16 腾讯科技(深圳)有限公司 一种报文转发方法及装置
US20210258380A1 (en) * 2019-11-08 2021-08-19 Goodblock Technologies, Inc. Resilient distributed storage system
CN112929424A (zh) * 2021-01-26 2021-06-08 成都佳发安泰教育科技股份有限公司 网关负载均衡的方法、装置、设备及存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WANG, HEKANG ET AL.: "Design and Implementation of Enterprise Net-Disk Based on Multi-Cloud Servers", JOURNAL OF INTEGRATION TECHNOLOGY, vol. 8,, no. 2, 31 March 2019 (2019-03-31), ISSN: 2095-3135 *
WANG, HEKANG ET AL.: "Design and Implementation of Enterprise Net-Disk Based on Multi-Cloud Servers", JOURNAL OF INTEGRATION TECHNOLOGY, vol. 8,, no. 2,, 31 March 2019 (2019-03-31), ISSN: 2095-3135 *
YANG, FEI; ZHU, ZHI-XIANG; LIANG, XIAO-JIANG: "Design and Implementation of Load Balancing Based on Ceph Object Storage Cluster", COMPUTER SYSTEMS AND APPLICATIONS, ZHONGGUO KEXUEYUAN RUANJIAN YANJIUSUO, CN, vol. 25, no. 4, 30 April 2016 (2016-04-30), CN , pages 268 - 271, XP009544253, ISSN: 1003-3254 *

Also Published As

Publication number Publication date
EP4383076A1 (en) 2024-06-12
CN115729693A (zh) 2023-03-03
US20240205292A1 (en) 2024-06-20

Similar Documents

Publication Publication Date Title
US10997211B2 (en) Systems and methods for database zone sharding and API integration
US10977277B2 (en) Systems and methods for database zone sharding and API integration
US8990243B2 (en) Determining data location in a distributed data store
US9304815B1 (en) Dynamic replica failure detection and healing
JP5458308B2 (ja) 仮想計算機システム、仮想計算機システムの監視方法及びネットワーク装置
US10097659B1 (en) High performance geographically distributed data storage, retrieval and update
JP6325001B2 (ja) 階層データ構造のノードにおいて再帰的イベントリスナを用いる方法およびシステム
US9641598B1 (en) Contextually unique identifier generation service
US9792150B1 (en) Detecting site change for migrated virtual machines
US12038879B2 (en) Read and write access to data replicas stored in multiple data centers
WO2022111313A1 (zh) 一种请求处理方法及微服务系统
US11567660B2 (en) Managing cloud storage for distributed file systems
US20240205292A1 (en) Data processing method and apparatus, computer device, and computer-readable storage medium
WO2019153880A1 (zh) 集群中镜像文件下载的方法、节点、查询服务器
US8621260B1 (en) Site-level sub-cluster dependencies
WO2012171363A1 (zh) 分布式缓存系统中的数据操作方法和装置
WO2021244500A1 (zh) 一种备份状态确定方法、装置及系统
US11169728B2 (en) Replication configuration for multiple heterogeneous data stores
US11340964B2 (en) Systems and methods for efficient management of advanced functions in software defined storage systems
US10712959B2 (en) Method, device and computer program product for storing data
US10310889B1 (en) Data statistics service
US12007850B2 (en) Method and system for generating backup of a large file and optimizing resource utilization
US11880586B2 (en) Storage array remote replication
US11943316B1 (en) Database connection multiplexing for prepared statements
US11971902B1 (en) Data retrieval latency management system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862631

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022862631

Country of ref document: EP

Effective date: 20240307

NENP Non-entry into the national phase

Ref country code: DE