WO2018188089A1 - 数据处理方法、存储系统和交换设备 - Google Patents

数据处理方法、存储系统和交换设备 Download PDF

Info

Publication number
WO2018188089A1
WO2018188089A1 PCT/CN2017/080655 CN2017080655W WO2018188089A1 WO 2018188089 A1 WO2018188089 A1 WO 2018188089A1 CN 2017080655 W CN2017080655 W CN 2017080655W WO 2018188089 A1 WO2018188089 A1 WO 2018188089A1
Authority
WO
WIPO (PCT)
Prior art keywords
osd
identifier
offset
network
switching device
Prior art date
Application number
PCT/CN2017/080655
Other languages
English (en)
French (fr)
Inventor
许慧锋
郭海涛
严春宝
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to JP2019526353A priority Critical patent/JP6724252B2/ja
Priority to EP17905455.6A priority patent/EP3474146B1/en
Priority to CN202210544122.1A priority patent/CN114880256A/zh
Priority to CN201780089594.XA priority patent/CN110546620B/zh
Priority to PCT/CN2017/080655 priority patent/WO2018188089A1/zh
Publication of WO2018188089A1 publication Critical patent/WO2018188089A1/zh
Priority to US16/360,906 priority patent/US10728335B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/10Program control for peripheral devices
    • G06F13/102Program control for peripheral devices where the programme performs an interfacing function, e.g. device driver
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/35Switches specially adapted for specific applications
    • H04L49/356Switches specially adapted for specific applications for storage area networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration

Definitions

  • the present application relates to the field of information technology, and in particular, to a data processing method, a storage system, and a switching device.
  • the compute node performs the IO (Input/Output, Input/Output) operation for the virtual disk. After calculating the location of the IO operation corresponding to the OSD (Object Storage Device), the compute node can Corresponding OSD initiates IO operations.
  • IO Input/Output, Input/Output
  • the process of calculating the IO operation corresponding to the OSD consumes the computing power of the computing node, especially in the scenario where the IO traffic is large, and the computing node is lowered. The efficiency of the operation.
  • the present application discloses a data processing method, a storage system, and a switching device to alleviate the calculation amount of a computing node and improve the efficiency of operation of a computing node.
  • the present application provides a storage system, where the storage system includes a computing node and a switching device that are connected to each other, wherein the computing node sends a first network packet to the switching device, where the first network packet carries the resource identifier, The first offset and the input and output IO commands.
  • the switching device determines the OSD corresponding to the IO operation. Specifically, the switching device generates a second offset according to the first offset and the size of the object storage device OSD, according to the resource.
  • the second offset is an address offset of the first OSD to be subjected to the IO operation.
  • the computing node does not need to calculate the OSD corresponding to the IO operation, but the switching device performs the OSD search according to the information carried in the first network packet from the computing node, thereby reducing the calculation amount of the computing node.
  • the storage system further includes a plurality of storage nodes, where the first storage node is configured to receive the second network packet sent by the switching device, select the first OSD according to the identifier of the first OSD, and And performing an IO operation according to the storage address pointed by the second offset in the first OSD according to the IO command carried in the second network message.
  • the first storage node receives the second network packet from the switching device.
  • the second storage packet carries the identifier of the first OSD to be IO-operated and the second offset, and the first storage node carries the second network packet according to the second network packet.
  • the information can be used for IO operations. Therefore, the present application implements the lookup of the OSD locally on the switching device, and does not affect the IO operation of the first storage node. Compared with the prior art, the switching device of the present application can be used for various storage nodes without requiring the storage node to make changes.
  • the IO operation can be a write operation or a read operation.
  • the IO operation is a write operation
  • the IO command includes a write IO command and data to be written
  • the IO operation is a read operation
  • the IO command includes a read IO command and a read length.
  • the first storage node is specifically configured to write a write data IO command to be written according to a storage address pointed by the second offset in the first OSD according to the write IO command.
  • the switching device since the switching device locally performs the lookup of the OSD corresponding to the IO operation, the computing node only needs to set the write IO command and the data to be written in the first network packet and send the first network packet to the switching device, and the switching device is based on When the first network packet finds the OSD, the identifier of the OSD, the write IO command, and the data to be written are directly sent to the first storage node where the OSD is located. Therefore, the first storage node can perform the foregoing information without any modification.
  • the write operation so the switching device of the present application does not need to require the storage node to make changes, and can be commonly used for various storage nodes.
  • the data to be written from the computing node needs to be stored in multiple copies, that is, the data to be written needs to be stored in at least two storage nodes for the purpose of enhancing data security.
  • the first network packet further carries a multi-copy operation code, and the switching device is specifically configured to acquire the first network address, the identifier of the first OSD, and the second OSD to be written according to the resource identifier and the first offset.
  • the switching device generates a third network according to the multiple copy operation code, in addition to generating the second network packet and sending the second network packet to the first storage node Sending a third network packet to the second storage node, where the third network packet carries a second offset, a write IO command, a to-be-written data, and an identifier of the second OSD, and the third network packet
  • the destination address is the second network address
  • the second storage node receives the third network packet, and writes the data to be written according to the storage address pointed by the second offset in the second OSD according to the write IO command.
  • the multi-copy opcode can be used to notify the switching device that the primary OSD and the secondary OSD of the virtual disk mapping need to be simultaneously written.
  • the switching device generates multiple networks.
  • the message is sent to the storage node where the primary OSD is located and the storage node where the secondary OSD is located, so that the primary OSD and the secondary OSD simultaneously write data to be written, thereby ensuring data consistency. Therefore, the application further expands the switching device.
  • the function enables the switching device to translate write commands from the compute node into write commands for multiple storage nodes based on the multi-copy opcode.
  • the first storage node is further configured to send the first response packet to the switching device, where the second storage node is further configured to send the second response packet to the switching device, where the first response packet
  • the file carries a first write result and a message type identifier.
  • the destination address of the first response message is a network address of the computing node, and the second response message carries a second write result and a message type identifier.
  • the destination address of the second response packet is a network address of the computing node, and the switching device is further configured to receive the first response packet and the second response packet, and determine the first response packet and the second response packet according to the packet type identifier.
  • the destination address is a network address of the computing node, and the computing node is further configured to receive the third response packet, and obtain the first writing result and the second writing result carried by the third response packet.
  • the switching device After receiving the response packet returned by the at least two storage nodes, the switching device determines the type of the response packet, and combines and encapsulates multiple response packets returned from the at least two storage nodes to generate a return response packet to the computing node.
  • a response message which aggregates responses from multiple storage nodes into a single response and returns to the computing node, thereby significantly reducing the burden on the computing node, especially in the scenario where the IO traffic is large, and the advantages of the embodiment of the present invention are more It is obvious.
  • the switching device When the received packet carries the packet type identifier, the switching device confirms that the packet is a KV packet, parses the packet, and performs OSD search according to the parsed information. The packet does not carry the packet type identifier. If the packet is a non-KV packet, the packet is forwarded according to the destination address of the packet. Therefore, the switching device of the present application has the function of forwarding non-KV packets, and is compatible with various network systems.
  • the first storage node is specifically configured to read data of the read length according to the storage address pointed by the second offset in the first OSD according to the read IO command, where the first network packet
  • the IO command carried in the second network packet specifically includes a read IO command and a read length.
  • the switching device since the switching device locally performs the lookup of the OSD corresponding to the IO operation, the computing node only needs to set the read IO command and the read length in the first network packet and send the first network packet to the switching device, and the switching device is according to the first When the network packet finds the OSD, the OSD identifier, the read IO command, and the read length are directly sent to the first storage node where the OSD is located. Therefore, the first storage node can perform the read operation according to the foregoing data without any modification. Therefore, the switching device of the present application is commonly used for various storage nodes.
  • the switching device is specifically configured to perform a modulo operation on the size of the OSD by using the first offset, and the obtained result is used as the second offset.
  • the present application is applicable to the case where the virtual disk is a block storage system.
  • the resource identifier is specifically the file system identifier and the file identifier of the virtual file of the virtual disk on the computing node
  • the present application is applicable to the case where the virtual disk is a file storage system.
  • each virtual space has a mapping relationship with an OSD.
  • the first offset may occur greater than the OSD, so the first offset is not applicable to the OSD, so the first offset is to be modulo the size of the OSD, and the resulting second offset is less than or Equal to the size of the OSD.
  • the switching device is specifically configured to acquire, according to the volume number of the virtual disk on the computing node and the first offset, the first storage node where the first OSD to be IO operated is located.
  • the resource identifier is a file system identifier and a file identifier
  • the switching device is configured to obtain, according to the file system identifier, the file identifier, and the first offset, where the first OSD to be IO-operated is located.
  • the first network packet further carries a packet type identifier
  • the switching device is further configured to determine, according to the packet type identifier, that the first network packet is a key value KV packet.
  • the switching device When receiving the packet, the switching device confirms that the packet is a KV packet, and parses the packet. The device performs the OSD search according to the parsed information. When the packet type identifier is carried, it is confirmed that the packet is a non-KV packet, and the packet is forwarded according to the destination address of the packet. Therefore, the switching device of the present application has the function of forwarding non-KV packets, and is compatible. For various network systems.
  • the first network packet is a TCP packet
  • the packet type identifier is set in an Options field and a Padding field in an IP header of the TCP packet.
  • the field of the IP header carries the packet type identifier.
  • the switching device determines the packet type, it only needs to analyze the IP header without IP. Data fields are disassembled to speed up packet processing.
  • the present application provides a specific implementation manner in which a switching device acquires an IO operation corresponding to an OSD.
  • the switching device is specifically configured to perform a rounding operation on the size of the first offset to the OSD, obtain a rounding result, obtain a key corresponding to the resource identifier and the rounding result, and determine a lookup table.
  • the virtual volume or virtual file is actually The virtual space is equal to the size of the OSD.
  • Each virtual space has a mapping relationship with an OSD. Therefore, the virtual space corresponds to a certain number of OSDs. Therefore, it is only necessary to ensure that one virtual space corresponds to one key, and the rounding operation can be performed. Implement a virtual space corresponding to a key.
  • the switching device is specifically configured to perform a rounding operation on the size of the first offset to the OSD to obtain a rounding result according to the resource identifier and the rounding result as a consistent hash algorithm.
  • the input parameter runs a consistent hash algorithm to obtain the corresponding key.
  • the present application can obtain a key from a resource identifier and a rounding result by a table lookup or a hash operation.
  • the comparison table includes a global view table and a partition map, where the global view table includes a correspondence between the key and the OSD number, wherein the OSD number is used to uniquely identify an OSD in the storage system, and the partition
  • the mapping table includes the mapping between the OSD number and the network address of the storage node and the identifier of the OSD
  • the switching device is specifically configured to search the global view table to determine the OSD number corresponding to the obtained key, and find the partition map to determine the first corresponding to the OSD number. The first network address of the first storage node where the OSD is located and the identifier of the first OSD.
  • the storage system further includes a metadata control node connected to the switching device, and the metadata control node records the metadata of the storage system, including the global view table and the partition map.
  • the switching device is further configured to receive a global view table and a partition map sent by the metadata control node.
  • the switching device receives the global view table and the partition map by the metadata control node, it is not necessary to regenerate the global view table and the partition map, which improves system compatibility.
  • the application provides a data processing method, including:
  • the computing node sends the first network packet to the switching device, where the first network packet carries the resource identifier, the first offset, and the input and output IO commands, and the switching device according to the first offset and the size of the object storage device OSD Generating a second offset, and acquiring a first network address of the first storage node where the first OSD to be IO operation is located and an identifier of the first OSD according to the resource identifier and the first offset, and the switching device generates the second network Transmitting the second network packet to the first storage node, where the second network packet carries the second offset, the IO command, and the identifier of the first OSD, and the destination address of the second network packet is First network address.
  • any one implementation manner is the first aspect or the method implementation corresponding to any one of the foregoing aspects, and the description in any one of the first aspect or the first aspect is applicable to the second aspect. Or any implementation of the second aspect, and details are not described herein again.
  • the present application provides a data processing method, including: a switching device receives a first network packet sent by a computing node, where the first network packet carries a resource identifier, a first offset, and an input and output IO command.
  • the switching device generates a second offset according to the first offset and the size of the object storage device OSD, and acquires the first storage node where the first OSD to be IO-operated is located according to the resource identifier and the first offset.
  • the switching device generates a second network packet and sends the second network packet to the first storage node, where the second network packet carries a second offset, an IO command, and The identifier of the first OSD, and the destination address of the second network packet is the first network address.
  • the computing node does not need to calculate the OSD corresponding to the IO operation, but the switching device performs the OSD search according to the information carried in the first network packet from the computing node, thereby reducing the calculation amount of the computing node.
  • the application provides a switching device, including: a receiving module, configured to receive a first network packet sent by a computing node, where the first network packet carries a resource identifier, a first offset, and an input and output An IO command, a processing module, configured to generate a second offset according to the first offset and the size of the object storage device OSD, according to the Obtaining, by the first identifier, a first network address of the first storage node where the first OSD is to be IO operation, and an identifier of the first OSD, and generating a second network packet, where the second network packet The second offset, the IO command, and the identifier of the first OSD are carried, and the destination address of the second network packet is the first network address, and the sending module is configured to send the second network packet to the first storage node.
  • a receiving module configured to receive a first network packet sent by a computing node, where the first network packet carries a resource identifier, a first offset, and an input and output An
  • any one implementation manner is the method implementation corresponding to any one of the third aspect or the third aspect, and the description in any one of the third aspect or the third aspect is applicable to the fourth aspect. Or any implementation manner of the fourth aspect, and details are not described herein again.
  • the IO command includes a write IO command and data to be written
  • the first network packet further carries a multi-copy operation code
  • the processing module is specifically configured to obtain according to the resource identifier and the first offset. a first network address, an identifier of the first OSD, a second network address of the second storage node where the second OSD is to be written, and an identifier of the second OSD; the processing module is further configured to generate the third according to the multiple copy operation code a network packet, wherein the third network packet carries a second offset, a write IO command, a to-be-written data, and an identifier of the second OSD, and the destination address of the third network packet is a second network address; And is further configured to send the third network packet to the second storage node.
  • the receiving module is further configured to receive a first response message sent by the first storage node and a second response message sent by the second storage node, where the first response message carries a first write result and a message type identifier, the destination address of the first response message is a network address of the computing node, and the second response message carries a second write result and a message type identifier, and the second response report
  • the destination address of the text is the network address of the computing node
  • the processing module is further configured to determine, according to the packet type identifier, that the first response packet and the second response packet are key value KV packets, generate a third response packet, and send The third response message is sent to the computing node, where the third response message carries the first write result and the second write result, and the destination address of the third response message is the network address of the computing node.
  • the processing module is specifically configured to perform a modulo operation on the size of the first offset to the OSD, and the obtained result is used as the second offset.
  • the resource identifier is a volume number of the virtual disk on the computing node
  • the processing module is configured to obtain, according to the volume number and the first offset, the first OSD where the IO operation is to be performed.
  • the resource identifier is a file system identifier and a file identifier
  • the processing module is configured to obtain, according to the file system identifier, the file identifier, and the first offset, where the first OSD to be IO-operated is located.
  • the first network packet further carries a packet type identifier
  • the processing module is further configured to determine, according to the packet type identifier, that the first network packet is a key value KV packet.
  • the processing module is further configured to perform a rounding operation on the size of the first offset to the OSD, obtain a rounding result, obtain a key corresponding to the resource identifier and the rounding result, and look up a lookup table. Determining a first network address of the first storage node corresponding to the key and an identifier of the first OSD, wherein the comparison table includes a correspondence between the key, the network address of the storage node, and the identifier of the OSD.
  • the comparison table includes a global view table and a partition map, where the global view table includes a correspondence between the key and the OSD number, wherein the OSD number is used to identify the OSD in the storage system, and the partition map
  • the processing module is configured to: determine the OSD number corresponding to the obtained key by searching the global view table, and find the partition map to determine the first OSD corresponding to the OSD number. The first network address of the first storage node and the identity of the first OSD.
  • the receiving module is further configured to receive a global view sent by the metadata control node. Charts and partition maps.
  • the present application provides a switching device having a function of implementing a switching device in the above method.
  • the functions may be implemented by hardware or by corresponding software implemented by hardware.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • the switching device may be a network side device, such as a switch and a physical server implementing switch functions.
  • the present application provides a switch that includes a processor, a memory, and a plurality of physical ports that perform the functions of the switching device described in the above aspects.
  • the processor is configured to support a switching device to perform corresponding functions in the above methods, such as generating or processing data and/or information involved in the above methods.
  • the present application provides a computer storage medium for storing computer software instructions for use in the above-described switching device, comprising a program designed to perform various embodiments of the above aspects.
  • FIG. 1 is a schematic structural diagram of a system of a storage system according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an apparatus for computing a node according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of another device of a computing node according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of an apparatus for storing a node according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of another storage node device according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a device of a metadata control node according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an apparatus of a switching device according to an embodiment of the present invention.
  • Figure 8 is a flow chart showing a first embodiment of an information processing method according to the present invention.
  • FIG. 9 is a mapping diagram of a volume of a virtual disk and an OSD according to an embodiment of the present invention.
  • Figure 10 is a flow chart showing a second embodiment of an information processing method according to the present invention.
  • Figure 11 is a flow chart showing a third embodiment of the information processing method according to the present invention.
  • FIG. 12 is a schematic diagram of a format of a TCP packet according to an embodiment of the present invention.
  • Figure 13 is a block diagram showing another structure of a switching device in accordance with the present invention.
  • FIG. 1 is a schematic structural diagram of a system of a storage system according to an embodiment of the present invention.
  • a storage system according to an embodiment of the present invention includes a switching device 300, a computing node cluster, a storage node cluster, and a meta
  • the data control node 400 includes a plurality of computing nodes, such as computing nodes 100, 100', ...; the storage node cluster includes a plurality of storage nodes, such as a first storage node 200, a second storage node 200', and the like.
  • any computing node may establish a point-to-point connection with the metadata node 400 and any storage node through the switching device 300.
  • the specific number of the computing node, the storage node, and the metadata control node may be set according to actual needs. Generally, since the embodiment of the present invention relates to a distributed storage technology, in the following embodiments, two storages are used.
  • the node is introduced as an example.
  • the switching device 300 may be at least one switch or router.
  • the switching device 300 after receiving the network packet sent by the computing node, the switching device 300 according to the network
  • the resource identifier and the offset carried in the packet obtain the network address of the storage node where the OSD to be written or to be read is located, and the identifier of the OSD of the OSD on the storage node, generate and send the IO command
  • the network message is sent to the storage node, so that the storage node can read or write to its own OSD according to the IO command.
  • the switching device 300 performs a key value operation function to implement a network address of the storage node and a lookup of the OSD, and the computing node does not need to establish a connection with the metadata control node at each metadata control node in each IO operation.
  • the network address and OSD lookup are performed, so that the calculation amount of the computing node can be reduced, and the network load can be reduced.
  • FIG. 2 to FIG. 5 respectively introduce the specific structures of the computing node 100, the first storage node 200, the second storage node 200', the metadata control node 400, and the switching device 300. .
  • FIG. 2 is a schematic structural diagram of a device for computing a node according to an embodiment of the present invention.
  • the computing node 100 includes a hardware 101, an operating system 102, and a VBS (Virtual Block System) component. 104 and virtual machine 105.
  • VBS Virtual Block System
  • the hardware 101 includes a physical network card 1011
  • the operating system 102 includes a virtual machine monitor 1022 and a physical network card driver 1021
  • the virtual machine 105 includes an application 1051
  • a virtual machine operating system 1052 is provided with a virtual disk 10531.
  • the physical network card 1011 works as a network component of the network link layer, and is an interface connecting the computing node and the transmission medium in the network.
  • the physical network card 1011 sets a network address for the computing node 100, and the network address can be a MAC address or an IP address, and the computing node passes through the network.
  • the address can be identified on the network.
  • the physical network card 1011 can be configured with one or more network addresses. In this embodiment, the physical network card 1011 can be configured with a network address, but it is worth noting that in a relatively complex network environment, multiple network addresses can be set.
  • the physical network card 1011 can set the IP address of the compute node 100 to 192.168.1.11.
  • the physical network card driver 1021 is disposed in the operating system 102 to provide an interface for the physical network card 1011.
  • the VBS component 104 can control the physical network card 1011 to receive or send network packets through the interface provided by the physical network card driver 1021.
  • VBS component 104 is installed and operates on operating system 102, which provides distributed storage access services to computing node 100.
  • the VBS component 104 is provided with a network port in the operating system 102.
  • the VBS component 104 communicates with the external network through its own network port.
  • the network port of the VBS component 104 is, for example, 10001, and the network port of the VBS component of each computing node in the storage system is the same.
  • the virtual machine monitor 1022 is disposed in the operating system 102, which virtualizes a set of virtual hardware independent of the actual hardware 101 for the virtual machine operating system 1053.
  • the application 1051 runs on the virtual machine operating system 1052, and the virtual machine operating system 1052 is provided with a virtual disk 10531.
  • the application 1051 can read and write the virtual disk 10531.
  • the virtual disk 10531 is provided with a plurality of virtual volumes, each of which is assigned a volume number.
  • the application 1051 generates a read/write command including a volume number and a first offset when the virtual volume is read or written, wherein the first offset points to the read/write position of the virtual volume corresponding to the volume number, and the virtual machine operates.
  • the system 1052 sends the read and write commands to the virtual machine monitor 1052, and the VBS component 104 retrieves the read and write commands from the virtual machine monitor 1052.
  • the virtual disk 10531 is provided with a plurality of virtual files, each of which is provided with a file system identifier and a file identifier, wherein the file system identifier indicates an identifier of a file system where the file is located, and is used to identify, for example, FAT (File) Allocation Table, file configuration table, NTFS (New Technology File System) and other file formats.
  • FAT File
  • NTFS New Technology File System
  • the application 1051 when reading and writing the virtual file, generates a read/write instruction including a file system identifier, a file identifier, and a first offset, wherein the first offset points to the file system identifier and the file
  • the read/write location of the corresponding virtual file is identified, the virtual machine operating system 1052 sends the read/write command to the virtual machine monitor 1052, and the VBS component 104 obtains the read/write command from the virtual machine monitor 1052.
  • FIG. 3 is a schematic structural diagram of another device of a computing node according to an embodiment of the present invention.
  • the computing node 100 may not adopt a virtualization structure.
  • the application 1051 runs directly on the operating system 102.
  • the VBS component 104 provides a distributed storage access service for the node.
  • the operating system 102 can map the distributed storage space provided by the VBS component 104 to the virtual disk 10521.
  • the program 1051 actually accesses the distributed storage space via the VBS component 104 when accessing the virtual disk 10521.
  • the virtual disk 10531 can be provided with a plurality of virtual volumes, each of which is assigned a volume number.
  • the application 1051 generates a read/write command including a volume number and a first offset when the virtual volume is read or written, wherein the first offset points to a read/write position of the virtual volume corresponding to the volume number, and the operating system 102
  • the read and write commands are sent to the VBS component 104.
  • the virtual disk 10531 is provided with a plurality of virtual files, each of which is provided with a file system identifier and a file identifier, wherein the file system identifier indicates an identifier of a file system where the file is located, and the file system may be, for example, FAT (File Allocation Table), NTFS (New Technology File System), etc., the file identifier indicates the file system identifier of the file.
  • the application 1051 generates a read/write instruction including a file system identifier, a file identifier, and a first offset when the virtual file is read or written, wherein the first offset points to the file system identifier and the virtual file corresponding to the file identifier.
  • the read and write location the operating system sends the read and write commands to the VBS component 104.
  • FIG. 4 is a schematic structural diagram of an apparatus for storing a node according to an embodiment of the present invention.
  • the first storage node 200 includes hardware 201, an operating system 202, and an OSD component 203.
  • the hardware 201 includes a physical network card 2011, a disk controller 2012, an OSD1, an OSD2, an OSD3, and an OSD4.
  • the operating system 202 includes a disk drive 2021 and a physical network card driver 2022.
  • the OSD component 203 is installed and runs on the operating system 202.
  • each OSD is provided with an identifier, and the OSD component 203 manages different OSDs by the identifier of the OSD.
  • the OSD is specifically set on the physical disk.
  • one physical disk sets one OSD.
  • one physical disk sets multiple OSDs, and each OSD has the same size, for example
  • the statement can be 10MB (MByte, mega).
  • each OSD is assigned an OSD identifier, and the OSD identifier can be used to identify the OSD in the storage node.
  • the identifier of the OSD1 is 0000
  • the identifier of the OSD2 is 0001
  • the identifier of the OSD3 is 0002
  • the identifier of the OSD4 is 0003
  • the identifier of the OSD in the first storage node 200 is recorded by the OSD component 203.
  • each OSD is further assigned an OSD number, which can be used to identify the OSD in the storage system, and the OSD number is recorded in the metadata control node 400.
  • the OSD number of OSD1 in the first storage node 200 is 0x 00000000.
  • the physical network card 2011 sets a network address for the first storage node 200, and the OSD component 203 can control the physical network card 2011 to send or receive network packets through the interface provided by the physical network card driver 2022.
  • the IP address set by the physical network card 2011 for the first storage node 200 is 192.168.1.12.
  • the disk drive 2021 is disposed in the operating system 202, and an interface is provided in the operating system 202 through which the OSD component 203 controls the disk controller 2012.
  • the disk controller 2012 can be, for example, a disk drive adapter that receives and parses the SCSI (Small Computer System Interface) instructions issued by the OSD component 203 through the interface provided by the operating system 202 by the disk drive 2021, to the physics of the OSD. The disk is read and written.
  • SCSI Small Computer System Interface
  • the OSD component 203 is provided with a network port in the operating system 202, and the VBS component 104 communicates with an external network through its own network port, for example, the network port of the OSD component 203 is 10002.
  • FIG. 5 is a schematic structural diagram of another storage node according to an embodiment of the present invention. It is noted that the structure of the second storage node 200' is substantially the same as that of the first storage node 200, and the difference lies in the second.
  • the physical NIC of storage node 200' provides a different network address for second storage node 200' than first storage node 200. Illustratively, the IP address of the second storage node 200' is 192.168.1.13.
  • the OSD component 203' also records the identifier of the OSD on the second storage node 200'.
  • the identifier of the OSD of the OSD1' may be 0000
  • the identifier of the OSD of the OSD2' It may be 0001
  • the identifier of the OSD of the OSD 3' may be 0002
  • the identifier of the OSD of the OSD 4' may be 0003.
  • the network port of the OSD component 203' on the operating system 202' of the second storage node 200' can be, for example, 10002.
  • FIG. 6 is a schematic structural diagram of a device of a metadata control node according to an embodiment of the present invention.
  • the metadata control node 400 includes a hardware 401, an operating system 402, and an MDC (Meta Data Controller) component 403.
  • the hardware 401 includes a memory 4011 and a physical network card 4012.
  • the operating system 402 includes a memory driver 4021 and a physical network card driver 4022.
  • the physical NIC driver 4022 provides an interface in the operating system 402 through which the MDC component 403 running on the operating system 402 controls the physical NIC 4012 to receive or transmit network messages.
  • physical network card 4012 sets the IP address of metadata control node 400 to 192.168.1.14.
  • the memory driver 4021 provides an interface at the operating system 402 through which the MDC component 403 writes data to or reads data from the memory 4011.
  • the MDC component 403 can record the correspondence between the key and the OSD. Specifically, the MDC component 403 receives the status information reported by each storage node in the storage system, and sorts it into a lookup table according to the state information and the key (key) of the self record, and stores it in the memory 4011, where the state information includes the OSD component record in the storage node.
  • the ID of the OSD and the IP address of the storage node where it resides are as follows:
  • the secondary OSD is a backup of the primary OSD.
  • the data needs to be simultaneously written to the secondary OSD to ensure data consistency requirements. It is worth noting that in the above table, only one scene corresponding to one primary OSD and one secondary OSD is shown, but in other examples, one key corresponds to only one primary OSD, or one key corresponds to one primary OSD and three. Or more than three secondary OSDs.
  • the comparison table includes a global view table and a partition map, where the global view table includes a correspondence between each key in the storage system and a corresponding OSD number, wherein the OSD number is used to identify the OSD in the storage system;
  • the table includes a mapping relationship between each OSD number and a network address of a corresponding storage node and an identifier of a corresponding OSD.
  • the key and OSD numbers are all expressed in hexadecimal, the data read length of the key is 8 bytes, and the OSD number is 4 bytes.
  • the VBS component when the operating system of the compute node loads the virtual volume, the VBS component hashes the volume number of the virtual volume and each offset in the virtual volume, thereby obtaining the key, the volume number of the virtual volume, and There is a one-to-one correspondence between the offset in the virtual volume and the key. Further, the VBS component may first round the offset of the virtual volume to the size of the OSD, and hash the rounded result of the virtual volume to obtain the key.
  • the VBS component when the operating system of the computing node loads the virtual file, the VBS component hashes each offset in the virtual file according to the file system identifier and the file identifier of the virtual file, thereby obtaining the key. There is a one-to-one correspondence between the file system identifier, the file identifier, and the offset in the virtual file and the key. Further, the VBS component may first perform the rounding operation on the offset of the virtual file in the virtual file, and hash the file system identifier and the file identifier of the virtual file to obtain the key.
  • the virtual volume or virtual file is divided into units of OSD size, and in a virtual volume or virtual file, corresponding to the size of the OSD.
  • the space corresponds to the same OSD.
  • the OSD number is the number of each OSD in the storage system, which may be generated by the computing node.
  • the computing node may allocate an OSD number to the key when calculating the first key.
  • another OSD number is assigned to the key, for example, 1, and so on, wherein the key has a one-to-one correspondence with the OSD number.
  • the assigned OSD numbers can be synchronized between the computing nodes so that duplicate OSD numbers do not appear in the storage system.
  • the number of identifiers of the secondary OSDs may be two or more, that is, in the storage system, the primary OSDs are correspondingly provided with two or More than two backups.
  • the MDC component 403 is provided with a network port in the operating system 402, and the MDC component 403 communicates with the external network through its own network port, for example, the port of the MDC component 403 is 10003.
  • each OSD component has a unified network port
  • each VBS component has a unified network port
  • the network port of the OSD component is different from the network port of the VBS component
  • each OSD component is The network port of the VBS component is recorded.
  • FIG. 7 is a schematic structural diagram of an apparatus of a switching device according to an embodiment of the present invention.
  • the switching device 300 includes a processor 301, a memory 302, a bus 303, and physical ports 1-4.
  • the processor 301, the memory 302, and the physical ports 1-4 are respectively connected to the bus 303.
  • the physical ports 1-4 can be connected to the computing node or the storage node through a cable, and receive network packets from the computing node or the storage node.
  • the transceiver and bus are parsed and processed by the processor.
  • cables include, but are not limited to, twisted pairs or fibers.
  • the processor 301 can select a physical port according to the destination address of the network packet, and send the network packet from the selected physical port to the computing node or the storage node connected to the physical port.
  • the number of ports shown in FIG. 7 is only an example.
  • the number of physical ports is not limited in the embodiment of the present invention. In other examples, the number of physical ports may be 8, 16, 32, 64 or other numbers. , specifically set as needed.
  • multiple switching devices can also be connected in a cascade manner, thereby implementing port expansion.
  • the physical port 1 shown in FIG. 7 is connected to the computing node 100 shown in FIG. 1 through a cable, and the physical port 2 passes through the cable and the first storage node 200 shown in FIG. 1.
  • the physical port 3 is connected to the second storage node 200' shown in FIG. 1 by a cable
  • the physical port 4 is connected to the metadata control node 400 shown in FIG. 1 by a cable.
  • the ports of the switching device are connected to physical NICs of the nodes through different cables.
  • the port forwarding rule table is pre-recorded in the memory 302 of the switching device 300, as follows:
  • 192.168.1.11 is the IP address of the computing node 100
  • 192.168.1.12 is the IP address of the first storage node 200
  • 192.168.1.13 is the IP address of the second storage node 200'
  • 192.168.1.14 is the IP of the metadata control node. address.
  • the processor 301 analyzes the destination network address of the received network packet, and queries the destination address in the port forwarding rule table. Corresponding physical port, and select the physical port to forward the network packet to the computing node or storage node connected to the physical port.
  • the network address is taken as an IP address as an example, in an optional embodiment.
  • the network address may be a MAC (Media Access Control) address, and the port forwarding rule table may also record the correspondence between the physical port and the MAC address.
  • MAC Media Access Control
  • a switch or a router may be disposed between the physical port and the storage node, the computing node, or the metadata control node, and the switching device 300 may select a physical port according to the destination address of the received network packet, and The network packet is sent to the switch or router, and the network packet is forwarded by the switch or router to the storage node, computing node or metadata control node where the destination address is located.
  • the switching device 300 can receive the global view table and the partition map sent by the metadata control node 400 through the physical port 3 and store it to the memory 302.
  • the global view table and the partition map are written to the memory 302 when the switching device 300 is produced, and the global view table and the partition map are solidified in the memory 302.
  • the functions of the metadata control node 400 described above may also be implemented by the switching device 300.
  • FIG. 8 is a flowchart of a first embodiment of an information processing method according to the present invention, showing a flow of making an IO operation in a storage system, as shown in FIG. 8 , an information processing method according to an embodiment of the present invention.
  • an information processing method according to an embodiment of the present invention include:
  • Step 501 The computing node 100 generates a first network packet, and sends a first network packet to the switching device 300, where the first network packet carries a resource identifier, a first offset, and an input and output IO command, and the first network
  • the destination address of the address can be empty, and the source address is the network address of the compute node.
  • the resource identification is the volume number of the virtual disk on the compute node, and in other examples, the resource identification is a file system identification and a file identification.
  • the first network message further carries a message type identifier, and the message type identifier is used to notify the switching device 300 to process the first network message.
  • the resource identifier is a file identifier and a file system identifier of the virtual disk on the computing node, and the computing node 100 is in the structure shown in FIG. 2, and the application 1051 issues a command: "The readme.txt file in the current directory. The 256KB space starts to write the data to be written.
  • the file system of the virtual machine operating system 1052 generates a write IO command according to the above command, acquires the data to be written, and queries its own file allocation table to obtain the file identifier of the readme.txt file.
  • the virtual machine operating system 1052 sends the file identification, file system identification, first offset, and IO commands to the virtual machine monitor 1022, where the IO commands include write IO commands and data to be written.
  • the virtual machine monitor 1022 sends the file identification, file system identification, first offset, and IO commands to the VBS component 104, which generates the file identification, file system identification, first offset, and IO.
  • the first network packet is commanded, and the VBS component 104 controls the physical network card 1011 to send the first network packet to the physical port 1 of the switching device 300 through the interface provided by the physical network card driver 1021.
  • Step 502 The switching device 300 receives the first network packet, generates a second offset according to the first offset and the size of the object storage device OSD, and acquires the to-be-executed IO according to the resource identifier and the first offset.
  • switching device 300 modulates the size of the OSD by a first offset and the result is taken as a second offset.
  • FIG. 9 is a mapping diagram of a volume of a virtual disk and an OSD according to an embodiment of the present invention.
  • volume 1 corresponds to OSD1, OSD5, OSD7, and OSD4, and volume.
  • OSD2 corresponds to OSD2, OSD6, OSD3, and OSD8, when the application 1051 of the computing node 100 is to be virtual
  • the IO operation is actually performed on the second offset B of the OSD1, assuming that the size of the volume 1 is 4M and the A is 3.5M.
  • the size of each OSD is 1M.
  • B 0.5M, which is for OSD1.
  • the data starting at the 0.5M is subjected to an IO operation.
  • the switching device 300 performs a rounding operation on the size of the OSD to obtain a rounding result, obtains a key corresponding to the resource identifier and the rounding result, and searches for a first storage node corresponding to the key by the lookup table.
  • the OSD corresponding to the first offset A (3.5M) and the third offset C (3.9M) are both OSD1, and if A is directly used as an input parameter and the volume number of the virtual volume 1 is performed,
  • the result of hashing the two rounding result 3 respectively with the volume number of volume 1 is consistent, and it can be ensured that both A and C correspond to OSD1.
  • the first offset may be a block number
  • the virtual volume includes a plurality of blocks having the same data length, each block having a number in turn, each block setting the same data length
  • the block number (first offset) can be used to locate a specific block in the virtual volume.
  • the OSD can be partitioned by the same data length, and the block number (second offset) can also be used to locate a specific block in the OSD.
  • the switching device 300 performs a second offset according to the first offset and the size of the object storage device OSD when determining that the first network message is a key KV message according to the message type identifier. And obtaining, according to the resource identifier and the first offset, the first network address of the first storage node where the first OSD to be IO operation is located and the identifier of the first OSD, the switching device 300 determines that the first network report When the packet does not carry the packet type identifier, the first network packet is determined to be a non-key KV packet, and the physical port is directly selected according to the destination address of the first network packet to send the first network packet. Therefore, whether the first network message is a key value KV message is determined by the message type identifier, so that the switching device 300 can simultaneously process the key value KV message and the non-key value KV message, thereby increasing system compatibility.
  • the comparison table includes a global view table and a partition map, where the global view table includes a correspondence between the key and the OSD number, wherein the OSD number is used to identify the OSD in the storage system, and the partition map includes the OSD number and the network of the storage node.
  • the correspondence between the address and the identifier of the OSD is a global view table and a partition map.
  • the switching device 300 may search the global view table to determine the OSD number corresponding to the acquired key, and find the partition map to determine the first network address of the first storage node where the first OSD corresponding to the OSD number is located, and the first OSD. logo.
  • the switching device 300 is further configured to receive a global view table and a partition map sent by the metadata control node 400.
  • Step 503 The switching device 300 generates a second network packet and sends the second network packet to the first storage node 200, where the second network packet carries the second offset, the IO command, and the identifier of the first OSD.
  • the destination address of the second network packet is a first network address, and the source address may be a network address of the computing node 100.
  • the switching device 30 selects the port 2 according to the first network address and sends the second network through the port 2. Message.
  • Step 504 The first storage node 200 receives the second network packet sent by the switching device 300, and performs an IO operation according to the IO command in the storage address pointed by the second offset in the first OSD.
  • the first storage node 200 is specifically configured to write the write address to be written to the storage address pointed by the second offset in the first OSD according to the write IO command. data.
  • the first storage node 200 is specifically configured to read the read length according to the storage address pointed by the second offset in the first OSD according to the read IO command. The data.
  • the physical network card 2011 of the first storage node 200 receives the second network packet from the physical port 2, and the OSD component 203 obtains the second network packet from the physical network card 2011 through the interface provided by the physical network card driver 1021.
  • Step 505 The first storage node 200 sends a first response message to the switching device 300, where the first response message carries an IO operation result and a message type identifier, and the destination address of the first response message is a computing node. website address.
  • the OSD component 203 generates a first response message carrying the IO operation result and the message type identifier, and the physical network card 2011 transmits the first response message to the physical port 2 through the interface provided by the physical network card driver 1021.
  • Step 506 The switching device 300 determines that the first response message is a key value KV message according to the message type identifier, generates a second response message, and sends the second response message to the computing node 100, where the second response message carries the IO operation result.
  • the destination address of the third response packet is the network address of the computing node 100, and the source address is the network address of the first storage node 200.
  • the switching device 300 selects the physical port 1 according to the network address 192.168.1.11 of the computing node 100, and sends the second response message to the computing node 100 through the physical port 1.
  • Step 507 The computing node 100 receives the second response packet, and obtains the IO operation result carried in the second response packet.
  • the physical network card 1011 of the computing node 100 receives the second response message from the physical port 1, and the VBS component 104 receives the second response message from the physical network card 1011 according to the interface provided by the physical network card driver 1021 and the second response message.
  • the analysis is performed, the IO operation result is obtained, and the IO operation result is sent to the application 1051.
  • the computing node since the switching device performs the OSD search locally, the computing node does not need to calculate the key, and does not need to establish a network connection with the metadata control node every time the IO operation is performed, thereby reducing the calculation amount of the computing node. And reduce the network load.
  • the computing node may write the data into multiple storage nodes in the form of multiple copies.
  • the embodiment of the present invention uses the computing node to 2 copy the form to write to two storage nodes as an example to illustrate.
  • FIG. 10 is a flowchart of a second embodiment of an information processing method according to the present invention.
  • the information processing method includes:
  • Step 601 The computing node 100 generates a first network packet, where the first network packet carries a resource identifier, a first offset, a write IO command, a to-be-written data, and a multi-copy operation code.
  • the computing node 100 sends the first network packet to the switching device 300.
  • the implementation of the first network packet by the computing node 100 is similar to that of the step 501, and is not described in detail in the embodiment of the present invention.
  • Step 602 The switching device 300 receives the first network packet, generates a second offset according to the first offset and the size of the object storage device OSD, and acquires an IO operation to be performed according to the resource identifier and the first offset. a first network address of the first storage node where the first OSD is located and an identifier of the first OSD and a second OSD to be written The second network address of the second storage node and the identifier of the second OSD.
  • the switching device 300 looks up the lookup table, and according to the multi-copy operation code, determines the two storage nodes that need to write the data to be written and the identifier of the OSD on which the data is to be written on each storage node.
  • Step 603 The switching device 300 generates and sends a second network packet and a third network packet.
  • the switching device 300 generates a corresponding network packet for each storage node.
  • the second network packet carries the second offset, the write IO command, the to-be-written data, and the identifier of the first OSD, and the destination address of the second network packet is The first network address is described.
  • the first OSD is an OSD of the first storage node 200 to be written data.
  • the third network packet carries the second offset, the write IO command, the to-be-written data, and the identifier of the second OSD, and the destination address of the third network packet is The second network address; wherein the second OSD is an OSD of the second storage node 200' to which data is to be written.
  • the switching device 300 can set the second network packet carrying the message identifier 1 and set the third network packet to carry the message identifier 2, when the subsequently received KV message carries the message identifier. And confirm that the KV message is a response message of the second network message or the third network message.
  • Step 604 The first storage node 200 receives the second network packet, writes the data to be written in the storage address pointed by the second offset in the first OSD according to the write IO command, and generates a first response message.
  • the first response message carries a first write result and a message type identifier, and the destination address of the first response message is a network address of the computing node.
  • the first storage node 200 sets the first response message to carry the message identifier 1.
  • Step 605 The second storage node 200 ′ receives the third network packet, writes the data to be written in the storage address pointed by the second offset in the second OSD according to the write IO command, and generates a second response packet.
  • the second response message carries a second write result and the message type identifier, and the destination address of the second response message is a network address of the computing node.
  • the second storage node 200' sets the second response message carrying the message identifier 2.
  • Step 606 The switching device 300 receives the first response message and the second response message, and determines that the first response message and the second response message are key value KV messages according to the message type identifier, and generates a third response message. And sending a third response message, where the third response message carries the first write result and the second write result, and the destination address of the third response message is the computing node Network address.
  • the switching device 300 confirms that the first response message is the response of the first network message. After determining that the second response message is a KV message, the switching device 300 confirms that the second response message is a response message of the second network message, when the second response message carries the message identifier 2
  • the switching device 300 can summarize the writing result carried by the first network packet and the second network packet to the third response packet and send the result to the computing node.
  • the correspondence between the network packet and the response packet can be confirmed by the message identifier.
  • the packet identifier can ensure that the writing result corresponds to the same calculation. node.
  • Step 607 The computing node 100 receives the third response packet, and obtains the first writing result and the second writing result carried by the third response packet.
  • the embodiment of the present invention is a multi-copy writing process.
  • the switching device 300 sends the data to be written to at least two storage nodes for storage, and receives the at least Two deposits
  • the type of the response packet is determined, and multiple response packets returned from the at least two storage nodes are combined and encapsulated to generate a response packet that is returned to the computing node 100.
  • the embodiment of the present invention further expands the function of the switching device 300, so that the switching device 300 can convert the write command from the computing node into a write command for multiple storage nodes according to the multi-copy operation code, and will be from multiple storage nodes.
  • the response is aggregated into a single response and returned to the computing node, thereby significantly reducing the burden on the computing node. Especially in the scenario where the IO traffic is large, the advantages of the embodiment of the present invention are more obvious.
  • FIG. 11 is a flowchart of a third embodiment of an information processing method according to the present invention. It is noted that steps 701 to 703 of the information processing method shown in FIG. 11 and step 601 in FIG. Step 603 is identical, the difference is step 704 to step 709, so the description of step 701 to step 703 is omitted here: step 704: the first storage node 200 receives the second network message, according to the write IO command at the first OSD The storage address pointed to by the second offset is written to the data to be written, and a fourth response message is generated and sent to the switching device 300. The fourth response message carries a first write result, and the destination address of the fourth response message is a network address of the computing node 100.
  • the first storage node 200 sends a fourth response message to the physical port 2 of the switching device 300.
  • Step 705 The switching device 300 forwards the fourth response message to the computing node 100.
  • the switching device 300 determines that the fourth response packet does not carry the packet type identifier, determines that the fourth response packet is a non-KV packet, and selects the physical port 1 according to the destination address of the fourth response packet. Physical port 1 forwards the fourth response message to compute node 100.
  • Step 706 The computing node 100 receives the fourth response packet, and obtains the first writing result carried by the fourth response packet.
  • Step 707 The second storage node 200 ′ receives the third network packet, writes the data to be written in the storage address pointed by the second offset in the second OSD according to the write IO command, generates a fifth response message, and sends the message to the third response packet.
  • Switching device 300 The fifth response packet carries a second write result, and the destination address of the fifth response packet is a network address of the computing node 100.
  • the first storage node 200 sends a fifth response message to the physical port 3 of the switching device 300.
  • Step 708 The switching device 300 forwards the fifth response message to the computing node 100.
  • the switching device 300 determines that the fifth response packet does not carry the packet type identifier, and determines that the fifth response packet is a non-KV packet, and selects the physical port 1 according to the destination address of the fifth response packet. Physical port 1 forwards the fifth response message to compute node 100.
  • Step 709 The computing node 100 receives the fifth response packet, and obtains the second writing result carried in the fifth response packet.
  • the embodiment of the present invention is another process of writing multiple copies.
  • the switching device 300 sends the data to be written to at least two storage nodes for storage, and receives the data. After the response message returned by the at least two storage nodes, the type of the response message is determined, and the response message is directly forwarded to the computing node 100.
  • the storage node does not need to set the packet type identifier in the response packet, and does not need to modify the storage node, so that the system is compatible.
  • the packet may include a TCP packet and a UDP packet, where the packet type identifier may be set in an IP packet header of the TCP packet or the UDP packet. ) fields and Padding fields. Because the Options field and the Padding field are generally idle in the TCP or UDP packets, the field of the IP header carries the packet type identifier. When the switching device determines the packet type, The IP header needs to be analyzed, and the IP data field does not need to be disassembled, which can speed up the packet processing.
  • the network address in the embodiment of the present invention may be a MAC address or an IP address.
  • FIG. 12 is a schematic diagram of a format of a TCP packet according to an embodiment of the present invention. As shown in FIG. 12, an IP packet header is provided with an Options field and a Padding field, and the packet type identifier may be carried by the Options field and the Padding field.
  • the destination address of the second network packet sent by the switching device is the first storage node, so the destination port of the second network packet may be the network port of the OSD component of the first storage node, for example, 10002 .
  • the OSD component can learn that the second network packet is for itself by using the destination port of the second network packet, so that the second network packet can be parsed. The same is true for the third network message, which will not be described here.
  • the destination address of the response packet sent by the storage node or the switch is a computing node. Therefore, the destination port of the response packet may be the network port of the VBS component of the computing node, for example, 10001.
  • the VBS component can know that the response packet is for itself by responding to the destination port of the packet, so the response packet can be parsed.
  • FIG. 13 is a block diagram of another apparatus of a switching device in accordance with the present invention.
  • the switching device includes: a receiving module 801, configured to receive a first network packet sent by the computing node, where the first network packet carries a resource identifier, a first offset, and an input and output IO command; and the processing module 802 is configured to: Generating a second offset according to the first offset and the size of the object storage device OSD, and acquiring a first network address of the first storage node where the first OSD to be IO operation is located according to the resource identifier and the first offset And the identifier of the first OSD, and the second network packet is generated, where the second network packet carries the second offset, the IO command, and the identifier of the first OSD, and the destination address of the second network packet is a network address, the sending module 803, configured to send the second network packet to the first storage node.
  • the IO command includes a write IO command and a data to be written, where the first network packet further carries a multi-copy operation code, and the processing module 802 is specifically configured to obtain the first network address according to the resource identifier and the first offset, The identifier of the first OSD, the second network address of the second storage node where the second OSD to be written is located, and the identifier of the second OSD;
  • the processing module 802 is further configured to generate a third network packet according to the multi-copy operation code, where the third network packet carries a second offset, a write IO command, a to-be-written data, and an identifier of the second OSD, and the The destination address of the three network packets is the second network address;
  • the sending module 803 is further configured to send the third network packet to the second storage node.
  • the receiving module 801 is further configured to receive a first response message sent by the first storage node and a second response message sent by the second storage node, where the first response message carries the first write result. And a packet type identifier, the destination address of the first response packet is a network address of the computing node, the second response packet carries a second writing result and a packet type identifier, and the destination address of the second response packet is
  • the processing module 802 is further configured to: determine, according to the message type identifier, that the first response message and the second response message are key value KV messages, generate a third response message, and send a third response message. And the third response message carries a first write result and a second write result, and the destination address of the third response message is a network address of the computing node.
  • the processing module 802 is specifically configured to perform a modulo operation on the size of the OSD by using the first offset, and the obtained result is used as the second offset.
  • the resource identifier is a volume number of the virtual disk on the computing node
  • the processing module 802 is configured to acquire, according to the volume number and the first offset, the first storage node where the first OSD to be IO operated is located. A network address and an identification of the first OSD.
  • the resource identifier is a file system identifier and a file identifier
  • the processing module 802 is configured to obtain, according to the file system identifier, the file identifier, and the first offset, the first storage node where the first OSD to be IO-operated is located.
  • the first network packet further carries a packet type identifier
  • the processing module 802 is further configured to determine, according to the packet type identifier, that the first network packet is a key value KV packet.
  • the processing module 802 is further configured to perform a rounding operation on the size of the first offset to the OSD, obtain a rounding result, obtain a key corresponding to the resource identifier and the rounding result, and search the comparison table to determine a key corresponding to the key.
  • a first network address of the storage node and an identifier of the first OSD wherein the comparison table includes a correspondence between the key, the network address of the storage node, and the identifier of the OSD.
  • the comparison table includes a global view table and a partition map, where the global view table includes a correspondence between the key and the OSD number, wherein the OSD number is used to identify the OSD in the storage system, and the partition map includes the OSD number and the storage node.
  • the processing module 802 is specifically configured to search the global view table to determine the OSD number corresponding to the obtained key, and search the partition map to determine the first storage node where the first OSD corresponding to the OSD number is located. The first network address and the identity of the first OSD.
  • the receiving module 801 is further configured to receive a global view table and a partition map sent by the metadata control node.
  • the present application provides a switch that includes a processor 301, a memory 302, and a plurality of physical ports 1-4 that perform the functions of the switching device 300 described in the above aspects.
  • the processor 301, the memory 302, and the physical ports 1-4 are respectively connected to the bus 303.
  • the physical ports 1-4 can be connected to the computing node or the storage node through a cable, and receive network packets from the computing node or the storage node.
  • the transceiver and bus are parsed and processed by the processor.
  • the first physical port 1 is configured to receive a first network packet sent by the computing node, where the first network packet carries a resource identifier, a first offset, and an input/output IO command; and the processor 301 runs the program.
  • the command is executed by: generating a second offset according to the first offset and the size of the object storage device OSD, and acquiring, by the resource identifier and the first offset, the first storage node where the first OSD to be IO operation is located a first network address and an identifier of the first OSD, and generating a second network packet, where the second network packet carries a second offset, an IO command, and an identifier of the first OSD, and the second network packet is The destination address is the first network address, and the second physical port 2 is configured to send the second network packet to the first storage node.
  • the processor 301 is also configured to support the switching device to perform corresponding functions in the above methods, such as generating or processing data and/or information involved in the above methods.
  • the key search or calculation destination address and the final network IO encapsulation are not completed by the computing node, but are completed by the switch, which reduces the load of the computing node, and the switch joins the FPGA (Field-Programmable Gate Array). , field programmable gate array) / dedicated CPU / ASIC (Application Specific Integrated Circuits) / NP (Network Processor) and other processing engines, processing data in a fixed format, efficiency And performance will be much larger than a general purpose processor.
  • FPGA Field-Programmable Gate Array
  • any of the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as the cells may or may not be Physical units can be located in one place or distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, and specifically, one or more communication buses or signal lines can be realized.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, it can also include dedicated integrated circuits through dedicated hardware. It is realized by a CPU, a dedicated memory, a dedicated component, or the like.
  • functions performed by computer programs can be easily implemented with the corresponding hardware, and the specific hardware structure used to implement the same function can be various, such as analog circuits, digital circuits, or dedicated circuits. Circuits, etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of the present invention which is essential or contributes to the prior art, can be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a readable storage medium such as a floppy disk of a computer.
  • U disk mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc., including a number of commands to make a computer device (may be A personal computer, server, or network device, etc.) performs the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

一种数据处理方法、存储系统和交换设备(300),存储系统包括相互连接的计算节点(100)和交换设备(300),计算节点(100)向交换设备(300)发送第一网络报文,该第一网络报文携带有资源标识、第一偏移量和输入输出IO命令(501),交换设备(300)根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点(200)的第一网络地址以及第一OSD的标识(502),并产生第二网络报文并将第二网络报文发送至第一存储节点(200)(503)。可减轻计算节点(100)的计算量,提高计算节点(100)运行的效率。

Description

数据处理方法、存储系统和交换设备 技术领域
本申请涉及信息技术领域,尤其涉及一种数据处理方法、存储系统和交换设备。
背景技术
随着分布式存储的日益流行,各类新的存储方式出现在存储市场上,其中对象存储技术成为新的热点。
在对象存储系统中,计算节点针对虚拟磁盘进行每一次IO(Input/Output,输入输出)操作,均需要计算出IO操作对应OSD(Object Storage Device,对象存储设备)的位置之后,计算节点才能向对应OSD发起IO操作。
由于计算节点本身还需要收发报文,并对报文进行封装以及解析,因此计算IO操作对应OSD的过程消耗了计算节点的计算能力,特别是在IO流量较大的场景下,降低了计算节点运行的效率。
发明内容
本申请公开了一种数据处理方法、存储系统和交换设备,以减轻计算节点的计算量,提高计算节点运行的效率。
第一方面,本申请提供一种存储系统,该存储系统包括相互连接的计算节点和交换设备,其中,计算节点向交换设备发送第一网络报文,该第一网络报文携带有资源标识、第一偏移量和输入输出IO命令。交换设备收到计算节点发送的第一网络报文后,确定IO操作对应的OSD,具体的,该交换设备根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识,并产生第二网络报文并将第二网络报文发送至第一存储节点,其中,第二网络报文携带有第二偏移量、IO命令以及第一OSD的标识,且第二网络报文的目的地址为第一网络地址。所述第二偏移量为待进行IO操作的第一OSD的地址偏移量。
在上述系统中,计算节点无需计算IO操作对应的OSD,而是由交换设备根据来自计算节点的第一网络报文携带的信息进行OSD查找,因此,减轻了计算节点的计算量。
在本申请一种实现方式中,存储系统还包括多个存储节点,其中,第一存储节点,用于接收交换设备发送的第二网络报文,根据第一OSD的标识选择第一OSD,并根据第二网络报文携带的IO命令在第一OSD中第二偏移量指向的存储地址进行IO操作。
第一存储节点从交换设备接收第二网络报文,由于第二网络报文携带了待进行IO操作的第一OSD的标识以及第二偏移量,第一存储节点根据第二网络报文携带的信息即可进行IO操作。因此本申请在交换设备本地实现OSD的查找,亦不会影响第一存储节点的IO操作。相对比于现有技术,本申请的交换设备无需要求存储节点进行改动,可通用于各种存储节点。
所述IO操作可以为写操作或者读操作。当所述IO操作为写操作时,所述IO命令包括写IO命令和待写入数据;当所述IO操作为读操作时,所述IO命令包括读IO命令和读取长度。
在本申请另一种实现方式中,第一存储节点,具体用于根据写IO命令在第一OSD中第二偏移量指向的存储地址写入待写入数据写IO命令。
由于交换设备在本地实现IO操作对应的OSD的查找,计算节点只需在第一网络报文中设置好写IO命令和待写入数据并发送第一网络报文至交换设备,交换设备在根据第一网络报文查找到OSD时,直接将OSD的标识、写IO命令及待写入数据发送至OSD所在的第一存储节点,因此,第一存储节点无需进行任何改动就可以根据上述信息执行写入操作,因此本申请的交换设备无需要求存储节点进行改动,可通用于各种存储节点。
在本申请另一种实现方式中,来自计算节点的待写入数据需要进行多副本存储,即,待写入数据需要在至少两个存储节点存储以达到增强数据安全性的目的。此时,第一网络报文还携带有多副本操作码,交换设备具体用于根据资源标识和第一偏移量获取第一网络地址、第一OSD的标识、待写入的第二OSD所在的第二存储节点的第二网络地址和第二OSD的标识,交换设备除产生第二网络报文并发送第二网络报文至第一存储节点外,还根据多副本操作码产生第三网络报文并发送第三网络报文至第二存储节点,其中第三网络报文携带有第二偏移量、写IO命令、待写入数据以及第二OSD的标识,且第三网络报文的目的地址为第二网络地址,第二存储节点接收第三网络报文,并根据写IO命令在第二OSD中第二偏移量指向的存储地址写入待写入数据。
计算节点在对设置了多副本备份的虚拟磁盘进行写入时,可通过多副本操作码通知交换设备需要对虚拟磁盘映射的主OSD和副OSD同时写入,此时交换设备会产生多个网络报文并分别发送至主OSD所在的存储节点和副OSD所在的存储节点,使得主OSD和副OSD会同时写入待写入数据,从而保证数据一致性,因此,本申请进一步扩展了交换设备的功能,使得交换设备可以根据多副本操作码将来自计算节点的写入命令转化为针对多个存储节点的写入命令。
在本申请另一种实现方式中,第一存储节点进一步用于向交换设备发送第一响应报文,第二存储节点进一步用于向交换设备发送第二响应报文,其中,第一响应报文携带有第一写入结果和报文类型标识符,第一响应报文的目的地址为计算节点的网络地址,第二响应报文携带有第二写入结果和报文类型标识符,第二响应报文的目的地址为计算节点的网络地址,交换设备进一步用于接收第一响应报文和第二响应报文,根据报文类型标识符确定第一响应报文和第二响应报文为键值KV报文,产生第三响应报文并发送第三响应报文至计算节点,其中第三响应报文携带有第一写入结果和第二写入结果,且第三响应报文的目的地址为计算节点的网络地址,计算节点还用于接收第三响应报文,并获取第三响应报文携带的第一写入结果和第二写入结果。
交换设备在收到所述至少两个存储节点返回的响应报文后,确定响应报文的类型,将来自至少两个存储节点返回的多个响应报文进行组合封装,生成返回给计算节点的一个响应报文,从而将来自多个存储节点的响应聚合成单个响应并返回给计算节点,从而显著降低了计算节点的负担,特别是在IO流量较大的场景下,本发明实施例优势更为明显。
交换设备在接收报文携带有报文类型标识符时,确认该报文为KV报文,对该报文进行解析,根据解析到的信息实现OSD查找,在报文没有携带有报文类型标识符时,确认该报文为非KV报文,直接根据该报文的目的地址转发该报文,因此本申请的交换设备还具备转发非KV报文的功能,能兼容于各种网络系统。
在本申请另一种实现方式中,第一存储节点具体用于根据读IO命令在第一OSD中第二偏移量指向的存储地址读取读取长度的数据,其中,第一网络报文和第二网络报文携带的IO命令具体包括读IO命令和读取长度。
由于交换设备在本地实现IO操作对应的OSD的查找,计算节点只需在第一网络报文中设置读IO命令和读取长度并发送第一网络报文至交换设备,交换设备在根据第一网络报文查找到OSD时,直接将OSD的标识、读IO命令及读取长度发送至OSD所在的第一存储节点,因此,第一存储节点无需进行任何改动就可以根据上述数据执行读取操作,因此本申请的交换设备通用于各种存储节点。
在本申请另一种实现方式中,交换设备,具体用于将第一偏移量对OSD的大小作取模运算,所得结果作为第二偏移量。
在资源标识具体为计算节点上的虚拟磁盘的中虚拟卷的卷号时,本申请适用于虚拟磁盘为块存储系统的情况。在资源标识具体为计算节点上的虚拟磁盘的虚拟文件的文件系统标识和文件标识时,本申请适用于虚拟磁盘为文件存储系统的情况。
由于第一偏移量是在虚拟卷或虚拟文件中的偏移量,而虚拟卷或虚拟文件实际上是包括多个与OSD的大小相等的虚拟空间,每一个虚拟空间与一个OSD具有映射关系,第一偏移量会出现大于OSD的情况,因此第一偏移量并不适用于OSD,故要将第一偏移量对OSD的大小取模,所得第二偏移量的大小小于或等于OSD的大小。
在本申请另一种实现方式中,交换设备,具体用于根据计算节点上的虚拟磁盘的卷号以及第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识,其中,资源标识具体为计算节点上的虚拟磁盘的卷号。
在本申请另一种实现方式中,资源标识为文件系统标识和文件标识,交换设备,具体用于根据文件系统标识、文件标识以及第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
在本申请另一种实现方式中,第一网络报文还携带有报文类型标识符,交换设备,还用于根据报文类型标识符确定第一网络报文为键值KV报文。
交换设备在接收到报文时,在报文携带有报文类型标识符时,确认该报文为KV报文,对该报文进行解析,根据解析到的信息实现OSD查找,在报文没有携带有报文类型标识符时,确认该报文为非KV报文,直接根据该报文的目的地址转发该报文,因此本申请的交换设备还具备转发非KV报文的功能,能兼容于各种网络系统。
在本申请另一种实现方式中,第一网络报文为TCP报文,报文类型标识符设置在TCP报文的IP头中的Options(可选)字段和Padding(填充)字段。
由于Options(可选)字段和Padding(填充)字段一般空闲,故通过IP头的该字段携带报文类型标识符,交换设备在判断报文类型时,只需对IP头进行分析,无需对IP数据字段进行拆解,可加快报文处理速度。
更具体的,本申请提供了交换设备获取IO操作对应OSD的具体实现方式。
在本申请另一种实现方式中,交换设备具体用于将第一偏移量对OSD的大小作取整运算,得到取整结果,获取资源标识和取整结果对应的key,查找对照表确定key对应的第一存储节点的第一网络地址以及第一OSD的标识,其中,对照表包括key、存储节点的网络地址、以及OSD的标识的对应关系。
由于第一偏移量是在虚拟卷或虚拟文件中的偏移量,而虚拟卷或虚拟文件实际上是 包括多个与OSD的大小相等的虚拟空间,每一个虚拟空间与一个OSD具有映射关系,因此虚拟空间对应于一定数量的OSD,因此只需保证一个虚拟空间对应一个key,而通过取整运算可以实现一个虚拟空间对应一个key。
在本申请另一种实现方式中,交换设备具体用于将第一偏移量对OSD的大小作取整运算,得到取整结果,根据将资源标识和取整结果作为一致性哈希算法的输入参数运行一致性哈希算法,从而获取对应的key。
因此,本申请可通过查表或哈希运算从资源标识和取整结果获取到key。
在本申请另一种实现方式中,对照表包括全局视图表和分区地图表,全局视图表包括所key和OSD编号的对应关系,其中,OSD编号用于在存储系统中唯一标识一个OSD,分区地图表包括OSD编号与存储节点的网络地址和OSD的标识的对应关系,交换设备,具体用于查找全局视图表确定获取的key对应的OSD编号,并查找分区地图表确定OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
在本申请另一种实现方式中,存储系统还包括与交换设备连接的元数据控制节点,元数据控制节点记录有存储系统的元数据,其中包括上述全局视图表和分区地图表。交换设备,还用于接收元数据控制节点发送的全局视图表和分区地图表。
由于交换设备通过接收元数据控制节点发送的全局视图表和分区地图表,无需重新产生全局视图表和分区地图表,提高了系统兼容性。
第二方面,本申请提供一种数据处理方法,包括:
计算节点向交换设备发送第一网络报文,其中,第一网络报文携带有资源标识、第一偏移量和输入输出IO命令,交换设备根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识,交换设备产生第二网络报文并将第二网络报文发送至第一存储节点,其中,第二网络报文携带有第二偏移量、IO命令以及第一OSD的标识,且第二网络报文的目的地址为第一网络地址。
第二方面或第二方面任意一种实现方式是第一方面或第一方面任意一种实现方式对应的方法实现,第一方面或第一方面任意一种实现方式中的描述适用于第二方面或第二方面任意一种实现方式,在此不再赘述。
第三方面,本申请提供一种数据处理方法,包括:交换设备接收计算节点发送的第一网络报文,其中,第一网络报文携带有资源标识、第一偏移量和输入输出IO命令,交换设备根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识,交换设备产生第二网络报文并将第二网络报文发送至第一存储节点,其中,第二网络报文携带有第二偏移量、IO命令以及第一OSD的标识,且第二网络报文的目的地址为第一网络地址。
上述数据处理方法中,计算节点无需计算IO操作对应的OSD,而是由交换设备根据来自计算节点的第一网络报文携带的信息进行OSD查找,因此,减轻了计算节点的计算量
第四方面,本申请提供一种交换设备,包括:接收模块,用于接收计算节点发送的第一网络报文,其中,第一网络报文携带有资源标识、第一偏移量和输入输出IO命令,处理模块,用于所根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,根据资 源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识,并产生第二网络报文,其中,第二网络报文携带有第二偏移量、IO命令以及第一OSD的标识,且第二网络报文的目的地址为第一网络地址,发送模块,用于将第二网络报文发送至第一存储节点。
第四方面或第四方面任意一种实现方式是第三方面或第三方面任意一种实现方式对应的方法实现,第三方面或第三方面任意一种实现方式中的描述适用于第四方面或第四方面任意一种实现方式,在此不再赘述。
在本申请一种实现方式中,IO命令包括写IO命令和待写入数据,第一网络报文还携带有多副本操作码,处理模块,具体用于根据资源标识和第一偏移量获取第一网络地址、第一OSD的标识、待写入的第二OSD所在的第二存储节点的第二网络地址和第二OSD的标识;处理模块,还用于根据多副本操作码产生第三网络报文,其中第三网络报文携带有第二偏移量、写IO命令、待写入数据以及第二OSD的标识,且第三网络报文的目的地址为第二网络地址;发送模块,还用于发送第三网络报文至第二存储节点。
在本申请另一种实现方式中,接收模块,还用于接收第一存储节点发送的第一响应报文和第二存储节点发送的第二响应报文,其中,第一响应报文携带有第一写入结果和报文类型标识符,第一响应报文的目的地址为计算节点的网络地址,第二响应报文携带有第二写入结果和报文类型标识符,第二响应报文的目的地址为计算节点的网络地址,处理模块,还用于根据报文类型标识符确定第一响应报文和第二响应报文为键值KV报文,产生第三响应报文并发送第三响应报文至计算节点,其中第三响应报文携带有第一写入结果和第二写入结果,且第三响应报文的目的地址为计算节点的网络地址。
在本申请另一种实现方式中,处理模块,具体用于将第一偏移量对OSD的大小作取模运算,所得结果作为第二偏移量。
在本申请另一种实现方式中,资源标识为计算节点上的虚拟磁盘的卷号,处理模块,具体用于根据卷号以及第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
在本申请另一种实现方式中,资源标识为文件系统标识和文件标识,处理模块,具体用于根据文件系统标识、文件标识以及第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
在本申请另一种实现方式中,第一网络报文还携带有报文类型标识符,处理模块,还用于根据报文类型标识符确定第一网络报文为键值KV报文。
在本申请另一种实现方式中,处理模块,还用于将第一偏移量对OSD的大小作取整运算,得到取整结果,获取资源标识和取整结果对应的key,查找对照表确定key对应的第一存储节点的第一网络地址以及第一OSD的标识,其中,对照表包括key、存储节点的网络地址、以及OSD的标识的对应关系。
在本申请另一种实现方式中,对照表包括全局视图表和分区地图表,全局视图表包括所key和OSD编号的对应关系,其中,OSD编号用于在存储系统中标识OSD,分区地图表包括OSD编号与存储节点的网络地址和OSD的标识的对应关系,处理模块,具体用于查找全局视图表确定获取的key对应的OSD编号,并查找分区地图表确定OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
在本申请另一种实现方式中,接收模块,还用于接收元数据控制节点发送的全局视 图表和分区地图表。
第五方面,本申请提供了一种交换设备,该交换设备具有实现上述方法实际中交换设备的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多于一个与上述功能相对应的模块。可选的,该交换设备可以是一种网络侧设备,如交换机和实现交换机功能的物理服务器。
第六方面,本申请提供了一种交换机,该交换机包括处理器、存储器以及多个物理端口,该交换机执行上述方面所述的交换设备的功能。所述处理器被配置为支持交换设备执行上述方法中相应的功能,例如生成或处理上述方法中所涉及的数据和/或信息。
第七方面,本申请提供了一种计算机存储介质,用于储存为上述交换设备所用的计算机软件指令,其包含用于执行上述方面各种实施方式所设计的程序。
附图说明
图1是根据本发明实施例的存储系统的系统结构示意图;
图2是根据本发明实施例的计算节点的装置结构示意图;
图3是根据本发明实施例的计算节点的另一装置结构示意图;
图4是根据本发明实施例的存储节点的装置结构示意图;
图5是根据本发明实施例的另一存储节点的装置结构示意图;
图6是本发明实施例提供的元数据控制节点的装置结构示意图;
图7是根据本发明实施例的交换设备的装置结构示意图;
图8是根据本发明的信息处理方法第一实施例的流程图;
图9是根据本发明实施例的虚拟磁盘的卷与OSD的映射关系图;
图10是根据本发明的信息处理方法第二实施例的流程图;
图11是根据本发明的信息处理方法第三实施例的流程图;
图12是根据本发明实施例的TCP报文的格式示意图;
图13是根据本发明的交换设备的另一装置结构示意图。
具体实施方式
下面将结合附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。
首先请参见图1,图1是根据本发明实施例的存储系统的系统结构示意图,如图1所示,根据本发明实施例的存储系统包括交换设备300、计算节点集群、存储节点集群以及元数据控制节点400,计算节点集群包括计算节点100、100’、……等多个计算节点;存储节点集群包括第一存储节点200、第二存储节点200’、……等多个存储节点。在本发明实施例的存储系统中,任一计算节点可通过交换设备300与元数据节点400以及任一存储节点建立点对点连接。
值得注意的是,计算节点、存储节点以及元数据控制节点的具体数量可根据实际需要设置,一般而言,由于本发明实施例涉及分布式存储技术,因此下述实施例中,以2个存储节点为例进行介绍。在具体的产品实现中,上述交换设备300可以为至少一个交换机或路由器。
在本发明实施例中,交换设备300在接收到计算节点发送的网络报文后,根据该网 络报文中携带的资源标识和偏移量获取待写入或待读取的OSD所在的存储节点的网络地址以及该OSD在该存储节点上的OSD的标识,产生并发送携带有IO命令的网络报文至该存储节点,使得该存储节点可根据IO命令对自身的OSD进行读或写的操作。
在本发明实施例中,交换设备300执行键值运算的功能,实现存储节点的网络地址和OSD的查找,计算节点无需在每次IO操作时与元数据控制节点建立连接以在元数据控制节点上进行网络地址及OSD查找,故可减轻计算节点的计算量,并可降低网络负载。
为了清楚说明,请一并参见图2至5,图2至5分别对计算节点100、第一存储节点200、第二存储节点200’、元数据控制节点400以及交换设备300的具体结构进行介绍。
首先请参见图2,图2是根据本发明实施例的计算节点的装置结构示意图,如图2所示,计算节点100包括硬件101、操作系统102、VBS(Virtual Block System,虚拟块系统)组件104以及虚拟机105。
硬件101包括物理网卡1011,操作系统102包括虚拟机监视器1022和物理网卡驱动1021,虚拟机105包括应用程序1051、虚拟机操作系统1052,虚拟机操作系统1052设置有虚拟磁盘10531。
物理网卡1011工作在网络链路层的网络组件,是网络中连接计算节点和传输介质的接口,物理网卡1011为计算节点100设置网络地址,网络地址可为MAC地址或IP地址,计算节点通过网络地址可在网络上被识别。物理网卡1011可设置一个或多个网络地址,在本实施例中,物理网卡1011设置一个网络地址即可,但值得注意的是,在较为复杂的网络环境中,可设置多个网络地址。
示例性地,物理网卡1011可将计算节点100的IP地址设置为192.168.1.11。
物理网卡驱动1021设置在操作系统102中,为物理网卡1011提供接口,VBS组件104可通过物理网卡驱动1021提供的接口控制物理网卡1011接收或发送网络报文。
VBS组件104安装并运行于操作系统102上,VBS组件104为计算节点100提供分布式存储接入服务。
VBS组件104在操作系统102中设置有网络端口,VBS组件104通过自身网络端口与外部网络通信,VBS组件104的网络端口例如为10001,存储系统中每一计算节点的VBS组件的网络端口相同。
虚拟机监视器1022设置在操作系统102中,虚拟机监视器1022为虚拟机操作系统1053虚拟一套独立于实际硬件101的虚拟硬件。
应用程序1051运行在虚拟机操作系统1052上,虚拟机操作系统1052设置有虚拟磁盘10531,应用程序1051可对虚拟磁盘10531进行读写。
在一些示例中,虚拟磁盘10531设置有多个虚拟卷,每一虚拟机卷分配有卷号。应用程序1051在对虚拟卷进行读写时,产生包括卷号和第一偏移量的读写命令,其中第一偏移量指向在该卷号对应的虚拟卷的读写位置,虚拟机操作系统1052将该读写命令发送至虚拟机监视器1052,VBS组件104从虚拟机监视器1052获取该读写命令。
在另一些示例中,虚拟磁盘10531设置有多个虚拟文件,每一虚拟文件设置有文件系统标识和文件标识,其中,文件系统标识表示文件所在的文件系统的标识,用以标识如FAT(File Allocation Table,文件配置表)、NTFS(New Technology File System,新技术文件系统)等文件格式。应用程序1051在对虚拟文件进行读写时,产生包括文件系统标识、文件标识和第一偏移量的读写指令,其中第一偏移量指向该文件系统标识和该文件 标识对应的虚拟文件的读写位置,虚拟机操作系统1052将该读写命令发送至虚拟机监视器1052,VBS组件104从虚拟机监视器1052获取该读写命令。
值得注意的是,图1所示的计算节点100’的结构与计算节点100相同,于此不作赘述。
可选地,请进一步参见图3,图3是根据本发明实施例的计算节点的另一装置结构示意图,如图3所示,在另一些示例中,计算节点100可以不采用虚拟化结构,应用程序1051直接运行在操作系统102上,VBS组件104为所在节点提供分布式存储接入服务,举例而言,操作系统102可将VBS组件104提供的分布式存储空间映射为虚拟磁盘10521,应用程序1051在访问虚拟磁盘10521时,实际上是经由VBS组件104访问分布式存储空间。
在图3所示的结构中,虚拟磁盘10531可设置有多个虚拟卷,每一虚拟机卷分配有卷号。应用程序1051在对虚拟卷进行读写时,产生包括卷号和第一偏移量的读写命令,其中第一偏移量指向在该卷号对应的虚拟卷的读写位置,操作系统102将该读写命令发送至VBS组件104。
在另一些示例中,虚拟磁盘10531设置有多个虚拟文件,每一虚拟文件设置有文件系统标识和文件标识,其中,文件系统标识表示文件所在的文件系统的标识,文件系统举例而言可为FAT(File Allocation Table,文件配置表)、NTFS(New Technology File System,新技术文件系统)等,文件标识表示该文件的在文件系统的标识。应用程序1051在对虚拟文件进行读写时,产生包括文件系统标识、文件标识和第一偏移量的读写指令,其中第一偏移量指向该文件系统标识和该文件标识对应的虚拟文件的读写位置,操作系统将该读写命令发送至VBS组件104。
以下请参见图4,图4是根据本发明实施例的存储节点的装置结构示意图,如图4所示,第一存储节点200包括硬件201、操作系统202以及OSD组件203。
硬件201包括物理网卡2011、磁盘控制器2012、OSD1、OSD2、OSD3、OSD4,操作系统202包括磁盘驱动2021、物理网卡驱动2022。
OSD组件203安装并运行于操作系统202上,在本实施例中,每一OSD设置有一标识,OSD组件203通过OSD的标识管理不同的OSD。
值得注意的是,OSD具体设置在物理磁盘上,在一些示例中,一个物理磁盘设置一个OSD,在另一些示例中,一个物理磁盘设置多个OSD,并且,每一OSD的大小相同,举例而言可为10MB(MByte,兆)。
在本实施例中,在每一存储节点中,每一OSD分配有一OSD的标识,该OSD的标识可以用来标识存储节点中的OSD,举例而言,OSD1的标识为0000,OSD2的标识为0001,OSD3的标识为0002,OSD4的标识为0003,且第一存储节点200中的OSD的标识由OSD组件203记录。
进一步,在本发明实施例的存储系统中,每一OSD还分配有一OSD编号,该OSD编号可以用来标识存储系统中的OSD,OSD编号记录于元数据控制节点400中。
示例地,第一存储节点200中OSD1的OSD编号为0x 00000000。
物理网卡2011为第一存储节点200设置网络地址,OSD组件203可通过物理网卡驱动2022提供的接口控制物理网卡2011发送或接收网络报文。
示例性地,物理网卡2011为第一存储节点200设置的IP地址为192.168.1.12。
磁盘驱动2021设置在操作系统202中,在操作系统202中提供接口,OSD组件203通过该接口控制磁盘控制器2012。
磁盘控制器2012可例如为磁盘驱动器适配器,接收并解析OSD组件203通过磁盘驱动2021在操作系统202提供的接口发出的SCSI(Small Computer System Interface,小型计算机系统接口)指令,,向OSD所在的物理磁盘进行读写操作。
OSD组件203在操作系统202中设置有网络端口,VBS组件104通过自身网络端口与外部网络通信,OSD组件203的网络端口例如为10002。
以下请参见图5,图5是根据本发明实施例的另一存储节点的装置结构示意图,值得注意的是,第二存储节点200’的结构与第一存储节点200大致相同,区别在于第二存储节点200’的物理网卡为第二存储节点200’提供与第一存储节点200不同的网络地址。示例性地,第二存储节点200’的IP地址为192.168.1.13。并且,类似于第一存储节点200,OSD组件203’还记录第二存储节点200’上的OSD的标识,在本实施例中,OSD1’的OSD的标识可为0000,OSD2’的OSD的标识可为0001,OSD3’的OSD的标识可为0002,OSD4’的OSD的标识可为0003。并且,OSD组件203’在第二存储节点200’的操作系统202’上的网络端口可例如为10002。
请参见图6,图6是本发明实施例提供的元数据控制节点的装置结构示意图,元数据控制节点400包括硬件401、操作系统402以及MDC(Meta Data Controller,元数据控制器)组件403,硬件401包括存储器4011和物理网卡4012,操作系统402包括存储器驱动4021和物理网卡驱动4022。
类似地,物理网卡驱动4022在操作系统402中提供接口,运行在操作系统402上的MDC组件403通过该接口控制物理网卡4012接收或发送网络报文。
示例地,物理网卡4012将元数据控制节点400的IP地址设置为192.168.1.14。
进一步,存储器驱动4021在操作系统402提供接口,MDC组件403通过该接口向存储器4011写入数据,或从存储器4011读取数据。
在本发明实施例中,MDC组件403可记录key(键)与OSD之间的对应关系。具体而言,MDC组件403接收存储系统中各存储节点上报的状态信息,根据状态信息及自身记录的key(键)整理成对照表并存储在存储器4011,其中状态信息包括存储节点中OSD组件记录的OSD的标识以及所在存储节点的IP地址,对照表举例如下:
Figure PCTCN2017080655-appb-000001
对照表
在对照表中,副OSD是主OSD的备份,在一些场景下,当向主OSD写入数据时,需要同时向副OSD写入该数据,以保证数据一致性的要求。值得注意的是,上表中,只示出一个key对应一个主OSD和一个副OSD的场景,但在另一些示例中,一个key只对应一个主OSD,或一个key对应一个主OSD以及三个或三个以上的副OSD。
可选地,对照表包括全局视图表以及分区地图表,全局视图表包括存储系统中各key和对应的OSD编号的对应关系,其中,OSD编号用于在所述存储系统中标识OSD;分区地图表包括各OSD编号与对应的存储节点的网络地址和对应的OSD的标识之间的映射关系。
全局视图表举例如下:
KEY OSD编号
0x 0000000000000000 0x 00000000
0x 0000000000000001 0x 00000001
0x 0000000000000002 0x 00000002
0x 0000000000000003 0x 00000003
全局视图表
示例性地,在全局视图表中,key和OSD编号均以16进制表示,key的数据读取长度为8字节,OSD编号为4字节。
在一些示例中,计算节点的操作系统在加载虚拟卷时,VBS组件根据虚拟卷的卷号和虚拟卷中的每一偏移量进行哈希运算,从而获取到key,虚拟卷的卷号和虚拟卷中的偏移量与key之间具有一一对应关系。进一步,VBS组件可先将虚拟卷中的偏移量对OSD的大小进行取整运算,将取整结果与虚拟卷的卷号进行哈希运算而获取key。
在另一些示例中,计算节点的操作系统在加载虚拟文件时,VBS组件根据虚拟文件的文件系统标识和文件标识与虚拟文件中的每一偏移量的进行哈希运算,从而获取到key,文件系统标识、文件标识、以及虚拟文件中的偏移量与key之间具有一一对应关系。进一步,VBS组件可先将虚拟文件中的偏移量对OSD的大小进行取整运算,将取整结果与虚拟文件的文件系统标识和文件标识进行哈希运算而获取key。
值得注意的是,通过将偏移量与OSD的大小进行取整运算,使得虚拟卷或虚拟文件内以OSD的大小为单位进行划分,且在一个虚拟卷或虚拟文件中,与OSD的大小对应的空间均对应于同一个OSD。
进一步,在全局视图表中,OSD编号为在存储系统的各OSD的编号,其可由计算节点生成,举例而言,计算节点可在计算出第一个key时,对应分配一OSD编号至该key,例如为0,在计算出第二个key时,对应分配另一OSD编号至该key,例如为1,如此类推……,其中,key与OSD编号具有一一对应关系。
并且,计算节点之间可同步已分配的OSD编号,使得存储系统内不会出现重复的OSD编号。
分区地图表举例如下:
Figure PCTCN2017080655-appb-000002
分区地图表
为了便于说明,在本实施例中,仅示出一组副OSD,在一些示例中,副OSD的标识的数量可以是两个或以上,即在存储系统中,主OSD对应设置有两个或两个以上的备份。
类似地,MDC组件403在操作系统402中设置有网络端口,MDC组件403通过自身网络端口与外部网络通信,MDC组件403的端口例如为10003。
值得注意的是,在存储系统中,每一OSD组件具有统一的网络端口,每一VBS组件具有统一的网络端口,且OSD组件的网络端口与VBS组件的网络端口不相同,每一OSD组件均记录有VBS组件的网络端口。
请参见图7,图7是根据本发明实施例的交换设备的装置结构示意图,如图7所示,交换设备300包括处理器301、存储器302、总线303以及物理端口1-4。
处理器301、存储器302以及物理端口1-4分别与总线303连接,物理端口1-4可通过线缆与计算节点或存储节点连接,从计算节点或存储节点接收网络报文,网络报文经收发器和总线由处理器进行解析和处理。
举例而言,线缆包括但不限于双绞线或光纤等。
进一步,处理器301可根据网络报文的目的地址选择物理端口,将网络报文从所选择的物理端口发送至与该物理端口连接的计算节点或存储节点。
值得注意的是,图7所示端口数量仅为示例说明,本发明实施例对物理端口的数量不作限定,在另一些示例中,物理端口的数量可以是8、16、32、64或其他数量,具体根据需要设置。
并且,多个交换设备之间也可以通过级联的方式连接,从而实现端口的扩展。
为了便于说明,在本发明实施例中假设图7所示的物理端口1通过线缆与图1所示的计算节点100连接,物理端口2通过线缆与图1所示的第一存储节点200连接,物理端口3通过线缆与图1所示第二存储节点200’的连接,物理端口4通过线缆与图1所示的元数据控制节点400连接。
具体地,交换设备的端口是通过不同线缆与各节点的物理网卡连接。
并且,交换设备300的存储器302中预先记录了端口转发规则表,具体如下所示:
物理端口 IP地址
1 192.168.1.11
2 192.168.1.12
3 192.168.1.13
4 192.168.1.14
端口转发规则表
其中,192.168.1.11为计算节点100的IP地址,192.168.1.12为第一存储节点200的IP地址,192.168.1.13为第二存储节点200’的IP地址,192.168.1.14为元数据控制节点的IP地址。
交换设备300的物理端口1、物理端口2、物理端口3或物理端口4接收到网络报文后,处理器301分析接收到的网络报文的目的网络地址,在端口转发规则表查询该目的地址对应的物理端口,并选择该物理端口转发该网络报文至于该物理端口连接的计算节点或存储节点。
值得注意的是,在本实施例中,以网络地址为IP地址为例进行说明,在可选实施例 中,网络地址可以是MAC(Media Access Control,媒体访问控制)地址,端口转发规则表也可记录物理端口与MAC地址之间的对应关系。
并且,在可选实施例中,物理端口与存储节点、计算节点或元数据控制节点之间也可设置交换机或路由器,交换设备300可根据接收到的网络报文的目的地址选择物理端口,将网络报文发送至交换机或路由器,网络报文经交换机或路由器转发至目的地址所在的存储节点、计算节点或元数据控制节点。
在一些实施例中,交换设备300可通过物理端口3接收元数据控制节点400发送的全局视图表和分区地图表,并存储到到存储器302。
在另外一些实施例中,全局视图表和分区地图表在生产交换设备300时被写入至存储器302,全局视图表和分区地图表被固化于存储器302。
值得注意的是,在另外一些实施例中,上述元数据控制节点400的功能也可以由交换设备300实现。
以下请参见图8,图8是根据本发明的信息处理方法第一实施例的流程图,示出存储系统中作出一次IO操作的流程,如图8所示,本发明实施例的信息处理方法包括:
步骤501:计算节点100产生第一网络报文,向交换设备300发送第一网络报文,其中,第一网络报文携带有资源标识、第一偏移量和输入输出IO命令,第一网络地址的目的地址可为空,源地址为计算节点的网络地址。
在一些示例中,资源标识为计算节点上的虚拟磁盘的卷号,在另一些示例中,资源标识为文件系统标识和文件标识。
在一些示例中,第一网络报文还携带有报文类型标识符,该报文类型标识符用于通知交换设备300对第一网络报文进行处理。
举例而言,资源标识为计算节点上的虚拟磁盘的文件标识和文件系统标识,计算节点100为如图2所示的结构,应用程序1051发出命令:“在当前目录下的readme.txt文件的第256KB空间开始写入待写入数据”,虚拟机操作系统1052的文件系统根据以上命令产生写IO命令,获取待写入数据,并查询自身的文件分配表,获知readme.txt文件的文件标识,以及readme.txt文件所在的文件系统(例如为FAT32)的文件系统标识,且第一偏移量为readme.txt文件中第256KB开始的空间。虚拟机操作系统1052将文件标识、文件系统标识、第一偏移量、以及IO命令发送至虚拟机监视器1022,其中IO命令包括写IO命令以及待写入数据。
进一步的,虚拟机监视器1022将将文件标识、文件系统标识、第一偏移量以及IO命令发送至VBS组件104,VBS组件104产生文件标识、文件系统标识、第一偏移量、以及IO命令的第一网络报文,VBS组件104通过物理网卡驱动1021提供的接口控制物理网卡1011将第一网络报文发送至交换设备300的物理端口1。
步骤502:交换设备300接收第一网络报文,根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点200的第一网络地址以及第一OSD(OSD1)的标识。
在一些示例中,交换设备300将第一偏移量对OSD的大小作取模运算,所得结果作为第二偏移量。为便于说明,请结合图9进行参考,图9是根据本发明实施例的虚拟磁盘的卷与OSD的映射关系图,在图9中,卷1对应于OSD1、OSD5、OSD7、和OSD4,卷2对应于OSD2、OSD6、OSD3、和OSD8,当计算节点100的应用程序1051要对虚 拟磁盘的10521中的虚拟卷1中的第一偏移量A进行IO操作时,实际是对OSD1的第二偏移量B进行IO操作,假设卷1的大小为4M,A为3.5M,每一OSD的大小为1M,在要对虚拟卷1中第3.5M开始的数据进行IO操作时,只需将3.5M对1M进行取模运算,所得结果为B=0.5M,即对OSD1的第0.5M开始的数据进行IO操作。
在一些示例中,交换设备300将第一偏移量对OSD的大小作取整运算,得到取整结果,获取资源标识和取整结果对应的key,查找对照表确定key对应的第一存储节点的第一网络地址以及第一OSD的标识,其中,对照表包括key、存储节点的网络地址、以及OSD的标识的对应关系。正如图9所示,第一偏移量A(3.5M)和第三偏移量C(3.9M)对应的OSD均为OSD1,若直接将A作为输入参数与虚拟卷1的卷号进行哈希运算,将C作为输入参数与卷1的卷号进行哈希运算,则会得到两个不同的结果,即A和C对应于两个OSD。因此,在本发明实施例中,将A对OSD的大小作取整运算,得到取整结果3.5/1=3,将C对OSD的大小作取整运算,得到取整结果3.9/1=3,分别将两次取整结果3分别与卷1的卷号进行哈希运算所得的结果是一致的,可以保证A和C均对应于OSD1。经过对虚拟卷1中的偏移量进行取整处理后,而虚拟卷1中对应于OSD1的空间d中的所有偏移量经取整后与虚拟卷1的卷号进行哈希运算所得结果均相同,即对应于同一个OSD1,由此可建立从虚拟卷到OSD的映射关系。
在另一示例中,第一偏移量可以为块号,举例而言,在虚拟卷包括多个具有相同数据长度的分块,每一个块依次具有一个编号,每一个块设置相同的数据长度,通过块号(第一偏移量)即可定位到虚拟卷中具体的块。而对应地,OSD可以相同的数据长度进行分块,通过块号(第二偏移量)同样可以定位到OSD中具体的块。
在一些示例中,交换设备300在根据报文类型标识符确定第一网络报文为键值KV报文时,才执行根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识的步骤,交换设备300在判断到第一网络报文没有携带报文类型标识符时,确定第一网络报文为非键值KV报文时,根据第一网络报文的目的地址直接选择物理端口发送第一网络报文。因此,通过报文类型标识符确定第一网络报文是否为键值KV报文,可使得交换设备300同时处理键值KV报文和非键值KV报文,进而增加系统兼容性。
进一步,对照表包括全局视图表和分区地图表,全局视图表包括所key和OSD编号的对应关系,其中,OSD编号用于在存储系统中标识OSD,分区地图表包括OSD编号与存储节点的网络地址和OSD的标识的对应关系,
举例而言,交换设备300可查找全局视图表确定获取的key对应的OSD编号,并查找分区地图表确定OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
进一步,交换设备300还用于接收元数据控制节点400发送的全局视图表和分区地图表。
步骤503:交换设备300产生第二网络报文并将第二网络报文发送至第一存储节点200,其中,第二网络报文携带有第二偏移量、IO命令以及第一OSD的标识,且第二网络报文的目的地址为第一网络地址,源地址可为计算节点100的网络地址。
举例而言,交换设备30根据第一网络地址选择端口2,并通过端口2发送第二网络 报文。
步骤504:第一存储节点200接收交换设备300发送的第二网络报文,根据IO命令在第一OSD中第二偏移量指向的存储地址进行IO操作。
在一些示例中,在IO命令包括写IO命令和待写入数据时,第一存储节点200具体用于根据写IO命令在第一OSD中第二偏移量指向的存储地址写入待写入数据。
在另一些示例中,在IO命令包括读IO命令和读取长度时,第一存储节点200具体用于根据读IO命令在第一OSD中第二偏移量指向的存储地址读取读取长度的数据。
举例而言,第一存储节点200的物理网卡2011从物理端口2接收第二网络报文,OSD组件203通过物理网卡驱动1021提供的接口从物理网卡2011获取第二网络报文。
步骤505:第一存储节点200向交换设备300发送第一响应报文,其中,第一响应报文携带有IO操作结果和报文类型标识符,第一响应报文的目的地址为计算节点的网络地址。
举例而言,OSD组件203产生携带有IO操作结果和报文类型标识符的第一响应报文,通过物理网卡驱动1021提供的接口控制物理网卡2011发送第一响应报文至物理端口2。
步骤506:交换设备300根据报文类型标识符确定第一响应报文为键值KV报文,产生第二响应报文并发送至计算节点100,其中第二响应报文携带有IO操作结果,且第三响应报文的目的地址为计算节点100的网络地址,源地址为第一存储节点200的网络地址。
举例而言,交换设备300根据计算节点100的网络地址192.168.1.11选择物理端口1,通过物理端口1发送第二响应报文至计算节点100。
步骤507:计算节点100接收第二响应报文,获取第二响应报文携带的IO操作结果。
举例而言,计算节点100的物理网卡1011从物理端口1接收第二响应报文,VBS组件104根据物理网卡驱动1021提供的接口从物理网卡1011接收第二响应报文并对第二响应报文进行解析,获取IO操作结果,并将IO操作结果发送至应用程序1051。
在本发明实施例中,由于交换设备在本地进行OSD查找,因此计算节点无需计算key,亦无需在每次IO操作时均与元数据控制节点建立网络连接,故可减轻计算节点的计算量,并降低网络负载。
在一种具体的实施场景中,为提高数据存储的安全性,计算节点可以将数据以多副本的形式写入到多个存储节点中,示例性的,本发明实施例以计算节点将数据以2副本的形式写到两个存储节点为例进行说明。以下请参见图10,图10是根据本发明的信息处理方法第二实施例的流程图,信息处理方法包括:
步骤601:计算节点100产生第一网络报文,所述第一网络报文携带有资源标识、第一偏移量、写IO命令、待写入数据和多副本操作码。所述计算节点100向交换设备300发送所述第一网络报文。
计算节点100产生和发送第一网络报文的具体实现方式与步骤501类似,本发明实施例不再赘述。
步骤602:交换设备300接收所述第一网络报文,根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识以及待写入的第二OSD 所在的第二存储节点的第二网络地址和第二OSD的标识。
与步骤502类似,交换设备300查找对照表,根据多副本操作码,确定需要写入所述待写入数据的两个存储节点以及每个存储节点上待写入数据的OSD的标识。
步骤603:交换设备300产生并发送第二网络报文和第三网络报文。
具体的,交换设备300为每个存储节点产生对应的网络报文。在第二网络报文中携带所述第二偏移量、所述写IO命令、所述待写入数据以及所述第一OSD的标识,且所述第二网络报文的目的地址为所述第一网络地址。其中,所述第一OSD为第一存储节点200中待写入数据的OSD。
所述第三网络报文携带有所述第二偏移量、所述写IO命令、所述待写入数据以及所述第二OSD的标识,且所述第三网络报文的目的地址为所述第二网络地址;其中,所述第二OSD为第二存储节点200’中待写入数据的OSD。
在本步骤中,交换设备300可设置第二网络报文携带报文识别符1,设置第三网络报文携带报文识别符2,在后续接收到的KV报文携带有报文识别符时,确认KV报文为第二网络报文或第三网络报文的响应报文。
步骤604:第一存储节点200接收所述第二网络报文,根据写IO命令在第一OSD中第二偏移量指向的存储地址写入待写入数据,并产生第一响应报文。其中,所述第一响应报文携带有第一写入结果和报文类型标识符,所述第一响应报文的目的地址为所述计算节点的网络地址。
在本步骤中,第一存储节点200设置第一响应报文携带有报文识别符1。
步骤605:第二存储节点200’接收所述第三网络报文,根据写IO命令在第二OSD中第二偏移量指向的存储地址写入待写入数据,并产生第二响应报文。其中,所述第二响应报文携带有第二写入结果和所述报文类型标识符,所述第二响应报文的目的地址为所述计算节点的网络地址。
在本步骤中,第二存储节点200’设置第二响应报文携带有报文识别符2。
步骤606:交换设备300接收第一响应报文和第二响应报文,根据报文类型标识符确定第一响应报文和第二响应报文为键值KV报文,产生第三响应报文并发送第三响应报文,其中所述第三响应报文携带有所述第一写入结果和所述第二写入结果,且所述第三响应报文的目的地址为所述计算节点的网络地址。
在本步骤中,在确定第一响应报文为KV报文后,在第一响应报文携带有报文识别符1时,交换设备300确认第一响应报文为第一网络报文的响应报文;在确定第二响应报文为KV报文后,在第二响应报文携带有报文识别符2时,交换设备300确认第二响应报文为第二网络报文的响应报文,交换设备300此时可将第一网络报文和第二网络报文分别携带的写入结果汇总到第三响应报文并发送至计算节点。
通过报文识别符可确认网络报文与响应报文的对应关系,在交换设备300需要同时处理不同计算节点发送的KV报文时,通过报文识别符可确保写入结果对应于同一个计算节点。
步骤607:计算节点100接收第三响应报文,获取第三响应报文携带的第一写入结果和第二写入结果。
本发明实施例为多副本写入流程,交换设备300在确定第一网络报文携带多副本操作码时,将待写入数据发送到至少两个存储节点进行存储,并且在收到所述至少两个存 储节点返回的响应报文后,确定响应报文的类型,将来自至少两个存储节点返回的多个响应报文进行组合封装,生成返回给计算节点100的一个响应报文。本发明实施例进一步扩展了交换设备300的功能,使得交换设备300可以根据多副本操作码将来自计算节点的写入命令转化为针对多个存储节点的写入命令,并将来自多个存储节点的响应聚合成单个响应并返回给计算节点,从而显著降低了计算节点的负担,特别是在IO流量较大的场景下,本发明实施例优势更为明显。
以下请参见图11,图11是根据本发明的信息处理方法第三实施例的流程图,值得注意的是,图11所示的信息处理方法的步骤701至步骤703与图10中的步骤601至步骤603完全相同,区别在于步骤704至步骤709,因此于此省略对步骤701至步骤703的说明:步骤704:第一存储节点200接收第二网络报文,根据写IO命令在第一OSD中第二偏移量指向的存储地址写入待写入数据,产生第四响应报文并发送至交换设备300。其中,第四响应报文携带有第一写入结果,第四响应报文的目的地址为计算节点100的网络地址。
举例而言,第一存储节点200将第四响应报文发送至交换设备300的物理端口2。
步骤705:交换设备300转发第四响应报文至计算节点100。
举例而言,交换设备300判断到第四响应报文没有携带报文类型标识符,确定第四响应报文为非KV报文,可根据第四响应报文的目的地址选择物理端口1,通过物理端口1将第四响应报文转发至计算节点100。
步骤706:计算节点100接收第四响应报文,获取第四响应报文携带的第一写入结果。
步骤707:第二存储节点200’接收第三网络报文,根据写IO命令在第二OSD中第二偏移量指向的存储地址写入待写入数据,产生第五响应报文并发送至交换设备300。其中,第五响应报文携带有第二写入结果,第五响应报文的目的地址为计算节点100的网络地址。
举例而言,第一存储节点200将第五响应报文发送至交换设备300的物理端口3。
步骤708:交换设备300转发第五响应报文至计算节点100。
举例而言,交换设备300判断到第五响应报文没有携带报文类型标识符,确定第五响应报文为非KV报文,可根据第五响应报文的目的地址选择物理端口1,通过物理端口1将第五响应报文转发至计算节点100。
步骤709:计算节点100接收第五响应报文,获取第五响应报文携带的第二写入结果。
本发明实施例为多副本写入的另一流程,交换设备300在确定第一网络报文携带多副本操作码时,将待写入数据发送到至少两个存储节点进行存储,并且在收到所述至少两个存储节点返回的响应报文后,确定响应报文的类型,并直接将响应报文分别转发至计算节点100。本发明实施例存储节点无需将报文类型标识符设置于响应报文中,无需对存储节点进行改动,因此有利于系统兼容,本发明实施例尤其适合IO流量较小的场景。
在本发明实施例中,所涉及的报文具体可包括TCP报文和UDP报文,其中报文类型标识符可设置在TCP报文或UDP报文的IP报文头中的Options(可选)字段和Padding(填充)字段。由于在TCP报文或UDP报文中,Options(可选)字段和Padding(填充)字段一般空闲,故通过IP头的该字段携带报文类型标识符,交换设备在判断报文类型时,只需对IP头进行分析,无需对IP数据字段进行拆解,可加快报文处理速度。
本发明实施例所述的网络地址可为MAC地址或IP地址,举例而言,请参见图12, 图12是根据本发明实施例的TCP报文的格式示意图,如图12所示,IP报文头设置有Options字段和Padding字段,具体可通过Options字段和Padding字段携带报文类型标识符。
在本发明实施例中,交换设备发出的第二网络报文的目的地址是第一存储节点,因此第二网络报文的目的端口可为第一存储节点的OSD组件的网络端口,例如为10002。OSD组件通过第二网络报文的目的端口可获知该第二网络报文是针对自身,因此可对第二网络报文进行解析。对于第三网络报文而言亦是如此,于此不作赘述。
并且,存储节点或交换机发出的响应报文的目的地址为计算节点,因此响应报文的目的端口可为计算节点的VBS组件的网络端口,例如为10001。VBS组件通过响应报文的目的端口可获知响应报文是针对自身,因此可对响应报文进行解析。
以下请参见图13,图13是根据本发明的交换设备的另一装置结构示意图。交换设备包括:接收模块801,用于接收计算节点发送的第一网络报文,其中,第一网络报文携带有资源标识、第一偏移量和输入输出IO命令;处理模块802,用于所根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识,并产生第二网络报文,其中,第二网络报文携带有第二偏移量、IO命令以及第一OSD的标识,且第二网络报文的目的地址为第一网络地址;发送模块803,用于将第二网络报文发送至第一存储节点。
可选地,IO命令包括写IO命令和待写入数据,第一网络报文还携带有多副本操作码,处理模块802具体用于根据资源标识和第一偏移量获取第一网络地址、第一OSD的标识、待写入的第二OSD所在的第二存储节点的第二网络地址和第二OSD的标识;
处理模块802,还用于根据多副本操作码产生第三网络报文,其中第三网络报文携带有第二偏移量、写IO命令、待写入数据以及第二OSD的标识,且第三网络报文的目的地址为第二网络地址;
发送模块803,还用于发送第三网络报文至第二存储节点。
可选地,接收模块801,还用于接收第一存储节点发送的第一响应报文和第二存储节点发送的第二响应报文,其中,第一响应报文携带有第一写入结果和报文类型标识符,第一响应报文的目的地址为计算节点的网络地址,第二响应报文携带有第二写入结果和报文类型标识符,第二响应报文的目的地址为计算节点的网络地址;处理模块802,还用于根据报文类型标识符确定第一响应报文和第二响应报文为键值KV报文,产生第三响应报文并发送第三响应报文至计算节点,其中第三响应报文携带有第一写入结果和第二写入结果,且第三响应报文的目的地址为计算节点的网络地址。
可选地,处理模块802,具体用于将第一偏移量对OSD的大小作取模运算,所得结果作为第二偏移量。
可选地,资源标识为计算节点上的虚拟磁盘的卷号,处理模块802,具体用于根据卷号以及第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
可选地,资源标识为文件系统标识和文件标识,处理模块802,具体用于根据文件系统标识、文件标识以及第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
可选地,第一网络报文还携带有报文类型标识符,处理模块802,还用于根据报文类型标识符确定第一网络报文为键值KV报文。
可选地,处理模块802,还用于将第一偏移量对OSD的大小作取整运算,得到取整结果,获取资源标识和取整结果对应的key,查找对照表确定key对应的第一存储节点的第一网络地址以及第一OSD的标识,其中,对照表包括key、存储节点的网络地址、以及OSD的标识的对应关系。
可选地,对照表包括全局视图表和分区地图表,全局视图表包括所key和OSD编号的对应关系,其中,OSD编号用于在存储系统中标识OSD,分区地图表包括OSD编号与存储节点的网络地址和OSD的标识的对应关系,处理模块802,具体用于查找全局视图表确定获取的key对应的OSD编号,并查找分区地图表确定OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识。
可选地,接收模块801,还用于接收元数据控制节点发送的全局视图表和分区地图表。
本申请提供了一种交换机,该交换机包括处理器301、存储器302以及多个物理端口1-4,该交换机执行上述方面所述的交换设备300的功能。
处理器301、存储器302以及物理端口1-4分别与总线303连接,物理端口1-4可通过线缆与计算节点或存储节点连接,从计算节点或存储节点接收网络报文,网络报文经收发器和总线由处理器进行解析和处理。
其中,第一物理端口1,用于接收计算节点发送的第一网络报文,其中,第一网络报文携带有资源标识、第一偏移量和输入输出IO命令;处理器301,运行程序指令以执行步骤:根据第一偏移量和对象存储设备OSD的大小产生第二偏移量,根据资源标识和第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及第一OSD的标识,并产生第二网络报文,其中,第二网络报文携带有第二偏移量、IO命令以及第一OSD的标识,且第二网络报文的目的地址为第一网络地址;第二物理端口2,用于将第二网络报文发送至第一存储节点。
所述处理器301还被配置为支持交换设备执行上述方法中相应的功能,例如生成或处理上述方法中所涉及的数据和/或信息。
在本发明实施例中,基于key查找或计算目的地址以及最终网络IO的封装不是由计算节点完成,而是由交换机完成,降低了计算节点的负载,而交换机通过加入FPGA(Field-Programmable Gate Array,现场可编程门阵列)/专用CPU/ASIC(Application Specific Integrated Circuits,专用集成电路)/NP(Network Processor,网络处理器)等不同的处理引擎,对处理报文这种固定格式的数据,效率和性能会远远大于通用处理器。
需说明的是,以上描述的任意装置实施例都仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本发明提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专 用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本发明而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘,U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等,包括若干命令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
所属领域的技术人员可以清楚地了解到,上述描述的系统、装置或单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (47)

  1. 一种存储系统,其特征在于,包括计算节点和交换设备,
    所述计算节点,用于向所述交换设备发送第一网络报文,其中,所述第一网络报文携带有资源标识、第一偏移量和输入输出IO命令;
    所述交换设备,用于根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识;
    所述交换设备,还用于产生第二网络报文并将所述第二网络报文发送至所述第一存储节点,其中,所述第二网络报文携带有所述第二偏移量、所述IO命令以及所述第一OSD的标识,且所述第二网络报文的目的地址为所述第一网络地址。
  2. 根据权利要求1所述的存储系统,其特征在于,所述存储系统还包括多个存储节点,
    所述多个存储节点中的所述第一存储节点,用于接收所述交换设备发送的所述第二网络报文,根据所述IO命令在所述第一OSD中所述第二偏移量指向的存储地址进行IO操作。
  3. 根据权利要求2所述的存储系统,其特征在于,所述IO命令包括写IO命令和待写入数据,
    所述第一存储节点,具体用于根据所述写IO命令在所述第一OSD中所述第二偏移量指向的存储地址写入所述待写入数据。
  4. 根据权利要求3所述的存储系统,其特征在于,所述第一网络报文还携带有多副本操作码,
    所述交换设备,具体用于根据所述资源标识和所述第一偏移量获取所述第一网络地址、所述第一OSD的标识、待写入的第二OSD所在的第二存储节点的第二网络地址和所述第二OSD的标识;
    所述交换设备,还用于根据所述多副本操作码产生第三网络报文并发送所述第三网络报文至所述第二存储节点,其中所述第三网络报文携带有所述第二偏移量、所述写IO命令、所述待写入数据以及所述第二OSD的标识,且所述第三网络报文的目的地址为所述第二网络地址;
    所述第二存储节点,用于接收所述第三网络报文,根据所述写IO命令在所述第二OSD中所述第二偏移量指向的存储地址写入所述待写入数据。
  5. 根据权利要求4所述的存储系统,其特征在于,
    所述第一存储节点,还用于向所述交换设备发送第一响应报文,其中,所述第一响应报文携带有第一写入结果和报文类型标识符,所述第一响应报文的目的地址为所述计算节点的网络地址;
    所述第二存储节点,还用于向所述交换设备发送第二响应报文,其中,所述第二响应报文携带有第二写入结果和所述报文类型标识符,所述第二响应报文的目的地址为所述计算节点的网络地址;
    所述交换设备,还用于接收所述第一响应报文和所述第二响应报文,根据所述报文类型标识符确定所述第一响应报文和所述第二响应报文为键值KV报文,产生第三响应 报文并发送所述第三响应报文至所述计算节点,其中所述第三响应报文携带有所述第一写入结果和所述第二写入结果,且所述第三响应报文的目的地址为所述计算节点的网络地址;
    所述计算节点,还用于获取所述第三响应报文携带的所述第一写入结果和所述第二写入结果。
  6. 根据权利要求2所述的存储系统,其特征在于,所述IO命令包括读IO命令和读取长度,所述第一存储节点,具体用于根据所述读IO命令在所述第一OSD中所述第二偏移量指向的存储地址读取所述读取长度的数据。
  7. 根据权利要求1-6任一项所述的存储系统,其特征在于,所述交换设备,具体用于将所述第一偏移量对所述OSD的大小作取模运算,所得结果作为所述第二偏移量。
  8. 根据权利要求1-7任一项所述的存储系统,其特征在于,所述资源标识为所述计算节点上的虚拟磁盘的卷号,
    所述交换设备,具体用于根据所述卷号以及所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  9. 根据权利要求1-7任一项所述的存储系统,其特征在于,所述资源标识为文件系统标识和文件标识,
    所述交换设备,具体用于根据所述文件系统标识、所述文件标识以及所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  10. 根据权利要求1-9任一项所述的存储系统,其特征在于,所述第一网络报文还携带有报文类型标识符,
    所述交换设备,还用于根据所述报文类型标识符确定所述第一网络报文为键值KV报文。
  11. 根据权利要求1-10任一项所述的存储系统,其特征在于,
    所述交换设备,具体用于将所述第一偏移量对所述OSD的大小作取整运算,得到取整结果,获取所述资源标识和所述取整结果对应的key,查找对照表确定所述key对应的所述第一存储节点的第一网络地址以及所述第一OSD的标识,其中,所述对照表包括key、存储节点的网络地址、以及OSD的标识的对应关系。
  12. 根据权利要求11所述的存储系统,其特征在于,所述对照表包括全局视图表和分区地图表,所述全局视图表包括所key和OSD编号的对应关系,其中,所述OSD编号用于在所述存储系统中标识OSD,所述分区地图表包括OSD编号与存储节点的网络地址和OSD的标识的对应关系,
    所述交换设备,具体用于查找所述全局视图表确定获取的所述key对应的OSD编号,并查找所述分区地图表确定所述OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  13. 根据权利要求12所述的存储系统,其特征在于,所述存储系统还包括与所述交换设备连接的元数据控制节点,
    所述交换设备,还用于接收所述元数据控制节点发送的所述全局视图表和所述分区地图表。
  14. 一种数据处理方法,其特征在于,包括:
    计算节点向交换设备发送第一网络报文,其中,所述第一网络报文携带有资源标识、第一偏移量和输入输出IO命令;
    所述交换设备根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识;
    所述交换设备产生第二网络报文并将所述第二网络报文发送至所述第一存储节点,其中,所述第二网络报文携带有所述第二偏移量、所述IO命令以及所述第一OSD的标识,且所述第二网络报文的目的地址为所述第一网络地址。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    所述第一存储节点接收所述交换设备发送的所述第二网络报文,根据所述IO命令在所述第一OSD中所述第二偏移量指向的存储地址进行IO操作。
  16. 根据权利要求15所述的方法,其特征在于,所述IO命令包括写IO命令和待写入数据,所述根据所述IO命令在所述第一OSD中所述第二偏移量指向的存储地址进行IO操作具体包括:
    所述第一存储节点根据所述写IO命令在所述第一OSD中所述第二偏移量指向的存储地址写入所述待写入数据。
  17. 根据权利要求16所述的方法,其特征在于,所述第一网络报文还携带有多副本操作码,所述根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备根据所述资源标识和所述第一偏移量获取所述第一网络地址、所述第一OSD的标识、待写入的第二OSD所在的第二存储节点的第二网络地址和所述第二OSD的标识;所述方法还包括:
    所述交换设备根据所述多副本操作码产生第三网络报文并发送所述第三网络报文至所述第二存储节点,其中所述第三网络报文携带有所述第二偏移量、所述写IO命令、所述待写入数据以及所述第二OSD的标识,且所述第三网络报文的目的地址为所述第二网络地址;
    所述第二存储节点接收所述第三网络报文,根据所述写IO命令在所述第二OSD中所述第二偏移量指向的存储地址写入所述待写入数据。
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:
    所述第一存储节点向所述交换设备发送第一响应报文,其中,所述第一响应报文携带有第一写入结果和报文类型标识符,所述第一响应报文的目的地址为所述计算节点的网络地址;
    所述第二存储节点向所述交换设备发送第二响应报文,其中,所述第二响应报文携带有第二写入结果和所述报文类型标识符,所述第二响应报文的目的地址为所述计算节点的网络地址;
    所述交换设备接收所述第一响应报文和所述第二响应报文,根据所述报文类型标识符确定所述第一响应报文和所述第二响应报文为键值KV报文,产生第三响应报文并发送所述第三响应报文至所述计算节点,其中所述第三响应报文携带有所述第一写入结果和所述第二写入结果,且所述第三响应报文的目的地址为所述计算节点的网络地址;
    所述计算节点获取所述第三响应报文携带的所述第一写入结果和所述第二写入结 果。
  19. 根据权利要求15所述的方法,其特征在于,所述IO命令包括读IO命令和读取长度,所述根据所述IO操作命在所述第一OSD中所述第二偏移量指向的存储地址进行IO操作具体包括:
    所述第一存储节点根据所述读IO命令在所述第一OSD中所述第二偏移量指向的存储地址读取所述读取长度的数据。
  20. 根据权利要求14-19任一项所述的方法,其特征在于,所述交换设备根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量具体包括:
    所述交换设备将所述第一偏移量对所述OSD的大小作取模运算,所得结果作为所述第二偏移量。
  21. 根据权利要求14-20任一项所述的方法,其特征在于,所述资源标识为所述计算节点上的虚拟磁盘的卷号,所述根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备根据所述卷号以及所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  22. 根据权利要求14-20任一项所述的方法,其特征在于,所述资源标识为文件系统标识和文件标识,所述根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备根据所述文件系统标识、所述文件标识以及所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  23. 根据权利要求14-22任一项所述的方法,其特征在于,所述第一网络报文还携带有报文类型标识符,所述交换设备根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识之前,所述方法还包括
    所述交换设备根据所述报文类型标识符确定所述第一网络报文为键值KV报文。
  24. 根据权利要求14-23任一项所述的方法,其特征在于,所述交换设备根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备将所述第一偏移量对所述OSD的大小作取整运算,得到取整结果,获取所述资源标识和所述取整结果对应的key,查找对照表确定所述key对应的所述第一存储节点的第一网络地址以及所述第一OSD的标识,其中,所述对照表包括key、存储节点的网络地址、以及OSD的标识的对应关系。
  25. 根据权利要求24所述的方法,其特征在于,所述对照表包括全局视图表和分区地图表,所述全局视图表包括所key和OSD编号的对应关系,其中,所述OSD编号用于在所述存储系统中标识OSD,所述分区地图表包括OSD编号与存储节点的网络地址和OSD的标识的对应关系,所述查找对照表确定所述key对应的所述第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备查找所述全局视图表确定获取的所述key对应的OSD编号,并查找所述分区地图表确定所述OSD编号对应的第一OSD所在的第一存储节点的第一网络地址 以及所述第一OSD的标识。
  26. 根据权利要求25所述的方法,其特征在于,所述交换设备查找所述全局视图表确定获取的所述key对应的OSD编号,并查找所述分区地图表确定所述OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识之前,所述方法还包括:
    所述交换设备接收所述元数据控制节点发送的所述全局视图表和所述分区地图表。
  27. 一种数据处理方法,其特征在于,包括:
    所述交换设备接收计算节点发送的第一网络报文,其中,所述第一网络报文携带有资源标识、第一偏移量和输入输出IO命令;
    所述交换设备根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识;
    所述交换设备产生第二网络报文并将所述第二网络报文发送至所述第一存储节点,其中,所述第二网络报文携带有所述第二偏移量、所述IO命令以及所述第一OSD的标识,且所述第二网络报文的目的地址为所述第一网络地址。
  28. 根据权利要求27所述的方法,其特征在于,所述IO命令包括写IO命令和待写入数据,所述第一网络报文还携带有多副本操作码,所述根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备根据所述资源标识和所述第一偏移量获取所述第一网络地址、所述第一OSD的标识、待写入的第二OSD所在的第二存储节点的第二网络地址和所述第二OSD的标识;所述方法还包括:
    所述交换设备根据所述多副本操作码产生第三网络报文并发送所述第三网络报文至所述第二存储节点,其中所述第三网络报文携带有所述第二偏移量、所述写IO命令、所述待写入数据以及所述第二OSD的标识,且所述第三网络报文的目的地址为所述第二网络地址。
  29. 根据权利要求28所述的方法,其特征在于,所述方法还包括:
    所述交换设备接收所述第一存储节点发送的第一响应报文和所述第二存储节点发送的第二响应报文,其中,所述第一响应报文携带有第一写入结果和报文类型标识符,所述第一响应报文的目的地址为所述计算节点的网络地址,所述第二响应报文携带有第二写入结果和所述报文类型标识符,所述第二响应报文的目的地址为所述计算节点的网络地址;
    所述交换设备根据所述报文类型标识符确定所述第一响应报文和所述第二响应报文为键值KV报文,产生第三响应报文并发送所述第三响应报文至所述计算节点,其中所述第三响应报文携带有所述第一写入结果和所述第二写入结果,且所述第三响应报文的目的地址为所述计算节点的网络地址。
  30. 根据权利要求27-29任一项所述的方法,其特征在于,所述交换设备根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量具体包括:
    所述交换设备将所述第一偏移量对所述OSD的大小作取模运算,所得结果作为所述第二偏移量。
  31. 根据权利要求27-30任一项所述的方法,其特征在于,所述资源标识为所述计算节点上的虚拟磁盘的卷号,所述根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备根据所述卷号以及所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  32. 根据权利要求27-30任一项所述的方法,其特征在于,所述资源标识为文件系统标识和文件标识,所述根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备根据所述文件系统标识、所述文件标识以及所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  33. 根据权利要求27-32任一项所述的方法,其特征在于,所述第一网络报文还携带有报文类型标识符,所述交换设备根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识之前,所述方法还包括:
    所述交换设备根据所述报文类型标识符确定所述第一网络报文为键值KV报文。
  34. 根据权利要求27-33任一项所述的方法,其特征在于,所述交换设备根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量,并根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备将所述第一偏移量对所述OSD的大小作取整运算,得到取整结果,获取所述资源标识和所述取整结果对应的key,查找对照表确定所述key对应的所述第一存储节点的第一网络地址以及所述第一OSD的标识,其中,所述对照表包括key、存储节点的网络地址、以及OSD的标识的对应关系。
  35. 根据权利要求34所述的方法,其特征在于,所述对照表包括全局视图表和分区地图表,所述全局视图表包括所key和OSD编号的对应关系,其中,所述OSD编号用于在所述存储系统中标识OSD,所述分区地图表包括OSD编号与存储节点的网络地址和OSD的标识的对应关系,所述查找对照表确定所述key对应的所述第一存储节点的第一网络地址以及所述第一OSD的标识具体包括:
    所述交换设备查找所述全局视图表确定获取的所述key对应的OSD编号,并查找所述分区地图表确定所述OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  36. 根据权利要求35所述的方法,其特征在于,所述交换设备查找所述全局视图表确定获取的所述key对应的OSD编号,并查找所述分区地图表确定所述OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识之前,所述方法还包括:
    所述交换设备接收所述元数据控制节点发送的所述全局视图表和所述分区地图表。
  37. 一种交换设备,其特征在于,包括:
    接收模块,用于接收计算节点发送的第一网络报文,其中,所述第一网络报文携带有资源标识、第一偏移量和输入输出IO命令;
    处理模块,用于所根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量, 根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识,并产生第二网络报文,其中,所述第二网络报文携带有所述第二偏移量、所述IO命令以及所述第一OSD的标识,且所述第二网络报文的目的地址为所述第一网络地址;
    所述发送模块,用于将所述第二网络报文发送至所述第一存储节点。
  38. 根据权利要求37所述的交换设备,其特征在于,所述IO命令包括写IO命令和待写入数据,所述第一网络报文还携带有多副本操作码,
    所述处理模块,具体用于根据所述资源标识和所述第一偏移量获取所述第一网络地址、所述第一OSD的标识、待写入的第二OSD所在的第二存储节点的第二网络地址和所述第二OSD的标识;
    所述处理模块,还用于根据所述多副本操作码产生第三网络报文,其中所述第三网络报文携带有所述第二偏移量、所述写IO命令、所述待写入数据以及所述第二OSD的标识,且所述第三网络报文的目的地址为所述第二网络地址;
    所述发送模块,还用于发送所述第三网络报文至所述第二存储节点。
  39. 根据权利要求38所述的交换设备,其特征在于,
    所述接收模块,还用于接收所述第一存储节点发送的第一响应报文和所述第二存储节点发送的第二响应报文,其中,所述第一响应报文携带有第一写入结果和报文类型标识符,所述第一响应报文的目的地址为所述计算节点的网络地址,所述第二响应报文携带有第二写入结果和所述报文类型标识符,所述第二响应报文的目的地址为所述计算节点的网络地址;
    所述处理模块,还用于根据所述报文类型标识符确定所述第一响应报文和所述第二响应报文为键值KV报文,产生第三响应报文并发送所述第三响应报文至所述计算节点,其中所述第三响应报文携带有所述第一写入结果和所述第二写入结果,且所述第三响应报文的目的地址为所述计算节点的网络地址。
  40. 根据权利要求37-39任一项所述的交换设备,其特征在于,
    所述处理模块,具体用于将所述第一偏移量对所述OSD的大小作取模运算,所得结果作为所述第二偏移量。
  41. 根据权利要求37-40任一项所述的交换设备,其特征在于,所述资源标识为所述计算节点上的虚拟磁盘的卷号,
    所述处理模块,具体用于根据所述卷号以及所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  42. 根据权利要求37-40任一项所述的交换设备,其特征在于,所述资源标识为文件系统标识和文件标识,
    所述处理模块,具体用于根据所述文件系统标识、所述文件标识以及所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  43. 根据权利要求37-42任一项所述的交换设备,其特征在于,所述第一网络报文还携带有报文类型标识符,
    所述处理模块,还用于根据所述报文类型标识符确定所述第一网络报文为键值KV报文。
  44. 根据权利要求37-43任一项所述的交换设备,其特征在于,
    所述处理模块,还用于将所述第一偏移量对所述OSD的大小作取整运算,得到取整结果,获取所述资源标识和所述取整结果对应的key,查找对照表确定所述key对应的所述第一存储节点的第一网络地址以及所述第一OSD的标识,其中,所述对照表包括key、存储节点的网络地址、以及OSD的标识的对应关系。
  45. 根据权利要求44所述的交换设备,其特征在于,所述对照表包括全局视图表和分区地图表,所述全局视图表包括所key和OSD编号的对应关系,其中,所述OSD编号用于在所述存储系统中标识OSD,所述分区地图表包括OSD编号与存储节点的网络地址和OSD的标识的对应关系,
    所述处理模块,具体用于查找所述全局视图表确定获取的所述key对应的OSD编号,并查找所述分区地图表确定所述OSD编号对应的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识。
  46. 根据权利要求45所述的交换设备,其特征在于,
    所述接收模块,还用于接收所述元数据控制节点发送的所述全局视图表和所述分区地图表。
  47. 一种交换机,其特征在于,包括处理器、存储器、总线以及多个物理端口,所述处理器、所述存储器以及所述多个物理端口分别与所述总线连接,存储器存储有程序指令,其中:
    第一物理端口,用于接收计算节点发送的第一网络报文,其中,所述第一网络报文携带有资源标识、第一偏移量和输入输出IO命令;
    所述处理器,运行所述程序指令以执行步骤:根据所述第一偏移量和对象存储设备OSD的大小产生第二偏移量,根据所述资源标识和所述第一偏移量获取待进行IO操作的第一OSD所在的第一存储节点的第一网络地址以及所述第一OSD的标识,并产生第二网络报文,其中,所述第二网络报文携带有所述第二偏移量、所述IO命令以及所述第一OSD的标识,且所述第二网络报文的目的地址为所述第一网络地址;
    第二物理端口,用于将所述第二网络报文发送至所述第一存储节点。
PCT/CN2017/080655 2017-04-14 2017-04-14 数据处理方法、存储系统和交换设备 WO2018188089A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
JP2019526353A JP6724252B2 (ja) 2017-04-14 2017-04-14 データ処理方法、記憶システムおよび切り換え装置
EP17905455.6A EP3474146B1 (en) 2017-04-14 2017-04-14 Data processing method, storage system and exchange device
CN202210544122.1A CN114880256A (zh) 2017-04-14 2017-04-14 数据处理方法、存储系统和交换设备
CN201780089594.XA CN110546620B (zh) 2017-04-14 2017-04-14 数据处理方法、存储系统和交换设备
PCT/CN2017/080655 WO2018188089A1 (zh) 2017-04-14 2017-04-14 数据处理方法、存储系统和交换设备
US16/360,906 US10728335B2 (en) 2017-04-14 2019-03-21 Data processing method, storage system, and switching device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/080655 WO2018188089A1 (zh) 2017-04-14 2017-04-14 数据处理方法、存储系统和交换设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/360,906 Continuation US10728335B2 (en) 2017-04-14 2019-03-21 Data processing method, storage system, and switching device

Publications (1)

Publication Number Publication Date
WO2018188089A1 true WO2018188089A1 (zh) 2018-10-18

Family

ID=63793026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/080655 WO2018188089A1 (zh) 2017-04-14 2017-04-14 数据处理方法、存储系统和交换设备

Country Status (5)

Country Link
US (1) US10728335B2 (zh)
EP (1) EP3474146B1 (zh)
JP (1) JP6724252B2 (zh)
CN (2) CN110546620B (zh)
WO (1) WO2018188089A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111276110A (zh) * 2018-12-04 2020-06-12 杭州海康威视数字技术股份有限公司 一种字符显示方法、装置及电子设备

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021120141A1 (en) * 2019-12-20 2021-06-24 Intel Corporation Asset caching in cloud rendering computing architectures
CN111142801B (zh) * 2019-12-26 2021-05-04 星辰天合(北京)数据科技有限公司 分布式存储系统网络亚健康检测方法及装置
CN113472688B (zh) * 2020-03-30 2023-10-20 瑞昱半导体股份有限公司 应用在网络装置中的电路及网络装置的操作方法
CN111857577B (zh) * 2020-06-29 2022-04-26 烽火通信科技股份有限公司 一种分布式存储系统下物理硬盘的管理方法与装置
CN114356232B (zh) * 2021-12-30 2024-04-09 西北工业大学 数据读写方法和装置
CN115426322B (zh) * 2022-08-23 2023-09-19 绿盟科技集团股份有限公司 一种虚拟存储的方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130054889A1 (en) * 2011-08-26 2013-02-28 Vmware, Inc. Computer system accessing object storage system
CN104536702A (zh) * 2014-12-31 2015-04-22 华为技术有限公司 一种存储阵列系统及数据写请求处理方法
WO2016041128A1 (zh) * 2014-09-15 2016-03-24 华为技术有限公司 数据写请求处理方法和存储阵列
CN105516263A (zh) * 2015-11-28 2016-04-20 华为技术有限公司 存储系统中数据分发方法、装置、计算节点及存储系统

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8558795B2 (en) * 2004-03-12 2013-10-15 Riip, Inc. Switchless KVM network with wireless technology
US7382776B1 (en) * 2003-04-15 2008-06-03 Brocade Communication Systems, Inc. Performing block storage virtualization at a switch
JP2004334481A (ja) 2003-05-07 2004-11-25 Fujitsu Ltd 仮想化情報管理装置
US7228320B2 (en) * 2004-11-17 2007-06-05 Hitachi, Ltd. System and method for creating an object-level snapshot in a storage system
US7747836B2 (en) * 2005-03-08 2010-06-29 Netapp, Inc. Integrated storage virtualization and switch system
US7609649B1 (en) * 2005-04-26 2009-10-27 Cisco Technology, Inc. Methods and apparatus for improving network based virtualization performance
US7366808B2 (en) * 2005-11-23 2008-04-29 Hitachi, Ltd. System, method and apparatus for multiple-protocol-accessible OSD storage subsystem
TWI307026B (en) 2005-12-30 2009-03-01 Ind Tech Res Inst System and method for storage management
US7979652B1 (en) * 2007-12-20 2011-07-12 Amazon Technologies, Inc. System and method for M-synchronous replication
US8578126B1 (en) * 2009-10-29 2013-11-05 Netapp, Inc. Mapping of logical start addresses to physical start addresses in a system having misalignment between logical and physical data blocks
EP2603847B1 (en) 2010-08-11 2018-06-27 Hitachi, Ltd. Storage apparatus and control method thereof
US8473643B2 (en) * 2011-05-05 2013-06-25 Hitachi, Ltd. Method and apparatus of tier storage management awareness networking
CN105786414A (zh) * 2016-03-24 2016-07-20 天津书生云科技有限公司 存储系统、存储系统的访问方法和存储系统的访问装置
KR20140038799A (ko) * 2012-09-21 2014-03-31 엘지전자 주식회사 영상표시장치, 서버 및 그 동작방법
US9756128B2 (en) * 2013-04-17 2017-09-05 Apeiron Data Systems Switched direct attached shared storage architecture
CN105900518B (zh) * 2013-08-27 2019-08-20 华为技术有限公司 用于移动网络功能虚拟化的系统及方法
CN103986602B (zh) * 2014-05-16 2017-12-12 华为技术有限公司 一种启动操作系统的方法、相关设备和系统
WO2016054818A1 (zh) * 2014-10-11 2016-04-14 华为技术有限公司 数据处理方法和装置
BR112016030547B1 (pt) * 2014-11-06 2022-11-16 Huawei Cloud Computing Technologies Co., Ltd Sistema e método de replicação e de armazenamento distribuído
US9836229B2 (en) * 2014-11-18 2017-12-05 Netapp, Inc. N-way merge technique for updating volume metadata in a storage I/O stack
CN105704098B (zh) * 2014-11-26 2019-03-01 杭州华为数字技术有限公司 一种虚拟化网络的数据传输方法,节点控制器及系统
EP3217294B1 (en) * 2014-11-28 2018-11-28 Huawei Technologies Co. Ltd. File access method and apparatus and storage device
AU2014415350B2 (en) * 2014-12-27 2019-02-21 Huawei Technologies Co., Ltd. Data processing method, apparatus and system
CN105138281B (zh) * 2015-08-05 2018-12-07 华为技术有限公司 一种物理磁盘的共享方法及装置
WO2017046864A1 (ja) 2015-09-15 2017-03-23 株式会社日立製作所 ストレージシステム、計算機システム、およびストレージシステムの制御方法
CN105391771B (zh) * 2015-10-16 2018-11-02 北京云启志新科技股份有限公司 一种面向多租户的云网络系统
CN105657081B (zh) * 2016-04-07 2019-01-18 华为技术有限公司 提供dhcp服务的方法、装置及系统
CN106406758B (zh) * 2016-09-05 2019-06-18 华为技术有限公司 一种基于分布式存储系统的数据处理方法及存储设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130054889A1 (en) * 2011-08-26 2013-02-28 Vmware, Inc. Computer system accessing object storage system
WO2016041128A1 (zh) * 2014-09-15 2016-03-24 华为技术有限公司 数据写请求处理方法和存储阵列
CN104536702A (zh) * 2014-12-31 2015-04-22 华为技术有限公司 一种存储阵列系统及数据写请求处理方法
CN105516263A (zh) * 2015-11-28 2016-04-20 华为技术有限公司 存储系统中数据分发方法、装置、计算节点及存储系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3474146A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111276110A (zh) * 2018-12-04 2020-06-12 杭州海康威视数字技术股份有限公司 一种字符显示方法、装置及电子设备
CN111276110B (zh) * 2018-12-04 2021-10-19 杭州海康威视数字技术股份有限公司 一种字符显示方法、装置及电子设备

Also Published As

Publication number Publication date
CN110546620A (zh) 2019-12-06
CN114880256A (zh) 2022-08-09
US20190222648A1 (en) 2019-07-18
EP3474146A4 (en) 2019-07-24
JP6724252B2 (ja) 2020-07-15
US10728335B2 (en) 2020-07-28
EP3474146B1 (en) 2022-02-23
CN110546620B (zh) 2022-05-17
JP2019531563A (ja) 2019-10-31
EP3474146A1 (en) 2019-04-24

Similar Documents

Publication Publication Date Title
WO2018188089A1 (zh) 数据处理方法、存储系统和交换设备
CN107508795B (zh) 跨容器集群的访问处理装置及方法
US9246709B2 (en) Storage device in which forwarding-function-equipped memory nodes are mutually connected and data processing method
US20240039995A1 (en) Data access system and method, device, and network adapter
WO2016054818A1 (zh) 数据处理方法和装置
WO2016095149A1 (zh) 一种数据压缩存储方法、装置,及分布式文件系统
CN110798541B (zh) 接口共享、报文转发方法、装置、电子设备及存储介质
WO2021223454A1 (zh) 投屏方法和系统
CN109088957B (zh) Nat规则管理的方法、装置和设备
CN111327651A (zh) 资源下载方法、装置、边缘节点及存储介质
CN113810349B (zh) 数据传输方法、装置、计算机设备和存储介质
WO2015006970A1 (zh) 交换设备、控制器、交换设备配置、报文处理方法及系统
US20240205292A1 (en) Data processing method and apparatus, computer device, and computer-readable storage medium
CN111629084B (zh) 数据传输方法和装置、存储介质及电子设备
CN109445988B (zh) 异构容灾方法、装置、系统、服务器和容灾平台
IL272881B2 (en) Method and device for wireless communication
WO2017107485A1 (zh) 一种云桌面多节点连接的方法和装置
CN116436891A (zh) 一种内网穿透系统
US11526516B2 (en) Method, apparatus, device and storage medium for generating and processing a distributed graph database
WO2021087865A1 (zh) 寻址方法、寻址系统以及寻址装置
CN115129779A (zh) 数据库的同步方法、装置及可读介质
US20230004304A1 (en) Storage System, Request Processing Method, and Switch
CN117492933B (zh) 数据交换方法、装置、电子设备及存储介质
WO2023143061A1 (zh) 一种数据访问方法及其数据访问系统
CN112764666B (zh) 用于存储管理的方法、设备和计算机程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17905455

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019526353

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017905455

Country of ref document: EP

Effective date: 20190118

NENP Non-entry into the national phase

Ref country code: DE