WO2021077745A1 - Appareil de lecture et d'écriture de données d'un système de stockage distribué - Google Patents

Appareil de lecture et d'écriture de données d'un système de stockage distribué Download PDF

Info

Publication number
WO2021077745A1
WO2021077745A1 PCT/CN2020/092831 CN2020092831W WO2021077745A1 WO 2021077745 A1 WO2021077745 A1 WO 2021077745A1 CN 2020092831 W CN2020092831 W CN 2020092831W WO 2021077745 A1 WO2021077745 A1 WO 2021077745A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
osd
request
placement group
identification information
Prior art date
Application number
PCT/CN2020/092831
Other languages
English (en)
Chinese (zh)
Inventor
王曙光
孟祥瑞
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Publication of WO2021077745A1 publication Critical patent/WO2021077745A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • This application relates to the field of computer technology, and in particular to a data reading and writing method, device, storage server, and readable storage medium of a distributed storage system.
  • the industry has higher and higher requirements for the performance and reliability of distributed storage.
  • the client when the client reads and writes an object, it needs to first calculate the placement group based on the object name, and then use it to calculate the members of the placement group, but the calculation of the placement group membership needs to go through multiple cycles and Recursively call hash calculation, which greatly wastes system CPU resources and increases the delay of front-end IO.
  • the purpose of this application is to provide a data reading and writing method, device, storage server, and readable storage medium for a distributed storage system to solve the problem that traditional data reading and writing solutions occupy more system CPU resources during the data reading and writing process. Causes the problem of increased front-end IO delay.
  • the specific plan is as follows:
  • this application provides a method for reading and writing data in a distributed storage system, which is applied to a Ceph client, including:
  • the determining the target OSD corresponding to the target placement group in the pre-created cache file and sending the IO request to the target OSD includes:
  • the calculating the target OSD corresponding to the target placement group according to the crush algorithm includes:
  • the determining the target placement group according to the identification information of the target object includes:
  • a modulo operation is performed on the hash value of the identification information of the target object to obtain the identification information of the target placement group.
  • the method further includes:
  • Obtain the file to be written divide the file to be written into multiple target objects according to a preset object size, and determine the identification information of each target object; generate an IO request carrying the identification information of the target object.
  • the determining the target OSD corresponding to the target placement group in the pre-created cache file includes:
  • the method further includes:
  • this application provides a data reading and writing device of a distributed storage system, which is applied to a Ceph client, and includes:
  • Request acquisition module used to acquire an IO request, where the IO request includes the identification information of the target object;
  • Placement group determination module used to determine the target placement group according to the identification information of the target object
  • Request sending module used to determine the target OSD corresponding to the target placement group in the pre-created cache file, and send the IO request to the target OSD to realize data reading and writing.
  • this application provides a storage server of a distributed storage system, including:
  • Memory used to store computer programs
  • Processor used to execute the computer program to implement the steps of a method for reading and writing data in a distributed storage system as described above.
  • the present application provides a readable storage medium on which a computer program is stored.
  • the computer program When the computer program is executed by a processor, it is used to implement the data of a distributed storage system as described above. The steps of the read and write method.
  • the data reading and writing method, device, storage server, and readable storage medium of a distributed storage system include: obtaining an IO request, the IO request including the identification information of the target object; according to the identification information of the target object , Determine the target placement group; determine the target OSD corresponding to the target placement group in the pre-created cache file, and send the IO request to the target OSD to achieve data reading and writing.
  • the solution caches the corresponding relationship between the placement group and the OSD in the cache file. Therefore, there is no need to determine the target OSD corresponding to the target placement group through tedious calculations during the data read and write process, thereby reducing the system CPU resources in the IO path. Waste, and can greatly reduce the IO delay.
  • FIG. 1 is an implementation flowchart of Embodiment 1 of a data reading and writing method for a distributed storage system provided by this application;
  • Embodiment 2 is an implementation flowchart of Embodiment 2 of a method for reading and writing data in a distributed storage system provided by this application;
  • FIG. 3 is a functional block diagram of an embodiment of a data reading and writing device of a distributed storage system provided by this application;
  • Figure 4 is a schematic structural diagram of an embodiment of a storage server of a distributed storage system provided by this application.
  • the core of this application is to provide a method, device, storage server and readable storage medium for reading and writing data of a distributed storage system, which reduces the waste of system CPU resources in the process of data reading and writing, and can greatly reduce IO latency .
  • the first embodiment includes:
  • S103 Determine a target OSD corresponding to the target placement group in a pre-created cache file, and send the IO request to the target OSD, so as to implement data reading and writing.
  • This embodiment is applied to a Ceph client.
  • the client mentioned here refers to a service process that accesses the cluster.
  • the nas scenario it refers to nfsserver and samba
  • the corresponding block scenario refers to the tgt process
  • the corresponding object scenario refers to rgw.
  • OSD Object Storage Device
  • Each disk, SSD, RAID group or partition can become an OSD.
  • OSD Object Storage Device
  • the file is divided into several objects.
  • the objects are first mapped to the placement group, and then from the placement group to the OSD group.
  • the target OSD in this embodiment refers to the main OSD in the OSD group, that is, the OSD that can perform data read and write operations.
  • the IO request in this embodiment includes the identification information of the target object.
  • the main information included is the identification information of the target object, the offset of the object read request, and the length to be read;
  • the request is a write request, the main information included is the identification information of the target object, the offset of the object write request, the length of the data to be written, and the content of the data to be written.
  • the Ceph client receives the IO request, it first calculates the target placement group to which the target object is mapped according to the identification information of the target object carried in the IO request. As a specific implementation, calculates the identification information of the target object The hash value of, and take the modulus to get the placement group corresponding to the target object.
  • the CRUSH algorithm Controlled Replication Under Scalable Hashing, distributed selection algorithm for data storage
  • a cache file is preset, and the cache file is used to record the mapping relationship between the configuration group and the OSD. Therefore, after the target placement group is determined, there is no need to calculate the corresponding target OSD through the cumbersome CRUSH algorithm, and only need to pass The target OSD can be directly determined by querying the cache file, thereby forwarding the IO request to the target OSD, avoiding the tedious calculation process.
  • the CRUSH algorithm is used to calculate the target OSD corresponding to the target placement group.
  • the CRUSH algorithm is used to calculate the location where the object should be written or read.
  • This embodiment provides a method for reading and writing data in a distributed storage system.
  • the solution includes: obtaining an IO request, the IO request including the identification information of the target object; determining the target placement group according to the identification information of the target object; In the created cache file, the target OSD corresponding to the target placement group is sent, and the IO request is sent to the target OSD to realize data reading and writing. It can be seen that the solution caches the corresponding relationship between the placement group and the OSD in the cache file. Therefore, there is no need to determine the target OSD corresponding to the target placement group through tedious calculations during the data read and write process, thereby reducing the system CPU resources in the IO path. Waste, and can greatly reduce the IO delay.
  • the second embodiment of a data reading and writing method for a distributed storage system provided by the present application will be described in detail below.
  • the second embodiment is implemented based on the foregoing embodiment 1, and is expanded to a certain extent on the basis of the first embodiment.
  • the mapping relationship between the placement group and the OSD is recorded in the form of a list, and this embodiment takes a data writing scenario as an example for description.
  • the second embodiment is applied to the Ceph client and specifically includes:
  • S201 Obtain a file to be written, divide the file to be written into multiple target objects according to a preset object size, and determine the identification information of each target object; generate an IO carrying the identification information of the target object request;
  • the aforementioned preset object size is adjusted according to actual needs, usually 2M or 4M.
  • the objects obtained by segmentation will have unique identification information, which is generally composed of the File ID of the file to be written and the number of the segment.
  • S203 Perform a modulo operation on the hash value of the identification information of the target object to obtain the identification information of the target placement group;
  • the identification information of the PG is similar to 1.0, 1.1, 1.2, and the above cache list can be implemented in the code using a map structure.
  • the specific content is as follows:
  • the [1,2,3] following the identification information of each placement group is the identification information of the OSD corresponding to the placement group.
  • OSD Map records how many OSDs the cluster contains, which nodes these OSDs belong to, and the respective weight information of these nodes and OSDs. These relationships are stored in a tree structure. OSD and node online and offline will bring changes in OSD Map, OSD Map changes will cause corresponding changes in the OSD corresponding to the PG, so the PG cache list should be cleared and the cache should be re-cached.
  • the data reading and writing method of a distributed storage system caches the correspondence between placement groups and OSDs in a cache file. Therefore, there is no need to determine the target placement through tedious calculations during data reading and writing.
  • the target OSD corresponding to the group greatly reduces the CPU consumption and delay on the IO path, thereby saving the system CPU resource occupation, and greatly reducing the front-end IO delay.
  • the following describes a data reading and writing device of a distributed storage system provided by an embodiment of the present application.
  • the data reading and writing device of a distributed storage system described below and the data reading and writing device of a distributed storage system described above are described below.
  • the methods can be referred to each other.
  • Ceph client including:
  • Request obtaining module 301 used to obtain an IO request, where the IO request includes the identification information of the target object;
  • the placement group determining module 302 used to determine the target placement group according to the identification information of the target object;
  • the request sending module 303 is used to determine the target OSD corresponding to the target placement group in the pre-created cache file, and send the IO request to the target OSD to implement data reading and writing.
  • the data reading and writing device of the distributed storage system of this embodiment is used to implement the aforementioned data reading and writing method of the distributed storage system. Therefore, the specific implementation of the device can be seen in the foregoing description of the data reading and writing method of the distributed storage system.
  • the embodiment part, for example, the request obtaining module 301, the placement group determining module 302, and the request sending module 303 are respectively used to implement steps S101, S102, and S103 in the data reading and writing method of the above-mentioned distributed storage system. Therefore, the specific implementation mode can refer to the description of the respective parts of the embodiment, and the introduction is not repeated here.
  • the data reading and writing device of the distributed storage system of this embodiment is used to implement the aforementioned data reading and writing method of the distributed storage system, its function corresponds to the function of the above method, and will not be repeated here.
  • this application also provides a storage server of a distributed storage system, as shown in Figure 4, including:
  • Memory 100 used to store computer programs
  • the processor 200 is configured to execute the computer program to implement the steps of a method for reading and writing data in a distributed storage system as described above.
  • this application provides a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, it is used to implement data reading and writing in a distributed storage system as described above. Method steps.
  • the steps of the method or algorithm described in combination with the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or any other technical field Any other known storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention porte sur un procédé et sur un appareil de lecture et d'écriture de données, sur un serveur de stockage et sur un support de stockage lisible d'un système de stockage distribué, le procédé consistant : à obtenir une demande d'entrée/de sortie, la demande d'entrée/de sortie comportant des informations d'identification d'un objet cible ; à déterminer un groupe de placement cible en fonction des informations d'identification de l'objet cible ; à déterminer un OSD cible correspondant au groupe de placement cible dans un fichier de mémoire cache précréé et à envoyer la demande d'entrée/de sortie à l'OSD cible pour mettre en œuvre la lecture et l'écriture de données. Comme on peut le voir, le procédé met en mémoire cache la correspondance entre le groupe de placement et l'OSD dans le fichier de mémoire cache ; par conséquent, dans le processus de lecture et d'écriture de données, il n'est pas nécessaire d'utiliser des calculs fastidieux pour déterminer l'OSD cible correspondant au groupe de placement cible, ce qui réduit le gaspillage de ressources d'unité CPU du système dans le trajet d'entrée/de sortie et peut réduire considérablement le retard d'entrée/de sortie.
PCT/CN2020/092831 2019-10-25 2020-05-28 Appareil de lecture et d'écriture de données d'un système de stockage distribué WO2021077745A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911025181.2A CN110780819A (zh) 2019-10-25 2019-10-25 一种分布式存储系统的数据读写方法
CN201911025181.2 2019-10-25

Publications (1)

Publication Number Publication Date
WO2021077745A1 true WO2021077745A1 (fr) 2021-04-29

Family

ID=69386814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092831 WO2021077745A1 (fr) 2019-10-25 2020-05-28 Appareil de lecture et d'écriture de données d'un système de stockage distribué

Country Status (2)

Country Link
CN (1) CN110780819A (fr)
WO (1) WO2021077745A1 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110780819A (zh) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 一种分布式存储系统的数据读写方法
CN112363674B (zh) * 2020-11-12 2022-04-22 新华三技术有限公司成都分公司 一种数据写入方法和装置
CN112486413B (zh) * 2020-11-27 2022-08-05 杭州朗和科技有限公司 一种数据读取方法、装置、介质和计算设备
CN112783443A (zh) * 2021-01-18 2021-05-11 北京聚云科技有限公司 数据读取方法、装置及电子设备
CN112925482B (zh) * 2021-03-15 2022-08-23 北京首都在线科技股份有限公司 数据处理方法、装置、系统、电子设备和计算机存储介质
CN114564460B (zh) * 2022-02-25 2024-01-19 苏州浪潮智能科技有限公司 基于分布式存储系统的参数调优方法、装置、设备及介质
CN114327299B (zh) * 2022-03-01 2022-06-03 苏州浪潮智能科技有限公司 一种顺序读预读的方法、装置、设备及介质
CN116737619B (zh) * 2023-08-15 2023-11-03 苏州浪潮智能科技有限公司 数据请求系统、方法、装置、计算机设备和存储介质
CN117348820A (zh) * 2023-12-04 2024-01-05 浪潮云信息技术股份公司 一种分布式系统的数据处理方法、系统、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214773A1 (en) * 2013-01-30 2014-07-31 Hewlett-Packard Development Company, L.P. Reconstructing a state of a file system using a preserved snapshot
CN108287660A (zh) * 2017-01-09 2018-07-17 中国移动通信集团河北有限公司 数据存储方法和设备
CN110222013A (zh) * 2019-06-11 2019-09-10 深信服科技股份有限公司 一种集群存储容量确定的方法、系统、设备及存储介质
CN110780819A (zh) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 一种分布式存储系统的数据读写方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103905540A (zh) * 2014-03-25 2014-07-02 浪潮电子信息产业股份有限公司 基于两级哈希的对象存储数据分布机制
US10003649B2 (en) * 2015-05-07 2018-06-19 Dell Products Lp Systems and methods to improve read/write performance in object storage applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214773A1 (en) * 2013-01-30 2014-07-31 Hewlett-Packard Development Company, L.P. Reconstructing a state of a file system using a preserved snapshot
CN108287660A (zh) * 2017-01-09 2018-07-17 中国移动通信集团河北有限公司 数据存储方法和设备
CN110222013A (zh) * 2019-06-11 2019-09-10 深信服科技股份有限公司 一种集群存储容量确定的方法、系统、设备及存储介质
CN110780819A (zh) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 一种分布式存储系统的数据读写方法

Also Published As

Publication number Publication date
CN110780819A (zh) 2020-02-11

Similar Documents

Publication Publication Date Title
WO2021077745A1 (fr) Appareil de lecture et d'écriture de données d'un système de stockage distribué
US11662936B2 (en) Writing data using references to previously stored data
US11436157B2 (en) Method and apparatus for accessing storage system
WO2017201977A1 (fr) Procédé et appareil d'écriture et de lecture de données, et grappe de stockage d'objet distribuée
US10789228B2 (en) Data presence/absence determination apparatus and computer-readable storage medium storing program for determination of data presence/absence
EP3531666B1 (fr) Procédé de gestion de dispositifs de stockage dans un système de stockage, et système de stockage
WO2017107948A1 (fr) Procédé et système d'agrégation d'écriture et d'agrégation de lecture de fichier, et client
WO2012041110A1 (fr) Procédé et dispositif de comparaison de données
US9110820B1 (en) Hybrid data storage system in an HPC exascale environment
WO2020093501A1 (fr) Procédé de stockage et procédé de suppression de fichiers, serveur et support d'informations
US11321021B2 (en) Method and apparatus of managing mapping relationship between storage identifier and start address of queue of storage device corresponding to the storage identifier
WO2021159711A1 (fr) Procédé et appareil d'accès à une arborescence b+ et support de stockage lisible par ordinateur
WO2022048356A1 (fr) Procédé et système de traitement de données pour plate-forme en nuage, dispositif électronique et support de stockage
US20240086332A1 (en) Data processing method and system, device, and medium
CN114281989A (zh) 基于文本相似度的数据去重方法、装置及存储介质和服务器
US10725877B2 (en) System, method and computer program product for performing a data protection operation
WO2016065610A1 (fr) Procédé pour accéder à des fichiers, système de stockage distribué et nœud de stockage
US11586353B2 (en) Optimized access to high-speed storage device
Jo et al. On the trade-off between performance and storage efficiency of replication-based object storage
CN111966845A (zh) 图片管理方法、装置、存储节点及存储介质
WO2023093091A1 (fr) Système de stockage de données, carte réseau intelligente et nœud de calcul
US20190065559A1 (en) Computer system and database management method
CN111857587B (zh) 一种在存储系统缓存装置中基于磁道地址的命中检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20879427

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20879427

Country of ref document: EP

Kind code of ref document: A1