WO2021077745A1 - Appareil de lecture et d'écriture de données d'un système de stockage distribué - Google Patents
Appareil de lecture et d'écriture de données d'un système de stockage distribué Download PDFInfo
- Publication number
- WO2021077745A1 WO2021077745A1 PCT/CN2020/092831 CN2020092831W WO2021077745A1 WO 2021077745 A1 WO2021077745 A1 WO 2021077745A1 CN 2020092831 W CN2020092831 W CN 2020092831W WO 2021077745 A1 WO2021077745 A1 WO 2021077745A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- osd
- request
- placement group
- identification information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Definitions
- This application relates to the field of computer technology, and in particular to a data reading and writing method, device, storage server, and readable storage medium of a distributed storage system.
- the industry has higher and higher requirements for the performance and reliability of distributed storage.
- the client when the client reads and writes an object, it needs to first calculate the placement group based on the object name, and then use it to calculate the members of the placement group, but the calculation of the placement group membership needs to go through multiple cycles and Recursively call hash calculation, which greatly wastes system CPU resources and increases the delay of front-end IO.
- the purpose of this application is to provide a data reading and writing method, device, storage server, and readable storage medium for a distributed storage system to solve the problem that traditional data reading and writing solutions occupy more system CPU resources during the data reading and writing process. Causes the problem of increased front-end IO delay.
- the specific plan is as follows:
- this application provides a method for reading and writing data in a distributed storage system, which is applied to a Ceph client, including:
- the determining the target OSD corresponding to the target placement group in the pre-created cache file and sending the IO request to the target OSD includes:
- the calculating the target OSD corresponding to the target placement group according to the crush algorithm includes:
- the determining the target placement group according to the identification information of the target object includes:
- a modulo operation is performed on the hash value of the identification information of the target object to obtain the identification information of the target placement group.
- the method further includes:
- Obtain the file to be written divide the file to be written into multiple target objects according to a preset object size, and determine the identification information of each target object; generate an IO request carrying the identification information of the target object.
- the determining the target OSD corresponding to the target placement group in the pre-created cache file includes:
- the method further includes:
- this application provides a data reading and writing device of a distributed storage system, which is applied to a Ceph client, and includes:
- Request acquisition module used to acquire an IO request, where the IO request includes the identification information of the target object;
- Placement group determination module used to determine the target placement group according to the identification information of the target object
- Request sending module used to determine the target OSD corresponding to the target placement group in the pre-created cache file, and send the IO request to the target OSD to realize data reading and writing.
- this application provides a storage server of a distributed storage system, including:
- Memory used to store computer programs
- Processor used to execute the computer program to implement the steps of a method for reading and writing data in a distributed storage system as described above.
- the present application provides a readable storage medium on which a computer program is stored.
- the computer program When the computer program is executed by a processor, it is used to implement the data of a distributed storage system as described above. The steps of the read and write method.
- the data reading and writing method, device, storage server, and readable storage medium of a distributed storage system include: obtaining an IO request, the IO request including the identification information of the target object; according to the identification information of the target object , Determine the target placement group; determine the target OSD corresponding to the target placement group in the pre-created cache file, and send the IO request to the target OSD to achieve data reading and writing.
- the solution caches the corresponding relationship between the placement group and the OSD in the cache file. Therefore, there is no need to determine the target OSD corresponding to the target placement group through tedious calculations during the data read and write process, thereby reducing the system CPU resources in the IO path. Waste, and can greatly reduce the IO delay.
- FIG. 1 is an implementation flowchart of Embodiment 1 of a data reading and writing method for a distributed storage system provided by this application;
- Embodiment 2 is an implementation flowchart of Embodiment 2 of a method for reading and writing data in a distributed storage system provided by this application;
- FIG. 3 is a functional block diagram of an embodiment of a data reading and writing device of a distributed storage system provided by this application;
- Figure 4 is a schematic structural diagram of an embodiment of a storage server of a distributed storage system provided by this application.
- the core of this application is to provide a method, device, storage server and readable storage medium for reading and writing data of a distributed storage system, which reduces the waste of system CPU resources in the process of data reading and writing, and can greatly reduce IO latency .
- the first embodiment includes:
- S103 Determine a target OSD corresponding to the target placement group in a pre-created cache file, and send the IO request to the target OSD, so as to implement data reading and writing.
- This embodiment is applied to a Ceph client.
- the client mentioned here refers to a service process that accesses the cluster.
- the nas scenario it refers to nfsserver and samba
- the corresponding block scenario refers to the tgt process
- the corresponding object scenario refers to rgw.
- OSD Object Storage Device
- Each disk, SSD, RAID group or partition can become an OSD.
- OSD Object Storage Device
- the file is divided into several objects.
- the objects are first mapped to the placement group, and then from the placement group to the OSD group.
- the target OSD in this embodiment refers to the main OSD in the OSD group, that is, the OSD that can perform data read and write operations.
- the IO request in this embodiment includes the identification information of the target object.
- the main information included is the identification information of the target object, the offset of the object read request, and the length to be read;
- the request is a write request, the main information included is the identification information of the target object, the offset of the object write request, the length of the data to be written, and the content of the data to be written.
- the Ceph client receives the IO request, it first calculates the target placement group to which the target object is mapped according to the identification information of the target object carried in the IO request. As a specific implementation, calculates the identification information of the target object The hash value of, and take the modulus to get the placement group corresponding to the target object.
- the CRUSH algorithm Controlled Replication Under Scalable Hashing, distributed selection algorithm for data storage
- a cache file is preset, and the cache file is used to record the mapping relationship between the configuration group and the OSD. Therefore, after the target placement group is determined, there is no need to calculate the corresponding target OSD through the cumbersome CRUSH algorithm, and only need to pass The target OSD can be directly determined by querying the cache file, thereby forwarding the IO request to the target OSD, avoiding the tedious calculation process.
- the CRUSH algorithm is used to calculate the target OSD corresponding to the target placement group.
- the CRUSH algorithm is used to calculate the location where the object should be written or read.
- This embodiment provides a method for reading and writing data in a distributed storage system.
- the solution includes: obtaining an IO request, the IO request including the identification information of the target object; determining the target placement group according to the identification information of the target object; In the created cache file, the target OSD corresponding to the target placement group is sent, and the IO request is sent to the target OSD to realize data reading and writing. It can be seen that the solution caches the corresponding relationship between the placement group and the OSD in the cache file. Therefore, there is no need to determine the target OSD corresponding to the target placement group through tedious calculations during the data read and write process, thereby reducing the system CPU resources in the IO path. Waste, and can greatly reduce the IO delay.
- the second embodiment of a data reading and writing method for a distributed storage system provided by the present application will be described in detail below.
- the second embodiment is implemented based on the foregoing embodiment 1, and is expanded to a certain extent on the basis of the first embodiment.
- the mapping relationship between the placement group and the OSD is recorded in the form of a list, and this embodiment takes a data writing scenario as an example for description.
- the second embodiment is applied to the Ceph client and specifically includes:
- S201 Obtain a file to be written, divide the file to be written into multiple target objects according to a preset object size, and determine the identification information of each target object; generate an IO carrying the identification information of the target object request;
- the aforementioned preset object size is adjusted according to actual needs, usually 2M or 4M.
- the objects obtained by segmentation will have unique identification information, which is generally composed of the File ID of the file to be written and the number of the segment.
- S203 Perform a modulo operation on the hash value of the identification information of the target object to obtain the identification information of the target placement group;
- the identification information of the PG is similar to 1.0, 1.1, 1.2, and the above cache list can be implemented in the code using a map structure.
- the specific content is as follows:
- the [1,2,3] following the identification information of each placement group is the identification information of the OSD corresponding to the placement group.
- OSD Map records how many OSDs the cluster contains, which nodes these OSDs belong to, and the respective weight information of these nodes and OSDs. These relationships are stored in a tree structure. OSD and node online and offline will bring changes in OSD Map, OSD Map changes will cause corresponding changes in the OSD corresponding to the PG, so the PG cache list should be cleared and the cache should be re-cached.
- the data reading and writing method of a distributed storage system caches the correspondence between placement groups and OSDs in a cache file. Therefore, there is no need to determine the target placement through tedious calculations during data reading and writing.
- the target OSD corresponding to the group greatly reduces the CPU consumption and delay on the IO path, thereby saving the system CPU resource occupation, and greatly reducing the front-end IO delay.
- the following describes a data reading and writing device of a distributed storage system provided by an embodiment of the present application.
- the data reading and writing device of a distributed storage system described below and the data reading and writing device of a distributed storage system described above are described below.
- the methods can be referred to each other.
- Ceph client including:
- Request obtaining module 301 used to obtain an IO request, where the IO request includes the identification information of the target object;
- the placement group determining module 302 used to determine the target placement group according to the identification information of the target object;
- the request sending module 303 is used to determine the target OSD corresponding to the target placement group in the pre-created cache file, and send the IO request to the target OSD to implement data reading and writing.
- the data reading and writing device of the distributed storage system of this embodiment is used to implement the aforementioned data reading and writing method of the distributed storage system. Therefore, the specific implementation of the device can be seen in the foregoing description of the data reading and writing method of the distributed storage system.
- the embodiment part, for example, the request obtaining module 301, the placement group determining module 302, and the request sending module 303 are respectively used to implement steps S101, S102, and S103 in the data reading and writing method of the above-mentioned distributed storage system. Therefore, the specific implementation mode can refer to the description of the respective parts of the embodiment, and the introduction is not repeated here.
- the data reading and writing device of the distributed storage system of this embodiment is used to implement the aforementioned data reading and writing method of the distributed storage system, its function corresponds to the function of the above method, and will not be repeated here.
- this application also provides a storage server of a distributed storage system, as shown in Figure 4, including:
- Memory 100 used to store computer programs
- the processor 200 is configured to execute the computer program to implement the steps of a method for reading and writing data in a distributed storage system as described above.
- this application provides a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, it is used to implement data reading and writing in a distributed storage system as described above. Method steps.
- the steps of the method or algorithm described in combination with the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
- the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or any other technical field Any other known storage media.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
La présente invention porte sur un procédé et sur un appareil de lecture et d'écriture de données, sur un serveur de stockage et sur un support de stockage lisible d'un système de stockage distribué, le procédé consistant : à obtenir une demande d'entrée/de sortie, la demande d'entrée/de sortie comportant des informations d'identification d'un objet cible ; à déterminer un groupe de placement cible en fonction des informations d'identification de l'objet cible ; à déterminer un OSD cible correspondant au groupe de placement cible dans un fichier de mémoire cache précréé et à envoyer la demande d'entrée/de sortie à l'OSD cible pour mettre en œuvre la lecture et l'écriture de données. Comme on peut le voir, le procédé met en mémoire cache la correspondance entre le groupe de placement et l'OSD dans le fichier de mémoire cache ; par conséquent, dans le processus de lecture et d'écriture de données, il n'est pas nécessaire d'utiliser des calculs fastidieux pour déterminer l'OSD cible correspondant au groupe de placement cible, ce qui réduit le gaspillage de ressources d'unité CPU du système dans le trajet d'entrée/de sortie et peut réduire considérablement le retard d'entrée/de sortie.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911025181.2A CN110780819A (zh) | 2019-10-25 | 2019-10-25 | 一种分布式存储系统的数据读写方法 |
CN201911025181.2 | 2019-10-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021077745A1 true WO2021077745A1 (fr) | 2021-04-29 |
Family
ID=69386814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/092831 WO2021077745A1 (fr) | 2019-10-25 | 2020-05-28 | Appareil de lecture et d'écriture de données d'un système de stockage distribué |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110780819A (fr) |
WO (1) | WO2021077745A1 (fr) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110780819A (zh) * | 2019-10-25 | 2020-02-11 | 浪潮电子信息产业股份有限公司 | 一种分布式存储系统的数据读写方法 |
CN112363674B (zh) * | 2020-11-12 | 2022-04-22 | 新华三技术有限公司成都分公司 | 一种数据写入方法和装置 |
CN112486413B (zh) * | 2020-11-27 | 2022-08-05 | 杭州朗和科技有限公司 | 一种数据读取方法、装置、介质和计算设备 |
CN112783443A (zh) * | 2021-01-18 | 2021-05-11 | 北京聚云科技有限公司 | 数据读取方法、装置及电子设备 |
CN112925482B (zh) * | 2021-03-15 | 2022-08-23 | 北京首都在线科技股份有限公司 | 数据处理方法、装置、系统、电子设备和计算机存储介质 |
CN114564460B (zh) * | 2022-02-25 | 2024-01-19 | 苏州浪潮智能科技有限公司 | 基于分布式存储系统的参数调优方法、装置、设备及介质 |
CN114327299B (zh) * | 2022-03-01 | 2022-06-03 | 苏州浪潮智能科技有限公司 | 一种顺序读预读的方法、装置、设备及介质 |
CN116737619B (zh) * | 2023-08-15 | 2023-11-03 | 苏州浪潮智能科技有限公司 | 数据请求系统、方法、装置、计算机设备和存储介质 |
CN117348820A (zh) * | 2023-12-04 | 2024-01-05 | 浪潮云信息技术股份公司 | 一种分布式系统的数据处理方法、系统、设备及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140214773A1 (en) * | 2013-01-30 | 2014-07-31 | Hewlett-Packard Development Company, L.P. | Reconstructing a state of a file system using a preserved snapshot |
CN108287660A (zh) * | 2017-01-09 | 2018-07-17 | 中国移动通信集团河北有限公司 | 数据存储方法和设备 |
CN110222013A (zh) * | 2019-06-11 | 2019-09-10 | 深信服科技股份有限公司 | 一种集群存储容量确定的方法、系统、设备及存储介质 |
CN110780819A (zh) * | 2019-10-25 | 2020-02-11 | 浪潮电子信息产业股份有限公司 | 一种分布式存储系统的数据读写方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103905540A (zh) * | 2014-03-25 | 2014-07-02 | 浪潮电子信息产业股份有限公司 | 基于两级哈希的对象存储数据分布机制 |
US10003649B2 (en) * | 2015-05-07 | 2018-06-19 | Dell Products Lp | Systems and methods to improve read/write performance in object storage applications |
-
2019
- 2019-10-25 CN CN201911025181.2A patent/CN110780819A/zh active Pending
-
2020
- 2020-05-28 WO PCT/CN2020/092831 patent/WO2021077745A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140214773A1 (en) * | 2013-01-30 | 2014-07-31 | Hewlett-Packard Development Company, L.P. | Reconstructing a state of a file system using a preserved snapshot |
CN108287660A (zh) * | 2017-01-09 | 2018-07-17 | 中国移动通信集团河北有限公司 | 数据存储方法和设备 |
CN110222013A (zh) * | 2019-06-11 | 2019-09-10 | 深信服科技股份有限公司 | 一种集群存储容量确定的方法、系统、设备及存储介质 |
CN110780819A (zh) * | 2019-10-25 | 2020-02-11 | 浪潮电子信息产业股份有限公司 | 一种分布式存储系统的数据读写方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110780819A (zh) | 2020-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021077745A1 (fr) | Appareil de lecture et d'écriture de données d'un système de stockage distribué | |
US11662936B2 (en) | Writing data using references to previously stored data | |
US11436157B2 (en) | Method and apparatus for accessing storage system | |
WO2017201977A1 (fr) | Procédé et appareil d'écriture et de lecture de données, et grappe de stockage d'objet distribuée | |
US10789228B2 (en) | Data presence/absence determination apparatus and computer-readable storage medium storing program for determination of data presence/absence | |
EP3531666B1 (fr) | Procédé de gestion de dispositifs de stockage dans un système de stockage, et système de stockage | |
WO2017107948A1 (fr) | Procédé et système d'agrégation d'écriture et d'agrégation de lecture de fichier, et client | |
WO2012041110A1 (fr) | Procédé et dispositif de comparaison de données | |
US9110820B1 (en) | Hybrid data storage system in an HPC exascale environment | |
WO2020093501A1 (fr) | Procédé de stockage et procédé de suppression de fichiers, serveur et support d'informations | |
US11321021B2 (en) | Method and apparatus of managing mapping relationship between storage identifier and start address of queue of storage device corresponding to the storage identifier | |
WO2021159711A1 (fr) | Procédé et appareil d'accès à une arborescence b+ et support de stockage lisible par ordinateur | |
WO2022048356A1 (fr) | Procédé et système de traitement de données pour plate-forme en nuage, dispositif électronique et support de stockage | |
US20240086332A1 (en) | Data processing method and system, device, and medium | |
CN114281989A (zh) | 基于文本相似度的数据去重方法、装置及存储介质和服务器 | |
US10725877B2 (en) | System, method and computer program product for performing a data protection operation | |
WO2016065610A1 (fr) | Procédé pour accéder à des fichiers, système de stockage distribué et nœud de stockage | |
US11586353B2 (en) | Optimized access to high-speed storage device | |
Jo et al. | On the trade-off between performance and storage efficiency of replication-based object storage | |
CN111966845A (zh) | 图片管理方法、装置、存储节点及存储介质 | |
WO2023093091A1 (fr) | Système de stockage de données, carte réseau intelligente et nœud de calcul | |
US20190065559A1 (en) | Computer system and database management method | |
CN111857587B (zh) | 一种在存储系统缓存装置中基于磁道地址的命中检测方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20879427 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20879427 Country of ref document: EP Kind code of ref document: A1 |