CN113192558A - Reading and writing method for third-generation gene sequencing data and distributed file system - Google Patents

Reading and writing method for third-generation gene sequencing data and distributed file system Download PDF

Info

Publication number
CN113192558A
CN113192558A CN202110578909.5A CN202110578909A CN113192558A CN 113192558 A CN113192558 A CN 113192558A CN 202110578909 A CN202110578909 A CN 202110578909A CN 113192558 A CN113192558 A CN 113192558A
Authority
CN
China
Prior art keywords
data
reading
request
hard disk
writing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110578909.5A
Other languages
Chinese (zh)
Inventor
宁建峰
宁建强
戈素梅
李宁宁
刘政委
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lexun Technology Co ltd
Beijing Free Cat Technology Co ltd
Original Assignee
Beijing Lexun Technology Co ltd
Beijing Free Cat Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lexun Technology Co ltd, Beijing Free Cat Technology Co ltd filed Critical Beijing Lexun Technology Co ltd
Priority to CN202110578909.5A priority Critical patent/CN113192558A/en
Publication of CN113192558A publication Critical patent/CN113192558A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The application provides a reading and writing method and a distributed file system for third-generation gene sequencing data, wherein the method comprises the following steps: dividing a hard disk storage space into a plurality of data storage pools, and storing write-in data into the data storage pools in sequence after receiving a data write-in request, wherein one data storage pool is switched to the next data storage pool after being fully written. The scheme of the application utilizes the characteristic that the gene sequencing application data is written once, and then only reading can be carried out without modification, and meanwhile, the data writing is carried out asynchronously, and the real-time requirement is not high.

Description

Reading and writing method for third-generation gene sequencing data and distributed file system
Technical Field
The application relates to the technical field of information processing, in particular to a reading and writing method for third-generation gene sequencing data and a distributed file system.
Background
Gene sequencing is a typical application of high-performance calculation, and the third generation gene sequencing technology is gradually changed into the mainstream sequencing technology at present. The gene sequencing system is a standard high-performance computing cluster, the system architecture is shown in fig. 1, the whole system comprises a computing node cluster and a distributed file system, the computing node cluster comprises n computing nodes j1, j2 … … jn, the distributed file system comprises i storage servers f1 and f2 … … fi, and the computing nodes and the storage servers are connected through m switches h1 … … hm network.
The third generation gene sequencing system has several main requirements for distributed file systems: firstly, a large amount of random data extraction needs to be carried out on a gene file in the third-generation gene sequencing operation process, so that the random reading delay of a distributed file system is required to be lower; secondly, a large amount of new data is input while the third-generation gene sequencing is operated, so that the distributed file system is required to provide higher write-in bandwidth while ensuring lower random reading delay; third, during the third generation gene sequencing, all the computing nodes are parallel, so that the distributed file system is required to provide consistent performance for each computing node, and the situations that part of the nodes are fast and part of the nodes are slow can not occur.
The current distributed file system is basically positioned as a general file system, generally manages data on a hard disk based on a local file system, and has several problems in response to third-generation gene sequencing: firstly, a local file system is constructed on a mechanical hard disk, and the random read access performance is not high because the metadata of the local file system has more times of accessing the hard disk; secondly, when reading and writing are mixed, data are placed into a cache firstly when written in, collected to a certain amount and then are intensively brushed back to the hard disk, so that a large amount of writing in a short time can have great influence on reading of the hard disk, and the reading delay is uncontrollable; thirdly, when there are many computing nodes, the capability of the computing node far exceeds the capability of the storage cluster, the storage service directly rejects the request that the storage service cannot be accepted by adopting a simple flow control mechanism, and the computing node retries until the request succeeds, so that the retry may continue to be rejected, and the access delay is uncontrollable.
To this end, improvements to existing distributed file systems are needed.
Disclosure of Invention
The embodiment of the application aims to provide a reading and writing method and a distributed file for third-generation gene sequencing data, so as to solve the problem of low reading and writing efficiency in the reading and writing process of the third-generation gene sequencing data in the prior art.
In order to achieve the above objects, some embodiments of the present application provide a method for reading and writing third generation gene sequencing data, comprising the steps of:
dividing a hard disk storage space into a plurality of data storage pools;
and after receiving a data write request, sequentially storing write data into the data storage pools, wherein one data storage pool is switched to the next data storage pool after being fully written.
In some embodiments of the present application, the method for reading and writing third generation gene sequencing data further comprises the following steps:
and after receiving a data reading request, directly reading the requested data if the data storage pool in which the requested data is located is in a full write state.
In some embodiments of the present application, the method for reading and writing third generation gene sequencing data further comprises the following steps:
after receiving a data repair request, determining an updated hard disk storage space;
and storing the written repair data into the updated hard disk storage space.
In some embodiments of the present application, the method for reading and writing third generation gene sequencing data further comprises the following steps:
after a system mounting signal is detected, scanning data retrieval information in a hard disk storage space, wherein the data retrieval information comprises a data storage directory entry and a data index node; caching the data retrieval information into a system memory;
and after receiving a random data reading request, determining a target directory and a target index node where the requested data is located according to the data retrieval information in the system memory.
In some embodiments of the present application, the method for reading and writing third generation gene sequencing data further comprises the following steps:
setting a flow window;
and after receiving a data writing request or a data reading request, adjusting the data writing flow or the data reading flow according to the flow window.
In some embodiments of the present application, the step of adjusting the data write flow or the data read flow according to the flow window includes:
acquiring the delay time of the data writing request or the data reading request;
if the delay time length is less than the expected time length, increasing the flow window according to a set proportion; and if the delay time length is greater than the expected time length, reducing the flow window according to a set proportion.
Based on the same inventive concept, some embodiments of the present application further provide a storage server for third generation gene sequencing data, comprising:
at least one hard disk;
the data management module is used for dividing the hard disk storage space of the hard disk into a plurality of data storage pools; and after receiving a data write request, sequentially storing write data into the data storage pools, wherein one data storage pool is switched to the next data storage pool after being fully written.
The storage server for third generation gene sequencing data in some embodiments of the present application, further comprising:
the data management module is further configured to, after receiving a data reading request, directly read the requested data if the data storage pool where the requested data is located is in a full write state; and/or the presence of a gas in the gas,
after receiving a data repair request, determining an updated hard disk storage space; storing the written repair data into the updated hard disk storage space; and/or the presence of a gas in the gas,
after a system mounting signal is detected, scanning data retrieval information in a hard disk storage space, wherein the data retrieval information comprises a data storage directory entry and a data index node; caching the data retrieval information into a system memory;
and after receiving a random data reading request, determining a target directory and a target index node where the requested data is located according to the data retrieval information in the system memory.
The storage server for third generation gene sequencing data in some embodiments of the present application, further comprising:
a Qos management module, which is used for setting a flow window; after receiving a data writing request or a data reading request, adjusting data writing flow or data reading flow according to the flow window; and:
acquiring the delay time of the data writing request or the data reading request; if the delay time length is less than the expected time length, increasing the flow window according to a set proportion; and if the delay time length is greater than the expected time length, reducing the flow window according to a set proportion.
Some embodiments of the present application further provide a distributed file system, comprising a plurality of storage servers for third generation gene sequencing data as described in any of the above aspects, further comprising:
and the global data distribution manager is used for performing differentiated processing of writing operation and reading operation on all hard disk storage spaces in the plurality of storage servers.
Compared with the prior art, the technical scheme provided by the application at least has the following beneficial effects: the method comprises the steps of dividing a hard disk storage space into a plurality of data storage pools, and storing write-in data into the data storage pools in sequence after receiving a data write-in request, wherein one data storage pool is switched to the next data storage pool after being fully written, so that a traditional data write-in mode can be changed, the data storage pool does not need to be written in as long as the data storage pool is fully written, and the data storage pool is not influenced by the data write-in operation if the fully written data storage pool is subsequently read. The scheme is designed aiming at the characteristics of the gene sequencing application data, the gene sequencing application data is characterized in that the data is written once and only can be read subsequently without being modified, meanwhile, the data writing is carried out asynchronously, the real-time requirement is not high, and on the basis, the scheme in the embodiment of the application can at least ensure the reading performance of the gene sequencing application data.
Drawings
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
FIG. 1 is a diagram of a system architecture of a computer cluster used in a prior art gene sequencing system;
FIG. 2 is a flow chart of a method for reading and writing third generation gene sequencing data according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for reading and writing third generation gene sequencing data according to another embodiment of the present application;
FIG. 4 is a flow chart of a method for reading and writing third generation gene sequencing data according to yet another embodiment of the present application;
FIG. 5 is a block diagram of a storage server for third generation gene sequencing data according to one embodiment of the present application;
FIG. 6 is a block diagram of a storage server for third generation gene sequencing data according to another embodiment of the present application;
fig. 7 is a block diagram of a distributed file system according to an embodiment of the present application.
Detailed Description
In this section, reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Some embodiments of the present application provide a method for reading and writing third generation gene sequencing data, which can be used in a computer system for performing the reading and writing operations of the third generation gene sequencing data, as shown in fig. 2, and can include the following steps:
s101: the hard disk storage space is divided into a plurality of data storage pools. In this step, the storage space of each hard disk may be divided according to a set space size, all the hard disk storage spaces are divided into a plurality of data storage pools, and the space sizes of different data storage pools may be the same or different.
S102: and after receiving a data write request, sequentially storing write data into the data storage pools, wherein one data storage pool is switched to the next data storage pool after being fully written. The different data storage pools may be numbered according to certain rules and the system is able to determine the number of each data storage pool. When writing data, the system can obtain the size of the written data in real time and determine the space size of the data storage pool, for example, when the written data is 50G, space can be allocated for the written data according to the space size of different data storage pools in advance. Write data is written into the data storage pool numbered "10" with the 30G space size, and after the data storage pool numbered "10" is full, the write data is written into the data storage pool numbered "11" with the 20G space size. In some application scenarios, the data storage pools may be selected based on the size of the write data and the size of the storage space of each data storage pool. Therefore, the order in this step may be in accordance with a predetermined numbering order or may be a result of selection according to the size of the write data.
The above scheme provided by this embodiment changes the conventional data writing manner, where the state of a part of the data storage pool is a full-written state, and as long as the data storage pool is full, it is not necessary to perform a writing operation on the data storage pool, and if there is a subsequent reading operation on the full-written data storage pool, it is completely not affected by the data writing operation. The scheme is designed aiming at the characteristics of the gene sequencing application data, the gene sequencing application data is characterized in that the data is written once and only can be read subsequently without being modified, meanwhile, the data writing is carried out asynchronously, the real-time requirement is not high, and on the basis, the scheme in the embodiment of the application can at least ensure the reading performance of the gene sequencing application data.
The reading and writing method for third generation gene sequencing data provided in some embodiments of the present application, as shown in fig. 3, may further include the following steps:
s103: and after receiving the data repair request, determining the updated hard disk storage space. When a hard disk fails, the failed hard disk is replaced by an updated hard disk, the updated hard disk corresponds to the updated hard disk storage space, and at this time, data of the failed hard disk needs to be restored again.
S104: and storing the written repair data into the updated hard disk storage space.
By adopting the scheme, the common strategy of the distributed file is broken through during data repair, and only the repaired data is written into the newly replaced hard disk, so that the interference of the repaired data on other hard disks is avoided, and the influence of writing on data reading during data repair after the hard disk fails can be avoided.
The reading and writing method for third generation gene sequencing data provided in some embodiments of the present application, as shown in fig. 4, may further include the following steps:
s105: and after receiving a data reading request, directly reading the requested data if the data storage pool in which the requested data is located is in a full write state. As mentioned above, the read operation can be directly performed on the full data storage pool, and due to the characteristics of the third generation gene sequencing data, no data change operation is performed on the full data storage pool, so that no data write operation affects the read operation.
Further, the method may further include:
s106: after a system mounting signal is detected, scanning data retrieval information in a hard disk storage space, wherein the data retrieval information comprises a data storage directory entry and a data index node; and caching the data retrieval information into a system memory.
S107: and after receiving a random data reading request, determining a target directory and a target index node where the requested data is located according to the data retrieval information in the system memory.
For random read operation of large files, at present, a metadata part such as EXT4 (Fourth generation Extended file system) needs to access the hard disk at least 3 times, including finding a corresponding directory entry from a parent directory, reading file Inode (target Inode) information, and reading file layout extend (continuous storage space) information. For mechanical hard disks, each random access needs to consume about 7ms, and for third generation gene sequencing applications, the performance of the existing random read operation is difficult to meet the requirement. In order to accelerate the metadata access performance of the EXT4, according to the scheme, a kernel VFS (virtual File system) cache of a Linux system is improved, when an EXT4 File system is mounted (mounting refers to a process that an operating system enables computer files and directories on a storage device such as a hard disk, a CD-ROM or a shared resource to be accessible to a user through the File system of a computer), a background scanning program service is started, the files on the EXT4 File system are scanned, corresponding directory entries and Inode information are marked with special marks, so that a recovery mechanism of the kernel can be skipped when random reading operation is performed, and data retrieval information comprising the directory entries and the Inode information is cached in a memory all the time. As the third-generation gene sequencing is a large file, the memory consumption occupied by the directory entry and the Inode information is small, and the memory can be forcibly recycled when the memory needs to be actively recycled. According to the scheme, a cache mechanism is added in the operating system, so that directory entries and data index node information of EXT4 metadata can be cached for a long time, and metadata access performance is improved.
The reading and writing method for third generation gene sequencing data provided in some embodiments of the present application may further include the following steps:
s201: and setting a flow window. The setting of the traffic window may be established using a mechanism similar to the TCP sliding window.
S202: and after receiving a data writing request or a data reading request, adjusting the data writing flow or the data reading flow according to the flow window.
The scheme in the embodiment performs flow control on the data writing or data reading process by adopting a mechanism similar to a TCP sliding window, thereby avoiding the occurrence of a large number of conditions of request burst and retransmission, and avoiding the waste of network bandwidth and storage resources.
In addition, in order to optimize the size of the flow window and achieve the most effective flow control, in some embodiments, the step of adjusting the data write flow or the data read flow according to the flow window in step S202 includes:
s2021: and acquiring the delay time of the data writing request or the data reading request.
S2022: if the delay time length is less than the expected time length, increasing the flow window according to a set proportion; and if the delay time length is greater than the expected time length, reducing the flow window according to a set proportion.
The expected duration is the duration meeting the read-write efficiency requirement and can be set through an empirical value. In a specific implementation, an initial stage may first apply for an initial size of a traffic window from a system control center, and control the number of requests sent to each node and the number of requests received from other nodes by using the traffic window of the initial size, where if a delay duration for each request is expected, the traffic window may be expanded according to a set proportion, for example, the delay duration is increased at an increasing rate of 10%, and if a resource shortage or congestion condition occurs in a node that sends a request or receives a request, the delay duration may exceed an expected duration, and at this time, the expansion of the traffic window is stopped. If there is a node that receives a request or sends a request that continues to feed back resource shortage or congestion, the traffic window for that node is decremented proportionally, e.g., by decrementing the traffic window at a 10% decrement rate, to reduce the pressure on the node to send or receive requests until the node no longer feeds back resource shortage or congestion. In the above scheme in this embodiment, the data transmission rate is controlled by using the flow window with the size automatically regulated and controlled, so that the node is not overloaded, and the request retransmission is avoided, thereby improving the data read-write performance.
In some embodiments, a storage server 10 for third generation gene sequencing data is provided, as shown in fig. 5, comprising a data management module 101 and at least one hard disk 102. The data management module is used for dividing a hard disk storage space of the hard disk into a plurality of data storage pools; and after receiving a data write request, sequentially storing write data into the data storage pools, wherein one data storage pool is switched to the next data storage pool after being fully written. The gene sequencing application data is characterized in that the data is written once, and only can be read subsequently without modification, and meanwhile, the data writing is carried out asynchronously, so that the real-time requirement is not high.
In some embodiments of the present application, the data management module 101 is further configured to, after receiving a data reading request, directly read the requested data if a data storage pool in which the requested data is located is in a full write state; the gene sequencing application data is characterized in that the data is written once, and only reading can be carried out subsequently without modification, meanwhile, the data writing is carried out asynchronously, and the real-time requirement is not high.
In some embodiments of the present application, the data management module 101 is further configured to determine an updated hard disk storage space after receiving a data repair request; and storing the written repair data into the updated hard disk storage space. By adopting the scheme, the common strategy of the distributed file is broken through during data repair, and only the repaired data is written into the newly replaced hard disk, so that the interference of the repaired data on other hard disks is avoided, and the influence of writing on data reading during data repair after the hard disk fails can be avoided.
In some embodiments of the present application, after detecting a system mount signal, the data management module 101 scans data retrieval information in a hard disk storage space, where the data retrieval information includes a data storage directory entry and a data index node; caching the data retrieval information into a system memory; and after receiving a random data reading request, determining a target directory and a target index node where the requested data is located according to the data retrieval information in the system memory. According to the scheme, a cache mechanism is added in the operating system, so that directory entries and data index node information of EXT4 metadata can be cached for a long time, and metadata access performance is improved.
As shown in fig. 6, the storage server for third generation gene sequencing data in some embodiments of the present application further includes a Qos management module 103, configured to set a flow window; after receiving a data writing request or a data reading request, adjusting data writing flow or data reading flow according to the flow window; by adopting a mechanism similar to a TCP sliding window, the flow control is carried out on the data writing or data reading process, the situations of a large number of requests burst and retransmission are avoided, and the waste of network bandwidth and storage resources is avoided.
The Qos management module 103 in some embodiments of the present application is further configured to obtain a delay duration of the data write request or the data read request; if the delay time length is less than the expected time length, increasing the flow window according to a set proportion; and if the delay time length is greater than the expected time length, reducing the flow window according to a set proportion. The flow control method and the flow control device can optimize the size of the flow window and achieve the most effective flow control.
In some embodiments, a distributed file system is further provided, where the system architecture is as shown in fig. 7, and includes a plurality of storage servers 10 for third generation gene sequencing data as described in any of the above, and further includes a global data distribution manager 20, which is used to perform differentiated processing of write operation and read operation on all hard disk storage spaces in the plurality of storage servers 10. In the system, the global data distribution manager 20 may perform read-write data distribution separation on hard disks of different storage servers 10, and the data management module 101 in each storage server 10 may also be configured to perform data storage pool partitioning on a hard disk storage space in the server, so as to ensure that a write operation of a next data storage pool is executed only after one data storage pool is fully written. Furthermore, the Qos management module 103 in each storage server 10 is configured to control write and read traffic in each storage server 10, so as to avoid blocking in other storage servers 10 caused by excessive requests received or issued by a certain storage server 10, and avoid rejecting a large number of requests caused by insufficient storage service capability in a certain storage server 10. The reading of EXT4 metadata is accelerated in each storage server 10 by VFS caching. Through the improvements, the system provided by the embodiment can be more suitable for data reading and writing operations of third-generation gene sequencing applications.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A reading and writing method for third generation gene sequencing data is characterized by comprising the following steps:
dividing a hard disk storage space into a plurality of data storage pools;
and after receiving a data write request, sequentially storing write data into the data storage pools, wherein one data storage pool is switched to the next data storage pool after being fully written.
2. The method of claim 1, further comprising the steps of:
and after receiving a data reading request, directly reading the requested data if the data storage pool in which the requested data is located is in a full write state.
3. The method of claim 1, further comprising the steps of:
after receiving a data repair request, determining an updated hard disk storage space;
and storing the written repair data into the updated hard disk storage space.
4. A method of reading and writing third generation gene sequencing data according to any one of claims 1 to 3, further comprising the steps of:
after a system mounting signal is detected, scanning data retrieval information in a hard disk storage space, wherein the data retrieval information comprises a data storage directory entry and a data index node; caching the data retrieval information into a system memory;
and after receiving a random data reading request, determining a target directory and a target index node where the requested data is located according to the data retrieval information in the system memory.
5. The method of claim 4, further comprising the steps of:
setting a flow window;
and after receiving a data writing request or a data reading request, adjusting the data writing flow or the data reading flow according to the flow window.
6. The method of claim 5, wherein the step of adjusting the data write flow or the data read flow according to the flow window comprises:
acquiring the delay time of the data writing request or the data reading request;
if the delay time length is less than the expected time length, increasing the flow window according to a set proportion; and if the delay time length is greater than the expected time length, reducing the flow window according to a set proportion.
7. A storage server for third generation gene sequencing data, comprising:
at least one hard disk;
the data management module is used for dividing the hard disk storage space of the hard disk into a plurality of data storage pools; and after receiving a data write request, sequentially storing write data into the data storage pools, wherein one data storage pool is switched to the next data storage pool after being fully written.
8. The storage server for third generation gene sequencing data of claim 7, wherein:
the data management module is further configured to, after receiving a data reading request, directly read the requested data if the data storage pool where the requested data is located is in a full write state; and/or the presence of a gas in the gas,
after receiving a data repair request, determining an updated hard disk storage space; storing the written repair data into the updated hard disk storage space; and/or the presence of a gas in the gas,
after a system mounting signal is detected, scanning data retrieval information in a hard disk storage space, wherein the data retrieval information comprises a data storage directory entry and a data index node; caching the data retrieval information into a system memory;
and after receiving a random data reading request, determining a target directory and a target index node where the requested data is located according to the data retrieval information in the system memory.
9. The storage server for third generation gene sequencing data of claim 8, further comprising:
a Qos management module, which is used for setting a flow window; after receiving a data writing request or a data reading request, adjusting data writing flow or data reading flow according to the flow window; and:
acquiring the delay time of the data writing request or the data reading request; if the delay time length is less than the expected time length, increasing the flow window according to a set proportion; and if the delay time length is greater than the expected time length, reducing the flow window according to a set proportion.
10. A distributed file system comprising a plurality of storage servers for third generation genetic sequencing data as claimed in any of claims 7 to 9, further comprising:
and the global data distribution manager is used for performing differentiated processing of writing operation and reading operation on all hard disk storage spaces in the plurality of storage servers.
CN202110578909.5A 2021-05-26 2021-05-26 Reading and writing method for third-generation gene sequencing data and distributed file system Withdrawn CN113192558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110578909.5A CN113192558A (en) 2021-05-26 2021-05-26 Reading and writing method for third-generation gene sequencing data and distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110578909.5A CN113192558A (en) 2021-05-26 2021-05-26 Reading and writing method for third-generation gene sequencing data and distributed file system

Publications (1)

Publication Number Publication Date
CN113192558A true CN113192558A (en) 2021-07-30

Family

ID=76985344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110578909.5A Withdrawn CN113192558A (en) 2021-05-26 2021-05-26 Reading and writing method for third-generation gene sequencing data and distributed file system

Country Status (1)

Country Link
CN (1) CN113192558A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114420210A (en) * 2022-03-28 2022-04-29 山东大学 Rapid trimming method and system for biological sequencing sequence

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060112155A1 (en) * 2004-11-24 2006-05-25 Agami Systems, Inc. System and method for managing quality of service for a storage system
CN102859499A (en) * 2010-04-30 2013-01-02 株式会社日立制作所 Computer system and storage control method of same
CN103049680A (en) * 2012-12-29 2013-04-17 深圳先进技术研究院 gene sequencing data reading method and system
US20150193350A1 (en) * 2012-07-27 2015-07-09 Tencent Technology (Shezhen) Comany Limited Data storage space processing method and processing system, and data storage server
US20170161300A1 (en) * 2013-08-13 2017-06-08 Maxta, Inc. Shared data storage leveraging dispersed storage devices
CN108537007A (en) * 2017-03-04 2018-09-14 上海逐玛信息技术有限公司 A kind of access method for gene sequencing data
CN110109886A (en) * 2018-02-01 2019-08-09 中兴通讯股份有限公司 The file memory method and distributed file system of distributed file system
CN111787062A (en) * 2020-05-28 2020-10-16 北京航空航天大学 Wide area network file system-oriented adaptive fast increment pre-reading method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060112155A1 (en) * 2004-11-24 2006-05-25 Agami Systems, Inc. System and method for managing quality of service for a storage system
CN102859499A (en) * 2010-04-30 2013-01-02 株式会社日立制作所 Computer system and storage control method of same
US20150193350A1 (en) * 2012-07-27 2015-07-09 Tencent Technology (Shezhen) Comany Limited Data storage space processing method and processing system, and data storage server
CN103049680A (en) * 2012-12-29 2013-04-17 深圳先进技术研究院 gene sequencing data reading method and system
US20170161300A1 (en) * 2013-08-13 2017-06-08 Maxta, Inc. Shared data storage leveraging dispersed storage devices
CN108537007A (en) * 2017-03-04 2018-09-14 上海逐玛信息技术有限公司 A kind of access method for gene sequencing data
CN110109886A (en) * 2018-02-01 2019-08-09 中兴通讯股份有限公司 The file memory method and distributed file system of distributed file system
CN111787062A (en) * 2020-05-28 2020-10-16 北京航空航天大学 Wide area network file system-oriented adaptive fast increment pre-reading method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
武伟: "《操作系统教程》", 28 February 2004, 北京:机械工业出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114420210A (en) * 2022-03-28 2022-04-29 山东大学 Rapid trimming method and system for biological sequencing sequence

Similar Documents

Publication Publication Date Title
CN103019948B (en) The swap file using sequence continuously is operated the method and system of set exchange
EP1410219B1 (en) Disk caching
US10430338B2 (en) Selectively reading data from cache and primary storage based on whether cache is overloaded
CN108733306B (en) File merging method and device
CN103530388A (en) Performance improving data processing method in cloud storage system
US20050071550A1 (en) Increasing through-put of a storage controller by autonomically adjusting host delay
CN111930316B (en) Cache read-write system and method for content distribution network
US10891150B2 (en) Storage control method and storage controller for user individual service environment
CN103399823A (en) Method, equipment and system for storing service data
CN103959275A (en) Dynamic process/object scoped memory affinity adjuster
CN101645837A (en) Method and device for realizing load balancing
CN107888687B (en) Proxy client storage acceleration method and system based on distributed storage system
CN111737212A (en) Method and equipment for improving performance of distributed file system
CN107133183B (en) Cache data access method and system based on TCMU virtual block device
CN111694765A (en) Mobile application feature-oriented multi-granularity space management method for nonvolatile memory file system
CN113192558A (en) Reading and writing method for third-generation gene sequencing data and distributed file system
CN102609508B (en) High-speed access method of files in network storage
JP4189342B2 (en) Storage apparatus, storage controller, and write-back cache control method
CN111124302B (en) SAN shared file storage and archiving method and system
CN107181773A (en) Data storage and data managing method, the equipment of distributed memory system
CN108334457B (en) IO processing method and device
CN109582233A (en) A kind of caching method and device of data
CN111708489B (en) Method and equipment for improving hard disk service quality
CN112445794B (en) Caching method of big data system
CN114785662A (en) Storage management method, device, equipment and machine readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210730

WW01 Invention patent application withdrawn after publication