WO2021120789A1 - 数据写入方法、装置及存储服务器和计算机可读存储介质 - Google Patents

数据写入方法、装置及存储服务器和计算机可读存储介质 Download PDF

Info

Publication number
WO2021120789A1
WO2021120789A1 PCT/CN2020/119881 CN2020119881W WO2021120789A1 WO 2021120789 A1 WO2021120789 A1 WO 2021120789A1 CN 2020119881 W CN2020119881 W CN 2020119881W WO 2021120789 A1 WO2021120789 A1 WO 2021120789A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
write
request
cache
swiped
Prior art date
Application number
PCT/CN2020/119881
Other languages
English (en)
French (fr)
Inventor
张煜
周可
王桦
胡健鹰
吉永光
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021120789A1 publication Critical patent/WO2021120789A1/zh
Priority to US17/523,730 priority Critical patent/US11947829B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0868Data transfer between cache memory and other subsystems, e.g. storage devices or host systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/154Networked environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/21Employing a record carrier using a specific recording technology
    • G06F2212/217Hybrid disk, e.g. using both magnetic and solid state storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/314In storage network, e.g. network attached cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/50Control mechanisms for virtual memory, cache or TLB
    • G06F2212/502Control mechanisms for virtual memory, cache or TLB using adaptive policy

Definitions

  • This application relates to the field of storage technology, in particular to data writing technology.
  • the storage server When the client generates a write request, the storage server will determine whether to load the data to be written into the cache according to the write strategy, and when to flush the dirty data in the cache to the back-end hard disk drive. Therefore, the write strategy has a great impact on system performance, data consistency, and write traffic to cache media.
  • write-through In related technologies, there are three basic write strategies: write-through, write-back, and write-around.
  • the write-through strategy is used to write the data to be written into the cache and the back-end hard drive at the same time.
  • the write-back strategy is used to write the data to be written into the cache first, and then asynchronously flash it to the back-end hard drive.
  • the write-by-write strategy is used to The data to be written is directly written to the back-end hard drive, and the missed data is loaded into the cache only when there is a read operation.
  • This application provides a data writing method and device, a storage server and a computer-readable storage medium, which can effectively improve the efficiency of the writing strategy.
  • the first aspect of the present application provides a data writing method, which is executed by a server, and includes:
  • a data writing device including:
  • the first write module is configured to write the write data corresponding to the write request into the write buffer when the write request is received;
  • An acquiring module configured to acquire historical access data of the data block corresponding to the data to be swiped down in the write buffer when the data refresh operation is triggered for the write buffer;
  • a judging module configured to judge whether the data to be swiped down is write-only data based on the historical access data
  • the second writing module is configured to write the data to be flashed into the hard disk drive when the data to be flashed is write-only data;
  • the third writing module is configured to write the data to be swiped into the buffer when the data to be swiped is not write-only data.
  • the third aspect of the present application provides a storage server, including:
  • the processor is used to execute a program stored in the memory
  • the memory is used to store a program, and the program is at least used to execute the data writing method provided in the above-mentioned first aspect.
  • the fourth aspect of the present application provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned data writing method is implemented. step.
  • the fifth aspect of the present application provides a computer program product, including instructions, which when run on a computer, cause the computer to execute the broadcast service data writing method as described in the foregoing first aspect.
  • a data writing method includes: when a write request is received, write data corresponding to the write request into the write buffer; when the data in the write buffer is triggered During the flashing operation, the historical access data of the data block corresponding to the data to be flashed is obtained; based on the historical access data, it is judged whether the data to be flashed is write-only data; if so, the data to be flashed is written to the hard disk drive ; If not, write the data to be swiped down into the cache.
  • the data generated by the client is divided into different types, that is, write-only data and ordinary data. Considering that there are only write operations for write-only data within a certain time window, and no read operations, loading write-only data into the cache will not increase the read hit rate of the cache, but will also cause a large number of unnecessary writes to the cache. Incoming flow. Therefore, the data writing method provided in this application judges the type of data to be flushed down in the write buffer, so as to adopt different writing methods for write-only data and ordinary data. Specifically, a write-through strategy is adopted for ordinary data.
  • FIG. 1 is an architecture diagram of a data writing system provided by an embodiment of the application
  • FIG. 3 is a flowchart of a second data writing method provided by an embodiment of the application.
  • FIG. 4 is a flowchart of a third data writing method provided by an embodiment of the application.
  • FIG. 5 is a flowchart of a classifier training method provided by an embodiment of the application.
  • FIG. 6 is a flowchart of a fourth data writing method provided by an embodiment of the application.
  • FIG. 7 is a system architecture diagram of an application embodiment provided by this application.
  • FIG. 8 is a flowchart of an application embodiment provided by this application.
  • FIG. 9 is a structural diagram of a data writing device provided by an embodiment of the application.
  • FIG. 10 is a structural diagram of a storage server provided by an embodiment of this application.
  • the data generated by the client is divided into write-only data and normal data. Since the write-only data only has write operations within a certain time window, a write-around strategy can be used for it to write to the back-end hard drive. Because ordinary data may have read operations within a certain time window, write-through strategy or write-back strategy can be used for it to write to the cache. In this way, writing only write data directly to the back-end hard drive can effectively reduce the useless traffic written to the cache, while leaving more space in the cache for ordinary data, increasing the utilization of the cache space, and improving the read hit rate of the cache .
  • FIG. 1 shows an architecture diagram of a data writing system provided by an embodiment of the present application.
  • the data writing system includes a client 100, a storage server 200, and a storage area 300.
  • the client The write data is transmitted between 100 and the storage server 200, the storage server 200, and the storage area 300 through the network 400.
  • the storage area 300 includes a write buffer 31, a cache (SSD) 32, and a back-end hard disk drive (HDD) 33.
  • the write buffer 31 may be composed of DRAM (full Chinese name: Dynamic Random Access Memory, full English name: Dynamic Random Access Memory), which is not specifically limited here.
  • the storage server 200 determines the data type of the data to be flashed, and if it is write-only data, it directly writes it to the back-end hard drive 33, otherwise, the write-through strategy can be used to change the data type.
  • the cache 32 is used to temporarily store data. When the client generates a read request, if the data corresponding to the read request is in the cache 32, it can be returned directly without accessing the back-end hard drive 33, which improves the reading efficiency.
  • the cache 32 in this embodiment may be an SSD cache, etc., which is not specifically limited here.
  • the embodiment of the present application discloses a data writing method, which improves the efficiency of the writing strategy.
  • FIG. 2 a flowchart of a data writing method provided by an embodiment of the present application, as shown in FIG. 2, includes:
  • S101 The client generates a write request, and sends the write request to the storage server;
  • the client In this step, the client generates a write request, the write request includes write data, and the write address may also be specified in the write request, which is not specifically limited here.
  • the client sends the write request to the storage server, so that the storage server processes the write data corresponding to the write request according to a preset defined write strategy.
  • S102 The storage server writes the write data corresponding to the write request into the write buffer
  • the write data corresponding to all write requests generated by the client will be written into the write buffer first, and then sequentially recorded in the log file, which is used for data recovery.
  • the storage server obtains historical access data of the data block corresponding to the data to be flushed in the write buffer, and judges the data to be flushed based on the historical access data Whether it is write-only data; if yes, go to S104; if not, go to S105;
  • write-only data refers to data in which only write operations but no read operations exist within a certain time window, and data other than write-only data is called ordinary data.
  • the storage server determines the data to be flushed in the write buffer, and obtains the historical access data of the data block corresponding to the data to be flushed.
  • the client When the client generates a write request, it can specify the write address, and the data block here is the data block corresponding to the write address.
  • the historical access data of the data block can characterize the historical access situation of the data block. According to the historical access data, the probability of the data block storing write-only data can be determined, and based on this, it can be judged whether the data to be flushed down to be stored in the data block is Only write data.
  • this embodiment does not limit the specific determination method of the data type.
  • the data type of the data to be swiped can be determined based on a statistical method or a classifier, which is within the protection scope of this embodiment.
  • this step does not limit the trigger conditions for the data flushing operation.
  • the data flushing operation can be triggered when the data amount of the write buffer reaches the threshold, that is, when the data flushing is triggered for the write buffer
  • the step of obtaining historical access data of the data block corresponding to the data to be flushed includes: when the amount of data in the write buffer reaches a threshold, triggering a data flushing operation for the write buffer, and obtains the data to be flushed.
  • S104 The storage server writes the data to be flashed into the hard disk drive
  • the cache is used to temporarily store data.
  • the client When the client generates a read request, if the data corresponding to the read request is stored in the cache, the data can be directly returned to the client without access to the back-end hard drive , The reading efficiency is higher.
  • write-only data there are only write operations and no read operations within a certain time window. Therefore, loading write-only data into the cache will not increase the read hit rate of the cache, but will also cause a large number of unnecessary operations to the cache. Write traffic.
  • a write-by-write strategy is adopted for the write-only data, that is, when the data to be flushed is write-only data, it is directly written into the hard disk drive at the back end.
  • this step includes: flushing the dirty data corresponding to the data to be flushed in the cache to the hard disk drive; and writing the data to be flushed into the hard disk drive.
  • S105 The storage server writes the data to be swiped down into the cache.
  • a write-through strategy or a write-back strategy can be used for ordinary data, which is not limited here.
  • this step includes: simultaneously writing the data to be flushed into the cache and the hard disk drive.
  • this step includes: writing the data to be flushed into the cache; when a preset condition is met, flushing the data in the cache to the hard disk drive in.
  • the preset condition here can be that the read hit rate is less than the preset value, that is, when the read hit rate of the cache is less than the preset value, the data in the data block with the lower read hit rate in the cache is flushed to the hard drive .
  • the preset condition here can also be that the cache is full, that is, when the cache is full, part of the data in the cache is refreshed to the hard drive. Of course, it can also be triggered based on other preset conditions to refresh the data in the cache to the hard drive. There is no specific limitation here.
  • the data generated by the client is divided into different types, that is, only write data and ordinary data.
  • Write-only data has only write operations within a certain time window, but no read operations. That is to say, the client's read request will not hit the write-only data within this time window. Therefore, the write-only data is written to the cache. Not only will it not increase the read hit rate of the cache, but it will also cause a large amount of unnecessary write traffic to the cache.
  • the data writing method provided by the embodiments of the present application adopts different writing methods for write-only data and ordinary data.
  • a write-through strategy or a write-back strategy is used to write it into the cache, while for write-only data
  • the type of data to be swiped is determined based on statistics, and the storage server will be used as the execution subject for introduction. specific:
  • FIG. 3 a flowchart of a second data writing method provided by an embodiment of the present application, as shown in FIG. 3, includes:
  • S203 Based on a statistical algorithm, judge whether the data to be swiped is write-only data according to the historical access data; if so, go to S204; if not, go to S205;
  • a statistical method is used to determine the type of data to be swiped based on historical access data, where the historical access data is the historical access data of the data block corresponding to the data to be swiped.
  • the client When the client generates a write request, it can specify the write address, and the data block here is the data block corresponding to the write address.
  • the historical access data of the data block can describe the historical access of the data block, and can include the number of write requests and read requests within a certain time window. According to the historical access data of the data block, it can be determined that the data block stores only write data. Then determine whether the data to be flashed down to be stored in the data block is write-only data.
  • this step includes: determining the number of write requests for the data block corresponding to the data to be flushed in multiple time windows based on the historical access data; calculating the number of write requests according to the number of write requests The write-only probability of the data block corresponding to the down-flashing data; wherein, the write-only probability is used to describe the probability that the data block stores write-only data; it is judged whether the write-only probability is greater than the third preset value, and if so, then It is determined that the data to be swiped down is the write-only data, and if not, it is determined that the data to be swiped down is ordinary data.
  • the number of write requests and the total number of requests for the data block corresponding to the data to be flushed in multiple time windows are determined based on historical access data.
  • the time window is not limited here, for example, one time window can be 1 day , Multiple time windows can be one month, that is, determine the number of write requests and the total number of requests for the data block every day within a month.
  • the total number of requests includes the number of write requests and the number of read requests.
  • the write request ratio of the data block that is, the ratio of the number of write requests to the total number of requests in each time window is calculated, and the average of the write request ratios of all time windows is taken as the write-only probability of the data block.
  • the write-only probability can be Characterize the probability that the data block stores write-only data. When the write-only probability is greater than the third preset value, it can be determined that the data to be stored in the data block is write-only data, that is, the data to be swiped down is write-only data; otherwise, it is determined that the data to be swiped down is ordinary data.
  • S205 Write the data to be swiped down into the cache.
  • this embodiment uses statistical methods to determine the type of data to be swiped, and uses different writing methods for write-only data and ordinary data.
  • a classifier is used to determine the type of data to be swiped, and the storage server will also be used as the execution subject for introduction. specific:
  • FIG. 4 a flowchart of a third data writing method provided by an embodiment of the present application, as shown in FIG. 4, includes:
  • a classifier is used to determine the type of data to be swiped.
  • the feature information of the data block corresponding to the data to be swiped is extracted as the first Feature information.
  • the feature information here can include any one or both of the time feature and the request type feature.
  • the time feature is the feature used to describe the time of accessing the data block
  • the request type feature is the feature used to describe the access to the data block. The characteristics of the request.
  • the time feature may include any one or two of the last access timestamp and the average reuse time difference.
  • the last access timestamp is used to describe the time point of the last access to the data block
  • the average reuse time difference is used to describe the accessed data.
  • the average time interval of the block It is understandable that the time feature can indicate the new progress and access time interval of the data block, that is, the last access timestamp describes the new progress of the access, and the average reuse time difference describes the access time interval.
  • the current time and the last access data can also be used.
  • the time interval of the time point of the block represents the new progress of the access of the data block, which is not specifically limited here.
  • the above-mentioned time feature can be standardized, for example, the unit of the time feature is set to hours, and the upper limit of the value is limited to 100.
  • the classifier there is a step of training the classifier by default.
  • machine learning algorithms can be used for training, and the commonly used feature information in the machine learning algorithm can also use spatial features, that is, the data to be written Address information, such as volume ID, offset, etc.
  • spatial features that is, the data to be written Address information, such as volume ID, offset, etc.
  • the address will change over time, which affects the prediction of the classifier. Therefore, in this embodiment, the spatial feature is not used, but the request-type feature is used to enrich the training feature.
  • the request characteristics may include any one or a combination of the average request size, the large request rate, the small request rate, and the write request rate.
  • the large request rate is used to describe that the request size is greater than the first request size in the request to access the data block.
  • a preset request ratio, the small request ratio is used to describe the ratio of requests whose request size is smaller than the second preset value among requests to access data blocks, and the write request ratio is used to describe the ratio of write requests among requests to access data blocks.
  • the first preset value is greater than or equal to the second preset value.
  • the average request size is the average request size for accessing data blocks, the unit can be kilobytes, and the upper limit can be 100KB.
  • the large request ratio can be the proportion of requests with a request size greater than 64KB in all access requests, and the small request ratio is the proportion of requests with a request size less than 8KB in all access requests.
  • the data area can be used as the statistical granularity of the feature information, and each data area includes multiple data blocks to balance the memory usage and prediction accuracy.
  • the access granularity in the storage system that is, the minimum read and write granularity is a data block, such as 8KB, and the size of the data area is an integer multiple of the access granularity, for example, the size of the data area may be 1MB.
  • the characteristic information of each data area is determined based on the historical access data of all data blocks in the data area, and the data blocks in the same data area share this Characteristic information of the data area. That is, the step of acquiring the characteristic information of the data block corresponding to the data to be swiped as the first characteristic information includes: determining the data area to which the data block corresponding to the data to be swiped belongs, and extracting the characteristic information of the data area As the first feature information.
  • S303 Using a pre-trained classifier, determine whether the data to be swiped is write-only data according to the first feature information; if yes, go to S304; if not, go to S305;
  • S305 Write the data to be swiped down into the cache.
  • the classifier is used to determine the type of data to be swiped, and the prediction accuracy of the classifier is relatively high, which improves the accuracy of determining the data type.
  • the corresponding classification is first determined by the classifier, and then different operations are performed according to the classification results, that is, the write-back strategy or the write-through strategy is used for ordinary data, and the bypass strategy is used for write-only data. Write strategy.
  • the write traffic to the cache is greatly reduced, and at the same time, more cache space is reserved for ordinary data, and the resource utilization of the cache space is improved, thereby improving the read performance of the storage system.
  • Machine learning (English full name: Machine Learning, English abbreviation: ML) is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, and complex algorithms. Many disciplines such as degree theory. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and style teaching learning. specific:
  • a flow chart of a method for training a classifier provided by an embodiment of the present application, as shown in Fig. 5, includes:
  • This embodiment uses a machine learning algorithm to train the classifier. It should be noted that the specific machine learning algorithm is not limited here. Supervised machine learning algorithms can be used, such as Naive Bayes, Logistic regression, and decision making. Tree, AdaBoost and random forest, etc. Run the classifier on the X86-64 computing server. The server has two Intel Xeon E5-2670-v3 CPUs and 128GB DRAM memory. The classification results of different machine learning algorithms are shown in Table 1. The training time in Table 1 is training The time spent on training samples in a day.
  • training samples are collected within a certain time window to train the classifier, for example, training samples are collected every day to train the classifier, and the trained classifier is used to distinguish the data type of the data to be brushed the next day . It takes hundreds of milliseconds for all supervised machine learning algorithms to train samples per day, and the size of the classifier model is only a few megabytes. It can be seen that the impact of training the classifier on the storage system is negligible.
  • the training sample may be replacement data, that is, when the data replacement operation is triggered for the cache, that is, when the first data in the cache is replaced with the second data in the hard drive, the first data is this step.
  • Replacement data in. Mark it based on its historical read data. For example, if the replacement data has one or more read hits, it will be marked as 0 (0 means normal data), otherwise it will be marked as 1 (1 means only Write data).
  • S402 Obtain feature information of the data block corresponding to the replacement data, and use the labeled replacement data and the second feature information to train a classifier as second feature information.
  • the feature information of the training sample is extracted as the second feature information.
  • the second feature information here may also include time features and request-type features, which will not be repeated here.
  • the marked training sample and the feature information of the training sample are input into the classifier, which is used to train the classifier to obtain the trained classifier.
  • the trained classifier is used to determine the type of data, and only write data or ordinary data.
  • the data region can also be used as the statistical granularity of the feature information of the training sample. That is, the step of obtaining the feature information of the data block corresponding to the replacement data as the second feature information includes: determining the data area to which the data block corresponding to the replacement data belongs; and extracting the feature information of the data area as the The second feature information.
  • the machine learning algorithm is used to train the classifier.
  • the requested feature is used to enrich the training features of the classifier, so that the classification effect of the classifier is better and the deterministic data is improved.
  • the accuracy of the type is improved.
  • the embodiment of the application discloses a data writing method. Compared with the previous embodiments, this embodiment further explains and optimizes the technical solution, and also introduces the storage server as the execution subject. specific:
  • FIG. 6 a flowchart of a fourth data writing method provided by an embodiment of the present application, as shown in FIG. 6, includes:
  • S503 Obtain historical access data of the data block corresponding to the data to be swiped, and extract feature information of the data block corresponding to the data to be swiped from the historical access data as the first feature information;
  • S504 Using the pre-trained classifier, judge whether the data to be swiped is write-only data according to the first feature information; if yes, go to S505; if not, go to S507;
  • this embodiment provides a write strategy based on machine learning, which can accurately identify different types of data, that is, only write data and ordinary data, and then perform different processing on different types of data.
  • the write-back strategy is adopted for ordinary data
  • the write-by-write strategy is adopted for write-only data, which minimizes the write traffic to the cache, and at the same time reserves more cache space for ordinary data, and improves the resource utilization of the cache space, thereby Improved the read performance of the storage system.
  • the data writing system includes a client, a classifier, and a storage area of the cloud block storage product Tencent CBS.
  • Hypervisor also called virtual machine monitor, full English name: virtual machine monitor, English abbreviation: VMM
  • the classifier collects training samples for training machine learning models.
  • the storage area includes a write buffer and a back-end storage.
  • the back-end storage includes an SSD cache and an HDD.
  • the write strategy determines whether to load the data to be written into the SSD cache and when to flush dirty data in the SSD cache to the HDD.
  • the processing flow of the write strategy is as follows:
  • Step 1 Write the write data corresponding to all write requests into the write buffer and sequentially record them in the log file, and notify the storage server when the amount of dirty data in the write buffer reaches the threshold.
  • Step 2 When the data needs to be flushed in the write buffer, the classifier obtains the characteristics of the area where the data block is currently refreshed and analyzes its type. If the data stored in the current refresh data block is write-only data, go to step 3, and if it is normal data, go to step 4.
  • Step 3 Find out whether there is dirty data in the SSD cache of the current refreshed data block. If there is, first flush the dirty data in the SSD cache, and the flush data in the write buffer is directly written to the HDD, and the process ends.
  • Step 4 Write down the data to the SSD cache, and then use the write-back strategy to asynchronously flush to the HDD.
  • the reason why the write-back strategy is used instead of the write-through strategy here is that the SSD has the advantage of non-volatility, so the write-back strategy is sufficient to ensure data consistency.
  • the classifier trains the model once a day to make predictions for the next day. Collect training samples during system operation. When there is data that needs to be eliminated in the SSD cache, the deleted data is flashed to the HDD and a training sample is added.
  • the write traffic of other strategies is standardized as the write traffic of the write strategy.
  • the write-bypass strategy achieves the lowest write traffic to the SSD (1). This is because the incorrectly written data is first written directly to the back-end storage. However, the write-by-write strategy achieves the lowest hit rate of 92.13% and causes the highest read delay of 1.29 ms. It can be seen that although the write bypass strategy achieves the lowest SSD write traffic, it destroys the cache performance in terms of hit rate and read latency.
  • the write-back strategy results in the maximum write traffic of the SSD (7.49).
  • the write strategy provided in this embodiment increases the hit rate from 94.68% to 97.22%, an increase of 2.61%, and reduces the read delay and write traffic by 37.52% (934.16 ⁇ s to 583.64 ⁇ s) And 41.52% (from 7.49 to 4.38). It can be seen that the write strategy provided in this embodiment achieves the best performance.
  • the following describes a data writing device provided by an embodiment of the present application.
  • the data writing device described below and the data writing method described above can be cross-referenced.
  • FIG. 9 a structural diagram of a data writing device provided by an embodiment of the present application, as shown in Fig. 9, includes:
  • the first writing module 901 is configured to write the write data corresponding to the write request into the write buffer when a write request is received;
  • the acquiring module 902 is configured to acquire the historical access data of the data block corresponding to the data to be swiped down in the write buffer when the data swiping operation is triggered for the write buffer;
  • the judgment module 903 is configured to judge whether the data to be swiped down is write-only data based on the historical access data;
  • the second writing module 904 is configured to write the data to be flashed into the hard disk drive when the data to be flashed is write-only data;
  • the third writing module 905 is configured to write the data to be swiped into the cache when the data to be swiped is not write-only data.
  • the data generated by the client is divided into different types, that is, only write data and ordinary data. Because there are only write operations for write-only data within a certain time window, and no read operations, loading write-only data into the cache will not increase the read hit rate of the cache, but will also cause a large number of unnecessary writes to the cache flow. Therefore, the data writing device provided in the embodiments of the present application adopts different writing methods for write-only data and ordinary data. For ordinary data, a write-through strategy or a write-back strategy is used to write it into the cache, while for write-only data, The write-bypass strategy writes it directly to the back-end hard drive.
  • the data writing device provided by the embodiment of the present application realizes the minimum write flow to the cache and improves the efficiency of the write strategy.
  • the second writing module 904 includes:
  • the first refresh unit is configured to refresh the dirty data corresponding to the data to be refreshed in the cache to the hard disk drive;
  • the first writing unit is used to write the data to be flashed into the hard disk drive.
  • the third writing module 905 includes:
  • the second writing unit is used to write the data to be swiped down into the cache
  • the second refresh unit is configured to refresh the data in the cache to the hard disk drive when the preset condition is met.
  • the third writing module 905 is specifically configured to: simultaneously write the data to be flushed into the cache and the hard disk drive.
  • the acquisition module 902 is specifically configured to: when the amount of data in the write buffer reaches a threshold, trigger a data flushing operation for the write buffer, and Obtain historical access data of the data block corresponding to the data to be flushed in the write buffer.
  • the judgment module 903 is specifically configured to: extract the feature information of the data block corresponding to the data to be swiped from the historical access data as the first feature Information; using a pre-trained classifier to determine whether the data to be swiped is the write-only data according to the first feature information.
  • the device further includes:
  • a marking module configured to mark the replacement data based on the historical read data of the replacement data in the buffer when a data replacement operation is triggered for the buffer;
  • the training module is used to obtain the feature information of the data block corresponding to the replacement data as the second feature information; use the labeled replacement data and the second feature information to train the classifier.
  • the training module includes:
  • a determining unit configured to determine the data area to which the data block corresponding to the replacement data belongs; wherein the data area includes a plurality of data blocks, and the size of the data area is an integer multiple of the access granularity;
  • An extraction unit configured to extract feature information of the data area as the second feature information
  • the training unit is used to train the classifier by using the marked replacement data and the second feature information.
  • the characteristic information includes at least one of a time characteristic and a request type characteristic;
  • the time characteristic is a characteristic used to describe the time when the data block is accessed, and the
  • the request type feature is the feature used to describe the request to access the data block.
  • the time feature includes at least one of the last access time stamp and the average reuse time difference; the last access time stamp is used to describe the last access to the data block The average reuse time difference is used to describe the average time interval for accessing data blocks;
  • the request characteristics include any one or a combination of the average request size, the large request ratio, the small request ratio, and the write request ratio; the large request ratio is used to describe that the request size is greater than the first request size in the request to access the data block.
  • a preset request ratio where the small request ratio is used to describe the ratio of requests with a request size smaller than a second preset value among requests to access data blocks, and the write request ratio is used to describe requests to access data blocks.
  • the ratio of write requests, the first preset value is greater than or equal to the second preset value.
  • the judgment module 903 is specifically configured to judge whether the data to be swiped is write-only data based on a statistical algorithm and according to the historical access data.
  • the judgment module 903 includes:
  • the calculation unit is configured to determine the number of write requests for the data block corresponding to the data to be swiped in multiple time windows based on the historical access data, and calculate the data corresponding to the data to be swiped according to the number of write requests
  • the write-only probability of the block wherein the write-only probability is used to describe the probability that the data block stores write-only data
  • the present application also provides a storage server.
  • a structure diagram of a storage server 200 provided in an embodiment of the present application, as shown in FIG. 10, may include a processor 21 and a memory 22.
  • the processor 21 may include multiple processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 21 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
  • the processor 21 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 21 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing content that needs to be displayed on the display screen.
  • the processor 21 may also include an AI (Artificial Intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 22 may include multiple computer-readable storage media, which may be non-transitory.
  • the memory 22 may also include high-speed random access memory and non-volatile memory, such as multiple disk storage devices and flash memory storage devices.
  • the memory 22 is used to store at least the following computer program 221. After the computer program is loaded and executed by the processor 21, it can implement the method for writing data on the server side disclosed in any of the foregoing embodiments. Related steps.
  • the resources stored in the memory 22 may also include an operating system 222 and data 223, etc., and the storage mode may be short-term storage or permanent storage.
  • the operating system 222 may include Windows, Unix, Linux, and so on.
  • the storage server 20 may further include a display screen 23, an input/output interface 24, a communication interface 25, a sensor 22, a power supply 27, and a communication bus 28.
  • the structure of the storage server shown in FIG. 10 does not constitute a limitation on the storage server in the embodiment of the present application.
  • the storage server may include more or less components than those shown in FIG. part.
  • a computer-readable storage medium including program instructions that, when executed by a processor, implement the steps of the data writing method executed by the storage server in any of the foregoing embodiments.
  • a computer program product including instructions, which when run on a computer, cause the computer to execute the data writing method in any of the above-mentioned embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据写入方法、装置及一种存储服务器和计算机可读存储介质,该方法包括:当接收到写入请求时,将所述写入请求对应的写入数据写入写缓冲区中(S201);当针对所述写缓冲区触发数据下刷操作时,获取所述待下刷数据对应的数据块的历史访问数据(S202);基于所述历史访问数据利用统计学方法判断待下刷数据是否为只写数据(S203);若是,将所述待下刷数据写入硬盘驱动器中(S204);若否,将所述待下刷数据写入缓存中(S205)。上述数据写入方法,可以有效减少写入缓存的无用流量,同时在缓存中为普通数据留出更多空间,提高缓存空间利用率,从而提高缓存的读取命中率,提升存储系统的读性能,实现了对缓存的最低写入流量,提高了写策略的效率。

Description

数据写入方法、装置及存储服务器和计算机可读存储介质
本申请要求于2019年12月17日提交中国专利局、申请号为201911304631.1、申请名称为“数据写入方法、装置及存储服务器和计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,具体涉及数据写入技术。
背景技术
当客户端产生写请求时,存储服务器将依据写策略决定是否将待写入数据加载到缓存,以及何时将缓存中的脏数据下刷到后端的硬盘驱动器。因此,写策略对系统性能、数据一致性以及对高速缓存介质的写入流量具有很大影响。
在相关技术中,存在三个基本的写策略:写穿策略(write-through)、写回策略(write-back)和绕写策略(write-around)。写穿策略用于将待写入数据同时写入缓存和后端的硬盘驱动器,写回策略用于将待写入数据首先写入缓存,然后异步下刷到后端的硬盘驱动器,绕写策略用于将待写入数据直接写入后端的硬盘驱动器,只有当存在读操作时才将未命中数据加载到缓存。
在不断变化的多样化云工作负载下,固定的写策略注定是低效的。因此,如何提高写策略的效率是本领域技术人员需要解决的技术问题。
发明内容
本申请提供了一种数据写入方法、装置及一种存储服务器和一种计算机可读存储介质,能够有效地提高写策略的效率。
为实现上述目的,本申请第一方面提供了一种数据写入方法,由服务器执行,包括:
当接收到写入请求时,将所述写入请求对应的写入数据写入写缓冲区中;
当针对所述写缓冲区触发数据下刷操作时,获取所述写缓冲区中待下刷数据对应的数据块的历史访问数据;
基于所述历史访问数据判断所述待下刷数据是否为只写数据;
若是,则将所述待下刷数据写入硬盘驱动器中;
若否,则将所述待下刷数据写入缓存中。
为实现上述目的,本申请第二方面提供了一种数据写入装置,包括:
第一写入模块,用于当接收到写入请求时,将所述写入请求对应的写入数据写入写缓冲区中;
获取模块,用于当针对所述写缓冲区触发数据下刷操作时,获取所述写缓冲区中待下刷数据对应的数据块的历史访问数据;
判断模块,用于基于所述历史访问数据判断所述待下刷数据是否为只写数据;
第二写入模块,用于当所述待下刷数据为只写数据时,将所述待下刷数据写入硬盘驱动器中;
第三写入模块,用于当所述待下刷数据为非只写数据时,将所述待下刷数据写入缓存中。
为实现上述目的,本申请第三方面提供了一种存储服务器,包括:
处理器和存储器;
其中,所述处理器用于执行所述存储器中存储的程序;
所述存储器用于存储程序,所述程序至少用于执行上述第一方面提供的数据写入方法。
为实现上述目的,本申请第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上述数据写入方法的步骤。
为实现上述目的,本申请第五方面提供了一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行如上述第一方面所述的广播业务的数据写入方法。
通过以上方案可知,本申请提供的一种数据写入方法包括:当接收到写入请求时,将该写入请求对应的写入数据写入写缓冲区;当触发针对写缓冲区的数据的下刷操作时,获取待下刷数据对应的数据块的历史访问数据;基于该历史访问数据判断该待下刷数据是否为只写数据;若是,则将该待下刷数据写入硬盘驱动器中;若否,则将该待下刷数据写入缓存中。
在本申请中,将客户端产生的数据划分为不同的类型,即只写数据和普通数据。考虑到只写数据在一定时间窗口内只存在写操作,而不存在读操作,将只写数据加载到缓存,不但不会提高缓存的读取命中率,还会对缓存造成大量非必要的写入流量。因此,本申请提供的数据写入方法针对写入缓冲区中待下刷数据的类型进行判断,以便对只写数据和普通数据采取不同的写入方式,具体的,对于普通数据采用写穿策略或写回策略将其写入缓存,而对于只写数据采用绕写策略将其直接写入后端的硬盘驱动器,如此,可以有效减少写入缓存的无用流量,同时在缓存中为普通数据留出更多空间,提高缓存空间利用率以及缓存的读取命中率,提升存储系统的读性能。由此可见,本申请提供的数据写入方法,实现了对缓存的最低写入流量,提高了写策略的效率。本申请还公开了一种数据写入装置、一种存储服务器和一种计算机可读存储介质,同样能实现上述技术效果。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不 能限制本申请。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:
图1为本申请实施例提供的一种数据写入系统的架构图;
图2为本申请实施例提供的第一种数据写入方法的流程图;
图3为本申请实施例提供的第二种数据写入方法的流程图;
图4为本申请实施例提供的第三种数据写入方法的流程图;
图5为本申请实施例提供的一种分类器训练方法的流程图;
图6为本申请实施例提供的第四种数据写入方法的流程图;
图7为本申请提供的一种应用实施例的系统架构图;
图8为本申请提供的一种应用实施例的流程图;
图9为本申请实施例提供的一种数据写入装置的结构图;
图10为本申请实施例提供的一种存储服务器的结构图。
具体实施方式
在典型的云块存储产品中,如腾讯CBS、Ceph(分布式文件系统)等,客户端产生写入请求时,该写入请求对应的写入数据首先会被写入写缓冲区,同时顺序记录在日志文件中。当写缓冲区中的脏数据量超过一定阈值时,写缓冲区中的数据会被提交到缓存,如SSD(中文全称:固态驱动器,英文全称:Solid State Drive)缓存等,缓存中的写入数据将进一步基于写回策略被异步下刷到后端的硬盘驱动器(英文全称:Hard Disk Drive,英文简称:HDD)存储。
但是,本申请的发明人研究发现,在CBS采集的一个月的IO(中文全称:输入/输出,英文全称:Input/Output)日志中,47.09%的数据只存在写操作,即47.09%的数据为只写数据。由于用户的读请求不会命中此类只写数据,因此将只写数据提交到缓存,并不能够使用户感知到对应加速效果,而对于缓存来说,无论使用写回策略还是写穿策略将只写数据写入缓存,都会对缓存造成大量非必要的写入流量。另外,简单地针对各类数据均部署绕写策略,将会影响用户对于其它非只写数据(即普通数据)的读取体验,即将不可避免地降低读请求的命中率,严重降低服务质量。
因此,在本申请中,将客户端产生的数据划分为只写数据和普通数据,由于只写数据在一定时间窗口内只存在写操作,因此可以针对其采用绕写策 略写入后端的硬盘驱动器,由于普通数据在一定时间窗口内可能存在读操作,因此可以针对其采用写穿策略或写回策略写入缓存。如此,将只写数据直接写入后端的硬盘驱动器,可以有效减少写入缓存的无用流量,同时在缓存中为普通数据留出更多空间,提高缓存空间利用率,提高缓存的读取命中率。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于理解本申请提供的数据写入的方法,首先对其使用的系统进行介绍。参见图1,其示出了本申请实施例提供的一种数据写入系统的架构图,如图1所示,该数据写入系统包括客户端100、存储服务器200和存储区300,客户端100与存储服务器200、存储服务器200和存储区300之间通过网络400传输写入数据。
存储区300包括写缓冲区31、缓存(SSD)32和后端的硬盘驱动器(HDD)33,当客户端100产生写入请求时,该写入请求对应的写入数据首先被写入写缓冲区31中,写缓冲区31可以由DRAM(中文全称:动态随机存取存储器,英文全称:Dynamic Random Access Memory)组成,在此不进行具体限定。写缓冲区31被触发数据下刷操作时,存储服务器200确定待下刷数据的数据类型,若其为只写数据,则将其直接写入后端的硬盘驱动器33,否则可以采用写穿策略将其同时写入缓存32和硬盘驱动器33,也可以采用写回策略为将其首先写入缓存32,然后异步下刷到硬盘驱动器33中,在此不进行具体限定。缓存32用于暂存数据,当客户端产生读请求时,若该读请求对应的数据在缓存32中时可以直接返回,不需要访问后端的硬盘驱动器33,提高读取效率。本实施例中的缓存32可以采用SSD缓存等,在此不进行具体限定。
本申请实施例公开了一种数据写入方法,提高了写策略的效率。
参见图2,本申请实施例提供的一种数据写入方法的流程图,如图2所示,包括:
S101:客户端产生写入请求,并将所述写入请求发送至存储服务器;
在本步骤中,客户端产生写入请求,该写入请求包括写入数据,也可以在该写入请求中指定写入地址,在此不进行具体限定。客户端将写入请求发送至存储服务器,以便存储服务器依据预设定义的写策略对该写入请求对应的写入数据进行处理。
S102:存储服务器将所述写入请求对应的写入数据写入写缓冲区中;
在本步骤中,客户端产生的所有写入请求对应的写入数据会被先写入写缓冲区,同时顺序记录到日志文件中,日志文件用于数据恢复。
S103:当针对所述写缓冲区触发数据下刷操作时,存储服务器获取所述写缓冲区中待下刷数据对应的数据块的历史访问数据,并基于所述历史访问数据判断待下刷数据是否为只写数据;若是,则进入S104;若否,则进入S105;
本步骤旨在确定待下刷数据的类型,即确定待下刷数据是只写数据还是普通数据,以便后续针对不同类型的数据采用不同的写策略。只写数据为在一定时间窗口内只存在写操作而不存在读操作的数据,只写数据之外的数据称之为普通数据。
当针对写缓冲区触发数据下刷操作时,存储服务器确定该写缓冲区中的待下刷数据,并获取该待下刷数据对应的数据块的历史访问数据。客户端在产生写入请求时,可以指定写入地址,此处的数据块即为该写入地址对应的数据块。数据块的历史访问数据能够表征该数据块的历史访问情况,依据历史访问数据可以确定该数据块存储只写数据的概率,进而据此判断即将存储在该数据块内的待下刷数据是否为只写数据。
需要说明的是,本实施例不对数据类型的具体确定方式进行限定,例如可以基于统计学方法或分类器等确定待下刷数据的数据类型,均在本实施例的保护范围内。
可以理解的是,本步骤不对数据下刷操作的触发条件进行限定,例如,可以当写缓冲区的数据量达到阈值时,触发数据下刷操作,即所述当针对写缓冲区触发数据下刷操作时,获取所述待下刷数据对应的数据块的历史访问数据的步骤包括:当所述写缓冲区的数据量达到阈值时,针对写缓冲区触发数据下刷操作,并获取所述待下刷数据对应的数据块的历史访问数据。
S104:存储服务器将所述待下刷数据写入硬盘驱动器中;
可以理解的是,缓存用于暂存数据,当客户端产生读请求时,若该读请求对应的数据存储在缓存中,则可以直接向客户端返回该数据,而不需要访问后端的硬盘驱动器,读取效率较高。而只写数据在一定时间窗口内只存在写操作,而不存在读操作,因此,将只写数据加载到缓存,不但不会提高缓存的读取命中率,还会对缓存造成大量非必要的写入流量。
因此,在本步骤中,对只写数据采用绕写策略,即当待下刷数据为只写数据时,将其直接写入后端的硬盘驱动器中。在缓存中为普通数据留出更多空间,提高缓存空间利用率以及缓存的读取命中率。
在具体实施中,为了进一步节省缓存空间,防止缓存中存在该待下刷数据对应的脏数据,保证数据一致性,首先查询缓存中是否存在该待下刷数据对应的脏数据,若是,则首先将缓存中脏数据刷新至硬盘驱动器,再将写缓冲区中的待下刷数据写入硬盘驱动器。即本步骤包括:将所述缓存中所述待 下刷数据对应的脏数据刷新至所述硬盘驱动器;将所述待下刷数据写入硬盘驱动器中。
S105:存储服务器将所述待下刷数据写入缓存中。
由于普通数据存在读操作,因此需要将普通数据写入缓存,以提高存储系统的读性能。在本步骤中,对于普通数据可以采用写穿策略,也可以采用写回策略,在此不进行限定。
若针对普通数据采用的写策略为写穿策略,本步骤包括:将所述待下刷数据同时写入所述缓存和所述硬盘驱动器中。
若针对普通数据采用的写策略为写回策略,则本步骤包括:将所述待下刷数据写入缓存中;当满足预设条件时,将所述缓存中的数据刷新至所述硬盘驱动器中。此处的预设条件可以为读取命中率小于预设值,即当缓存的读取命中率小于预设值时,将缓存中读取命中率较低的数据块中的数据刷新至硬盘驱动器。此处的预设条件也可以为缓存已满,即当缓存已满时,将缓存中的部分数据刷新至硬盘驱动器,当然还可以基于其它预设条件触发将缓存中的数据刷新至硬盘驱动器,在此不进行具体限定。
在本申请实施例中,将客户端产生的数据划分为不同的类型,即只写数据和普通数据。只写数据在一定时间窗口内只存在写操作,而不存在读操作,也就是说,客户端的读请求在该时间窗口内不会命中只写数据,因此,将只写数据写入到缓存,不但不会提高缓存的读取命中率,还会对缓存造成大量非必要的写入流量。基于此,本申请实施例提供的数据写入方法,对只写数据和普通数据采取不同的写入方式,对于普通数据采用写穿策略或写回策略将其写入缓存,而对于只写数据采用绕写策略将其直接写入后端的硬盘驱动器,如此,可以有效减少写入缓存的无用流量,同时在缓存中为普通数据留出更多空间,提高缓存空间利用率以及缓存的读取命中率,提升存储系统的读性能。由此可见,本申请实施例提供的数据写入方法,实现了对缓存的最低写入流量,提高了写策略的效率。
本实施例基于统计学确定待下刷数据的类型,将以存储服务器为执行主体进行介绍。具体的:
参见图3,本申请实施例提供的第二种数据写入方法的流程图,如图3所示,包括:
S201:当接收到写入请求时,将所述写入请求对应的写入数据写入写缓冲区中;
S202:当针对所述写缓冲区触发数据下刷操作时,获取所述待下刷数据对应的数据块的历史访问数据;
S203:基于统计学算法,根据所述历史访问数据判断待下刷数据是否为只写数据;若是,则进入S204;若否,则进入S205;
本实施例利用统计学方法基于历史访问数据确定待下刷数据的类型,此处的历史访问数据为待下刷数据对应的数据块的历史访问数据。客户端产生写入请求时,可以指定写入地址,此处的数据块即为该写入地址对应的数据块。数据块的历史访问数据能够描述该数据块的历史访问情况,可以包括在一定时间窗口内的写请求次数、读请求次数等,依据该数据块的历史访问数据可以确定该数据块存储只写数据的概率,进而确定即将存储在该数据块内的待下刷数据是否为只写数据。
作为一种优选实施方式,本步骤包括:基于所述历史访问数据,确定多个时间窗口内针对所述待下刷数据对应的数据块的写请求次数;根据所述写请求次数计算所述待下刷数据对应的数据块的只写概率;其中,所述只写概率用于描述所述数据块存储只写数据的概率;判断所述只写概率是否大于第三预设值,若是,则判定所述待下刷数据为所述只写数据,若否,则判定所述待下刷数据为普通数据。
在具体实施中,基于历史访问数据确定多个时间窗口内针对待下刷数据对应的数据块的写请求次数和总请求次数,此处不对时间窗口进行限定,例如,一个时间窗口可以为1天,多个时间窗口可以为一个月,即确定一个月内每天中针对该数据块的写请求次数和总请求次数,该总请求次数包括写请求次数和读请求次数,计算每个时间窗口内该数据块的写请求比例,即计算每个时间窗口内写请求次数与总请求次数的比值,进而将所有时间窗口的写请求比例的平均值作为该数据块的只写概率,该只写概率可以表征该数据块存储只写数据的概率。当只写概率大于第三预设值时,可以判定该数据块即将存储的数据为只写数据,即待下刷数据为只写数据,否则判定待下刷数据为普通数据。
S204:将所述待下刷数据写入硬盘驱动器中;
S205:将所述待下刷数据写入缓存中。
由此可见,本实施例基于待下刷数据对应的数据块的历史访问数据,利用统计学方法确定待下刷数据的类型,对只写数据和普通数据采取不同的写入方式,对于普通数据采用写穿策略或写回策略将其写入缓存,而对于只写数据采用绕写策略将其直接写入后端的硬盘驱动器,可以有效减少写入缓存的无用流量,同时在缓存中为普通数据留出更多空间,提高缓存空间利用率和缓存的读取命中率,提升存储系统的读性能。
本实施例利用分类器确定待下刷数据的类型,同样将以存储服务器为执行主体进行介绍。具体的:
参见图4,本申请实施例提供的第三种数据写入方法的流程图,如图4所示,包括:
S301:当接收到写入请求时,将所述写入请求对应的写入数据写入写 缓冲区中;
S302:当针对所述写缓冲区触发数据下刷操作时,获取所述待下刷数据对应的数据块的历史访问数据,并在所述历史访问数据中提取所述待下刷数据对应的数据块的特征信息作为第一特征信息;
本实施例利用分类器确定待下刷数据的类型,在本步骤中,首先在待下刷数据对应的数据块的历史访问数据中,提取待下刷数据对应的数据块的特征信息作为第一特征信息,此处的特征信息可以包括时间特征和请求类特征中的任一项或两项,时间特征为用于描述访问数据块的时间的特征,请求类特征为用于描述访问数据块的请求的特征。
具体的,时间特征可以包括上次访问时间戳和平均重用时间差中的任一项或两项,上次访问时间戳用于描述上一次访问数据块的时间点,平均重用时间差用于描述访问数据块的平均时间间隔。可以理解的是,时间特征可以表示数据块的访问新进度和访问时间间隔,即上次访问时间戳描述访问新进度,平均重用时间差描述访问时间间隔,当然也可以使用当前时间和上一次访问数据块的时间点的时间间隔表示数据块的访问新进度,在此不进行具体限定。另外,可以对上述时间特征进行标准化处理,例如,将时间特征的单位设置为小时,并将数值的上限限制为100。
可以理解的是,本实施例默认存在训练分类器的步骤,在具体实施中,可以利用机器学习算法进行训练,而机器学习算法中的常用特征信息还可以使用空间特征,即待写入数据的地址信息,如卷ID、偏移等。但是,地址会随着时间而变化,影响分类器的预测,因此在本实施例中,不使用空间特征,而采用请求类特征来丰富训练特征。
具体的,请求类特征可以包括平均请求大小、大请求比率、小请求比率和写请求比率中的任一项或多项的组合,大请求比率用于描述访问数据块的请求中请求大小大于第一预设值的请求的比率,小请求比率用于描述访问数据块的请求中请求大小小于第二预设值的请求的比率,写请求比率用于描述访问数据块的请求中写请求的比率,所述第一预设值大于或等于所述第二预设值。
例如,平均请求大小是访问数据块的平均请求大小,单位可以为千字节,上限可以为100KB。大请求比率可以为请求大小大于64KB的请求在所有访问请求中的占比,小请求比率是请求大小小于8KB的请求在所有访问请求中的占比。
需要说明的是,若针对每个数据块均统计特征信息,则特征信息的提取会消耗过多的系统资源,存储过程中的内存占用过大,导致特征信息提取效率较低。因此,为了提高特征信息的提取效率,可以提高特征信息的统计粒度,然而过大的统计粒度也会导致训练完成的分类器预测结果不准确。因此,可以以数据区域作为特征信息的统计粒度,每个数据区域包括多个数据块, 以在内存占用和预测准确性之间进行权衡。存储系统中的访问粒度,即最小读写粒度为数据块,例如8KB,数据区域的大小为访问粒度的整数倍,例如数据区域的大小可以为1MB。
在具体实施中,以1MB的统计粒度对特征信息进行统计为例,每个数据区域的特征信息是根据该数据区域中所有数据块的历史访问数据确定的,同一数据区域中的数据块共享此数据区域的特征信息。即所述获取所述待下刷数据对应的数据块的特征信息,作为第一特征信息的步骤包括:确定所述待下刷数据对应的数据块所属的数据区域,提取该数据区域的特征信息作为第一特征信息。
S303:通过预先训练的分类器,根据所述第一特征信息判断待下刷数据是否为只写数据;若是,则进入S304;若否,则进入S305;
在本步骤中,将上一步骤提取的第一特征信息输入预先训练完成的分类器中,得到该待下刷数据的类型,若其为只写数据,则进入S304,若其为普通数据,则进入S305。
S304:将所述待下刷数据写入硬盘驱动器中;
S305:将所述待下刷数据写入缓存中。
由此可见,本实施例利用分类器确定待下刷数据的类型,分类器的预测准确度较高,提高了确定数据类型的准确度。对于从写缓冲区中的待下刷数据,均首先通过分类器确定其对应的分类,然后根据分类结果进行不同操作,即对普通数据采用写回策略或写穿策略,对只写数据采用绕写策略。通过上述方式极大地减少了对缓存的写流量,同时也为普通数据保留了更多的缓存空间,提高缓存空间的资源利用率,从而提高了存储系统的读性能。
本实施例基于机器学习算法对分类器进行训练,机器学习(英文全称:Machine Learning,英文简称:ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。具体的:
参见图5,本申请实施例提供的一种分类器训练方法的流程图,如图5所示,包括:
S401:当针对所述缓存触发数据替换操作时,基于所述缓存中替换数据的历史读取数据标记所述替换数据;
本实施例利用机器学习算法对分类器进行训练,需要说明的是,此处不对具体的机器学习算法进行限定,可以使用监督机器学习算法,例如朴素贝 叶斯(Naive Bayes)、Logistic回归、决策树、AdaBoost和随机森林等。在X86-64计算服务器上运行分类器,该服务器有两个Intel Xeon E5-2670-v3 CPU和128GB DRAM内存,不同机器学习算法的分类结果如表1所示,表1中的训练时间为训练一天的训练样本所用的时间。
表1
分类器名称 准确率(%) 召回率(%) 训练时间(ms) 预测时间(ms)
Logistic回归 83.54 75.21 500.84 1.01
朴素贝叶斯 83.89 83.91 440.88 1.25
决策树 80.96 68.61 482.62 1.16
AdaBoost 87.85 85.64 462.47 14.73
随机森林 88.32 82.64 523.68 20.23
由表1可知,随机森林在准确率方面优于其他算法,但需要更多的时间进行预测,每个请求平均需要20.23μs。AdaBoost在准确率方面仅次于随机森林,并且具有较短的预测时间,每个请求平均为14.73μs。朴素贝叶斯能很好的平衡准确率、召回率和预测时间,Logistic回归在召回率方面不如朴素贝叶斯,决策树具有最低的准确率和召回率。
在具体实施中,在一定的时间窗口内采集训练样本用于训练分类器,例如,每天采集训练样本对分类器进行训练,训练完成的分类器用于第二天对待下刷数据进行数据类型的区分。所有监督型机器学习算法训练一天样本耗时几百毫秒,分类器模型的大小仅为几兆字节,可见训练分类器对存储系统的影响可以忽略不计。
在本实施例中,训练样本可以为替换数据,即当针对缓存触发数据替换操作时,即当将缓存中的第一数据替换为硬盘驱动器中的第二数据时,第一数据即为本步骤中的替换数据。基于其历史读取数据对其进行标记,例如,若该替换数据具有一次或多次读取命中次数,则将其标记为0(0表示普通数据),否则将其标记为1(1表示只写数据)。
S402:获取所述替换数据对应的数据块的特征信息,作为第二特征信息,利用标记后的替换数据和所述第二特征信息训练分类器。
在本步骤中,提取训练样本的特征信息作为第二特征信息,此处的第二特征信息也可以包括时间特征、请求类特征,在此不再赘述。将标记完成的训练样本和训练样本的特征信息输入分类器中,用于对该分类器进行训练,得到训练完成的分类器。训练完成的分类器用于确定数据的类型,只写数据或普通数据。
可以理解的是,为了提高训练样本的特征信息的提取效率,此处也可以将数据区域作为训练样本的特征信息的统计粒度。即所述获取所述替换数据 对应的数据块的特征信息,作为第二特征信息的步骤包括:确定所述替换数据对应的数据块所属的数据区域;提取所述数据区域的特征信息作为所述第二特征信息。
由此可见,本实施例利用机器学习算法对分类器进行训练,在提取训练样本的特征信息时,使用请求类特征丰富分类器的训练特征,使得分类器的分类效果更好,提高了确定数据类型的准确度。
本申请实施例公开了一种数据写入方法,相对于前几个实施例,本实施例对技术方案作了进一步的说明和优化,同样以存储服务器为执行主体进行介绍。具体的:
参见图6,本申请实施例提供的第四种数据写入方法的流程图,如图6所示,包括:
S501:当接收到写入请求时,将所述写入请求对应的写入数据写入写缓冲区中;
S502:当所述写缓冲区的数据量达到阈值时,触发针对写缓冲区的数据下刷操作;
S503:获取所述待下刷数据对应的数据块的历史访问数据,并在所述历史访问数据中提取所述待下刷数据对应的数据块的特征信息,作为第一特征信息;
S504:通过预先训练的分类器,根据所述第一特征信息判断待下刷数据是否为只写数据;若是,则进入S505;若否,则进入S507;
S505:将所述缓存中所述待下刷数据对应的脏数据刷新至所述硬盘驱动器;
S506:将所述待下刷数据写入硬盘驱动器中;
S507:将所述待下刷数据写入缓存中;
S508:当满足预设条件时,将所述缓存中的数据刷新至所述硬盘驱动器中。
由此可见,本实施例提供了一种基于机器学习的写策略,可以准确识别不同类型的数据,即只写数据和普通数据,进而对不同类型的数据进行不同的处理。对普通数据采用写回策略,对只写数据采用绕写策略,最大程度的减少了对缓存的写流量,同时也为普通数据保留了更多的缓存空间,提高缓存空间的资源利用率,从而提高了存储系统的读性能。
为了便于理解,结合本申请的一种应用场景进行介绍。具体的,如图7所示,数据写入系统包括客户端、分类器和云块存储产品腾讯CBS的存储区。客户端中设置Hypervisor(又称虚拟机监视器,英文全称:virtual machine monitor,英文简称:VMM),用于监控客户端的写入请求。分类器采集训 练样本,用于训练机器学习模型。存储区包括写缓冲区和后端存储,后端存储包括SSD缓存和HDD,写策略决定是否将待写入数据加载到SSD缓存,以及何时将SSD缓存中的脏数据下刷到HDD。如图8所示,当存在来自客户端的请求时,写策略的处理流程如下:
步骤一:将所有写入请求对应的写入数据写入写缓冲区,并顺序记录到日志文件中,在写缓冲器中的脏数据量达到阈值时通知存储服务器。
步骤二:当需要针对写缓冲区下刷数据时,分类器获取当前刷新数据块的所在区域的特征并分析其类型。如果当前刷新数据块存储的数据为只写数据,则执行步骤三,如果是普通数据,则执行步骤四。
步骤三:查找当前刷新数据块在SSD缓存中是否存在脏数据,若存在,则首先将SSD缓存中的脏数据下刷,写缓冲区中的下刷数据则直接写入HDD,流程结束。
步骤四:将下刷数据写入SSD缓存,然后采用写回策略异步刷新到HDD。此处使用写回策略而不是写穿策略的原因是SSD具有非易失性的优势,因此写回策略足以确保数据一致性。
其中,分类器每天训练一次模型以进行第二天的预测。在系统运行过程中收集训练样本。在SSD缓存存在需要剔除数据时,将剔除数据下刷到HDD,并添加一个训练样本。
实验结果如表2所示:
表2
写策略 命中率 SSD写入流量 读取延迟
绕写策略 92.13% 1 1.29ms
写回策略 94.68% 7.49 934.16μs
本实施例的写策略 97.22% 4.38 583.64μs
在表2中,其他策略的写入流量被标准化为写入策略的写入流量。绕写策略实现了对SSD的最低写入流量(1),这是因为错误写入数据首先直接写入后端存储,然而,绕写策略获得最低命中率92.13%,并且引起最高读取延迟1.29ms。可见,绕写策略虽然实现了最低的SSD写入流量,但在命中率和读取延迟方面破坏了缓存性能。写回策略导致SSD的最大写入流量(7.49)。本实施例提供的写策略与写回策略相比,将命中率从94.68%提高到97.22%,提高了2.61%,并将读取延迟和写入流量减少了37.52%(934.16μs到583.64μs)和41.52%(从7.49到4.38)。由此可见,本实施例提供的写策略实现了最佳性能。
下面对本申请实施例提供的一种数据写入装置进行介绍,下文描述的一种数据写入装置与上文描述的一种数据写入方法可以相互参照。
参见图9,本申请实施例提供的一种数据写入装置的结构图,如图9所 示,包括:
第一写入模块901,用于当接收到写入请求时,将所述写入请求对应的写入数据写入写缓冲区中;
获取模块902,用于当针对所述写缓冲区触发数据下刷操作时,获取所述写缓冲区中待下刷数据对应的数据块的历史访问数据;
判断模块903,用于基于所述历史访问数据判断所述待下刷数据是否为只写数据;
第二写入模块904,用于当所述待下刷数据为只写数据时,将所述待下刷数据写入硬盘驱动器中;
第三写入模块905,用于当所述待下刷数据为非只写数据时,将所述待下刷数据写入缓存中。
在本申请实施例中,将客户端产生的数据划分为不同的类型,即只写数据和普通数据。由于只写数据在一定时间窗口内只存在写操作,而不存在读操作,将只写数据加载到缓存,不但不会提高缓存的读取命中率,还会对缓存造成大量非必要的写入流量。因此,本申请实施例提供的数据写入装置,对只写数据和普通数据采取不同的写入方式,对于普通数据采用写穿策略或写回策略将其写入缓存,而对于只写数据采用绕写策略将其直接写入后端的硬盘驱动器。如此,可以有效减少写入缓存的无用流量,同时在缓存中为普通数据留出更多空间,提高缓存空间利用率以及缓存的读取命中率,提升存储系统的读性能。由此可见,本申请实施例提供的数据写入装置,实现了对缓存的最低写入流量,提高了写策略的效率。
在上述实施例的基础上,作为一种优选实施方式,所述第二写入模块904包括:
第一刷新单元,用于将所述缓存中所述待下刷数据对应的脏数据刷新至所述硬盘驱动器;
第一写入单元,用于将所述待下刷数据写入硬盘驱动器中。
在上述实施例的基础上,作为一种优选实施方式,所述第三写入模块905包括:
第二写入单元,用于将所述待下刷数据写入缓存中;
第二刷新单元,用于当满足预设条件时,将所述缓存中的数据刷新至所述硬盘驱动器中。
在上述实施例的基础上,作为一种优选实施方式,所述第三写入模块905具体用于:将所述待下刷数据同时写入所述缓存和所述硬盘驱动器中。
在上述实施例的基础上,作为一种优选实施方式,所述获取模块902具体用于:当所述写缓冲区的数据量达到阈值时,针对所述写缓冲区触发数据下刷操作,并获取所述写缓冲区中所述待下刷数据对应的数据块的历史访 问数据。
在上述实施例的基础上,作为一种优选实施方式,所述判断模块903具体用于:在所述历史访问数据中提取所述待下刷数据对应的数据块的特征信息,作为第一特征信息;通过预先训练的分类器,根据所述第一特征信息判断所述待下刷数据是否为所述只写数据。
在上述实施例的基础上,作为一种优选实施方式,所述装置还包括:
标记模块,用于当针对所述缓存触发数据替换操作时,基于所述缓存中替换数据的历史读取数据标记所述替换数据;
训练模块,用于获取所述替换数据对应的数据块的特征信息,作为第二特征信息;利用标记后的替换数据和所述第二特征信息训练所述分类器。
在上述实施例的基础上,作为一种优选实施方式,所述训练模块包括:
确定单元,用于确定所述替换数据对应的数据块所属的数据区域;其中,所述数据区域包括多个数据块,所述数据区域的大小为访问粒度的整数倍;
提取单元,用于提取所述数据区域的特征信息作为所述第二特征信息;
训练单元,用于利用标记后的替换数据和所述第二特征信息训练所述分类器。
在上述实施例的基础上,作为一种优选实施方式,所述特征信息包括时间特征和请求类特征中的至少一项;所述时间特征是用于描述访问数据块的时间的特征,所述请求类特征是用于描述访问数据块的请求的特征。
在上述实施例的基础上,作为一种优选实施方式,所述时间特征包括上次访问时间戳和平均重用时间差中的至少一项;所述上次访问时间戳用于描述上一次访问数据块的时间点,所述平均重用时间差用于描述访问数据块的平均时间间隔;
所述请求类特征包括平均请求大小、大请求比率、小请求比率和写请求比率中的任一项或多项的组合;所述大请求比率用于描述访问数据块的请求中请求大小大于第一预设值的请求的比率,所述小请求比率用于描述访问数据块的请求中请求大小小于第二预设值的请求的比率,所述写请求比率用于描述访问数据块的请求中写请求的比率,所述第一预设值大于或等于所述第二预设值。
在上述实施例的基础上,作为一种优选实施方式,所述判断模块903具体用于:基于统计学算法,根据所述历史访问数据判断待下刷数据是否为只写数据。
在上述实施例的基础上,作为一种优选实施方式,所述判断模块903包括:
计算单元,用于基于所述历史访问,数据确定多个时间窗口内所述待下刷数据对应的数据块的写请求次数,并根据所述写请求次数计算所述待下刷数据对应的数据块的只写概率;其中,所述只写概率用于描述所述数据块存 储只写数据的概率;
判断单元,用于判断所述只写概率是否大于第三预设值,若是,则判定所述待下刷数据为所述只写数据,若否,则判定所述待下刷数据为普通数据。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
本申请还提供了一种存储服务器,参见图10,本申请实施例提供的一种存储服务器200的结构图,如图10所示,可以包括处理器21和存储器22。
其中,处理器21可以包括多个处理核心,比如4核心处理器、8核心处理器等。处理器21可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器21也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器21可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器21还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器22可以包括多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器22还可包括高速随机存取存储器,以及非易失性存储器,比如多个磁盘存储设备、闪存存储设备。本实施例中,存储器22至少用于存储以下计算机程序221,其中,该计算机程序被处理器21加载并执行之后,能够实现前述任一实施例公开的由服务器侧执行的数据写入方法中的相关步骤。另外,存储器22所存储的资源还可以包括操作系统222和数据223等,存储方式可以是短暂存储或者永久存储。其中,操作系统222可以包括Windows、Unix、Linux等。
在一些实施例中,存储服务器20还可包括有显示屏23、输入输出接口24、通信接口25、传感器22、电源27以及通信总线28。
当然,图10所示的存储服务器的结构并不构成对本申请实施例中存储服务器的限定,在实际应用中存储服务器可以包括比图10所示的更多或更少的部件,或者组合某些部件。
在另一示例性实施例中,还提供了一种包括程序指令的计算机可读存储介质,该程序指令被处理器执行时实现上述任一实施例存储服务器所执行的数据写入方法的步骤。
在另一示例性实施例中,还提供了一种计算机程序产品,包括指令,当 其在计算机上运行时,使得计算机执行如上述任一实施例中的数据写入方法。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (16)

  1. 一种数据写入方法,由服务器执行,包括:
    当接收到写入请求时,将所述写入请求对应的写入数据写入写缓冲区中;
    当针对所述写缓冲区触发数据下刷操作时,获取所述写缓冲区中待下刷数据对应的数据块的历史访问数据;
    基于所述历史访问数据判断所述待下刷数据是否为只写数据;
    若是,则将所述待下刷数据写入硬盘驱动器中;
    若否,则将所述待下刷数据写入缓存中。
  2. 根据权利要求1所述的数据写入方法,所述基于所述历史访问数据判断所述待下刷数据是否为只写数据,包括:
    在所述历史访问数据中提取所述待下刷数据对应的数据块的特征信息,作为第一特征信息;
    通过预先训练的分类器,根据所述第一特征信息判断所述待下刷数据是否为所述只写数据。
  3. 根据权利要求2所述的数据写入方法,还包括:
    当针对所述缓存触发数据替换操作时,基于所述缓存中替换数据的历史读取数据标记所述替换数据;
    获取所述替换数据对应的数据块的特征信息,作为第二特征信息;
    利用标记后的所述替换数据和所述第二特征信息训练所述分类器。
  4. 根据权利要求3所述的数据写入方法,所述获取所述替换数据对应的数据块的特征信息,作为第二特征信息,包括:
    确定所述替换数据对应的数据块所属的数据区域;其中,所述数据区域包括多个数据块,所述数据区域的大小为访问粒度的整数倍;
    提取所述数据区域的特征信息作为所述第二特征信息。
  5. 根据权利要求2至4任一项所述的数据写入方法,所述特征信息包括时间特征和请求类特征中的至少一项;所述时间特征是用于描述访问数据块的时间的特征,所述请求类特征是用于描述访问数据块的请求的特征。
  6. 根据权利要求5所述的数据写入方法,所述时间特征包括上次访问时间戳和平均重用时间差中的至少一项;所述上次访问时间戳用于描述上一次访问所述数据块的时间点,所述平均重用时间差用于描述访问所述数据块的平均时间间隔;
    所述请求类特征包括平均请求大小、大请求比率、小请求比率和写请求比率中的任一项或多项的组合;所述大请求比率用于描述访问所述数据块的请求中请求大小大于第一预设值的请求的比率,所述小请求比率用于描述访问所述数据块的请求中请求大小小于第二预设值的请求的比率,所述写请求比率用于描述访问所述数据块的请求中写请求的比率,所述第一预设值大于 或等于所述第二预设值。
  7. 根据权利要求1所述的数据写入方法,所述基于所述历史访问数据判断待下刷数据是否为所述只写数据,包括:
    基于统计学算法,根据所述历史访问数据判断所述待下刷数据是否为所述只写数据。
  8. 根据权利要求7所述的数据写入方法,所述基于统计学算法,根据所述历史访问数据判断所述待下刷数据是否为所述只写数据,包括:
    基于所述历史访问数据,确定多个时间窗口内针对所述待下刷数据对应的数据块的写请求次数;
    根据所述写请求次数计算所述待下刷数据对应的数据块的只写概率;其中,所述只写概率用于描述所述数据块存储只写数据的概率;
    判断所述只写概率是否大于第三预设值,若是,则判定所述待下刷数据为所述只写数据,若否,则判定所述待下刷数据为普通数据。
  9. 根据权利要求1至8任一项所述的数据写入方法,所述将所述待下刷数据写入硬盘驱动器中,包括:
    将所述缓存中所述待下刷数据对应的脏数据刷新至所述硬盘驱动器;
    将所述待下刷数据写入硬盘驱动器中。
  10. 根据权利要求1至8任一项所述的数据写入方法,所述将所述待下刷数据写入缓存中,包括:
    将所述待下刷数据写入缓存中;
    当满足预设条件时,将所述缓存中的数据刷新至所述硬盘驱动器中。
  11. 根据权利要求1至8任一项所述的数据写入方法,所述将所述待下刷数据写入缓存中,包括:
    将所述待下刷数据同时写入所述缓存和所述硬盘驱动器中。
  12. 根据权利要求1至8任一项所述的数据写入方法,所述当针对写缓冲区触发数据下刷操作时,获取所述写缓冲区中待下刷数据对应的数据块的历史访问数据,包括:
    当所述写缓冲区的数据量达到阈值时,针对所述写缓冲区触发数据下刷操作,并获取所述写缓冲区中所述待下刷数据对应的数据块的历史访问数据。
  13. 一种数据写入装置,包括:
    第一写入模块,用于当接收到写入请求时,将所述写入请求对应的写入数据写入写缓冲区中;
    获取模块,用于当针对所述写缓冲区触发数据下刷操作时,获取所述写缓冲区中待下刷数据对应的数据块的历史访问数据;
    判断模块,用于基于所述历史访问数据判断所述待下刷数据是否为只写数据;
    第二写入模块,用于当所述待下刷数据为只写数据时,将所述待下刷数据写入硬盘驱动器中;
    第三写入模块,用于当所述待下刷数据为非只写数据时,将所述待下刷数据写入缓存中。
  14. 一种存储服务器,包括:
    处理器和存储器;
    其中,所述处理器用于执行所述存储器中存储的程序;
    所述存储器用于存储程序,所述程序至少用于执行权利要求1至12任一项所述的数据写入方法。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至12任一项所述数据写入方法的步骤。
  16. 一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行如上述权利要求1至12所述的数据写入方法。
PCT/CN2020/119881 2019-12-17 2020-10-09 数据写入方法、装置及存储服务器和计算机可读存储介质 WO2021120789A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/523,730 US11947829B2 (en) 2019-12-17 2021-11-10 Data writing method, device, storage server, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911304631.1 2019-12-17
CN201911304631.1A CN111104066B (zh) 2019-12-17 2019-12-17 数据写入方法、装置及存储服务器和计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/523,730 Continuation US11947829B2 (en) 2019-12-17 2021-11-10 Data writing method, device, storage server, and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021120789A1 true WO2021120789A1 (zh) 2021-06-24

Family

ID=70422581

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119881 WO2021120789A1 (zh) 2019-12-17 2020-10-09 数据写入方法、装置及存储服务器和计算机可读存储介质

Country Status (3)

Country Link
US (1) US11947829B2 (zh)
CN (1) CN111104066B (zh)
WO (1) WO2021120789A1 (zh)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104066B (zh) 2019-12-17 2021-07-27 华中科技大学 数据写入方法、装置及存储服务器和计算机可读存储介质
CN113805787A (zh) * 2020-06-11 2021-12-17 中移(苏州)软件技术有限公司 数据写入方法、装置、设备及存储介质
CN111984552A (zh) * 2020-08-21 2020-11-24 苏州浪潮智能科技有限公司 一种缓存管理方法、装置及电子设备和存储介质
CN112162693B (zh) * 2020-09-04 2024-06-18 郑州浪潮数据技术有限公司 一种数据刷写方法、装置、电子设备和存储介质
CN112130766A (zh) * 2020-09-17 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 一种基于Flash存储器的写数据方法、装置、设备及存储介质
CN114817319A (zh) * 2021-01-21 2022-07-29 华为云计算技术有限公司 一种缓存管理方法、装置及设备
CN113076062B (zh) * 2021-03-30 2023-01-06 山东英信计算机技术有限公司 一种提升qlcssd寿命的方法和设备
CN113204573B (zh) * 2021-05-21 2023-07-07 珠海金山数字网络科技有限公司 一种数据读写访问系统及方法
CN113534566B (zh) * 2021-06-10 2022-04-29 华中科技大学 一种光学可编程逻辑阵列器件
CN113703673B (zh) * 2021-07-30 2023-09-22 郑州云海信息技术有限公司 一种单机数据存储方法及相关装置
CN113900591A (zh) * 2021-09-30 2022-01-07 中国电力科学研究院有限公司 延长存储器寿命的方法、装置、电子设备及存储介质
CN114327280B (zh) * 2021-12-29 2024-02-09 以萨技术股份有限公司 一种基于冷热分离存储的消息存储方法及系统
CN115328402A (zh) * 2022-08-18 2022-11-11 三星(中国)半导体有限公司 数据缓存的方法和装置
CN117666937A (zh) * 2022-08-31 2024-03-08 华为技术有限公司 存储的方法、装置和电子设备
CN115758206B (zh) * 2022-11-07 2023-05-16 武汉麓谷科技有限公司 一种快速查找ZNS固态硬盘中NorFlash上次写结束位置的方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120117345A1 (en) * 2006-11-22 2012-05-10 Hitachi, Ltd. Method and apparatus for backup and restore in a dynamic chunk allocation storage system
CN104461935A (zh) * 2014-11-27 2015-03-25 华为技术有限公司 一种进行数据存储的方法、装置及系统
CN104571954A (zh) * 2014-12-26 2015-04-29 杭州华为数字技术有限公司 一种数据存储方法及装置
CN110007870A (zh) * 2019-04-12 2019-07-12 苏州浪潮智能科技有限公司 一种存储设备写请求处理方法及相关装置
CN110019210A (zh) * 2017-11-24 2019-07-16 阿里巴巴集团控股有限公司 数据写入方法及设备
CN110262758A (zh) * 2019-06-28 2019-09-20 深信服科技股份有限公司 一种数据存储管理方法、系统及相关设备
CN111104066A (zh) * 2019-12-17 2020-05-05 华中科技大学 数据写入方法、装置及存储服务器和计算机可读存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080140934A1 (en) * 2006-12-11 2008-06-12 Luick David A Store-Through L2 Cache Mode
CN103049395B (zh) * 2012-12-10 2015-12-23 记忆科技(深圳)有限公司 缓存存储设备数据的方法及其系统
US9395924B2 (en) * 2013-01-22 2016-07-19 Seagate Technology Llc Management of and region selection for writes to non-volatile memory
AU2016393275B2 (en) * 2016-02-19 2019-10-10 Huawei Technologies Co., Ltd. Method and apparatus for accessing flash memory device
US10324799B2 (en) * 2017-09-28 2019-06-18 International Business Machines Corporation Enhanced application write performance
CN109725824A (zh) * 2017-10-27 2019-05-07 伊姆西Ip控股有限责任公司 用于向存储系统中的盘阵列写入数据的方法和设备
CN109947363B (zh) * 2018-12-11 2022-10-14 深圳供电局有限公司 一种分布式存储系统的数据缓存方法
CN110427158B (zh) * 2019-07-29 2023-06-20 浙江华忆芯科技有限公司 固态硬盘的写入方法及固态硬盘
KR102691906B1 (ko) * 2019-08-29 2024-08-06 에스케이하이닉스 주식회사 저장 장치 및 그 동작 방법

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120117345A1 (en) * 2006-11-22 2012-05-10 Hitachi, Ltd. Method and apparatus for backup and restore in a dynamic chunk allocation storage system
CN104461935A (zh) * 2014-11-27 2015-03-25 华为技术有限公司 一种进行数据存储的方法、装置及系统
CN104571954A (zh) * 2014-12-26 2015-04-29 杭州华为数字技术有限公司 一种数据存储方法及装置
CN110019210A (zh) * 2017-11-24 2019-07-16 阿里巴巴集团控股有限公司 数据写入方法及设备
CN110007870A (zh) * 2019-04-12 2019-07-12 苏州浪潮智能科技有限公司 一种存储设备写请求处理方法及相关装置
CN110262758A (zh) * 2019-06-28 2019-09-20 深信服科技股份有限公司 一种数据存储管理方法、系统及相关设备
CN111104066A (zh) * 2019-12-17 2020-05-05 华中科技大学 数据写入方法、装置及存储服务器和计算机可读存储介质

Also Published As

Publication number Publication date
US11947829B2 (en) 2024-04-02
US20220066691A1 (en) 2022-03-03
CN111104066A (zh) 2020-05-05
CN111104066B (zh) 2021-07-27

Similar Documents

Publication Publication Date Title
WO2021120789A1 (zh) 数据写入方法、装置及存储服务器和计算机可读存储介质
CN103902474B (zh) 一种支持固态盘缓存动态分配的混合存储系统和方法
CN104335175B (zh) 基于系统性能度量在系统节点之间标识和迁移线程的方法和系统
CN110226157A (zh) 用于减少行缓冲冲突的动态存储器重新映射
JP2013521579A5 (zh)
WO2016141735A1 (zh) 缓存数据的确定方法及装置
WO2019062417A1 (zh) 应用清理方法、装置、存储介质及电子设备
CN101071403B (zh) 动态更新自适应的空闲计时器
WO2023050712A1 (zh) 一种深度学习业务的任务调度方法及相关装置
US11138104B2 (en) Selection of mass storage device streams for garbage collection based on logical saturation
CN112667528A (zh) 一种数据预取的方法及相关设备
CN112286459A (zh) 一种数据处理方法、装置、设备及介质
EP2919120A1 (en) Memory monitoring method and related device
CN110297787A (zh) I/o设备访问内存的方法、装置及设备
US9081660B2 (en) Method and system for efficiently swapping pieces into and out of DRAM
CN109086141A (zh) 内存管理方法和装置以及计算机可读存储介质
CN112988332A (zh) 一种虚拟机热迁移预测方法、系统和计算机可读存储介质
US20240192880A1 (en) Data processing method, apparatus, and system
CN117235088B (zh) 一种存储系统的缓存更新方法、装置、设备、介质及平台
CN109783019B (zh) 一种数据智能存储管理方法与装置
CN106201918A (zh) 一种基于大数据量和大规模缓存快速释放的方法和系统
CN115794366A (zh) 一种内存预取方法及装置
CN111126619B (zh) 一种机器学习方法与装置
CN115840654B (zh) 消息的处理方法、系统、计算设备及可读存储介质
CN103176753A (zh) 存储设备及其数据管理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20903869

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20903869

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20.10.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20903869

Country of ref document: EP

Kind code of ref document: A1