WO2022088374A1

WO2022088374A1 - Data processing method and apparatus

Info

Publication number: WO2022088374A1
Application number: PCT/CN2020/132911
Authority: WO
Inventors: 占志刚; 程威
Original assignee: 北京泽石科技有限公司; 泽石科技(武汉)有限公司
Priority date: 2020-10-30
Filing date: 2020-11-30
Publication date: 2022-05-05
Also published as: CN112306414A

Abstract

A data processing method and apparatus. The method comprises: establishing a multi-dimensional function model, wherein the multi-dimensional function model comprises multi-dimensional indexes, and the multi-dimensional indexes respectively correspond to a plurality of conditions for screening data blocks (S102); determining a multi-dimensional space corresponding to the multi-dimensional function model, and a plurality of centroids of the multi-dimensional space (S104); according to the plurality of centroids, clustering a plurality of data blocks by means of a clustering algorithm, so as to obtain a plurality of block clusters corresponding to the plurality of centroids (S106); and selecting a block cluster corresponding to a centroid, of which the multi-dimensional index meets a plurality of conditions, as a target block cluster (S108). By means of the method, the technical problem in the related art of poor consistency and universality of data block processing due to characteristic measurement of data blocks being limited in terms of management manner of data storage blocks is solved.

Description

Data processing method and device

The present disclosure takes the Chinese patent document with the application number of 202011193950.2 and the title of “Data Processing Method and Apparatus” filed on October 30, 2020 as a priority document, the entire contents of which are incorporated into the present disclosure by reference.

technical field

The present disclosure relates to the field of data processing, and in particular, to a data processing method and apparatus.

Background technique

NAND Flash block management includes the use of clean blocks and the recovery of dirty blocks, mainly in terms of wear leveling, including dynamic leveling (garbage collection, etc.) and static leveling. The implementation of the two may need to be adjusted according to requirements.

The use of clean blocks is relatively simple, just consider the wear degree of the block. For garbage collection of dirty blocks, classic algorithms include Greedy policy, Cost-benefit policy, Cost-Age-Times (CAT) policy, etc. The block characteristics measured by these algorithms are limited, and most of them are guided by experience. It is difficult to find the global optimum, and may not be consistent and universal.

For the above problems, no effective solution has been proposed yet.

SUMMARY OF THE INVENTION

The embodiments of the present disclosure provide a data processing method and device, so as to at least solve the management method of data storage blocks in the related art, and there is a problem that the feature measurement of data blocks is limited, resulting in poor consistency and generality of data block processing. technical problem.

According to an aspect of the embodiments of the present disclosure, a data processing method is provided, including: establishing a multi-dimensional function model, wherein the multi-dimensional function model includes multi-dimensional indices, and the multi-dimensional indices respectively correspond to multiple conditions for filtering data blocks ; Determine the multi-dimensional space corresponding to the multi-dimensional function model, and a plurality of centroids of the multi-dimensional space; According to the plurality of centroids, cluster a plurality of data blocks by a clustering algorithm, and obtain a plurality of centroids corresponding to the plurality of centroids. multiple block clusters; select the block cluster corresponding to the centroid whose multi-dimensional index satisfies the multiple conditions as the target block cluster.

In some disclosed embodiments, determining a multi-dimensional space corresponding to the multi-dimensional function model and multiple centroids of the multi-dimensional space includes: determining the number of centroids of the multiple centroids according to the number of multi-dimensional indices of the multi-dimensional function model the number of centroids, wherein the number of centroids is one more than the number of multi-dimensional indices; determine the coordinates of the number of centroids in the multi-dimensional space, wherein the centroids are the multiple data blocks in the multi-dimensional space The end point of the range in space, the centroid is on the coordinate axis of the multidimensional space.

In some disclosed embodiments, according to the plurality of centroids, clustering a plurality of data blocks through a clustering algorithm to obtain a plurality of block clusters corresponding to the plurality of centroids includes: determining the plurality of data blocks The coordinates of the blocks in the multi-dimensional space; weighting the coordinates of the multiple data blocks; according to the weighted coordinates of the multiple data blocks, calculate the distance between the multiple data blocks and the multiple particle points respectively Euclidean distance; using the size of the Euclidean distance as a clustering condition, cluster the multiple data blocks; obtain multiple block clusters corresponding to the multiple centroids.

In some disclosed embodiments, selecting the block cluster corresponding to the centroid whose multidimensional index satisfies the multiple conditions as the target block cluster includes: determining the target multidimensional index satisfying the multiple conditions according to the multiple conditions; The multi-dimensional index is used to determine the target centroid whose coordinates correspond to the target multi-dimensional index; the block cluster corresponding to the target centroid is used as the target block cluster.

In some disclosed embodiments, after selecting the block cluster corresponding to the centroid whose multi-dimensional index satisfies the multiple conditions as the target block cluster, the method further includes: determining the actual centroid according to the coordinates of the data blocks in the target block cluster; The actual centroid performs the clustering operation of subsequent data blocks.

In some disclosed embodiments, the condition is a filtering condition that meets the requirements of a data processing method; the data processing method includes at least one of the following: performing a write operation on the plurality of data blocks; Data blocks are reclaimed.

In some disclosed embodiments, the plurality of filter conditions include at least one of the following: the number of valid pages of the data block is 0 or the number of valid pages is less than a preset number; the wear degree of the data block is less than the preset wear degree; the popularity of the data block is 0 or the heat is less than the preset heat.

According to another aspect of the embodiments of the present disclosure, there is also provided a data processing apparatus, comprising: a building module configured to build a multi-dimensional function model, wherein the multi-dimensional function model includes multi-dimensional indices, and the multi-dimensional indices correspond to corresponding data respectively multiple conditions for the block to be screened; a determination module, configured to determine a multi-dimensional space corresponding to the multi-dimensional function model, and multiple centroids of the multi-dimensional space; a clustering module, configured to determine the multiple centroids through clustering The algorithm clusters multiple data blocks to obtain multiple block clusters corresponding to the multiple centroids; the selection module is configured to select the block clusters corresponding to the centroids whose multi-dimensional indices satisfy the multiple conditions as the target block clusters.

According to another aspect of the embodiments of the present disclosure, a computer storage medium is also provided, where the computer storage medium includes a stored program, wherein when the program runs, a device where the computer storage medium is located is controlled to execute any one of the above The data processing method described in item.

According to another aspect of the embodiments of the present disclosure, a processor is further provided, and the processor is configured to run a program, wherein when the program runs, any one of the data processing methods described above is executed.

In the embodiment of the present disclosure, a multi-dimensional function model is established, wherein the multi-dimensional function model includes a multi-dimensional index, and the multi-dimensional index corresponds to a plurality of conditions for screening data blocks respectively; the multi-dimensional space corresponding to the multi-dimensional function model is determined, and the multi-dimensional According to the multiple centroids, cluster multiple data blocks through a clustering algorithm to obtain multiple block clusters corresponding to the multiple centroids; select the block cluster corresponding to the centroid whose multi-dimensional index satisfies multiple conditions as the target block cluster By establishing a multi-dimensional function model, selecting mass points in the multi-dimensional function model for clustering, and classifying multiple data blocks, so as to quickly filter the data blocks, and achieve the rapid screening of multiple data blocks from multiple dimensions. Therefore, the technical effect of improving the measurement efficiency of data blocks and improving the consistency and versatility of data block processing is realized, and the management method of data storage blocks in related technologies is solved. Technical issues that lead to poor consistency and generality of data block processing.

Description of drawings

The accompanying drawings described herein are used to provide a further understanding of the present disclosure and constitute a part of the present application. The exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure and do not constitute an improper limitation of the present disclosure. In the attached image:

1 is a flowchart of a data processing method according to an embodiment of the present disclosure;

2 is a schematic diagram of clustering in a multi-dimensional space according to an embodiment of the present disclosure;

3 is a flowchart of a storage data block management method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure.

Detailed ways

In order to make those skilled in the art better understand the solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only Embodiments are part of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.

It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, rather than to describe a specific sequence or sequence. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

According to an embodiment of the present disclosure, a method embodiment of a data processing method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings may be executed in a computer system such as a set of computer-executable instructions, and, Although a logical order is shown in the flowcharts, in some cases steps shown or described may be performed in an order different from that herein.

FIG. 1 is a flowchart of a data processing method according to an embodiment of the present disclosure. As shown in FIG. 1 , the method includes the following steps:

Step S102, establishing a multi-dimensional function model, wherein the multi-dimensional function model includes a multi-dimensional index, and the multi-dimensional index corresponds to a plurality of conditions for screening data blocks;

Step S104, determining the multi-dimensional space corresponding to the multi-dimensional function model, and multiple centroids of the multi-dimensional space;

Step S106, according to a plurality of centroids, cluster a plurality of data blocks by a clustering algorithm to obtain a plurality of block clusters corresponding to the plurality of centroids;

In step S108, the block cluster corresponding to the centroid whose multi-dimensional index satisfies multiple conditions is selected as the target block cluster.

Through the above steps, a multi-dimensional function model is established, wherein the multi-dimensional function model includes a multi-dimensional index, and the multi-dimensional index corresponds to a plurality of conditions for screening data blocks respectively; the multi-dimensional space corresponding to the multi-dimensional function model and the multi-dimensional space are determined. According to multiple centroids, multiple data blocks are clustered by a clustering algorithm, and multiple block clusters corresponding to multiple centroids are obtained; the block cluster corresponding to the centroid whose multi-dimensional index satisfies multiple conditions is selected as the target block cluster, By establishing a multi-dimensional function model, selecting particles in the multi-dimensional function model for clustering, and classifying multiple data blocks, the data blocks can be quickly screened, and the purpose of quickly screening multiple data blocks from multiple dimensions is achieved. It achieves the technical effect of improving the measurement efficiency of data blocks, improving the consistency and versatility of data block processing, and then solving the management method of data storage blocks in related technologies. Deal with technical issues with poor consistency and generality.

The above multi-dimensional function model can be d=f(x, y, z...), x, y, z... represents the multi-dimensional index that affects the multi-dimensional function value, that is, multiple independent variables, and the above multi-dimensional index can be in the data block. Conditions or indicators of multiple dimensions when using, that is, multiple conditions for filtering data blocks. For example, when writing a clean data block or reclaiming a dirty data block, three factors need to be considered, the number of valid pages of the data block, the wear degree of the data block, that is, the number of programming/erasing, The heat conversion value of the data block, that is, the difference between the initial programming time of the data block and the current time is M. The larger the difference, the smaller the heat. In order to avoid the locality principle, the data that has just been moved will be written again soon. , so the block with less heat is usually selected for garbage collection, and N is set to a fixed constant, and the heat value is equal to N-M.

In some disclosed embodiments, in the above process of writing a clean data block, a block (empty block) with a valid page of 0, a low degree of wear, and a heat of 0 is preferentially selected. In the above process of reclaiming dirty data blocks, a block with fewer valid pages, less wear and less heat is selected. Whether it is the above-mentioned writing or recycling process, there is a process of filtering a large number of data blocks.

The multi-dimensional space corresponding to the above multi-dimensional function can be a multi-dimensional space established by using the index of each dimension as a coordinate axis, and the eigenvalues of multiple data blocks on the multi-dimensional index can be used as the data block in the multi-dimensional space. Corresponding to the coordinate value of the coordinate axis, so that multiple data blocks are represented in the form of points in the multi-dimensional space.

In a multi-dimensional space, the number of the above-mentioned centroids is one more than the dimension of the space. In the case that the multi-dimensional space is a three-dimensional space, the above-mentioned number of centroids is 4, which are (1) (x=0, y=0, z=0 respectively). ); (2) (x=maximum, y=0, z=0); (3) (x=0, y=maximum, z=0); (4) (x=0, y=0, z = maximum value). The above x is equal to the maximum value, that is, the maximum value of the index corresponding to the x-axis in multiple data blocks; similarly, y is equal to the maximum value, which can be the maximum value of the index corresponding to the y-axis in multiple data blocks; z is equal to The maximum value can be the maximum value of the indices corresponding to the z-axis in multiple data blocks.

Accordingly, it can be inferred that the five centroids in the four-dimensional space are, respectively, (1) (w=0, x=0, y=0, z=0; (2) (w=maximum value, x=0, y=0, z=0); (3) (w=0, x=maximum, y=0, z=0); (4) (w=0, x=0, y=maximum, z= 0); (5) (w=0, x=0, y=0, z=maximum value). The centroid coordinates of other multi-dimensional spaces are analogous, and the centroid can be the coordinate origin of the multi-dimensional space, and each coordinate axis corresponds to The point of the maximum value of the multiple data blocks, the centroid is on the coordinate axes of the multidimensional space.

The above-mentioned clustering algorithm may be a K-means clustering algorithm. Through the above-mentioned clustering algorithm, multiple data blocks can be clustered into data block clusters corresponding to multiple centroids, that is, a set of data blocks. Thereby, a plurality of data blocks are classified, and according to the multi-dimensional index satisfied by the centroid and whether multiple conditions for screening the data blocks are satisfied, the data block cluster corresponding to the centroid is determined as the target block cluster for screening.

By establishing a multi-dimensional function model, selecting particles in the multi-dimensional function model for clustering, and classifying multiple data blocks, the data blocks can be quickly screened, and the purpose of quickly screening multiple data blocks from multiple dimensions is achieved. It achieves the technical effect of improving the measurement efficiency of data blocks, improving the consistency and versatility of data block processing, and then solving the management method of data storage blocks in related technologies. Poor consistency and generality of block processing" technical issues.

In some disclosed embodiments, determining the multi-dimensional space corresponding to the multi-dimensional function model and the multiple centroids of the multi-dimensional space includes: determining the number of the multiple centroids according to the number of the multi-dimensional indices of the multi-dimensional function model, wherein the number of the centroids is greater than the number of the centroids. The number of multi-dimensional indices is one more; the coordinates of a number of centroids in the multi-dimensional space are determined, wherein the centroid is the range end point of the plurality of data blocks in the multi-dimensional space, and the above-mentioned centroids are on the coordinate axis of the multi-dimensional space.

When determining the initial centroid, not only the number of centroids, but also the maximum value of multiple data blocks on multiple coordinate axes, so as to determine the initial centroid coordinates. The above range endpoints can be the maximum or minimum value of the coordinate range. It mainly depends on the situation of the coordinate system and the distribution of multiple data blocks in the coordinate system. In this embodiment, when writing a clean data block or recycling a dirty data block, the indicators of the data block are all non-negative values, so the above range endpoint may be the maximum value of the coordinate range of the coordinate axis.

In some disclosed embodiments, clustering a plurality of data blocks by a clustering algorithm according to a plurality of centroids to obtain a plurality of block clusters corresponding to the plurality of centroids includes: determining the distribution of the plurality of data blocks in a multi-dimensional space. Coordinates; weight the coordinates of multiple data blocks; calculate the Euclidean distances between multiple data blocks and multiple particle points according to the weighted coordinates of multiple data blocks; use the size of the Euclidean distance as the clustering condition , clustering multiple data blocks; obtain multiple block clusters corresponding to multiple centroids.

The conventional K-means clustering algorithm usually selects the Euclidean distance from the target point to the centroid as a condition for measuring whether or not to cluster. In this embodiment, the coordinates of the weighted data block and the Euclidean distance between the centroid coordinates are used to determine the data. A measure of whether the blocks can be clustered in the data block clusters for this centroid. For example, in the above three-dimensional space, after the coordinates (x, y, z) are weighted, the actual algorithm logical coordinates are (ax, by, cz). Among them, a, b, and c are the weights of the corresponding features, and a+b+c=1. By adjusting the feature weights, all data blocks can be reasonably distributed in the corresponding sets of 4 centroids.

In some disclosed embodiments, selecting the block cluster corresponding to the centroid whose multidimensional index satisfies multiple conditions as the target block cluster includes: determining the target multidimensional index satisfying the multiple conditions according to the multiple conditions; The target centroid corresponding to the target multi-dimensional index; the block cluster corresponding to the target centroid is taken as the target block cluster.

For example, in the above process of writing a clean data block, a block with a valid page of 0, a low degree of wear, and a heat of 0, that is, an empty block, is preferentially selected. In the above process of reclaiming dirty data blocks, select blocks with fewer valid pages, less wear, and less heat. Then in the process of writing clean data blocks or recycling dirty data blocks, select centroid (1) (x=0, y=0, z=0) as the target centroid, which represents the The blocks contained in the block cluster are more satisfactory than the blocks contained in the block clusters represented by other centroids. Therefore, the data block cluster corresponding to the centroid (1) is the target data block cluster.

In some disclosed embodiments, after selecting the block cluster corresponding to the centroid whose multi-dimensional index satisfies multiple conditions as the target block cluster, the method further includes: determining the actual centroid according to the coordinates of the data blocks in the target block cluster; and performing subsequent steps according to the actual centroid. Clustering operations on data blocks.

The actual centroid is recalculated according to the coordinates of the corresponding points of each data block in the target data block cluster, and the actual centroid of the target data block cluster can be calculated by taking the average value based on the coordinates of the data blocks in all the data block clusters. After obtaining the target data block cluster and the actual center of mass, the entire life cycle of the subsequent solid-state storage device takes the actual center as the starting point for data block management. Compared with the actual centroid, the multi-dimensional space corresponding to the multi-dimensional function model determined above, as well as multiple centroids in the multi-dimensional space, can be the initial centroids, and the initial centroids are all on the coordinate axis of the multi-dimensional space, but the actual centroid may not on the coordinate axes of space. For example, when filtering a data block next time, the centroid can be used as the origin of the multi-dimensional space coordinate system, and the next data block filtering can be performed. It should be noted that the above centroids and target block clusters can be iteratively updated with the screening of each data block to ensure the validity and accuracy of the next use.

In some disclosed embodiments, the above-mentioned condition for selecting a multi-dimensional index is a screening condition that meets the requirements of a data processing method; the above-mentioned data processing method includes at least one of the following: performing a write operation on multiple data blocks; Perform a recycling operation.

It should be noted that the embodiments of the present application further provide an implementation manner, which will be described in detail below.

During the normal use of the solid-state storage device, in an unsupervised learning manner, a number of particles are selected as cluster centers, and all blocks are automatically aggregated into several classes (clusters). These particle-centered NAND Flash block clusters represent the set of blocks to be used in solid-state storage devices.

The purpose of this embodiment is to solve the problem that the existing general algorithms cannot take into account wear leveling and writing efficiency, and provide a general block management method that takes into account multiple target characteristics.

For the sample set D = {x ₁ , x ₂ , ..., x _m }, the K-means algorithm is to minimize the squared error for the clustering partition C = {C ₁ , C ₂ , ..., C _k }:

in

is the mean vector of clusters C _i . It can be seen from the above formula that the formula depicts the closeness of the samples in the cluster around the cluster mean vector. The smaller the E value, the higher the similarity of the samples in the cluster.

The purpose of this embodiment is to divide the samples into several categories (clusters) according to the similarity of comprehensive features, and select a category with better comprehensive features as the target during writing and garbage collection.

FIG. 3 is a flowchart of a method for managing storage data blocks according to an embodiment of the present disclosure. As shown in FIG. 3 , first, a multi-dimensional function model d=f(x, y, z...) is established, where x, y, z, etc. is the discrete eigenvalue of the NAND Flash block. For example, x is the number of valid pages of the block, y is the wear degree of the block, including the number of programming and/or erasing times, z is the heat conversion value of the block, and the difference between the initial programming time of the block and the current time is M, the difference value The larger the value, the smaller the heat. In order to avoid the locality principle causing the data that has just been moved to be written again quickly, the block with less heat is usually selected for garbage collection, and N is set to a fixed constant, z=N-M.

The principles of block usage are: (1) When writing, select a block with 0 valid pages, less wear, and 0 heat, that is, empty blocks; (2) When garbage collection, select fewer valid pages and wear less. Smaller, less hot blocks.

All NAND Flash blocks in a solid-state storage device can be regarded as discrete points in a multi-dimensional space, such as a three-dimensional space, and these points have characteristic values such as x, y, and z.

After the multi-dimensional space is determined, the initial centroid is selected as the cluster center. The number of centroids is equal to the model dimension plus 1. In a three-dimensional model, the number of centroids is K=4. The coordinates of the four initial centroids are: (1)(x=0, y=0, z=0); (2)(x =max, y=0, z=0); (3) (x=0, y=max, z=0); (4) (x=0, y=0, z=max). That is, 4 points on the coordinate axis of the three-dimensional coordinate system, and the maximum value refers to the maximum value of this feature in all blocks.

The conventional K-means clustering algorithm usually selects the Euclidean distance from the target point to the centroid as a condition for measuring whether or not to cluster, and the present disclosure uses the weighted Euclidean distance as the measuring condition. In the three-dimensional space, after the coordinates (x, y, z) are weighted, the actual algorithm logical coordinates are (ax, by, cz). Among them, a, b, and c are the weights of the corresponding features, and a+b+c=1. By adjusting the feature weights, all blocks can be reasonably distributed in 4 sets.

After the initial centroids are determined, data allocation is required. According to the above method, using the Euclidean distance from the weighted logical coordinates to the initial mass point as the clustering condition, K-means clustering is performed. After the K-means clustering is completed, 4 NAND Flash block clusters (sets) are obtained.

After the initial cluster is obtained, the initial centroid has no effect. At this time, the centroid is recalculated according to the coordinates of the elements in the cluster, and is calculated based on the average coordinate of all cluster elements. After the initial cluster and centroid are obtained, the entire life cycle of subsequent solid-state storage devices takes this as the starting point for block management.

Among them, the centroid obtained based on the coordinates (1) (x=0, y=0, z=0) and the cluster obtained by the centroid clustering are the target clusters for selecting blocks during writing and garbage collection. The cluster is The set of optimal solutions obtained after synthesizing each feature. Selecting blocks in the cluster for writing and garbage collection can make the solid-state storage device achieve a relatively good wear leveling state, while taking into account the read and write performance. When adding or removing block elements in a cluster, according to the method of unsupervised learning, block elements automatically join or leave the corresponding cluster, and the position of the centroid is adjusted in real time.

The clusters obtained based on the coordinates (2) (3) (4) and their subsequent centroids are a collection of blocks with poor eigenvalues, which are all poor solutions for the target operation of the storage device.

FIG. 2 is a schematic diagram of clustering in a multi-dimensional space according to an embodiment of the present disclosure. As shown in FIG. 2 , the block represented by the coordinates of the circle near the centroid (1) (x=0, y=0, z=0) is located Collection, which is the target block collection for storage device writing and garbage collection.

FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure. As shown in FIG. 4, according to another aspect of the embodiment of the present disclosure, a data processing apparatus is further provided, including: a establishing module 42, a determining module 44. The clustering module 46 and the selection module 48 are described in detail below.

The establishment module 42 is configured to establish a multi-dimensional function model, wherein the multi-dimensional function model includes a multi-dimensional index, and the multi-dimensional index corresponds to a plurality of conditions for screening the data blocks respectively; the determination module 44 is connected to the above-mentioned establishment module 42, and is set to determine the multi-dimensional function The multi-dimensional space corresponding to the model, and the multiple centroids of the multi-dimensional space; the clustering module 46, connected with the above-mentioned determination module 44, is set to cluster a plurality of data blocks through a clustering algorithm according to the plurality of centroids, to obtain a plurality of data blocks. A plurality of block clusters corresponding to the centroids; the selection module 48, connected to the above-mentioned clustering module 46, is configured to select the block clusters corresponding to the centroids whose multidimensional indices satisfy multiple conditions as the target block clusters.

Through the above device, the establishment module 42 is used to establish a multi-dimensional function model, wherein the multi-dimensional function model includes a multi-dimensional index, and the multi-dimensional index corresponds to a plurality of conditions for screening data blocks; the determination module 44 determines the multi-dimensional space corresponding to the multi-dimensional function model, and the multi-dimensional Multiple centroids of space; the clustering module 46 clusters multiple data blocks through a clustering algorithm according to the multiple centroids to obtain multiple block clusters corresponding to the multiple centroids; the selection module 48 selects a multi-dimensional index to meet multiple conditions The block cluster corresponding to the centroid is the target block cluster, by establishing a multi-dimensional function model, selecting the mass points in the multi-dimensional function model for clustering, and classifying multiple data blocks, so as to quickly filter the data blocks, and achieve the goal of achieving a wide range of The purpose of quickly screening multiple data blocks by dimension, so as to achieve the technical effect of "improving the measurement efficiency of data blocks, and improving the consistency and versatility of data block processing", thereby solving the problem of data storage blocks in related technologies. In the management method, there is a technical problem that the characteristic measurement of the data block is limited, which leads to the poor consistency and generality of the data block processing.

According to another aspect of the embodiments of the present disclosure, a computer storage medium is also provided, and the computer storage medium includes a stored program, wherein when the program is executed, a device where the computer storage medium is located is controlled to execute any one of the data processing methods described above.

According to another aspect of the embodiments of the present disclosure, a processor is also provided, and the processor is configured to run a program, wherein when the program runs, any one of the data processing methods described above is executed.

The above-mentioned serial numbers of the embodiments of the present disclosure are only for description, and do not represent the advantages or disadvantages of the embodiments.

In the above-mentioned embodiments of the present disclosure, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed technical content may be implemented in other manners. The device embodiments described above are only illustrative, for example, the division of the units may be a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or Integration into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the part that contributes to the prior art, or all or part of the technical solutions, and the computer software product is stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes .

The above are only the preferred embodiments of the present disclosure. It should be pointed out that for those skilled in the art, without departing from the principles of the present disclosure, several improvements and modifications can be made. It should be regarded as the protection scope of the present disclosure.

Claims

A data processing method comprising:

establishing a multi-dimensional function model, wherein the multi-dimensional function model includes a multi-dimensional index, and the multi-dimensional index corresponds to a plurality of conditions for screening data blocks;

determining a multidimensional space corresponding to the multidimensional function model, and a plurality of centroids of the multidimensional space;

According to the plurality of centroids, clustering a plurality of data blocks through a clustering algorithm to obtain a plurality of block clusters corresponding to the plurality of centroids;

The block cluster corresponding to the centroid whose multi-dimensional index satisfies the multiple conditions is selected as the target block cluster.
The method according to claim 1, wherein determining a multi-dimensional space corresponding to the multi-dimensional function model and a plurality of centroids of the multi-dimensional space, comprising:

determining the number of the plurality of centroids according to the number of the multidimensional indices of the multidimensional function model, wherein the number of the centroids is one more than the number of the multidimensional indices;

Coordinates of the number of centroids in the multi-dimensional space are determined, wherein the centroids are range endpoints of a plurality of data blocks in the multi-dimensional space, and the centroids are on the coordinate axis of the multi-dimensional space.
The method according to claim 2, wherein, according to the plurality of centroids, clustering a plurality of data blocks by a clustering algorithm to obtain a plurality of block clusters corresponding to the plurality of centroids, comprising:

determining the coordinates of the plurality of data blocks in the multidimensional space;

weighting the coordinates of the plurality of data blocks;

According to the weighted coordinates of the plurality of data blocks, calculate the Euclidean distances of the plurality of data blocks relative to the plurality of particle points;

Using the size of the Euclidean distance as a clustering condition, the multiple data blocks are clustered; multiple block clusters corresponding to the multiple centroids are obtained.
The method according to claim 3, wherein selecting the block cluster corresponding to the centroid whose multidimensional index satisfies the multiple conditions as the target block cluster, comprising:

determining a target multidimensional index satisfying the plurality of conditions according to a plurality of conditions;

According to the target multi-dimensional index, determine the target centroid whose coordinates correspond to the target multi-dimensional index;

The block cluster corresponding to the target centroid is taken as the target block cluster.
The method according to claim 1, wherein after selecting the block cluster corresponding to the centroid whose multidimensional index satisfies the multiple conditions as the target block cluster, the method further comprises:

Determine the actual centroid according to the coordinates of the data blocks in the target block cluster;

The clustering operation of subsequent data blocks is performed according to the actual centroid.
The method according to claim 1, wherein the condition is a screening condition that meets the requirements of a data processing method; the data processing method includes at least one of the following:

performing a write operation on the plurality of data blocks;

A reclamation operation is performed on the plurality of data blocks.
The method of claim 6, wherein the plurality of screening conditions include at least one of the following:

The number of valid pages of the data block is 0 or the number of valid pages is less than the preset number;

The wear degree of the data block is less than the preset wear degree;

The data block's hotness is 0 or the hotness is less than the preset hotness.
A data processing device, comprising:

a building module, configured to build a multi-dimensional function model, wherein the multi-dimensional function model includes a multi-dimensional index, and the multi-dimensional index corresponds to a plurality of conditions for screening data blocks;

a determination module, configured to determine a multi-dimensional space corresponding to the multi-dimensional function model, and a plurality of centroids of the multi-dimensional space;

A clustering module, configured to cluster a plurality of data blocks through a clustering algorithm according to the plurality of centroids to obtain a plurality of block clusters corresponding to the plurality of centroids;

The selection module is configured to select the block cluster corresponding to the centroid whose multi-dimensional index satisfies the multiple conditions as the target block cluster.
A computer storage medium, the computer storage medium comprising a stored program, wherein when the program is executed, a device where the computer storage medium is located is controlled to execute the data processing method according to any one of claims 1 to 7.
A processor, wherein the processor is configured to run a program, wherein when the program runs, the data processing method according to any one of claims 1 to 7 is executed.