CN115470190A - Multi-storage-pool data classification storage method and system and electronic equipment - Google Patents

Multi-storage-pool data classification storage method and system and electronic equipment Download PDF

Info

Publication number
CN115470190A
CN115470190A CN202210908596.XA CN202210908596A CN115470190A CN 115470190 A CN115470190 A CN 115470190A CN 202210908596 A CN202210908596 A CN 202210908596A CN 115470190 A CN115470190 A CN 115470190A
Authority
CN
China
Prior art keywords
data
stored
type
hot
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210908596.XA
Other languages
Chinese (zh)
Inventor
李贵斌
吴学含
张翼
薛强
李家伟
蔡维珑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202210908596.XA priority Critical patent/CN115470190A/en
Publication of CN115470190A publication Critical patent/CN115470190A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The specification discloses a method, a system and an electronic device for classified storage of data in multiple storage pools, which aim to solve the problems of low flexibility and single decision index of the existing method. The method comprises the following steps: performing feature extraction on data to be stored, and determining data feature information of the data to be stored; performing data classification prediction on the data to be stored according to the data characteristic information to determine the data type of the data to be stored; and classifying and storing the data to be stored into a storage pool corresponding to the data type. The system comprises: the device comprises a feature extraction module, a data classification module and a distribution storage module. The electronic device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the multi-storage-pool data classification storage method when executing the program.

Description

Multi-storage-pool data classification storage method and system and electronic equipment
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a method, a system, and an electronic device for classified storage of data in multiple storage pools.
Background
The cloud storage can cooperate a large number of heterogeneous storage devices of different types in a network through technologies such as software definition and distributed clustering, and provides a high-capacity, high-performance, low-cost and strongly-available data storage service for the outside. In a heterogeneous cloud storage environment, a large gap exists between the performance and the purchase cost of a Solid State Disk (SSD) and a mechanical hard disk (HDD), and the SSD and the HDD are usually used for storing different types of data. With the rapid development of data applications, the data types of different applications are greatly different, so that the I/O modes also have remarkable heterogeneity.
In order to ensure the access performance of data and reduce the hardware construction cost, the cloud storage service needs to have the capability of distinguishing the types of the client application data and store different types of data into the most appropriate data pool. Currently, a cloud storage service generally establishes a special application data and an access rule of a specific type of hardware directly for a specific application, so as to achieve the purpose of classifying and storing the data. The method is lack of flexibility, various access rules need to be set manually, the heterogeneity of a data I/O mode is ignored, and the data storage mode cannot reasonably meet the requirements of multiple applications; and the access rule is often set and constructed only aiming at cost or availability, the requirements of multiple dimensions of the user are not comprehensively considered, and the user experience is poor.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a method, a system, and an electronic device for classified storage of data in multiple storage pools, so as to solve the problems of low flexibility and single decision index of the existing method.
In a first aspect, an embodiment of the present specification provides a method for classified storage of data in multiple storage pools, including:
performing feature extraction on data to be stored, and determining data feature information of the data to be stored;
performing data classification prediction on the data to be stored according to the data characteristic information to determine the data type of the data to be stored;
and classifying and storing the data to be stored into a storage pool corresponding to the data type.
Optionally, the data characteristic information includes a data size, a data request type, a life cycle, creation time, last access time, and last modification time of the data to be stored;
the data types include a hot read and hot write type, a hot read and cold write type, a cold read and hot write type, and a cold read and cold write type.
Optionally, the performing data classification prediction on the data to be stored according to the data feature information includes:
taking the data characteristic information as input, processing the input by utilizing a plurality of pre-trained class predictors, and outputting corresponding class probabilities, wherein the plurality of class predictors respectively correspond to a plurality of data types;
and selecting the classification predictor of which the class probability exceeds a preset threshold value, and determining the data type corresponding to the classification predictor as the data type of the data to be stored.
Optionally, the method for processing the input by using the classification predictor includes:
calculating the category probability corresponding to the input using a desired category probability formula;
the expected category probability formula is:
Figure BDA0003773200550000021
wherein, P represents the class probability, α, β represent regression parameters of the classification predictor, and x represents the data characteristic information.
Optionally, the training method of the class predictor includes:
constructing a training data set, wherein the training data set comprises a plurality of items of sample characteristic data and type labels corresponding to the sample characteristic data;
and training the regression parameters in the classification prediction by using a minimized loss function to determine the optimal parameters by using a plurality of items of sample characteristic data as the input of the classification predictor and a plurality of items of type labels as the expected output.
Optionally, the training the regression parameters in the classification prediction by minimizing a loss function includes:
solving a minimum value of a loss function by adopting maximum likelihood estimation, and taking the regression parameter of the classification predictor when the loss function takes the minimum value as the optimal parameter;
and when the minimum value of the loss function is solved by adopting maximum likelihood estimation, the regression parameters are adjusted by adopting a random gradient upward method.
Optionally, the classifying and storing the data to be stored into the storage pool corresponding to the data type includes:
responding to the fact that the data type of the data to be stored is determined to be a hot reading and writing type, distributing and writing the data to be stored into a hot data pool, wherein the hot data pool is composed of solid state disks;
responding to the fact that the data type of the data to be stored is determined to be a hot reading and cold writing type, and distributing and writing the data to be stored into a hot reading data pool, wherein the hot reading data pool comprises a solid state disk and a mechanical hard disk, and the solid state disk is used for reading and caching in the hot reading data pool;
responding to the fact that the data type of the data to be stored is determined to be a cold-read hot-write type, distributing and writing the data to be stored into a hot-write data pool, wherein the hot-write data pool comprises a solid state disk and a mechanical hard disk, and the solid state disk is used for writing and caching in the hot-write data pool;
and in response to the fact that the data type of the data to be stored is determined to be a cold reading and cold writing type, distributing and writing the data to be stored into a cold data pool, wherein the cold data pool is composed of mechanical hard disks.
Optionally, when the data to be stored is categorized and stored in the storage pool corresponding to the data type, the method further includes:
and storing the data to be stored in a multi-copy mode.
In a second aspect, the present specification further provides a multi-storage pool data classification storage system, the system including:
the characteristic extraction module is used for extracting the characteristics of the data to be stored and determining the data characteristic information of the data to be stored;
the data classification module is used for carrying out data classification prediction on the data to be stored according to the data characteristic information so as to determine the data type of the data to be stored; and
and the distribution storage module is used for classifying and storing the data to be stored into the storage pool corresponding to the data type.
In a third aspect, an embodiment of the present specification further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for classifying and storing data in multiple storage pools according to the first aspect.
As can be seen from the above, the method, the system, and the electronic device for storing data in multiple storage pools by classification provided by the embodiments of the present specification have the following beneficial technical effects:
and writing the data to be stored into the corresponding storage pool by extracting the data characteristic information of the data to be stored and determining the data type of the data to be stored based on the data characteristic information. In such a way, the data characteristic attributes are comprehensively considered and classified according to the difference of different types of application data on performance requirements. And placing the data into a storage pool which can be matched with the storage requirement according to the data classification result, and completing the self-adaptive classification of the data types without manual intervention. The problems of low flexibility and single decision index of the existing method can be effectively solved, the user data is scientifically and reasonably distributed and stored, the user data storage performance is improved, and the user experience is optimized.
Drawings
The features and advantages of the present description will be more clearly understood by reference to the accompanying drawings, which are schematic and are not to be understood as limiting in any way, in which:
FIG. 1 is a schematic diagram illustrating a multi-storage pool data classification storage method according to one or more embodiments of the present disclosure;
FIG. 2 is a schematic diagram illustrating a method for data classification prediction in a multi-storage pool data classification storage method according to one or more embodiments of the present disclosure;
FIG. 3 is a schematic diagram illustrating a method for training a class predictor in a multiple storage pool data classification storage method according to one or more embodiments of the present disclosure;
FIG. 4 is a schematic diagram illustrating an architecture of a multiple storage pool data classification storage system according to one or more embodiments of the present disclosure;
fig. 5 is a schematic diagram illustrating an electronic device architecture for multi-storage pool data classified storage according to one or more embodiments of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments obtained by a person skilled in the art without making any inventive step based on the embodiments in this description belong to the protection scope of this description.
The cloud storage can cooperate a large number of heterogeneous storage devices of different types in a network through technologies such as software definition and distributed clustering, and provides a high-capacity, high-performance, low-cost and strongly-available data storage service for the outside. In a heterogeneous cloud storage environment, a large gap exists between the performance and the purchase cost of a Solid State Disk (SSD) and a mechanical hard disk (HDD), and the SSD and the mechanical hard disk are generally used for storing different types of data. With the rapid development of data applications, the data types of different applications are greatly different, so that the I/O modes also have remarkable heterogeneity.
In order to ensure the access performance of data and reduce the hardware construction cost, the cloud storage service needs to have the capability of distinguishing the types of the client application data and store different types of data into the most appropriate data pool. Currently, a cloud storage service generally establishes a special application data and an access rule of a specific type of hardware directly for a specific application, so as to achieve the purpose of data classification storage. The method is lack of flexibility, various access rules need to be set manually, the heterogeneity of a data I/O mode is ignored, and the data storage mode cannot reasonably meet the requirements of multiple applications; and the access rule is often set and constructed only aiming at cost or availability, the requirements of multiple dimensions of the user are not comprehensively considered, and the user experience is poor.
In view of the above problems, an object of the present specification is to provide a method for classifying and storing data in multiple storage pools, which includes integrating heterogeneous storage devices to construct multiple data storage pools of different types, classifying data characteristics of application data of different types, and placing the data in different storage pools according to data classification results, so as to flexibly and reasonably perform adaptive classified storage on the application data.
In one aspect, the present specification provides a method for classified storage of data in multiple storage pools.
As shown in fig. 1, one or more alternative embodiments of the present specification provide a method for classifying and storing data in multiple storage pools, including:
s1: and performing feature extraction on the data to be stored, and determining data feature information of the data to be stored.
A storage request related to the data to be stored can be acquired, and the data characteristic information of the data to be stored is extracted and determined from the storage request; or, the feature extraction may be performed by directly scanning the data to be stored.
The extracted and determined data characteristic information may include static characteristic information of the data to be stored, such as data size, data request type, data keyword information, data-related user information, and the like; dynamic characteristic information of the data to be stored, such as data creation time, data life cycle, data access time and the like, can also be included.
In some preferred embodiments, the data characteristic information may include a data size S and an operation type T of the data to be stored y Life cycle D, creation time C t Last access time A t And last modification time M t . The data characteristic information can comprehensively reflect the data characteristic attributes of the data to be stored in a static layer and a dynamic layer.
S2: and performing data classification prediction on the data to be stored according to the data characteristic information to determine the data type of the data to be stored.
In some alternative embodiments, the data types may include a hot read and hot write type, a hot read and cold write type, a cold read and hot write type, and a cold read and cold write type. The hot reading and hot writing data refers to data which needs to be read and written frequently, the hot reading and hot writing data refers to data with high reading access frequency and relatively low writing access frequency, the cold reading and hot writing data refers to data with low reading access frequency and relatively high writing access frequency, and the cold reading and cold writing data refers to data with low reading and writing frequencies.
The data characteristic information is used for representing a plurality of characteristic attributes of the data to be stored, and the data to be stored can be accurately classified according to the data characteristic information. The data classification algorithm for classifying and predicting the data to be stored can be a logistic regression algorithm, a linear discriminant analysis algorithm, a K nearest neighbor classification algorithm, a naive Bayes algorithm, a decision tree algorithm, a support vector machine algorithm and the like. And analyzing and learning the data characteristic information by using a neural network by adopting a deep learning algorithm to determine the data category of the data to be stored.
S3: and classifying and storing the data to be stored into a storage pool corresponding to the data type.
A plurality of storage pools corresponding to a plurality of data types can be constructed in advance, and after the data type of the data to be stored is determined, the data to be stored is classified and written into the corresponding storage pool. Wherein a plurality of the storage pools can match storage requirements of the corresponding data type data.
The data classification storage method for the multiple storage pools extracts the data characteristic information of the data to be stored, determines the data type of the data to be stored based on the data characteristic information, and writes the data to be stored into the corresponding storage pool. In such a way, the data characteristic attributes are comprehensively considered and classified according to the difference of different types of application data on performance requirements. And according to the data classification result, placing the data into a storage pool which can be matched with the storage requirement, and completing the self-adaptive classification of the data types without manual intervention. The problems of low flexibility and single decision index of the existing method can be effectively solved, the user data is scientifically and reasonably distributed and stored, the user data storage performance is improved, and the user experience is optimized.
As shown in fig. 2, in a multi-storage pool data classification storage method provided in one or more alternative embodiments of this specification, the performing data classification prediction on the data to be stored according to the data feature information includes:
s201: and taking the data characteristic information as input, processing the input by utilizing a plurality of pre-trained classification predictors, and outputting corresponding class probability.
In some optional embodiments, the data types are divided into four types, i.e., a hot-read hot-write type, a hot-read cold-write type, a cold-read hot-write type, and a cold-read cold-write type, and four classifiers corresponding to the four data types may be set to process the data characteristic information of the data to be stored, respectively, so as to determine the class probabilities that the data to be stored corresponds to the four data types.
S202: and selecting the classification predictor of which the class probability exceeds a preset threshold value, and determining the data type corresponding to the classification predictor as the data type of the data to be stored.
The category probability is used for representing the probability that the data to be stored belongs to the corresponding data type. And for the data to be stored, if the class probability output by a certain class predictor is higher than the preset threshold, the data type of the data to be stored is the data type corresponding to the class predictor.
It is understood that the preset threshold is generally set to 0.5, taking the class predictor corresponding to the hot-write type as an example, if the class probability output after the data characteristic information corresponding to the data to be stored is processed by the class predictor is higher than 0.5, it indicates that the data type of the data to be stored is the hot-read hot-write type, otherwise, it indicates that the data type is not the hot-read hot-write type. In this case, the data feature information is continuously processed by using other classification predictors.
The multi-storage pool data classification storage method utilizes a plurality of classification predictors trained in advance to process the data characteristic information of the data to be stored, can respectively obtain the class probability corresponding to a plurality of data types, and can accurately determine the data type of the data to be stored by comparing the class probability with a preset threshold value.
In one or more alternative embodiments of the present specification, a method for classifying and storing data in multiple storage pools is provided, where the method for processing the input using the classification predictor includes:
calculating the category probability corresponding to the input using a desired category probability formula;
the expected category probability formula is:
Figure BDA0003773200550000081
wherein, P represents the class probability, α, β represent regression parameters of the class predictor, and x represents the data feature information.
In some optional embodiments, the classification predictor performs data prediction classification on the data to be stored by using a Logistic Regression (LR) algorithm. And calculating the expected category probability formula selected by the logistic regression algorithm.
Wherein x represents the data characteristic information. In some optional embodiments, the data characteristic information includes a data size S and an operation type T of the data to be stored y Life cycle D, creation time C t Last access time A t And last modification time M t . The data characteristic information can be represented in a vector form:
X=[x 1 ,x 2 ,x 3 ,x 4 ,x 5 ,x 6 ]=[S,T y ,D,C t ,A t ,M t ]
for a data feature information vector X, the desired class probability formula may be expressed as:
Figure BDA0003773200550000082
wherein beta is 12 ,…,β 6 The regression parameters corresponding to the plurality of feature vector components are represented.
In order to realize accurate data classification of the data to be stored by the classification predictor, the optimal regression parameters need to be determined. The classification predictor may be pre-trained with training sample data to continuously adjust the regression parameters to determine optimal parameters.
As shown in fig. 3, in a method for classifying and storing data in multiple storage pools according to one or more alternative embodiments of the present specification, the method for training a classifier includes:
s301: and constructing a training data set, wherein the training data set comprises a plurality of items of sample characteristic data and type labels corresponding to the sample characteristic data.
Historical storage data can be acquired, corresponding data characteristic information is determined to serve as the sample characteristic data, and corresponding type labels are added to the sample characteristic data through labeling. The type label is one of four types, namely a hot reading and hot writing type, a hot reading and cold writing type, a cold reading and hot writing type and a cold reading and cold writing type. Training samples with different orders of magnitude can be simulated and generated through a third-party test data generation tool such as a databreaker and the like. Each item of sample feature data comprises six feature values including data size S and operation type T y Life cycle D, creation time C t Last access time A t And last modification time M t
S302: and training the regression parameters in the classification prediction by using a plurality of items of sample characteristic data as input of the classification predictor and a plurality of items of type labels as expected output through a minimized loss function so as to determine optimal parameters.
And substituting a training set into a classification predictor based on a logistic regression algorithm, and comparing the output of the classification predictor with the type label to determine a loss function of the classification predictor. Determining the optimal parameters by adjusting the regression parameters to minimize the loss function
In one or more alternative embodiments of the present disclosure, in a method for classified storage of data in multiple storage pools, a minimum value of a loss function may be solved by maximum likelihood estimation, and the regression parameter of the classified predictor when the loss function takes the minimum value may be used as the optimal parameter.
And when the minimum value of the loss function is solved by adopting maximum likelihood estimation, the regression parameters are adjusted by adopting a random gradient upward method.
In the above process, a random gradient ascent method may be employed to minimize the loss function. And performing cycle repetition adjustment on the regression parameter beta according to the gradient of the likelihood function:
Figure BDA0003773200550000091
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003773200550000101
representing the gradient of said likelihood function f and w the growth step each time the regression parameters are adjusted. The above-mentioned cyclic adjustment of the regression parameter β means that the regression parameter β is adjusted along the gradient direction of the likelihood function f each time.
The regression parameters are adjusted according to the method, and the maximum value point of the likelihood function can be quickly determined.
In one or more alternative embodiments of the present specification, in a method for classified storage of multiple storage pools of data, the classifying and storing the data to be stored into the storage pool corresponding to the data type includes:
responding to the data type of the data to be stored determined to be a hot reading and writing type, and distributing and writing the data to be stored into a hot data pool; responding to the data type of the data to be stored determined to be a hot reading and cold writing type, and distributing and writing the data to be stored into a hot reading data pool; responding to the data type of the data to be stored determined to be a cold reading hot writing type, and distributing and writing the data to be stored into a hot writing data pool; and responding to the data type of the data to be stored determined to be a cold reading and cold writing type, and distributing and writing the data to be stored into a cold data pool.
The hot data pool is composed of solid state disks, and the hot data pool is extremely high in read-write performance and used for hot data with high access frequency and high performance requirements.
The thermal read data pool comprises a mechanical hard disk and a small number of solid state disks, and the solid state disks are used for reading and caching in the thermal read data pool. The thermal read data pool has high data read performance and is suitable for a large number of data types with a large number of read operations and a small number of write operations.
The hot-writing data pool comprises a mechanical hard disk and a small number of solid state disks, and the solid state disks are used for writing and caching in the hot-writing data pool. The data writing performance of the hot writing data pool is high, and the hot writing data pool is suitable for a large number of data types with a small number of writing operations and reading operations.
The cold data pool is completely composed of mechanical hard disks, has relatively low performance but large capacity, and is suitable for filing data with low access frequency and low performance requirement.
In some optional embodiments, when the data to be stored is classified and stored in the storage pool corresponding to the data type, the data to be stored is stored in a multi-copy form. Such an approach may improve the reliability of the data.
According to the multi-storage-pool data classification storage method, the data to be stored with different data types are placed into corresponding data pools according to different data classification results, and the read-write characteristics of different data pools are different, so that the high performance and high cost of a solid state disk and the high capacity and low cost characteristics of a mechanical hard disk in a heterogeneous storage cluster can be balanced, and the performance of storage equipment is fully utilized for data storage.
It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and is completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the multiple devices interact with each other to complete the method.
It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to the method of any embodiment, the disclosure also provides a data classification storage system with multiple storage pools.
Referring to fig. 4, the multi-storage pool data classification storage system includes:
the characteristic extraction module is used for extracting the characteristics of the data to be stored and determining the data characteristic information of the data to be stored;
the data classification module is used for carrying out data classification prediction on the data to be stored according to the data characteristic information so as to determine the data type of the data to be stored; and
and the distribution storage module is used for classifying and storing the data to be stored into the storage pool corresponding to the data type.
In one or more alternative embodiments of the present specification, the data feature information determined by the feature extraction module includes a data size of the data to be stored, a data request type, a lifetime, a creation time, a last access time, and a last modification time. The data types determined by the data splitting module comprise a hot reading and hot writing type, a hot reading and cold writing type, a cold reading and hot writing type and a cold reading and cold writing type.
In one or more optional embodiments of the present specification, in the multiple storage pool data classification storage system, the data classification module is further configured to take the data feature information as an input, process the input by using a plurality of pre-trained classification predictors, and output corresponding class probabilities, where the plurality of classification predictors correspond to the plurality of data types, respectively; and selecting the classification predictor of which the class probability exceeds a preset threshold value, and determining the data type corresponding to the classification predictor as the data type of the data to be stored.
In one or more alternative embodiments of the present specification, the data classification module is further configured to calculate the class probability corresponding to the input using a desired class probability formula; the expected category probability formula is:
Figure BDA0003773200550000121
wherein, P represents the class probability, α, β represent regression parameters of the class predictor, and x represents the data feature information.
One or more alternative embodiments of the present specification provide a multi-storage pool data classification storage system further comprising a classification training module. The classification training module is used for constructing a training data set, and the training data set comprises a plurality of items of sample characteristic data and type labels corresponding to the sample characteristic data; and training the regression parameters in the classification prediction by using a plurality of items of sample characteristic data as input of the classification predictor and a plurality of items of type labels as expected output through a minimized loss function so as to determine optimal parameters.
In one or more alternative embodiments of the present specification, the classification training module is further configured to: solving a minimum value of a loss function by adopting maximum likelihood estimation, and taking the regression parameter of the classification predictor when the loss function takes the minimum value as the optimal parameter; and when the minimum value of the loss function is solved by adopting maximum likelihood estimation, the regression parameters are adjusted by adopting a random gradient upward method.
In one or more alternative embodiments of the present specification, in a multi-storage pool data classification storage system, the allocation storage module is further configured to, in response to determining that the data type of the data to be stored is a hot read/write type, allocate and write the data to be stored into a hot data pool, where the hot data pool is formed by solid state disks; responding to the fact that the data type of the data to be stored is a hot reading and cold writing type, distributing and writing the data to be stored into a hot reading data pool, wherein the hot reading data pool comprises a solid state disk and a mechanical hard disk, and the solid state disk is used for reading and caching in the hot reading data pool; responding to the fact that the data type of the data to be stored is determined to be a cold reading hot writing type, distributing and writing the data to be stored into a hot writing data pool, wherein the hot writing data pool comprises a solid state disk and a mechanical hard disk, and the solid state disk is used for writing cache in the hot writing data pool; and responding to the data type of the data to be stored determined to be a cold reading and cold writing type, and distributing and writing the data to be stored into a cold data pool, wherein the cold data pool is formed by a mechanical hard disk.
In one or more alternative embodiments of the present specification, the allocation storage module is further configured to store the data to be stored in multiple copies in a multiple storage pool data classification storage system.
For convenience of description, the above devices are described as being divided into various modules by functions, which are described separately. Of course, the functionality of the various modules may be implemented in the same one or more software and/or hardware implementations of the present disclosure.
The apparatus in the foregoing embodiment is used to implement the corresponding multi-storage pool data classification storage method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to the method of any embodiment, the disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the method for classifying and storing data in multiple storage pools according to any embodiment.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called by the processor 1010 for execution.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device in the foregoing embodiment is used to implement the corresponding multi-storage pool data classification storage method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments methods, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the multi-storage pool data classification storage method according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The storage medium of the foregoing embodiment stores computer instructions for causing the computer to execute the method for classifying and storing data in multiple storage pools as described in any of the foregoing embodiments, and has beneficial effects of corresponding method embodiments, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the concept of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Further, devices may be shown in block diagram form in order to avoid obscuring embodiments of the disclosure, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
It is understood that in the specific implementation of the present application, related data such as user information, location information, navigation data, etc. are involved, when the above embodiments of the present application are applied to specific products or technologies, user permission or consent needs to be obtained, and the collection, use and processing of the related data need to comply with relevant laws and regulations and standards in relevant countries and regions.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made without departing from the spirit or scope of the embodiments of the present disclosure are intended to be included within the scope of the disclosure.

Claims (10)

1. A method for classified storage of data in multiple storage pools, the method comprising:
performing feature extraction on data to be stored, and determining data feature information of the data to be stored;
performing data classification prediction on the data to be stored according to the data characteristic information to determine the data type of the data to be stored;
and classifying and storing the data to be stored into a storage pool corresponding to the data type.
2. The method of claim 1, wherein the data characteristic information comprises a data size, a data request type, a lifetime, a creation time, a last access time, a last modification time of the data to be stored;
the data types include a hot read and hot write type, a hot read and cold write type, a cold read and hot write type, and a cold read and cold write type.
3. The method according to claim 1, wherein the performing data classification prediction on the data to be stored according to the data feature information comprises:
taking the data characteristic information as input, processing the input by utilizing a plurality of pre-trained classification predictors, and outputting corresponding class probability, wherein the classification predictors respectively correspond to a plurality of data types;
and selecting the classification predictor of which the class probability exceeds a preset threshold value, and determining the data type corresponding to the classification predictor as the data type of the data to be stored.
4. The method of claim 3, wherein the method of processing the input using the class predictor comprises:
calculating the category probability corresponding to the input using a desired category probability formula;
the expected category probability formula is:
Figure FDA0003773200540000011
wherein, P represents the class probability, α, β represent regression parameters of the classification predictor, and x represents the data characteristic information.
5. The method of claim 4, wherein the method of training the class predictor comprises:
constructing a training data set, wherein the training data set comprises a plurality of items of sample characteristic data and type labels corresponding to the sample characteristic data;
and training the regression parameters in the classification prediction by using a plurality of items of sample characteristic data as input of the classification predictor and a plurality of items of type labels as expected output through a minimized loss function so as to determine optimal parameters.
6. The method of claim 5, wherein the training of the regression parameters in the classification predictions by minimizing a loss function comprises:
solving a minimum value of a loss function by adopting maximum likelihood estimation, and taking the regression parameter of the classification predictor when the loss function takes the minimum value as the optimal parameter;
and when the minimum value of the loss function is solved by adopting maximum likelihood estimation, the regression parameters are adjusted by adopting a random gradient upward method.
7. The method according to claim 2, wherein the classifying and storing the data to be stored into the storage pool corresponding to the data type comprises:
responding to the fact that the data type of the data to be stored is determined to be a hot reading and writing type, distributing and writing the data to be stored into a hot data pool, wherein the hot data pool is composed of solid state disks;
responding to the fact that the data type of the data to be stored is determined to be a hot reading and cold writing type, and distributing and writing the data to be stored into a hot reading data pool, wherein the hot reading data pool comprises a solid state disk and a mechanical hard disk, and the solid state disk is used for reading and caching in the hot reading data pool;
responding to the fact that the data type of the data to be stored is determined to be a cold reading hot writing type, distributing and writing the data to be stored into a hot writing data pool, wherein the hot writing data pool comprises a solid state disk and a mechanical hard disk, and the solid state disk is used for writing cache in the hot writing data pool;
and in response to the fact that the data type of the data to be stored is determined to be a cold reading and cold writing type, distributing and writing the data to be stored into a cold data pool, wherein the cold data pool is composed of mechanical hard disks.
8. The method according to claim 1, wherein when storing the data to be stored in the storage pool corresponding to the data type in a classified manner, further comprising:
and storing the data to be stored in a multi-copy mode.
9. A multi-storage pool data classification storage system, comprising:
the characteristic extraction module is used for extracting the characteristics of the data to be stored and determining the data characteristic information of the data to be stored;
the data classification module is used for carrying out data classification prediction on the data to be stored according to the data characteristic information so as to determine the data type of the data to be stored; and
and the distribution storage module is used for classifying and storing the data to be stored into the storage pool corresponding to the data type.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the program.
CN202210908596.XA 2022-07-29 2022-07-29 Multi-storage-pool data classification storage method and system and electronic equipment Pending CN115470190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210908596.XA CN115470190A (en) 2022-07-29 2022-07-29 Multi-storage-pool data classification storage method and system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210908596.XA CN115470190A (en) 2022-07-29 2022-07-29 Multi-storage-pool data classification storage method and system and electronic equipment

Publications (1)

Publication Number Publication Date
CN115470190A true CN115470190A (en) 2022-12-13

Family

ID=84365742

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210908596.XA Pending CN115470190A (en) 2022-07-29 2022-07-29 Multi-storage-pool data classification storage method and system and electronic equipment

Country Status (1)

Country Link
CN (1) CN115470190A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076523A (en) * 2023-10-13 2023-11-17 北京云成金融信息服务有限公司 Local data time sequence storage method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076523A (en) * 2023-10-13 2023-11-17 北京云成金融信息服务有限公司 Local data time sequence storage method
CN117076523B (en) * 2023-10-13 2024-02-09 华能资本服务有限公司 Local data time sequence storage method

Similar Documents

Publication Publication Date Title
KR101868830B1 (en) Weight generation in machine learning
US11526799B2 (en) Identification and application of hyperparameters for machine learning
KR101868829B1 (en) Generation of weights in machine learning
JP2022552980A (en) Systems and methods for machine learning interpretability
CN111079944B (en) Transfer learning model interpretation realization method and device, electronic equipment and storage medium
JP7171471B2 (en) LEARNING MODEL GENERATION SUPPORT DEVICE AND LEARNING MODEL GENERATION SUPPORT METHOD
US20230206083A1 (en) Optimizing gradient boosting feature selection
US20150302022A1 (en) Data deduplication method and apparatus
JP7017600B2 (en) System for mitigating hostile samples for ML and AI models
CN111062431A (en) Image clustering method, image clustering device, electronic device, and storage medium
CN109416621B (en) Utilizing computer storage systems supporting shared objects to restore free space in non-volatile storage
US9424484B2 (en) Feature interpolation
CN115470190A (en) Multi-storage-pool data classification storage method and system and electronic equipment
KR102228196B1 (en) Method and apparatus for deciding ensemble weight about base meta learner
CN110348581B (en) User feature optimizing method, device, medium and electronic equipment in user feature group
US9286349B2 (en) Dynamic search system
US11790087B2 (en) Method and apparatus to identify hardware performance counter events for detecting and classifying malware or workload using artificial intelligence
CN114417964A (en) Satellite operator classification method and device and electronic equipment
US10108636B2 (en) Data deduplication method
CN117251351B (en) Database performance prediction method and related equipment
KR102289411B1 (en) Weighted feature vector generation device and method
CN116107761B (en) Performance tuning method, system, electronic device and readable storage medium
KR102461825B1 (en) Apparatus for Searching Fraudulent News and Driving Method Thereof, and Computer Readable Recording Medium
US20210365831A1 (en) Identifying claim complexity by integrating supervised and unsupervised learning
US20230351211A1 (en) Scoring correlated independent variables for elimination from a dataset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination