CN116431081A - Distributed data storage method, system, device and storage medium - Google Patents

Distributed data storage method, system, device and storage medium Download PDF

Info

Publication number
CN116431081A
CN116431081A CN202310697781.3A CN202310697781A CN116431081A CN 116431081 A CN116431081 A CN 116431081A CN 202310697781 A CN202310697781 A CN 202310697781A CN 116431081 A CN116431081 A CN 116431081A
Authority
CN
China
Prior art keywords
data
function
access frequency
access
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310697781.3A
Other languages
Chinese (zh)
Other versions
CN116431081B (en
Inventor
何兴国
张越
赖春媚
周涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Turing Technology Co ltd
Original Assignee
Guangzhou Turing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Turing Technology Co ltd filed Critical Guangzhou Turing Technology Co ltd
Priority to CN202310697781.3A priority Critical patent/CN116431081B/en
Publication of CN116431081A publication Critical patent/CN116431081A/en
Application granted granted Critical
Publication of CN116431081B publication Critical patent/CN116431081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention discloses a distributed data storage method, a system, a device and a storage medium, wherein the access frequency of server performance data, database table data and database table data is obtained; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server; dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas; and then determining a server storage path of database table data in each access area according to the access frequency and the server performance data, and storing the database table data according to the storage path. The utilization efficiency of the server performance is improved, and the feedback time is reduced. The embodiment of the invention can be widely applied to the technical field of data processing.

Description

Distributed data storage method, system, device and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a distributed data storage method, system, device, and storage medium.
Background
The distributed storage is a technology for providing storage services together by a plurality of nodes through networking connection, and the data are stored in a scattered manner on a plurality of independent servers.
However, the performance and network transmission performance of different servers are different, the frequency of data reading is also different, and the data is simply stored in a scattered way, so that the operation can lead to the situation that when the data is accessed, the time is fed back due to the limitation of the performance of the servers, and meanwhile, the performance resources of the servers cannot be fully utilized.
Disclosure of Invention
Accordingly, an objective of the embodiments of the present invention is to provide a distributed data storage method, system, device and storage medium, which provide different storage strategies according to the access frequency of data and the performance of a server, so as to reduce the feedback time.
In a first aspect, an embodiment of the present invention provides a distributed data storage method, including the following steps:
acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas;
and in each access area, determining a server storage path of database table data according to the access frequency and the server performance data, and storing the database table data according to the storage path.
Further, the determining a server storage path of database table data according to the access frequency and the server performance data specifically includes:
determining an access value according to a preset access frequency function and the access frequency;
determining a server performance value according to a preset server performance function and the server performance data;
and matching the access value with the server performance value to determine a server storage path of database table data.
Further, the access frequency function includes a first function and a second function, and the determining an access value according to a preset access frequency function and the access frequency specifically includes:
calculating a first value according to the first function and the table data access frequency;
calculating a second value according to the second function and the metadata access frequency;
and carrying out weighted summation on the first numerical value and the second numerical value to obtain an access value.
Further, the server performance function includes a third function and a fourth function, and the determining a server performance value according to the preset server performance function and the server performance data specifically includes:
calculating a total probability density according to the third function and each parameter in the server performance data;
and calculating a server performance value according to the fourth function and the total probability density.
Further, the first function is determined by:
determining a probability quality function and a dirac function according to the history table data access data;
and multiplying and accumulating the product of the probability mass function and the dirac function as a first function.
Further, the second function is determined by:
determining a probability density function based on the historical metadata access data;
the probability density function is integrated as a second function.
Further, the level of each classification table is different, and the data storage method further includes:
calculating the frequency change rate according to the access frequency in a preset time period;
and if the frequency change rate is larger than a preset value, dividing the corresponding database table data into a classification table with the highest level.
In a second aspect, an embodiment of the present invention provides a distributed data storage system, including:
the first module is used for acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
the second module is used for dividing the database table data into a plurality of classification tables according to the access frequency and sending the classification tables to the corresponding access areas;
and a third module, configured to determine, in each access area, a server storage path of database table data according to the access frequency and the server performance data, and store the database table data according to the storage path.
In a third aspect, an embodiment of the present invention provides a distributed data storage device, including:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method as described above.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium in which a processor executable program is stored, characterized in that the processor executable program is for performing the method as described above when being executed by a processor.
The embodiment of the invention has the following beneficial effects: firstly, acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server; dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas; then, in each access area, determining a server storage path of database table data according to the access frequency and the server performance data, and storing the database table data according to the storage path; the database table data are classified and divided into areas according to the access frequency, servers with different performances are matched according to the areas, different storage strategies are provided according to the access frequency of the data and the performances of the servers, so that the high-frequency access data are distributed with better server performances and network transmission performances, the utilization efficiency of the server performances is improved, and the feedback time is reduced.
Drawings
FIG. 1 is a flowchart illustrating steps of a distributed data storage method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a distributed data storage system according to an embodiment of the present invention;
fig. 3 is a block diagram of a distributed data storage device according to an embodiment of the present invention.
Detailed Description
The invention will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the invention described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the embodiments of the invention is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
As shown in fig. 1, an embodiment of the present invention provides a distributed data storage method, which includes the following steps.
Step S110, obtaining server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises server memory size, hard disk size and processor performance.
Specifically, the database table data is data to be classified and stored, and the data form is a data table. In a specific embodiment, for example, employee information data stored in the form of a data table is included in an information database of an enterprise, the data table includes metadata and table data, the metadata is a data attribute, such as a name, an age, etc., the table data is specific data, and the access frequency is the number of times of accessing the data table in a preset time period, such as the number of times of searching or querying the metadata or the table data in a week. The server performance comprises the memory size, the hard disk size, the processor performance, the communication rate among servers and the like, and the embodiment of the invention is not particularly limited as the server performance is determined according to the actual application scene.
And step S120, dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to the corresponding access areas.
Specifically, database table data are divided according to the recorded access frequency and a preset level range, so that a plurality of different classification tables are obtained, and the access frequency of data in the same classification table is the same level. In a specific embodiment, the level ranges can be divided at equal intervals, for example, the access frequency is a first level within 0-5 times per day, the access frequency is a second level within 6-10 times per day, so as to obtain a complete level range, and the database tables are divided according to the obtained level ranges and the access frequency of the database tables, so that a plurality of classification tables are obtained; the access frequency range can also be set, and the data falling in the set frequency range can be divided into a class of classification tables, for example, the data with the access frequency within 0-50 times per week is divided into a low-frequency classification table, the data with the access frequency within 51-100 times per week is divided into a medium-frequency classification table, and the data with the access frequency above 100 times per week is divided into a high-frequency classification table. After the divided multiple classification tables are obtained, the classification tables are sent to the corresponding access areas to wait for subsequent storage operation, for example, the divided multiple classification tables are divided into a high-frequency classification table, an intermediate-frequency classification table and a low-frequency classification table, the access areas are also divided into a high-frequency access area, an intermediate-frequency access area and a low-frequency access area, the high-frequency classification table is sent to the high-frequency access area in a wired transmission or wireless transmission mode, the intermediate-frequency classification table is sent to the intermediate-frequency access area in a wired transmission or wireless transmission mode, and the low-frequency classification table is sent to the low-frequency access area in a wired transmission or wireless transmission mode.
And step 130, determining a server storage path of database table data in each access area according to the access frequency and the server performance data, and storing the database table data according to the storage path.
Specifically, the data access frequency and the server performance are comprehensively considered to determine the data storage path, the data access frequency reflects the data demand degree of the user, and the higher the data access frequency is, the larger the user demand is, so that the server with good performance is required to store and read so as to ensure the use experience of the user; in a specific embodiment, the performance of the current server is determined according to the server performance data, if the processor frequency of the current server is 4GHz, the memory size is 16GB, and the disk size is 500GB, then the performance of the current server can be determined to be excellent; the classification table grade of the current data is determined according to the access frequency of the database table data, the access frequency of the database table data is 180 times per week, and the access frequency is more than 100 times per week, so that the data belongs to a high frequency and is classified into a high frequency classification table, and the high frequency classification table data is stored through a server with excellent performance.
Optionally, the determining the server storage path of the database table data according to the access frequency and the server performance data specifically includes:
step S131, determining an access value according to a preset access frequency function and the access frequency.
In a specific embodiment, if the access frequency of the data is 15 times per week, the corresponding access value is 0.2 after normalization operation is performed through a preset function; the access frequency of the other data is 52 times per week, and the corresponding access value is 0.5 after normalization operation is carried out through a preset access frequency function; and obtaining an access value corresponding to the data for the subsequent determination of the storage path.
And step S132, determining a server performance value according to a preset server performance function and the server performance data.
Specifically, the performance of the current server can be evaluated through the server performance data, but the performance of the server is affected by a plurality of factors, including but not limited to the processing frequency of the server processor, the memory size, the disk size, and the like, and the requirements on the performance of the server are different according to different application approaches. For example, for a server used for storage, disk size is a major factor in evaluating server performance; for servers used to read data, the processing frequency of the processor is a major factor in evaluating the performance of the server. The server performance is evaluated manually, the efficiency is low, and the server needs to be re-evaluated when the application scene is changed; the server performance data is normalized through the preset server performance function to obtain a server performance value, the server can determine the own server performance value according to the server performance data by itself, and only the preset server performance function is required to be changed when the application scene is changed.
And step S133, matching the access value with the server performance value, and determining a server storage path of the database table data.
Specifically, in particular embodiments, the performance of the server is characterized by a server performance value and the frequency of access to database table data is characterized by an access value. For example, in a high-frequency access area, there are a plurality of servers with good performance, a server performance value of the server is calculated, an access value of the access frequency of database table data in the access area is calculated, and the server with high server performance value is allocated to the database table data with high access value for storage operation.
Optionally, the access frequency function includes a first function and a second function, and the access value is determined according to a preset access frequency function and an access frequency.
Specifically, the access frequency of the table data of the database includes a table data access frequency and a metadata access frequency, and the table data access frequency and the metadata access frequency reflect that the data is accessed in different modes, so that when the access value corresponding to the data is calculated, the contribution degree of the table data access frequency and the metadata access frequency needs to be comprehensively considered, the contribution degree of the table data access frequency is determined through a first function and the table data access frequency, and the contribution degree of the metadata access frequency is determined through a second function and the metadata access frequency.
The method for determining the access value specifically comprises the following steps:
step S1311, calculating a first value according to the first function and the table data access frequency.
Specifically, the first function is a probability quality function for calculating the table data access frequency of the database table data, and as the first numerical value, a specific calculation formula is as follows:
Figure SMS_1
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_10
probability mass function for table data access frequency x +.>
Figure SMS_4
For the table data access frequency is +.>
Figure SMS_9
Probability mass of time, < >>
Figure SMS_14
Is a dirac function. The acquired table data access frequency and the corresponding probability quality are respectively +.>
Figure SMS_19
、/>
Figure SMS_18
、/>
Figure SMS_20
The corresponding probability masses are +.>
Figure SMS_12
、/>
Figure SMS_16
、/>
Figure SMS_2
Then, at the access frequency +.>
Figure SMS_6
When calculating the corresponding probability mass function as first data +.>
Figure SMS_5
=/>
Figure SMS_8
+
Figure SMS_13
+/>
Figure SMS_17
Due toDirac function represents x at +.>
Figure SMS_3
The value is infinite, and the other parts are zero, so that the access frequency is +.>
Figure SMS_7
When the corresponding probability mass function is calculated as +.>
Figure SMS_11
. In a specific embodiment, if the access frequency is 1, 2, and 3, respectively, the corresponding probability mass is 0.2, 0.3, and 0.2, respectively, and the probability mass function when the access frequency is 2 is +.>
Figure SMS_15
Step S1312, calculating a second value according to the second function and the metadata access frequency.
Specifically, the second function is a probability density integral of metadata access frequency of database table data, and as the second numerical value, a specific calculation formula is as follows:
Figure SMS_21
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_22
integration of probability density for metadata access frequency x,/for metadata access frequency x>
Figure SMS_23
Is a probability density function when the metadata access frequency is x.
It should be noted that the probability density function and the probability quality function described above need to be determined according to specific application scenarios and data features.
Step S1313, performing weighted summation on the first value and the second value to obtain an access value.
Specifically, a weighted sum of the first value and the second value is calculated by the following formula,
Figure SMS_24
wherein S is a weighted sum,
Figure SMS_25
is a first value, +.>
Figure SMS_26
Is a second value>
Figure SMS_27
For the frequency of table data access,
Figure SMS_28
and accessing the metadata frequently.
In the embodiment of the present invention, the weights of the first value and the second value are both 1, but the weight in the above formula is not particularly limited, and may be specifically set according to practical applications.
Optionally, the first function is determined by:
and S210, determining a probability quality function and a Dirac function according to the history table data access data.
In particular, the actual probability mass function is determined by analyzing characteristics of historical table data access data of table data in the data, such as analyzing a data type of the data, if it is determined that the data is discrete geometric distribution data, the probability mass function of the data can be determined according to probability mass of the geometric distribution; whereas the dirac function is used to select the probability mass function for different access frequencies.
Step S220, multiply the probability mass function and the dirac function and accumulate as a first function.
Specifically, after the probability mass function and the dirac function are determined, the probability mass functions of different access frequencies are selected in a multiplication mode, and the probability mass functions of different access frequencies are accumulated after being multiplied by the dirac function, so that a first function is obtained.
Optionally, the second function is determined by:
and step S230, determining a probability density function according to the historical metadata access data.
In particular, the actual probability mass function is determined by analyzing characteristics of historical table data access data of table data in the data, such as analyzing a data type of the data, and if it is determined that the data is continuously uniformly distributed data, the probability density function of the data can be determined according to the probability density of the uniform distribution.
Step S240, integrating the probability density function as a second function.
Specifically, after the probability density function is obtained, since the data is continuous data, the second function of the data in different metadata access frequencies cannot be obtained through the accumulation mode of discrete data, and the second function needs to be obtained through integral operation on the probability density function.
Optionally, the server performance function includes a third function and a fourth function, and the server performance value is determined according to a preset server performance function and server performance data.
The specific calculation mode is as follows:
step S1321, calculating the total probability density according to the third function and each parameter in the server performance data.
Specifically, the third function is a probability density function of the performance data of the server, and may be obtained by statistics on historical performance data of the server. In a specific embodiment, during the use of the server, 1000 historical sample data of the performance of the server are recorded, where each sample data records the performance values of the memory, the disk and the processor of the server, and then the probability density function can be estimated according to the kernel density estimation method and the above historical sample data as a third function. After the third function is obtained, the performance values of the memory, the disk and the processor in the current state of the server are recorded, and the probability distribution, namely the total probability density, of the server is obtained by calculating according to the third function and the performance values of the memory, the disk and the processor in the current state.
Step S1322, calculating a server performance value according to the fourth function and the total probability density.
Specifically, the fourth function is to perform triple integration and weighted average operation on the obtained total probability density, and the obtained result is used as a server performance value. In a specific embodiment, the weight values of the integral range and the weighted average of the triple integral are determined according to the volume V formed by the memory size of the server, the disk size and the value range of the processor frequency, for example, the value range of the memory size of the server is 1GB to 16GB, the value range of the disk size is 100GB to 1TB, the value range of the processor frequency is 1GHz to 4GHz, and then the formed volume v= (16-1) ×100-1000) ×1-4) = -2250, and the negative sign is that the value range of the disk size is decreasing. After the integration range of the triple integration is obtained, the obtained total probability density is subjected to the triple integration, the inverse of the volume is used as the weight of weighted average, and the result of the triple integration is subjected to weighted average to be used as a server performance value which represents the average performance level of the server under various configurations.
According to the above method steps, a calculation formula for calculating the server performance value can be obtained as follows:
Figure SMS_29
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_30
for the server performance value, 1/V is the weight of the weighted average, V is the volume of the server memory size, disk size and the range of values of the processor frequency, +.>
Figure SMS_31
Memory size->
Figure SMS_32
For the disk size +.>
Figure SMS_33
For processor frequency, +.>
Figure SMS_34
Is a probability density function of the server memory size, disk size, and processor frequency.
Optionally, the data storage method further includes:
step S140, calculating the frequency change rate according to the access frequency in the preset time period.
Specifically, the formula for calculating the frequency change rate is as follows:
Figure SMS_35
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_37
for S pair->
Figure SMS_42
Rate of change of->
Figure SMS_46
、/>
Figure SMS_38
Respectively represent pair->
Figure SMS_43
Rate of change of->
Figure SMS_47
Is->
Figure SMS_49
For->
Figure SMS_36
Derivative of>
Figure SMS_40
Is->
Figure SMS_44
For->
Figure SMS_48
Derivative of>
Figure SMS_39
For the table data access frequency, +.>
Figure SMS_41
For metadata access frequency, ++>
Figure SMS_45
Is the frequency of access to the data.
As can be seen from the above formula, S pairs
Figure SMS_50
The rate of change of (2) is equal to the respective function pair +.>
Figure SMS_51
Is a weighted sum of the rates of change of (c).
And step S150, if the frequency change rate is larger than a preset value, dividing the corresponding database table data into the highest-level classification table.
Specifically, the change rate indicates the stability of the access frequency, for example, the change rate is too large, which indicates that the access frequency is low for a certain period of time and high for a certain period of time, and in this case, if the access frequency is measured according to the frequency of a certain period of time, a certain difference is generated; therefore, a preset value is set, the access frequency with the change rate larger than the preset value is judged to be unstable, and the unstable access frequency is divided into the highest-level classification table, so that user experience is ensured.
In a specific implementation, a set of discrete data access frequencies and corresponding probability quality functions are obtained first, and a data table formed according to the discrete data access frequencies and the corresponding probability quality functions is shown in table 1:
Figure SMS_52
TABLE 1
Calculating the probability of the access frequency being 2 times according to the data tableThe mass is as follows
Figure SMS_53
=/>
Figure SMS_54
+
Figure SMS_55
+/>
Figure SMS_56
+/>
Figure SMS_57
+/>
Figure SMS_58
=0.3, and taking the obtained probability mass as an access value of the current access frequency; then, judging that the data with the access frequency of 2 is low-access-frequency data according to the access value, and sending the data to a low-frequency area; then in the low-frequency region, acquiring the memory size, the disk size and the value range of the processor frequency of the current server, obtaining the value range of the memory size to be 1GB to 16GB, the value range of the disk size to be 100GB to 1TB, the value range of the processor frequency to be 1GHz to 4GHz, forming a volume of V= (16-1) (100-1000) (1-4) = -2250, determining the weight of weighted average to be 1/(-2250), reading the performance parameters of the server in the current state of the server, calculating the performance value to be 0.4, then matching the access value with the performance value, and judging that the current server can provide good server performance and transmission performance for the data with the access frequency of 2; the data is then stored in the current server.
The embodiment of the invention has the following beneficial effects: according to the embodiment, the access frequency of the server performance data, the database table data and the database table data is obtained; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server; dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas; and then determining a server storage path of database table data in each access area according to the access frequency and the server performance data, and storing the database table data according to the storage path. By calculating the access frequency of the data and the performance of the server, different storage strategies are provided according to the access frequency of the data and the performance of the server, so that the high-frequency access data can obtain better server performance and network transmission performance, the utilization efficiency of the server performance is improved, and the user experience is improved.
As shown in FIG. 2, an embodiment of the present invention further provides a distributed data storage system, including:
the first module is used for acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
the second module is used for dividing the database table data into a plurality of classification tables according to the access frequency and sending the classification tables to the corresponding access areas;
and a third module, configured to determine, in each access area, a server storage path of database table data according to the access frequency and the server performance data, and store the database table data according to the storage path.
It can be seen that the content in the above method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those of the method embodiment, and the beneficial effects achieved by the method embodiment are the same as those achieved by the method embodiment.
As shown in fig. 3, an embodiment of the present invention further provides a distributed data storage device, including:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the steps of the distributed data storage method described in the method embodiments above.
Wherein the memory is operable as a non-transitory computer readable storage medium storing a non-transitory software program and a non-transitory computer executable program. The memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes remote memory provided remotely from the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It can be seen that the content in the above method embodiment is applicable to the embodiment of the present device, and the functions specifically implemented by the embodiment of the present device are the same as those of the embodiment of the above method, and the beneficial effects achieved by the embodiment of the above method are the same as those achieved by the embodiment of the above method.
Furthermore, embodiments of the present application disclose a computer program product or a computer program, which is stored in a computer readable storage medium. The computer program may be read from a computer readable storage medium by a processor of a computer device, the processor executing the computer program causing the computer device to perform the method as described above. Similarly, the content in the above method embodiment is applicable to the present storage medium embodiment, and the specific functions of the present storage medium embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.
The embodiment of the present invention also provides a computer-readable storage medium storing a program executable by a processor, which when executed by the processor is configured to implement the above-described method.
It is to be understood that all or some of the steps, systems, and methods disclosed above may be implemented in software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
While the preferred embodiment of the present invention has been described in detail, the invention is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the invention, and these modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims (10)

1. A method of distributed data storage, comprising the steps of:
acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas;
and in each access area, determining a server storage path of database table data according to the access frequency and the server performance data, and storing the database table data according to the storage path.
2. The data storage method according to claim 1, wherein the determining the server storage path of the database table data according to the access frequency and the server performance data specifically comprises:
determining an access value according to a preset access frequency function and the access frequency;
determining a server performance value according to a preset server performance function and the server performance data;
and matching the access value with the server performance value to determine a server storage path of database table data.
3. The data storage method according to claim 2, wherein the access frequency function includes a first function and a second function, and the determining the access value according to the preset access frequency function and the access frequency specifically includes:
calculating a first value according to the first function and the table data access frequency;
calculating a second value according to the second function and the metadata access frequency;
and carrying out weighted summation on the first numerical value and the second numerical value to obtain an access value.
4. The data storage method according to claim 2, wherein the server performance function includes a third function and a fourth function, and the determining the server performance value according to the preset server performance function and the server performance data specifically includes:
calculating a total probability density according to the third function and each parameter in the server performance data;
and calculating a server performance value according to the fourth function and the total probability density.
5. A data storage method according to claim 3, wherein the first function is determined by:
determining a probability quality function and a dirac function according to the history table data access data;
and multiplying and accumulating the product of the probability mass function and the dirac function as a first function.
6. A data storage method according to claim 3, wherein the second function is determined by:
determining a probability density function based on the historical metadata access data;
the probability density function is integrated as a second function.
7. The data storage method of claim 1, wherein the level of each classification table is different, the data storage method further comprising:
calculating the frequency change rate according to the access frequency in a preset time period;
and if the frequency change rate is larger than a preset value, dividing the corresponding database table data into a classification table with the highest level.
8. A data storage system, comprising:
the first module is used for acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
the second module is used for dividing the database table data into a plurality of classification tables according to the access frequency and sending the classification tables to the corresponding access areas;
and a third module, configured to determine, in each access area, a server storage path of database table data according to the access frequency and the server performance data, and store the database table data according to the storage path.
9. A data storage device, comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method of any of claims 1-7.
10. A computer readable storage medium, in which a processor executable program is stored, characterized in that the processor executable program is for performing the method according to any of claims 1-7 when being executed by a processor.
CN202310697781.3A 2023-06-13 2023-06-13 Distributed data storage method, system, device and storage medium Active CN116431081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310697781.3A CN116431081B (en) 2023-06-13 2023-06-13 Distributed data storage method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310697781.3A CN116431081B (en) 2023-06-13 2023-06-13 Distributed data storage method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN116431081A true CN116431081A (en) 2023-07-14
CN116431081B CN116431081B (en) 2023-11-07

Family

ID=87085846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310697781.3A Active CN116431081B (en) 2023-06-13 2023-06-13 Distributed data storage method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN116431081B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117389469A (en) * 2023-09-21 2024-01-12 华南理工大学 Internet data storage method, device, system and medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841931A (en) * 2012-08-03 2012-12-26 中兴通讯股份有限公司 Storage method and storage device of distributive-type file system
JP2018041174A (en) * 2016-09-05 2018-03-15 日本電気株式会社 Database management device, database management method and program
US20210104332A1 (en) * 2019-09-25 2021-04-08 Brilliance Center B.V. System for anonymously tracking and/or analysing health in a population of subjects
CN112905113A (en) * 2021-02-08 2021-06-04 中国工商银行股份有限公司 Data access processing method and device
CN114518848A (en) * 2022-02-15 2022-05-20 北京百度网讯科技有限公司 Hierarchical storage system, and method, apparatus, device, and medium for processing storage data
CN114756624A (en) * 2022-04-11 2022-07-15 润联软件系统(深圳)有限公司 Data processing method, device and equipment for full-scale nodes and storage medium
CN115048053A (en) * 2022-06-15 2022-09-13 中国工商银行股份有限公司 Data storage method and device and electronic equipment
WO2022248714A1 (en) * 2021-05-27 2022-12-01 Cambridge Enterprise Limited Improvements in and relating to encoding and computation on distributions of data
CN115883590A (en) * 2022-12-09 2023-03-31 北京易华录信息技术股份有限公司 Optical-magnetic-electric fusion media asset data distributed storage and management method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841931A (en) * 2012-08-03 2012-12-26 中兴通讯股份有限公司 Storage method and storage device of distributive-type file system
JP2018041174A (en) * 2016-09-05 2018-03-15 日本電気株式会社 Database management device, database management method and program
US20210104332A1 (en) * 2019-09-25 2021-04-08 Brilliance Center B.V. System for anonymously tracking and/or analysing health in a population of subjects
CN112905113A (en) * 2021-02-08 2021-06-04 中国工商银行股份有限公司 Data access processing method and device
WO2022248714A1 (en) * 2021-05-27 2022-12-01 Cambridge Enterprise Limited Improvements in and relating to encoding and computation on distributions of data
CN114518848A (en) * 2022-02-15 2022-05-20 北京百度网讯科技有限公司 Hierarchical storage system, and method, apparatus, device, and medium for processing storage data
CN114756624A (en) * 2022-04-11 2022-07-15 润联软件系统(深圳)有限公司 Data processing method, device and equipment for full-scale nodes and storage medium
CN115048053A (en) * 2022-06-15 2022-09-13 中国工商银行股份有限公司 Data storage method and device and electronic equipment
CN115883590A (en) * 2022-12-09 2023-03-31 北京易华录信息技术股份有限公司 Optical-magnetic-electric fusion media asset data distributed storage and management method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117389469A (en) * 2023-09-21 2024-01-12 华南理工大学 Internet data storage method, device, system and medium

Also Published As

Publication number Publication date
CN116431081B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN116431081B (en) Distributed data storage method, system, device and storage medium
JP4837954B2 (en) Intelligent data broadcast
US10380649B2 (en) System and method for logistic matrix factorization of implicit feedback data, and application to media environments
US20090132526A1 (en) Content recommendation apparatus and method using tag cloud
Zhang et al. Novel item recommendation by user profile partitioning
CN113869801B (en) Maturity state evaluation method and device for enterprise digital middleboxes
US20140040432A1 (en) Content caching device for managing contents based on content usage features
CN111625730B (en) Information pushing method and device, electronic equipment and medium
US20130346439A1 (en) Pushing Business Objects
CN110322167B (en) Information processing method and device, storage medium and electronic equipment
US20100292995A1 (en) Method and apparatus for incremental quantile estimation
US20240147188A1 (en) Method and apparatus for uploading and acquiring features of wireless signals
CN113536104A (en) Information recommendation method, device, equipment and storage medium
CN112925990B (en) Target group classification method and device
CN111581442A (en) Method and device for realizing graph embedding, computer storage medium and terminal
CN112328865B (en) Information processing and recommending method, device, equipment and storage medium
CN106850822B (en) Load balancing method, equipment and distributed system
WO2022155450A1 (en) Crowdsourcing platform for on-demand media content creation and sharing
CN110443320A (en) The determination method and device of event similarity
US8332341B2 (en) Method and system for classifying information
CN110968790A (en) Latent customer intelligent recommendation method, device and storage medium based on big data
CN110084455B (en) Data processing method, device and system
FR3045859A1 (en) METHOD AND APPARATUS FOR FORMING A COMPUTER CLOUD STORING THE RESULT OF EXECUTION FROM A COMPUTER TASK
CN110689032A (en) Data processing method and system, computer system and computer readable storage medium
CN112100441B (en) Video recommendation method, electronic device, and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant