CN116431081B - Distributed data storage method, system, device and storage medium - Google Patents

Distributed data storage method, system, device and storage medium Download PDF

Info

Publication number
CN116431081B
CN116431081B CN202310697781.3A CN202310697781A CN116431081B CN 116431081 B CN116431081 B CN 116431081B CN 202310697781 A CN202310697781 A CN 202310697781A CN 116431081 B CN116431081 B CN 116431081B
Authority
CN
China
Prior art keywords
data
function
access frequency
server
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310697781.3A
Other languages
Chinese (zh)
Other versions
CN116431081A (en
Inventor
何兴国
张越
赖春媚
周涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Turing Technology Co ltd
Original Assignee
Guangzhou Turing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Turing Technology Co ltd filed Critical Guangzhou Turing Technology Co ltd
Priority to CN202310697781.3A priority Critical patent/CN116431081B/en
Publication of CN116431081A publication Critical patent/CN116431081A/en
Application granted granted Critical
Publication of CN116431081B publication Critical patent/CN116431081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a distributed data storage method, a system, a device and a storage medium, wherein the access frequency of server performance data, database table data and database table data is obtained; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server; dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas; and then determining a server storage path of database table data in each access area according to the access frequency and the server performance data, and storing the database table data according to the storage path. The utilization efficiency of the server performance is improved, and the feedback time is reduced. The embodiment of the application can be widely applied to the technical field of data processing.

Description

Distributed data storage method, system, device and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a distributed data storage method, system, device, and storage medium.
Background
The distributed storage is a technology for providing storage services together by a plurality of nodes through networking connection, and the data are stored in a scattered manner on a plurality of independent servers.
However, the performance and network transmission performance of different servers are different, the frequency of data reading is also different, and the data is simply stored in a scattered way, so that the operation can lead to the situation that when the data is accessed, the time is fed back due to the limitation of the performance of the servers, and meanwhile, the performance resources of the servers cannot be fully utilized.
Disclosure of Invention
Accordingly, an objective of the embodiments of the present application is to provide a distributed data storage method, system, device and storage medium, which provide different storage strategies according to the access frequency of data and the performance of a server, so as to reduce the feedback time.
In a first aspect, an embodiment of the present application provides a distributed data storage method, including the following steps:
acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas;
and in each access area, determining a server storage path of database table data according to the access frequency and the server performance data, and storing the database table data according to the storage path.
Further, the determining a server storage path of database table data according to the access frequency and the server performance data specifically includes:
determining an access value according to a preset access frequency function and the access frequency;
determining a server performance value according to a preset server performance function and the server performance data;
and matching the access value with the server performance value to determine a server storage path of database table data.
Further, the access frequency function includes a first function and a second function, and the determining an access value according to a preset access frequency function and the access frequency specifically includes:
calculating a first value according to the first function and the table data access frequency;
calculating a second value according to the second function and the metadata access frequency;
and carrying out weighted summation on the first numerical value and the second numerical value to obtain an access value.
Further, the server performance function includes a third function and a fourth function, and the determining a server performance value according to the preset server performance function and the server performance data specifically includes:
calculating a total probability density according to the third function and each parameter in the server performance data;
and calculating a server performance value according to the fourth function and the total probability density.
Further, the first function is determined by:
determining a probability quality function and a dirac function according to the history table data access data;
and multiplying and accumulating the product of the probability mass function and the dirac function as a first function.
Further, the second function is determined by:
determining a probability density function based on the historical metadata access data;
the probability density function is integrated as a second function.
Further, the level of each classification table is different, and the data storage method further includes:
calculating the frequency change rate according to the access frequency in a preset time period;
and if the frequency change rate is larger than a preset value, dividing the corresponding database table data into a classification table with the highest level.
In a second aspect, an embodiment of the present application provides a distributed data storage system, including:
the first module is used for acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
the second module is used for dividing the database table data into a plurality of classification tables according to the access frequency and sending the classification tables to the corresponding access areas;
and a third module, configured to determine, in each access area, a server storage path of database table data according to the access frequency and the server performance data, and store the database table data according to the storage path.
In a third aspect, an embodiment of the present application provides a distributed data storage device, including:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method as described above.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium in which a processor executable program is stored, characterized in that the processor executable program is for performing the method as described above when being executed by a processor.
The embodiment of the application has the following beneficial effects: firstly, acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server; dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas; then, in each access area, determining a server storage path of database table data according to the access frequency and the server performance data, and storing the database table data according to the storage path; the database table data are classified and divided into areas according to the access frequency, servers with different performances are matched according to the areas, different storage strategies are provided according to the access frequency of the data and the performances of the servers, so that the high-frequency access data are distributed with better server performances and network transmission performances, the utilization efficiency of the server performances is improved, and the feedback time is reduced.
Drawings
FIG. 1 is a flowchart illustrating steps of a distributed data storage method according to an embodiment of the present application;
FIG. 2 is a block diagram of a distributed data storage system according to an embodiment of the present application;
fig. 3 is a block diagram of a distributed data storage device according to an embodiment of the present application.
Detailed Description
The application will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the embodiments of the application is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
As shown in fig. 1, an embodiment of the present application provides a distributed data storage method, which includes the following steps.
Step S110, obtaining server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises server memory size, hard disk size and processor performance.
Specifically, the database table data is data to be classified and stored, and the data form is a data table. In a specific embodiment, for example, employee information data stored in the form of a data table is included in an information database of an enterprise, the data table includes metadata and table data, the metadata is a data attribute, such as a name, an age, etc., the table data is specific data, and the access frequency is the number of times of accessing the data table in a preset time period, such as the number of times of searching or querying the metadata or the table data in a week. The server performance comprises the memory size, the hard disk size, the processor performance, the communication rate among servers and the like, and the embodiment of the application is not particularly limited as the server performance is determined according to the actual application scene.
And step S120, dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to the corresponding access areas.
Specifically, database table data are divided according to the recorded access frequency and a preset level range, so that a plurality of different classification tables are obtained, and the access frequency of data in the same classification table is the same level. In a specific embodiment, the level ranges can be divided at equal intervals, for example, the access frequency is a first level within 0-5 times per day, the access frequency is a second level within 6-10 times per day, so as to obtain a complete level range, and the database tables are divided according to the obtained level ranges and the access frequency of the database tables, so that a plurality of classification tables are obtained; the access frequency range can also be set, and the data falling in the set frequency range can be divided into a class of classification tables, for example, the data with the access frequency within 0-50 times per week is divided into a low-frequency classification table, the data with the access frequency within 51-100 times per week is divided into a medium-frequency classification table, and the data with the access frequency above 100 times per week is divided into a high-frequency classification table. After the divided multiple classification tables are obtained, the classification tables are sent to the corresponding access areas to wait for subsequent storage operation, for example, the divided multiple classification tables are divided into a high-frequency classification table, an intermediate-frequency classification table and a low-frequency classification table, the access areas are also divided into a high-frequency access area, an intermediate-frequency access area and a low-frequency access area, the high-frequency classification table is sent to the high-frequency access area in a wired transmission or wireless transmission mode, the intermediate-frequency classification table is sent to the intermediate-frequency access area in a wired transmission or wireless transmission mode, and the low-frequency classification table is sent to the low-frequency access area in a wired transmission or wireless transmission mode.
And step 130, determining a server storage path of database table data in each access area according to the access frequency and the server performance data, and storing the database table data according to the storage path.
Specifically, the data access frequency and the server performance are comprehensively considered to determine the data storage path, the data access frequency reflects the data demand degree of the user, and the higher the data access frequency is, the larger the user demand is, so that the server with good performance is required to store and read so as to ensure the use experience of the user; in a specific embodiment, the performance of the current server is determined according to the server performance data, if the processor frequency of the current server is 4GHz, the memory size is 16GB, and the disk size is 500GB, then the performance of the current server can be determined to be excellent; the classification table grade of the current data is determined according to the access frequency of the database table data, the access frequency of the database table data is 180 times per week, and the access frequency is more than 100 times per week, so that the data belongs to a high frequency and is classified into a high frequency classification table, and the high frequency classification table data is stored through a server with excellent performance.
Optionally, the determining the server storage path of the database table data according to the access frequency and the server performance data specifically includes:
step S131, determining an access value according to a preset access frequency function and the access frequency.
In a specific embodiment, if the access frequency of the data is 15 times per week, the corresponding access value is 0.2 after normalization operation is performed through a preset function; the access frequency of the other data is 52 times per week, and the corresponding access value is 0.5 after normalization operation is carried out through a preset access frequency function; and obtaining an access value corresponding to the data for the subsequent determination of the storage path.
And step S132, determining a server performance value according to a preset server performance function and the server performance data.
Specifically, the performance of the current server can be evaluated through the server performance data, but the performance of the server is affected by a plurality of factors, including but not limited to the processing frequency of the server processor, the memory size, the disk size, and the like, and the requirements on the performance of the server are different according to different application approaches. For example, for a server used for storage, disk size is a major factor in evaluating server performance; for servers used to read data, the processing frequency of the processor is a major factor in evaluating the performance of the server. The server performance is evaluated manually, the efficiency is low, and the server needs to be re-evaluated when the application scene is changed; the server performance data is normalized through the preset server performance function to obtain a server performance value, the server can determine the own server performance value according to the server performance data by itself, and only the preset server performance function is required to be changed when the application scene is changed.
And step S133, matching the access value with the server performance value, and determining a server storage path of the database table data.
Specifically, in particular embodiments, the performance of the server is characterized by a server performance value and the frequency of access to database table data is characterized by an access value. For example, in a high-frequency access area, there are a plurality of servers with good performance, a server performance value of the server is calculated, an access value of the access frequency of database table data in the access area is calculated, and the server with high server performance value is allocated to the database table data with high access value for storage operation.
Optionally, the access frequency function includes a first function and a second function, and the access value is determined according to a preset access frequency function and an access frequency.
Specifically, the access frequency of the table data of the database includes a table data access frequency and a metadata access frequency, and the table data access frequency and the metadata access frequency reflect that the data is accessed in different modes, so that when the access value corresponding to the data is calculated, the contribution degree of the table data access frequency and the metadata access frequency needs to be comprehensively considered, the contribution degree of the table data access frequency is determined through a first function and the table data access frequency, and the contribution degree of the metadata access frequency is determined through a second function and the metadata access frequency.
The method for determining the access value specifically comprises the following steps:
step S1311, calculating a first value according to the first function and the table data access frequency.
Specifically, the first function is a probability quality function for calculating the table data access frequency of the database table data, and as the first numerical value, a specific calculation formula is as follows:
wherein,probability mass function for table data access frequency x +.>For the table data access frequency is +.>Probability mass of time, < >>Is a dirac function. The acquired table data access frequency and the corresponding probability quality are respectively +.>、/>、/>The corresponding probability masses are +.>、/>、/>Then, at the access frequency +.>When calculating the corresponding probability mass function as first data +.>=/>++/>Since the dirac function represents x is +.>The value is infinite, and the other parts are zero, so that the access frequency is +.>When the corresponding probability mass function is calculated as +.>. In a specific embodiment, if the access frequency is 1, 2, and 3, respectively, the corresponding probability mass is 0.2, 0.3, and 0.2, respectively, and the probability mass function when the access frequency is 2 is +.>
Step S1312, calculating a second value according to the second function and the metadata access frequency.
Specifically, the second function is a probability density integral of metadata access frequency of database table data, and as the second numerical value, a specific calculation formula is as follows:
wherein,integration of probability density for metadata access frequency x,/for metadata access frequency x>Is a probability density function when the metadata access frequency is x.
It should be noted that the probability density function and the probability quality function described above need to be determined according to specific application scenarios and data features.
Step S1313, performing weighted summation on the first value and the second value to obtain an access value.
Specifically, a weighted sum of the first value and the second value is calculated by the following formula,
wherein S is a weighted sum,is a first value, +.>Is a second value>For the frequency of table data access,and accessing the metadata frequently.
In the embodiment of the present application, the weights of the first value and the second value are both 1, but the weight in the above formula is not particularly limited, and may be specifically set according to practical applications.
Optionally, the first function is determined by:
and S210, determining a probability quality function and a Dirac function according to the history table data access data.
In particular, the actual probability mass function is determined by analyzing characteristics of historical table data access data of table data in the data, such as analyzing a data type of the data, if it is determined that the data is discrete geometric distribution data, the probability mass function of the data can be determined according to probability mass of the geometric distribution; whereas the dirac function is used to select the probability mass function for different access frequencies.
Step S220, multiply the probability mass function and the dirac function and accumulate as a first function.
Specifically, after the probability mass function and the dirac function are determined, the probability mass functions of different access frequencies are selected in a multiplication mode, and the probability mass functions of different access frequencies are accumulated after being multiplied by the dirac function, so that a first function is obtained.
Optionally, the second function is determined by:
and step S230, determining a probability density function according to the historical metadata access data.
In particular, the actual probability mass function is determined by analyzing characteristics of historical table data access data of table data in the data, such as analyzing a data type of the data, and if it is determined that the data is continuously uniformly distributed data, the probability density function of the data can be determined according to the probability density of the uniform distribution.
Step S240, integrating the probability density function as a second function.
Specifically, after the probability density function is obtained, since the data is continuous data, the second function of the data in different metadata access frequencies cannot be obtained through the accumulation mode of discrete data, and the second function needs to be obtained through integral operation on the probability density function.
Optionally, the server performance function includes a third function and a fourth function, and the server performance value is determined according to a preset server performance function and server performance data.
The specific calculation mode is as follows:
step S1321, calculating the total probability density according to the third function and each parameter in the server performance data.
Specifically, the third function is a probability density function of the performance data of the server, and may be obtained by statistics on historical performance data of the server. In a specific embodiment, during the use of the server, 1000 historical sample data of the performance of the server are recorded, where each sample data records the performance values of the memory, the disk and the processor of the server, and then the probability density function can be estimated according to the kernel density estimation method and the above historical sample data as a third function. After the third function is obtained, the performance values of the memory, the disk and the processor in the current state of the server are recorded, and the probability distribution, namely the total probability density, of the server is obtained by calculating according to the third function and the performance values of the memory, the disk and the processor in the current state.
Step S1322, calculating a server performance value according to the fourth function and the total probability density.
Specifically, the fourth function is to perform triple integration and weighted average operation on the obtained total probability density, and the obtained result is used as a server performance value. In a specific embodiment, the weight values of the integral range and the weighted average of the triple integral are determined according to the volume V formed by the memory size of the server, the disk size and the value range of the processor frequency, for example, the value range of the memory size of the server is 1GB to 16GB, the value range of the disk size is 100GB to 1TB, the value range of the processor frequency is 1GHz to 4GHz, and then the formed volume v= (16-1) ×100-1000) ×1-4) = -2250, and the negative sign is that the value range of the disk size is decreasing. After the integration range of the triple integration is obtained, the obtained total probability density is subjected to the triple integration, the inverse of the volume is used as the weight of weighted average, and the result of the triple integration is subjected to weighted average to be used as a server performance value which represents the average performance level of the server under various configurations.
According to the above method steps, a calculation formula for calculating the server performance value can be obtained as follows:
wherein,for the server performance value, 1/V is the weight of the weighted average, V is the volume of the server memory size, disk size and the range of values of the processor frequency, +.>Memory size->For the disk size +.>For processor frequency, +.>Is a probability density function of the server memory size, disk size, and processor frequency.
Optionally, the data storage method further includes:
step S140, calculating the frequency change rate according to the access frequency in the preset time period.
Specifically, the formula for calculating the frequency change rate is as follows:
wherein,for S pair->Rate of change of->、/>Respectively represent pair->Rate of change of->Is->For a pair ofDerivative of>Is->For->Derivative of>For the table data access frequency, +.>For metadata access frequency, ++>Is the frequency of access to the data.
As can be seen from the above formula, S pairsThe rate of change of (2) is equal to the respective function pair +.>Is a weighted sum of the rates of change of (c).
And step S150, if the frequency change rate is larger than a preset value, dividing the corresponding database table data into the highest-level classification table.
Specifically, the change rate indicates the stability of the access frequency, for example, the change rate is too large, which indicates that the access frequency is low for a certain period of time and high for a certain period of time, and in this case, if the access frequency is measured according to the frequency of a certain period of time, a certain difference is generated; therefore, a preset value is set, the access frequency with the change rate larger than the preset value is judged to be unstable, and the unstable access frequency is divided into the highest-level classification table, so that user experience is ensured.
In a specific implementation, a set of discrete data access frequencies and corresponding probability quality functions are obtained first, and a data table formed according to the discrete data access frequencies and the corresponding probability quality functions is shown in table 1:
TABLE 1
Calculating the probability mass of the access frequency of 2 times according to the data table=/>++/>+/>+/>=0.3, and taking the obtained probability mass as an access value of the current access frequency; then, judging that the data with the access frequency of 2 is low-access-frequency data according to the access value, and sending the data to a low-frequency area; then in the low-frequency region, acquiring the memory size, the disk size and the value range of the processor frequency of the current server, obtaining the value range of the memory size to be 1GB to 16GB, the value range of the disk size to be 100GB to 1TB, the value range of the processor frequency to be 1GHz to 4GHz, forming a volume of V= (16-1) (100-1000) (1-4) = -2250, determining the weight of weighted average to be 1/(-2250), reading the performance parameters of the server in the current state of the server, calculating the performance value to be 0.4, then matching the access value with the performance value, and judging that the current server can provide good server performance and transmission performance for the data with the access frequency of 2; the data is then stored in the current server.
The embodiment of the application has the following beneficial effects: according to the embodiment, the access frequency of the server performance data, the database table data and the database table data is obtained; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server; dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding access areas; and then determining a server storage path of database table data in each access area according to the access frequency and the server performance data, and storing the database table data according to the storage path. By calculating the access frequency of the data and the performance of the server, different storage strategies are provided according to the access frequency of the data and the performance of the server, so that the high-frequency access data can obtain better server performance and network transmission performance, the utilization efficiency of the server performance is improved, and the user experience is improved.
As shown in FIG. 2, an embodiment of the present application further provides a distributed data storage system, including:
the first module is used for acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
the second module is used for dividing the database table data into a plurality of classification tables according to the access frequency and sending the classification tables to the corresponding access areas;
and a third module, configured to determine, in each access area, a server storage path of database table data according to the access frequency and the server performance data, and store the database table data according to the storage path.
It can be seen that the content in the above method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those of the method embodiment, and the beneficial effects achieved by the method embodiment are the same as those achieved by the method embodiment.
As shown in fig. 3, an embodiment of the present application further provides a distributed data storage device, including:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the steps of the distributed data storage method described in the method embodiments above.
Wherein the memory is operable as a non-transitory computer readable storage medium storing a non-transitory software program and a non-transitory computer executable program. The memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes remote memory provided remotely from the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It can be seen that the content in the above method embodiment is applicable to the embodiment of the present device, and the functions specifically implemented by the embodiment of the present device are the same as those of the embodiment of the above method, and the beneficial effects achieved by the embodiment of the above method are the same as those achieved by the embodiment of the above method.
Furthermore, the embodiment of the application also discloses a computer program product or a computer program, and the computer program product or the computer program is stored in a computer readable storage medium. The computer program may be read from a computer readable storage medium by a processor of a computer device, the processor executing the computer program causing the computer device to perform the method as described above. Similarly, the content in the above method embodiment is applicable to the present storage medium embodiment, and the specific functions of the present storage medium embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.
The embodiment of the present application also provides a computer-readable storage medium storing a program executable by a processor, which when executed by the processor is configured to implement the above-described method.
It is to be understood that all or some of the steps, systems, and methods disclosed above may be implemented in software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (8)

1. A method of distributed data storage, comprising the steps of:
acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
dividing the database table data into a plurality of classification tables according to the access frequency, and transmitting the classification tables to corresponding storage areas;
and in each storage area, determining a probability quality function and a dirac function according to historical table data access data, taking the sum of products of the probability quality function and the dirac function as a first function, determining a probability density function according to historical metadata access data, integrating the probability density function to determine a second function, determining a server storage path of database table data according to the access frequency, the first function, the second function and the server performance data, and storing the database table data according to the storage path.
2. The data storage method according to claim 1, wherein the determining the server storage path of the database table data according to the access frequency and the server performance data specifically comprises:
determining an access value according to a preset access frequency function and the access frequency;
determining a server performance value according to a preset server performance function and the server performance data;
and matching the access value with the server performance value to determine a server storage path of database table data.
3. The data storage method according to claim 2, wherein the access frequency function includes a first function and a second function, and the determining the access value according to the preset access frequency function and the access frequency specifically includes:
calculating a first value according to the first function and the table data access frequency;
calculating a second value according to the second function and the metadata access frequency;
and carrying out weighted summation on the first numerical value and the second numerical value to obtain an access value.
4. The data storage method according to claim 2, wherein the server performance function includes a third function and a fourth function, and the determining the server performance value according to the preset server performance function and the server performance data specifically includes:
calculating a total probability density according to the third function and each parameter in the server performance data;
and calculating a server performance value according to the fourth function and the total probability density.
5. The data storage method of claim 1, wherein the level of each classification table is different, the data storage method further comprising:
calculating the frequency change rate according to the access frequency in a preset time period;
and if the frequency change rate is larger than a preset value, dividing the corresponding database table data into a classification table with the highest level.
6. A data storage system, comprising:
the first module is used for acquiring server performance data, database table data and access frequency of the database table data; wherein the access frequency comprises a table data access frequency and a metadata access frequency; the server performance data comprises the memory size, the hard disk size and the processor performance of the server;
the second module is used for dividing the database table data into a plurality of classification tables according to the access frequency and sending the classification tables to the corresponding storage areas;
and the third module is used for determining a probability quality function and a dirac function according to the historical table data access data in each storage area, taking the sum of products of the probability quality function and the dirac function as a first function, determining a probability density function according to the historical metadata access data, integrating the probability density function to determine a second function, determining a server storage path of database table data according to the access frequency, the first function, the second function and the server performance data, and storing the database table data according to the storage path.
7. A data storage device, comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method of any of claims 1-5.
8. A computer readable storage medium, in which a processor executable program is stored, characterized in that the processor executable program is for performing the method according to any of claims 1-5 when being executed by a processor.
CN202310697781.3A 2023-06-13 2023-06-13 Distributed data storage method, system, device and storage medium Active CN116431081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310697781.3A CN116431081B (en) 2023-06-13 2023-06-13 Distributed data storage method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310697781.3A CN116431081B (en) 2023-06-13 2023-06-13 Distributed data storage method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN116431081A CN116431081A (en) 2023-07-14
CN116431081B true CN116431081B (en) 2023-11-07

Family

ID=87085846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310697781.3A Active CN116431081B (en) 2023-06-13 2023-06-13 Distributed data storage method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN116431081B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117389469A (en) * 2023-09-21 2024-01-12 华南理工大学 Internet data storage method, device, system and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841931A (en) * 2012-08-03 2012-12-26 中兴通讯股份有限公司 Storage method and storage device of distributive-type file system
JP2018041174A (en) * 2016-09-05 2018-03-15 日本電気株式会社 Database management device, database management method and program
CN112905113A (en) * 2021-02-08 2021-06-04 中国工商银行股份有限公司 Data access processing method and device
CN114518848A (en) * 2022-02-15 2022-05-20 北京百度网讯科技有限公司 Hierarchical storage system, and method, apparatus, device, and medium for processing storage data
CN114756624A (en) * 2022-04-11 2022-07-15 润联软件系统(深圳)有限公司 Data processing method, device and equipment for full-scale nodes and storage medium
CN115048053A (en) * 2022-06-15 2022-09-13 中国工商银行股份有限公司 Data storage method and device and electronic equipment
WO2022248714A1 (en) * 2021-05-27 2022-12-01 Cambridge Enterprise Limited Improvements in and relating to encoding and computation on distributions of data
CN115883590A (en) * 2022-12-09 2023-03-31 北京易华录信息技术股份有限公司 Optical-magnetic-electric fusion media asset data distributed storage and management method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11404167B2 (en) * 2019-09-25 2022-08-02 Brilliance Center Bv System for anonymously tracking and/or analysing health in a population of subjects

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841931A (en) * 2012-08-03 2012-12-26 中兴通讯股份有限公司 Storage method and storage device of distributive-type file system
JP2018041174A (en) * 2016-09-05 2018-03-15 日本電気株式会社 Database management device, database management method and program
CN112905113A (en) * 2021-02-08 2021-06-04 中国工商银行股份有限公司 Data access processing method and device
WO2022248714A1 (en) * 2021-05-27 2022-12-01 Cambridge Enterprise Limited Improvements in and relating to encoding and computation on distributions of data
CN114518848A (en) * 2022-02-15 2022-05-20 北京百度网讯科技有限公司 Hierarchical storage system, and method, apparatus, device, and medium for processing storage data
CN114756624A (en) * 2022-04-11 2022-07-15 润联软件系统(深圳)有限公司 Data processing method, device and equipment for full-scale nodes and storage medium
CN115048053A (en) * 2022-06-15 2022-09-13 中国工商银行股份有限公司 Data storage method and device and electronic equipment
CN115883590A (en) * 2022-12-09 2023-03-31 北京易华录信息技术股份有限公司 Optical-magnetic-electric fusion media asset data distributed storage and management method and device

Also Published As

Publication number Publication date
CN116431081A (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN116431081B (en) Distributed data storage method, system, device and storage medium
CN106228386B (en) A kind of information-pushing method and device
US8209337B2 (en) Content recommendation apparatus and method using tag cloud
CN105022761B (en) Group searching method and device
US20170070528A1 (en) Methods and apparatus to identify malicious activity in a network
Zhang et al. Novel item recommendation by user profile partitioning
CN111625730B (en) Information pushing method and device, electronic equipment and medium
US20240147188A1 (en) Method and apparatus for uploading and acquiring features of wireless signals
CN112925990B (en) Target group classification method and device
CN111581442A (en) Method and device for realizing graph embedding, computer storage medium and terminal
CN112328865B (en) Information processing and recommending method, device, equipment and storage medium
US20230004776A1 (en) Moderator for identifying deficient nodes in federated learning
CN106850822B (en) Load balancing method, equipment and distributed system
CN104168174A (en) Method and apparatus for information transmission
US7626939B1 (en) Method and apparatus for automated time-based peer-to-peer thresholding
CN112215473A (en) Distribution pressure data obtaining method and device and electronic equipment
CN112416590A (en) Server system resource adjusting method and device, computer equipment and storage medium
CN110443320A (en) The determination method and device of event similarity
CN110968790A (en) Latent customer intelligent recommendation method, device and storage medium based on big data
CN110084455B (en) Data processing method, device and system
CN107784363B (en) Data processing method, device and system
FR3045859A1 (en) METHOD AND APPARATUS FOR FORMING A COMPUTER CLOUD STORING THE RESULT OF EXECUTION FROM A COMPUTER TASK
CN112100441B (en) Video recommendation method, electronic device, and computer-readable storage medium
CN111031355B (en) Media resource playing processing method, device and system
Yi et al. A hybrid scheduling scheme for data broadcast over a single channel in mobile environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant