CN116594568A - Data storage method and device, electronic equipment and storage medium - Google Patents

Data storage method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116594568A
CN116594568A CN202310629886.5A CN202310629886A CN116594568A CN 116594568 A CN116594568 A CN 116594568A CN 202310629886 A CN202310629886 A CN 202310629886A CN 116594568 A CN116594568 A CN 116594568A
Authority
CN
China
Prior art keywords
data storage
storage node
file
index
health
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310629886.5A
Other languages
Chinese (zh)
Inventor
张振野
席超敏
郭思曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yaxin Technology Co ltd
Original Assignee
Guangzhou Yaxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yaxin Technology Co ltd filed Critical Guangzhou Yaxin Technology Co ltd
Priority to CN202310629886.5A priority Critical patent/CN116594568A/en
Publication of CN116594568A publication Critical patent/CN116594568A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • G06F3/0622Securing storage systems in relation to access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a data storage method, a data storage device, electronic equipment and a storage medium, and relates to the technical field of computers. The method comprises the following steps: acquiring real-time values of all target acquisition indexes of a host of a first data storage node at every other preset time interval; acquiring health index parameters of the first data storage node according to each real-time value at every other preset time interval; and for each file, acquiring a second data storage node according to the health index parameters of each first data storage node, acquiring a third data storage node from the second data storage node according to the identification of the file, and storing the file data into the third data storage node. And selecting an optimal data storage node from the data storage nodes with higher health degrees for each file by calculating the health index parameter of each data storage node. The obtained data storage nodes can store all file data, and the file data is not easy to lose.

Description

Data storage method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data storage method, apparatus, electronic device, and storage medium.
Background
Big data storage is a key technology in the big data field, and how to better optimize distributed data storage becomes a hot topic with the continuous heat of big data technology in recent years. How to realize the load balancing of distributed data storage on the premise of guaranteeing the usability, fault tolerance mechanism and read-write performance of data is a serious problem for big data engineers.
In the prior art, a distributed file system client firstly obtains storage load information of each data storage node through a management node, then obtains available data storage nodes according to the storage load information of all the data storage nodes, and finally stores file data to be stored into each available data storage node according to an algorithm (such as Round-Robin algorithm, hash algorithm, consistency Hash algorithm and the like) aiming at distributed data storage.
However, in the above storage process, only the storage load information (i.e. how much file data is stored) of the data storage nodes is considered when the available data storage nodes are obtained, so that there may be data storage nodes with high read-write pressure or potential software and hardware faults in the available data storage nodes obtained by the prior art, so that the file data cannot be stored on the data storage nodes, and part of the file data is easy to be lost in the storage process.
Disclosure of Invention
The application aims to at least solve one of the technical defects, and the technical scheme provided by the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides a data storage method, including:
for each first data storage node, acquiring real-time values of target acquisition indexes of a host corresponding to the first data storage node at intervals of a first preset time interval; the target acquisition index characterizes the current performance of the host corresponding to the first data storage node;
every second preset time interval, acquiring health index parameters of the first data storage node according to real-time values in a first preset time period before the current moment; wherein the second preset time interval is not less than the first preset time interval;
when a file data storage request is received, for any one file in the file data storage request, at least one second data storage node is obtained according to health index parameters in a second preset time period before the current moment of each first data storage node, a third data storage node is obtained from each second data storage node according to the identification of any one file, and the data of any one file is stored in the third data storage node.
In an optional embodiment of the present application, the obtaining health indicator parameters of the first data storage node according to real-time values each belonging to a first preset time period before the current time specifically includes:
for each target acquisition index, taking the average value of the real-time values belonging to the first preset time period as a first reference value if the target acquisition index is a hardware type index, and taking the median of the real-time values belonging to the first preset time period as the first reference value if the target acquisition index is a non-hardware type index;
and acquiring the health index parameters of the first data storage node according to each first reference value.
In an alternative embodiment of the present application, the index of the hardware type includes: a Central Processing Unit (CPU) idle time to total time ratio, a memory usage ratio, a disk read-write request response time and a disk read-write total number;
an indicator of a non-hardware type, comprising: data amount and network delay.
In an optional embodiment of the present application, the obtaining the health indicator parameter of the first data storage node according to each first reference value specifically includes:
For each target acquisition index, acquiring the health degree of the target acquisition index according to the first reference value, and calculating the health index parameter of the target acquisition index according to the health degree;
and taking the minimum value of the health index parameters in each target acquisition index as the health index parameter of the first data storage node.
In an alternative embodiment of the present application, the method specifically includes:
if the health index parameter corresponding to the response time of the disk read-write request is not smaller than the first preset threshold value, the health index parameter corresponding to the ratio of the CPU idle time to the total time is not obtained; the first preset threshold value represents whether the ratio of CPU idle time to total time has a reference meaning or not.
In an optional embodiment of the present application, the obtaining at least one second data storage node according to the health indicator parameter of each first data storage node in the second preset time period before the current time specifically includes:
for each first data storage node, acquiring an average value of each health index parameter of the first data storage node in a second preset time period, and taking the average value as a second reference value;
taking the first data storage node with the second reference value larger than a second preset threshold value as a second data storage node; wherein the second preset threshold characterizes the first data storage node in a healthy state.
In an alternative embodiment of the present application, the obtaining the third data storage node from each second data storage node according to the identifier of any file specifically includes:
performing consistent hash calculation on the identification of any file, obtaining a calculation result, and obtaining a third data storage node from each second data storage node according to the calculation result;
and the calculation result and each second data storage node are preset with corresponding relations.
In a second aspect, an embodiment of the present application provides a data storage device, including:
the real-time numerical value acquisition module is used for acquiring real-time numerical values of all target acquisition indexes of the host corresponding to each first data storage node at intervals of a first preset time; the target acquisition index characterizes the current performance of the host corresponding to the first data storage node;
the health index parameter acquisition module is used for acquiring health index parameters of the first data storage node according to real-time values in a first preset time period before the current moment every second preset time interval; wherein the second preset time interval is not less than the first preset time interval;
The file data storage module is used for acquiring at least one second data storage node according to the health index parameter of each first data storage node in a second preset time period before the current moment for any file in the file data storage request when the file data storage request is received, acquiring a third data storage node from each second data storage node according to the identification of any file, and storing the data of any file into the third data storage node.
In an alternative embodiment of the present application, the health indicator parameter obtaining module is specifically configured to:
for each target acquisition index, taking the average value of the real-time values belonging to the first preset time period as a first reference value if the target acquisition index is a hardware type index, and taking the median of the real-time values belonging to the first preset time period as the first reference value if the target acquisition index is a non-hardware type index;
and acquiring the health index parameters of the first data storage node according to each first reference value.
In an alternative embodiment of the present application, the index of the hardware type includes: the CPU idle time and total time ratio, the memory usage ratio, the disk read-write request response time and the total disk read-write number;
An indicator of a non-hardware type, comprising: data amount and network delay.
In an alternative embodiment of the present application, the health indicator parameter obtaining module is further configured to:
for each target acquisition index, acquiring the health degree of the target acquisition index according to the first reference value, and calculating the health index parameter of the target acquisition index according to the health degree;
and taking the minimum value of the health index parameters in each target acquisition index as the health index parameter of the first data storage node.
In an alternative embodiment of the present application, the health indicator parameter acquisition module may be further configured to:
if the health index parameter corresponding to the response time of the disk read-write request is not smaller than the first preset threshold value, the health index parameter corresponding to the ratio of the CPU idle time to the total time is not obtained; the first preset threshold value represents whether the ratio of CPU idle time to total time has a reference meaning or not.
In an alternative embodiment of the present application, the file data storage module is specifically configured to:
for each first data storage node, acquiring an average value of each health index parameter of the first data storage node in a second preset time period, and taking the average value as a second reference value;
Taking the first data storage node with the second reference value larger than a second preset threshold value as a second data storage node; wherein the second preset threshold characterizes the first data storage node in a healthy state.
In an alternative embodiment of the application, the file data storage module is further for:
performing consistent hash calculation on the identification of any file, obtaining a calculation result, and obtaining a third data storage node from each second data storage node according to the calculation result;
and the calculation result and each second data storage node are preset with corresponding relations.
In a third aspect, an embodiment of the present application provides an electronic device including a memory, a processor, and a computer program stored on the memory;
the processor executes a computer program to implement the method provided in the first aspect embodiment or any alternative embodiment of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements the method provided in the embodiment of the first aspect or any of the alternative embodiments of the first aspect.
The technical scheme provided by the embodiment of the application has the beneficial effects that:
The method comprises the steps of calculating health index parameters of each data storage node by acquiring real-time values of a plurality of target acquisition indexes of a host corresponding to each data storage node, screening data storage nodes with higher health degrees from all the data storage nodes, and selecting optimal data storage nodes from the data storage nodes with higher health degrees for each file to store file data. Because the real-time data of each target acquisition index of the host is acquired, the current performance of the host of each data storage node is considered, so that the read-write pressure of the acquired data storage node is smaller and the problems of software and hardware are avoided, and further, the data storage node finally acquired can smoothly store all file data, and the loss of the file data is not easy to cause.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.
FIG. 1 is a schematic flow chart of a data storage method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for obtaining a third data storage node by a consistent hashing algorithm in an example of an embodiment of the present application;
FIG. 3 is a flow chart of data storage in one example of an embodiment of the application;
FIG. 4 is a block diagram illustrating a data storage device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.
Fig. 1 is a schematic flow chart of a data storage method according to an embodiment of the present application, where an execution body of the method may be a distributed file system, as shown in fig. 1, and the method may include:
step S101, for each first data storage node, acquiring real-time values of target acquisition indexes of a host corresponding to the first data storage node at intervals of a first preset time; the target acquisition index characterizes the current performance of the host corresponding to the first data storage node.
The first data storage nodes are all preset data storage nodes, each data storage node is arranged in a corresponding host, and in the embodiment of the application, the host can be a physical host or a cloud host, and the method is not limited herein. The target acquisition index is an index used for measuring the current performance of the host on the host corresponding to the first data storage node. The first preset time interval is used to indicate a time point of acquiring the real-time value of the target acquisition indicator, and may be set to 1 minute in an embodiment of the present application. (i.e., a real-time value of the target acquisition index is acquired every 1 minute). The real-time value may be obtained by an Agent (Agent plug-in).
Specifically, in the embodiment of the application, every other first time interval, the distributed file system acquires real-time values of target acquisition indexes on the host corresponding to each first data storage node through the Agent proxy plugin.
Step S102, acquiring health index parameters of a first data storage node according to real-time values in a first preset time period before the current moment every second preset time interval; wherein the second preset time interval is not smaller than the first preset time interval.
The health index parameter can be understood as a scoring value of the first data storage node, and the lower the value of the health index parameter is, the greater the pressure of reading and writing data of the corresponding first data storage node is or the potential software and hardware faults exist. The first preset time period is a time range of selecting a real-time value as a reference factor of the health index parameter of the time period, and may be set to be the first 15 minutes of the current time (i.e. the real-time value acquired within the first 15 minutes of the current time may be used to calculate the health index parameter) in an exemplary embodiment of the present application. The second preset time interval is used to indicate a time point of acquiring the health indicator parameter of the first data storage node, and may be set to be 15 minutes in an embodiment of the present application. (i.e., the health index parameter of the first data storage node is obtained once every 15 minutes)
Specifically, in the embodiment of the present application, each second preset time interval is set, according to the real-time value of each target acquisition index obtained within 15 minutes before the current time, the health index parameter of each first data storage node is calculated, and the calculated health index parameter is stored in the management node. The management node has a limited storage period for the health indicator parameters of each data storage node. For example, the storage period may be set to 2 days.
Step S103, when a file data storage request is received, for any one file in the file data storage request, at least one second data storage node is obtained according to health index parameters of each first data storage node in a second preset time period before the current moment, a third data storage node is obtained from each second data storage node according to the identification of any one file, and the data of any one file is stored in the third data storage node.
The file data storage request contains at least one file, and data contained in each file needs to be stored in a data storage node. The second data storage node is the data storage node which is screened out and has smaller current read-write pressure and no potential software and hardware faults. The third data storage node is the determined data storage node for storing certain file data. The second preset time period is a time range of selecting the health index parameter as the reference factor of the health index parameter of the time period, and may be set as the day before the current time (that is, the real-time value acquired in the day before the current time may be used as the reference factor for acquiring the second data storage node) in the embodiment of the present application. The identifier of the file may be a number corresponding to the file, and is used to determine a third data storage node corresponding to the file.
Specifically, when a file data storage request is received, for any one file in the request, at least one second data storage node is first selected from the first data storage nodes through a management node according to each health index parameter acquired from the day before each first data storage node, then a target data storage node (namely a third data storage node) of the file is determined according to the identification of any one file, and then the file data of any one file is stored in the third data storage node.
According to the scheme provided by the application, the health index parameters of each data storage node are calculated by acquiring the real-time values of the target acquisition indexes of the host corresponding to each data storage node, then the data storage node with higher health degree is screened out from all the data storage nodes, and the optimal data storage node is selected from the data storage nodes with higher health degree for each file so as to store file data. Because the real-time data of each target acquisition index of the host is acquired, the current performance of the host of each data storage node is considered, so that the read-write pressure of the acquired data storage node is smaller and the problems of software and hardware are avoided, and further, the data storage node finally acquired can smoothly store all file data, and the loss of the file data is not easy to cause.
In an optional embodiment of the present application, the obtaining health indicator parameters of the first data storage node according to real-time values each belonging to a first preset time period before the current time specifically includes:
for each target acquisition index, taking the average value of the real-time values belonging to the first preset time period as a first reference value if the target acquisition index is a hardware type index, and taking the median of the real-time values belonging to the first preset time period as the first reference value if the target acquisition index is a non-hardware type index;
and acquiring the health index parameters of the first data storage node according to each first reference value.
The index of the hardware type is an index related to host hardware, specifically, in the embodiment of the present application, an index related to hardware such as CPU, memory, disk, etc. The non-hardware type indicator is an indicator related to non-hardware, specifically, in the embodiment of the present application, an indicator related to a network condition. The first reference value is used for measuring the read-write pressure or the software and hardware state of the data storage node of the corresponding target acquisition index in the whole first preset time period.
Specifically, if the target acquisition index is of a hardware type, calculating an average value according to all acquired real-time data of the target acquisition index in a first preset time period, and taking the calculated average value as a first reference value of the target acquisition index; if the target acquisition index is of a non-hardware type, calculating the median according to all acquired real-time data of the target acquisition index in a first preset time period, and taking the calculated median as a first reference value of the target acquisition index. And after the first reference values of all the target acquisition indexes are acquired, acquiring the health index parameters of the first storage node according to all the first reference values.
It should be noted that, in actual operation, the first reference value may be obtained by different methods for each target acquisition index, that is, the target acquisition index of the hardware type may also be obtained by calculating the median, which needs to be determined according to the actual data situation.
In an alternative embodiment of the present application, the index of the hardware type includes: the CPU idle time and total time ratio, the memory usage ratio, the disk read-write request response time and the total disk read-write number;
an indicator of a non-hardware type, comprising: data amount and network delay.
Specifically, the ratio of the central processing unit CPU idle time to the total time (cpu_iowait) refers to the idle CPU time/total CPU time that triggers an unfinished IO (input output) task. The memory usage ratio (mem_used) refers to the memory usage (swap used) of the swap area/the total memory of the swap area. The disk usage ratio (disk_used) refers to the disk usage amount/total disk amount. The disk read-write request response time (disk_await) refers to the response time (in milliseconds) per read-write request. The total number of disk reads and writes (disk_iops) refers to the total number of IOs (in kilobytes per second) for disk reads and writes per second. The amount of transceiving data (nw_ioxkb) refers to the amount of transceiving data per second (in kilobytes per second). The network delay (nw_latency) is the time (in milliseconds) it takes for data to be transmitted in the transmission medium.
In an optional embodiment of the present application, the obtaining the health indicator parameter of the first data storage node according to each first reference value specifically includes:
for each target acquisition index, acquiring the health degree of the target acquisition index according to the first reference value, and calculating the health index parameter of the target acquisition index according to the health degree;
and taking the minimum value of the health index parameters in each target acquisition index as the health index parameter of the first data storage node.
The health degree is used for representing the health degree of the target acquisition indexes, and the health degree of each target acquisition index can be calculated according to a preset formula. And carrying out mathematical conversion on the health degree to obtain corresponding health index parameters.
Specifically, the manner in which each target acquisition indicator obtains the health degree according to the first reference value is shown in the following table 1:
TABLE 1
And multiplying the calculated health degree by 100 to obtain a corresponding health index parameter.
For the health index parameters of the first data storage nodes, since each first data storage node contains the health index parameters of a plurality of target acquisition indexes, the minimum value of all the health index parameters of the data storage nodes is used as the health index parameter of the first data storage node.
It can be understood that, for the first data storage node, if the health index parameter of one of the target acquisition indexes is lower, the first data storage node can be considered to have a certain problem (such as a large read-write pressure or a software-hardware problem), so that the state of the storage node can be accurately reflected by taking the minimum value of all the health index parameters as the health index parameter of the first data storage node.
In an alternative embodiment of the present application, the method specifically includes:
if the health index parameter corresponding to the response time of the disk read-write request is not smaller than the first preset threshold value, the health index parameter corresponding to the ratio of the CPU idle time to the total time is not obtained; the first preset threshold value represents whether the ratio of CPU idle time to total time has a reference meaning or not.
Specifically, when the response time of the disk read-write request is higher (at this time, the health index parameter corresponding to the disk read-write request is known to be lower according to the above formula), it is proved that the read-write pressure of the disk is larger, and the time in the kernel IO queue and the time for executing the IO on the storage device are possibly too long (i.e. the CPU may have a problem), at this time, the health index parameter of the ratio of the idle time to the total time of the CPU is further calculated by collecting the ratio of the idle time to the total time of the CPU. When the response time of the disk read-write request is lower (at this time, the health index parameter corresponding to the disk read-write request is higher according to the above formula), the read-write pressure of the disk is proved to be smaller, and at this time, whether the problem exists in the CPU (that is, the real-time value of the ratio of the CPU idle time to the total time is in the normal value and has no reference meaning) cannot be defined, and the health index parameter of the ratio of the CPU idle time to the total time is not required to be calculated.
For example, in the embodiment of the present application, the first preset threshold may be set to 80, that is, when the health indicator parameter corresponding to the response time of the disk read/write request is not less than 80, the health indicator parameter corresponding to the ratio of the idle time to the total time of the CPU is not acquired.
In an optional embodiment of the present application, the obtaining at least one second data storage node according to the health indicator parameter of each first data storage node in the second preset time period before the current time specifically includes:
for each first data storage node, acquiring an average value of each health index parameter of the first data storage node in a second preset time period, and taking the average value as a second reference value;
taking the first data storage node with the second reference value larger than a second preset threshold value as a second data storage node; wherein the second preset threshold characterizes the first data storage node in a healthy state.
The second reference value is used for measuring the read-write pressure or the software and hardware state of the corresponding data storage node in the whole second preset time period.
Specifically, for each first data storage node, all the health index parameters calculated in the second preset time period are obtained, the average value of the health index parameters is calculated, and the calculated average value is used as a second reference value. And then taking each first data storage node with the second reference value larger than a second preset threshold value as a second data storage node. For example, in an embodiment of the present application, the second preset threshold may be set to 60, i.e. the first data storage nodes having the second reference value greater than 60 will be considered as healthy data storage nodes, and these first data storage nodes will be referred to as second data storage nodes.
In an alternative embodiment of the present application, the obtaining the third data storage node from each second data storage node according to the identifier of any file specifically includes:
performing consistent hash calculation on the identification of any file, obtaining a calculation result, and obtaining a third data storage node from each second data storage node according to the calculation result;
and the calculation result and each second data storage node are preset with corresponding relations.
Wherein a consistent hash calculation is used to select the data storage node to be stored for each file.
Specifically, first, each second data storage node is decomposed into two corresponding virtual nodes, and each virtual node is averagely arranged on a virtual circle according to a preset rule, wherein the preset rule means that two virtual nodes of the same data storage node are arranged at two ends symmetrical by taking the center of a virtual circle as the center of a circle. And then performing consistent hash calculation on the identifier of any file (for example, dividing the number of any file by 2≡32) to obtain corresponding calculation results, wherein each calculation result corresponds to a corresponding position on the virtual circle, for each position, the first virtual node passing clockwise is the corresponding target virtual node, and finally the second data storage node corresponding to the target virtual node is the third data storage node (each calculation result corresponds to a corresponding position, each corresponding position corresponds to a corresponding target virtual node, and each target virtual node corresponds to a second data storage node, namely, each calculation result corresponds to a corresponding second data storage node).
Fig. 2 is a schematic flow chart of obtaining a third data storage node through a consistent hashing algorithm according to an embodiment of the present application. As shown in fig. 2, s1, s2, s3 correspond to three actual data storage nodes, respectively, and in this algorithm, each data storage node may be divided into two virtual nodes (s 1 is divided into v1, v2, s2 is divided into v3, v4, s3 is divided into v5, v 6). After consistent Hash calculation (Hash in the figure) is performed on the O1 and the O2, the virtual node corresponding to the O1 is v4, the virtual node corresponding to the O2 is v2, then the corresponding data storage node of the O1 is s2, and the corresponding data storage node of the O2 is s1.
Fig. 3 is a schematic flow chart of a file data storage according to an embodiment of the present application. As shown in fig. 3, in step (1), the distributed file system collects real-time data of each target collection index (CPU, memory, disk, network) of the host corresponding to the data storage node through an Agent plug-in the data storage node. And (2) after the acquisition is completed, reporting the acquired real-time data to a management node by the Agent, and storing the health index parameters of the data storage node by the management node after calculating the health index parameters of the data storage node according to the real-time data. And (3) when the client sends a file data storage request to the distributed file system, the client acquires the health index parameter information and the file storage information of each data storage node stored in the management node through a data stream, and acquires a third data storage node in which each file needs to be stored from the management node. And (4) the client writes each file data into the corresponding third data storage node through output stream.
Fig. 4 is a block diagram of a data storage device according to an embodiment of the present application, and as shown in fig. 4, the file data storage device 400 may include: a real-time value acquisition module 401, a health index parameter acquisition module 402, and a file data storage module 403, wherein,
the real-time value acquisition module 401 is configured to acquire, for each first data storage node, a real-time value of each target acquisition index of the host corresponding to the first data storage node at a first preset time interval; the target acquisition index characterizes the current performance of the host corresponding to the first data storage node;
the health index parameter obtaining module 402 is configured to obtain health index parameters of the first data storage node according to real-time values in a first preset time period before the current time every second preset time interval; wherein the second preset time interval is not less than the first preset time interval;
the file data storage module 403 is configured to, when receiving a file data storage request, obtain, for any file in the file data storage request, at least one second data storage node according to a health indicator parameter in a second preset time period before the current time, obtain, according to an identifier of any file, a third data storage node from each second data storage node, and store data of any file in the third data storage node.
According to the scheme provided by the application, the health index parameters of each data storage node are calculated by acquiring the real-time values of the target acquisition indexes of the host corresponding to each data storage node, then the data storage node with higher health degree is screened out from all the data storage nodes, and the optimal data storage node is selected from the data storage nodes with higher health degree for each file so as to store file data. Because the real-time data of each target acquisition index of the host is acquired, the current performance of the host of each data storage node is considered, so that the read-write pressure of the acquired data storage node is smaller and the problems of software and hardware are avoided, and further, the data storage node finally acquired can smoothly store all file data, and the loss of the file data is not easy to cause.
In an alternative embodiment of the present application, the health indicator parameter obtaining module is specifically configured to:
for each target acquisition index, taking the average value of the real-time values belonging to the first preset time period as a first reference value if the target acquisition index is a hardware type index, and taking the median of the real-time values belonging to the first preset time period as the first reference value if the target acquisition index is a non-hardware type index;
And acquiring the health index parameters of the first data storage node according to each first reference value.
In an alternative embodiment of the present application, the index of the hardware type includes: the CPU idle time and total time ratio, the memory usage ratio, the disk read-write request response time and the total disk read-write number;
an indicator of a non-hardware type, comprising: data amount and network delay.
In an alternative embodiment of the present application, the health indicator parameter obtaining module is further configured to:
for each target acquisition index, acquiring the health degree of the target acquisition index according to the first reference value, and calculating the health index parameter of the target acquisition index according to the health degree;
and taking the minimum value of the health index parameters in each target acquisition index as the health index parameter of the first data storage node.
In an alternative embodiment of the present application, the health indicator parameter acquisition module may be further configured to:
if the health index parameter corresponding to the response time of the disk read-write request is not smaller than the first preset threshold value, the health index parameter corresponding to the ratio of the CPU idle time to the total time is not obtained; the first preset threshold value represents whether the ratio of CPU idle time to total time has a reference meaning or not.
In an alternative embodiment of the present application, the file data storage module is specifically configured to:
for each first data storage node, acquiring an average value of each health index parameter of the first data storage node in a second preset time period, and taking the average value as a second reference value;
taking the first data storage node with the second reference value larger than a second preset threshold value as a second data storage node; wherein the second preset threshold characterizes the first data storage node in a healthy state.
In an alternative embodiment of the application, the file data storage module is further for:
performing consistent hash calculation on the identification of any file, obtaining a calculation result, and obtaining a third data storage node from each second data storage node according to the calculation result;
and the calculation result and each second data storage node are preset with corresponding relations.
Referring now to fig. 5, there is shown a schematic diagram of an electronic device (e.g., a terminal device or server performing the method of fig. 1) 500 suitable for use in implementing an embodiment of the present application. The electronic device in the embodiment of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a car-mounted terminal (e.g., car navigation terminal), a wearable device, etc., and a fixed terminal such as a digital TV, a desktop computer, etc. The electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.
An electronic device includes: the memory is used for storing programs for executing the methods according to the method embodiments; the processor is configured to execute a program stored in the memory. Herein, the processor may be referred to as a processing device 501, which is described below, and the memory may include at least one of a Read Only Memory (ROM) 502, a Random Access Memory (RAM) 503, and a storage device 508, which are described below, specifically as follows:
as shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. The above-described functions defined in the method of the embodiment of the present application are performed when the computer program is executed by the processing means 501.
The computer readable storage medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:
for each first data storage node, acquiring real-time values of target acquisition indexes of a host corresponding to the first data storage node at intervals of a first preset time interval; the target acquisition index characterizes the current performance of the host corresponding to the first data storage node; every second preset time interval, acquiring health index parameters of the first data storage node according to real-time values in a first preset time period before the current moment; wherein the second preset time interval is not less than the first preset time interval; when a file data storage request is received, for any one file in the file data storage request, at least one second data storage node is obtained according to health index parameters in a second preset time period before the current moment of each first data storage node, a third data storage node is obtained from each second data storage node according to the identification of any one file, and the data of any one file is stored in the third data storage node.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules or units involved in the embodiments of the present application may be implemented in software or in hardware. Where the name of the module or unit does not constitute a limitation of the unit itself in some cases, for example, the first constraint acquisition module may also be described as "a module that acquires the first constraint".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (10)

1. A method of data storage, comprising:
for each first data storage node, acquiring real-time values of target acquisition indexes of a host corresponding to the first data storage node at intervals of a first preset time interval; the target acquisition index characterizes the current performance of a host corresponding to the first data storage node;
Every second preset time interval, acquiring health index parameters of the first data storage node according to real-time values in a first preset time period before the current moment; wherein the second preset time interval is not less than the first preset time interval;
when a file data storage request is received, for any one file in the file data storage request, at least one second data storage node is obtained according to health index parameters of each first data storage node in a second preset time period before the current moment, a third data storage node is obtained from each second data storage node according to the identification of any one file, and the data of any one file is stored in the third data storage node.
2. The method according to claim 1, wherein the obtaining the health indicator parameter of the first data storage node according to the real-time values each belonging to the first preset time period before the current time comprises:
for each target acquisition index, taking an average value of real-time values in the first preset time period as a first reference value if the target acquisition index is a hardware type index, and taking a median value of real-time values in the first preset time period as the first reference value if the target acquisition index is a non-hardware type index;
And acquiring the health index parameters of the first data storage node according to each first reference value.
3. The method of claim 2, wherein the index of hardware type comprises: the CPU idle time and total time ratio, the memory usage ratio, the disk read-write request response time and the total disk read-write number;
the non-hardware type indicator includes: data amount and network delay.
4. A method according to claim 3, wherein said obtaining the health indicator parameter of the first data storage node from each first reference value comprises:
for each target acquisition index, acquiring the health degree of the target acquisition index according to the first reference value, and calculating the health index parameter of the target acquisition index according to the health degree;
and taking the minimum value of the health index parameter in each target acquisition index as the health index parameter of the first data storage node.
5. The method as recited in claim 4, wherein the method further comprises:
if the health index parameter corresponding to the response time of the disk read-write request is not smaller than a first preset threshold value, the health index parameter corresponding to the ratio of the CPU idle time to the total time is not obtained; the first preset threshold value characterizes whether the ratio of the CPU idle time to the total time has a reference meaning or not.
6. The method according to claim 1, wherein the obtaining at least one second data storage node according to the health indicator parameter of the first data storage node within the second preset time period before the current time comprises:
for each first data storage node, acquiring an average value of each health index parameter of the first data storage node in the second preset time period, and taking the average value as a second reference value;
taking the first data storage node with the second reference value larger than a second preset threshold value as a second data storage node; wherein the second preset threshold characterizes the first data storage node in a healthy state.
7. The method according to claim 1, wherein said obtaining a third data storage node from each second data storage node based on the identification of any file comprises:
performing consistent hash calculation on the identification of any file, obtaining a calculation result, and obtaining the third data storage node from each second data storage node according to the calculation result;
and the calculation result and each second data storage node are preset with corresponding relations.
8. A data storage device, comprising:
the real-time numerical value acquisition module is used for acquiring the real-time numerical value of each target acquisition index of the host corresponding to each first data storage node at every first preset time interval; the target acquisition index characterizes the current performance of a host corresponding to the first data storage node;
the health index parameter acquisition module is used for acquiring health index parameters of the first data storage node according to real-time values in a first preset time period before the current moment every second preset time interval; wherein the second preset time interval is not less than the first preset time interval;
the file data storage module is used for acquiring at least one second data storage node according to health index parameters of each first data storage node in a second preset time period before the current moment for any file in the file data storage request when the file data storage request is received, acquiring a third data storage node from each second data storage node according to the identification of any file, and storing the data of any file into the third data storage node.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method of any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-7.
CN202310629886.5A 2023-05-30 2023-05-30 Data storage method and device, electronic equipment and storage medium Pending CN116594568A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310629886.5A CN116594568A (en) 2023-05-30 2023-05-30 Data storage method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310629886.5A CN116594568A (en) 2023-05-30 2023-05-30 Data storage method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116594568A true CN116594568A (en) 2023-08-15

Family

ID=87599048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310629886.5A Pending CN116594568A (en) 2023-05-30 2023-05-30 Data storage method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116594568A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117908802A (en) * 2024-03-19 2024-04-19 北京鉴智科技有限公司 Data storage method, device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117908802A (en) * 2024-03-19 2024-04-19 北京鉴智科技有限公司 Data storage method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113934958B (en) Page loading method and device, electronic equipment and computer readable medium
CN114500339B (en) Node bandwidth monitoring method and device, electronic equipment and storage medium
CN111966950B (en) Log sending method and device, electronic equipment and computer readable medium
CN111580974B (en) GPU instance allocation method, device, electronic equipment and computer readable medium
CN116594568A (en) Data storage method and device, electronic equipment and storage medium
CN110443647B (en) Information delivery method and device
JP7216757B2 (en) Method and apparatus for obtaining information
CN111858381B (en) Application fault tolerance capability test method, electronic device and medium
CN111898061B (en) Method, apparatus, electronic device and computer readable medium for searching network
CN112507676B (en) Method and device for generating energy report, electronic equipment and computer readable medium
CN112732979B (en) Information writing method, information writing device, electronic equipment and computer readable medium
CN112019406B (en) Flow monitoring method and device, electronic equipment and computer readable medium
CN111694672B (en) Resource allocation method, task submission method, device, electronic equipment and medium
CN111680754B (en) Image classification method, device, electronic equipment and computer readable storage medium
CN111726476B (en) Image processing method, device, equipment and computer readable medium
CN112685271A (en) Pressure measurement data processing method and device, electronic equipment and readable storage medium
CN111143355B (en) Data processing method and device
CN112163176A (en) Data storage method and device, electronic equipment and computer readable medium
CN113760178A (en) Cache data processing method and device, electronic equipment and computer readable medium
CN114651237A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112464039A (en) Data display method and device of tree structure, electronic equipment and medium
CN112311842A (en) Method and device for information interaction
CN113765692B (en) Current limiting method, device, electronic equipment and computer readable medium
CN116755889B (en) Data acceleration method, device and equipment applied to server cluster data interaction
CN112073202B (en) Information generation method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination