CN116820323A - Data storage method, device, electronic equipment and computer readable storage medium - Google Patents

Data storage method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN116820323A
CN116820323A CN202210278287.9A CN202210278287A CN116820323A CN 116820323 A CN116820323 A CN 116820323A CN 202210278287 A CN202210278287 A CN 202210278287A CN 116820323 A CN116820323 A CN 116820323A
Authority
CN
China
Prior art keywords
backed
data
data segment
access
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210278287.9A
Other languages
Chinese (zh)
Inventor
毕杰山
姜国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210278287.9A priority Critical patent/CN116820323A/en
Publication of CN116820323A publication Critical patent/CN116820323A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data storage method, a device, electronic equipment and a computer readable storage medium, wherein after access record information is acquired, a data segment list to be backed up is constructed according to the access record information, and target file data of target data segments in the data segment list to be backed up are deleted according to access time information recorded in the data segments to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period; and storing the target file data to a second storage object, so that the target file data can be completely backed up, and the integrity of the data is maintained. The application can be widely applied to data processing technology of devices such as smart phones, tablet computers, notebook computers, desktop computers, servers, vehicle-mounted terminals and the like.

Description

Data storage method, device, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data storage method, a data storage device, an electronic device, and a computer readable storage medium.
Background
Index (Index) in data storage technology is a sort of data structure used to facilitate quick querying and updating of related data. Taking the database storage technology as an example, in addition to the data actually needed, the database system maintains data structures meeting specific search algorithms, and the data structures can point to the data actually needed in the database, so that advanced search algorithms can be implemented on the data structures, and the data structures are indexes.
As the size of data in data storage increases, the amount of storage for indexes is increasing, and these indexes include a low frequency index with a low access frequency, an intermediate frequency index with a higher access frequency, and a high frequency index with a higher access frequency. Because the data query operation generally has real-time requirements, the index is generally stored in a Solid State Disk (Solid State Disk or Solid State Drive, abbreviated as SSD) with a fast reading speed, but the hardware cost of the Solid State Disk is high, so that the related technology generally transfers the low-frequency index with low access frequency to a third party storage medium (such as a mechanical hard Disk or other databases) for backup storage, so as to save the storage cost of the Solid State Disk.
In practical application, only the high-frequency index with high access frequency needs to be stored in the solid state disk, but in the processing of the related technology, the solid state disk also stores the intermediate-frequency index, and still higher storage cost can be brought.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a data storage method, apparatus, electronic device, and computer readable storage medium, which can reduce the storage cost of a first storage object while considering access efficiency.
A first aspect of the present invention provides a data storage method, comprising:
acquiring access record information;
constructing a data segment list to be backed up according to the access record information; wherein the data segment list to be backed up comprises at least one data segment to be backed up; the data segment to be backed up records at least one piece of access record information; the data segment to be backed up is stored in a first storage object;
deleting target file data of a target data segment in the data segment list to be backed up according to the access time information recorded in the data segment to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period;
Storing the target file data to a second storage object, and generating index link information of the target file data in the target data segment; the index link information is used for representing the storage position of the target file data in the second storage object, and the access speed of the first storage object is greater than that of the second storage object.
A second aspect of the present invention provides a data storage device comprising:
the first module is used for acquiring access record information;
the second module is used for constructing a data segment list to be backed up according to the access record information; wherein the data segment list to be backed up comprises at least one data segment to be backed up; the data segment to be backed up records at least one piece of access record information; the data segment to be backed up is stored in a first storage object;
the third module is used for deleting the target file data of the target data segment in the data segment list to be backed up according to the access time information recorded in the data segment to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period;
A fourth module, configured to store the target file data to a second storage object, and generate index link information of the target file data in the target data segment; the index link information is used for representing the storage position of the target file data in the second storage object, and the storage speed of the first storage object is greater than that of the second storage object.
Optionally, the third module includes:
a first unit, configured to obtain each data segment to be backed up in the data segment to be backed up list;
a second unit, configured to identify a data access type of the data segment to be backed up; wherein the data access types comprise a local access type, a hybrid access type and a remote access type;
the third unit is used for screening out target data segments from the data segments to be backed up of the local access type according to the access time information recorded in the data segments to be backed up and a preset high-frequency access time window value; or screening out a target data segment from the data segment to be backed up of the mixed access type according to the access time information recorded in the data segment to be backed up and a preset intermediate frequency access time window value;
And a fourth unit, configured to delete the target file data of the target data segment.
Optionally, the third unit includes:
a fifth unit, configured to subtract the latest access time of the data segment to be backed up from the current time when the data segment to be backed up of the local access type is completely stored in the second storage object, to obtain a first time window value of the data segment to be backed up;
and a sixth unit, configured to determine that the data segment to be backed up is a first target data segment when the first time window value is greater than the preset high-frequency access time window value.
Optionally, the third unit includes:
a seventh unit, configured to subtract the latest access time of the data segment to be backed up of the local access type from the current time to obtain a second time window value of the data segment to be backed up;
and an eighth unit, configured to determine that the data segment to be backed up is a second target data segment when the second time window value is greater than a sum of the preset high-frequency access time window value and the intermediate-frequency access time window value.
Optionally, the third module includes:
a ninth unit, configured to obtain each data segment to be backed up in the data segment list to be backed up, and record access frequency statistics information of each data segment to be backed up;
A tenth unit, configured to screen out a target data segment from the data segment to be backed up according to the access frequency statistics information of the data segment to be backed up in the preset recording time period, and a preset high-frequency access frequency threshold value and a preset intermediate-frequency access frequency threshold value;
and an eleventh unit configured to delete target file data of the target data segment.
Optionally, the tenth unit includes:
a twelfth unit, configured to determine, when the data segment to be backed up has been completely stored in the second storage object, an accessed frequency value of the data segment to be backed up in a preset recording time period from access frequency statistics information of the data segment to be backed up;
and a thirteenth unit, configured to determine, when the accessed frequency value is smaller than the high frequency access threshold and the accessed frequency value is greater than or equal to the medium frequency access threshold, that the data segment to be backed up is a first target data segment.
Optionally, the tenth unit includes:
and a fourteenth unit, configured to determine, when the accessed frequency value is smaller than the intermediate frequency access threshold, that the data segment to be backed up is a second target data segment.
Optionally, the fourth unit and the seventh unit comprise at least one of:
A fifteenth unit, configured to delete first file data in the first target data segment, and update a data access type of the first target data segment to a hybrid access type;
or alternatively, the process may be performed,
sixteenth unit, configured to delete the second file data in the second target data segment, and update the data access type of the second target data segment to a remote access type.
Optionally, the fourth module includes:
seventeenth unit, configured to generate an independent handle in the target data segment, where the independent handle is used to characterize the index link information;
or alternatively, the process may be performed,
an eighteenth unit, configured to add the index link information to the metadata description file of the target data segment.
Optionally, the second module includes:
a nineteenth unit, configured to generate an original data segment according to the acquired access record information when the acquired data amount of the access record information is greater than a first threshold value, or when an acquisition duration of the access record information exceeds a first time period;
a twentieth unit, configured to continuously perform a merging operation on the original data segments that meet a data segment merging condition, generate the data segment to be backed up, and perform a freezing operation on the data segment to be backed up until the data volume of the data segment to be backed up is greater than a second threshold, so that the merging operation of the data segment to be backed up is stopped;
And a twenty-first unit, configured to construct the data segment list to be backed up according to the generated at least one data segment to be backed up.
Optionally, the data storage device further includes a fifth module, where the fifth module is configured to implement a parameter configuration step of the cluster node, and the fifth module includes at least one of:
a twenty-second unit, configured to configure an address of the second storage object and an access right credential of the second storage object;
or, a twenty-third unit, configured to configure a storage mode of the cluster node to be a hybrid storage mode;
or, a twenty-fourth unit, configured to configure a backup policy parameter of the to-be-backed up data segment list, where the backup policy parameter includes a time period policy parameter and a statistical policy parameter;
or, a twenty-fifth unit configured to configure a high-frequency access time window value and an intermediate-frequency access time window value;
or, a twenty-sixth unit configured to configure a high-frequency access frequency threshold and a medium-frequency access frequency threshold;
the time period policy parameter is used for representing the cluster node to delete target file data of a target data segment in the data segment list to be backed up according to access time information recorded in the data segment to be backed up;
And the statistical policy parameters are used for representing that the cluster node deletes the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period.
A third aspect of the invention provides an electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
A fourth aspect of the invention provides a computer readable storage medium storing a program for execution by a processor to implement a method as described above.
A fifth aspect of the invention provides a computer program which, when executed by a processor, implements a method as hereinbefore described.
After access record information is acquired, the embodiment of the invention constructs a data segment list to be backed up according to the access record information, and then deletes target file data of target data segments in the data segment list to be backed up according to access time information recorded in the data segments to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period, and the invention can reduce the target file data of the target data segment stored in the first storage object and save the storage space of the first storage object; the target file data is stored in the second storage object, so that the target file data can be subjected to full backup, and the integrity of the file data is maintained; in addition, the invention generates the index link information of the target file data in the target data segment, can rapidly position the target file data stored in the second storage object by utilizing the characteristic of high access speed of the first storage object, and can give consideration to the access efficiency under the condition of reducing the storage cost of the first storage object.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data structure of a data segment according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data structure of a line memory file according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data structure of a rank file according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a data structure of an index file according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a data structure of an index list according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a data segment merging operation according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an implementation environment provided by an embodiment of the present invention;
FIG. 8 is a flowchart illustrating steps of a data storage method according to an embodiment of the present invention;
FIG. 9 is a flowchart illustrating steps for constructing a list of data segments to be backed up according to an embodiment of the present invention;
FIG. 10 is a flowchart illustrating a step of transferring target file data according to a time period policy parameter according to an embodiment of the present application;
FIG. 11 is a flowchart illustrating another step of transferring target file data according to a time period policy parameter according to an embodiment of the present application;
fig. 12 is a schematic diagram of a storage situation of a data segment to be backed up corresponding to three data access types in a first storage object according to an embodiment of the present application;
FIG. 13 is a flowchart illustrating steps for transferring target file data according to statistical policy parameters according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of a data storage device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the embodiment of the application, are intended for purposes of illustration only and are not intended to limit the scope of the application.
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application are suitable for the following explanation:
Access record information (Document): the data record generated by the terminal device during the working operation of the terminal device is, for example, access record information of the object, device operation log information, instruction information of the device to be executed, and the like.
Data Segment (Segment): for Documents generated in a continuous time window, a set of non-rewritable indexes and data files is generated, and each document also comprises a plurality of constituent files.
Segment metadata: the metadata describes the constituent files of segments, and contains specific file names, description metadata of each type of file and data block index information.
Line memory file: the internal construction file of Segment, the data in the line memory file are stored according to line organization.
The file is stored in the column: the internal construction file of Segment, the data in the column storage file are stored in column organization.
Segment Merge operation (Segment Merge): refers to merging multiple segments into one large Segment.
Index data (Index): is a larger collection of Documents, which is made up of multiple Segments. Each Index has a separate data store directory under which all Segments belonging to that Index are stored.
Index list (Index Group): multiple indices created at predefined time periods, multiple indices under the same Index Group possess the same data structure definition.
Specifically, as shown in fig. 1, in one piece of data (Segment) 100, each Segment may include a line memory file 101, a column memory file 102, an index file (e.g., a dictionary data file 103 and a reverse list data file 104), a metadata file 105, and the like.
As shown in FIG. 2, the data in the line memory file is stored in a line organization, and a line of data can be quickly acquired according to Document IDs (for example, doc1, doc2 and Doc3 in FIG. 2 respectively represent three Document IDs), and in the process of data reading, taking FIG. 2 as an example, by acquiring Document IDs, for example, doc1, doc2 and Doc3, the file data of a corresponding line, for example, the file data of 201 line corresponding to Doc1, the file data of 202 line corresponding to Doc2 and the file data of 203 line corresponding to Doc3, can be respectively read.
A schematic diagram of a column-memory file is shown in fig. 3, in which data in the column-memory file is stored in a column organization, and as shown in fig. 3, each column may be represented as FieldA, fieldA and FieldC, specifically, in the data reading process, by acquiring a column number of each column, file data of a corresponding column, such as file data of column 301 corresponding to FieldA, file data of column 302 corresponding to FieldB, and file data of column 303 corresponding to FieldC, may be read.
A schematic diagram of the index file is shown in fig. 4, wherein the index file 400 includes Dictionary data (terminal Dictionary) 401 and inverted List data (polling List) 402.
A schematic diagram of the Index list (Index Group) is shown in fig. 5, where fig. 5 is an Index list (Index Group) 500 constructed according to a plurality of indices created for a period of 1 day, and three indices (indices created for 3 days of 2021.01.01, 2021.01.02, and 2021.01.03, respectively) are included in the Index Group, and each Index includes three segments, where each Segment (Segment) may include a plurality of pieces of access record information (documents), as shown in fig. 5 501.
In some embodiments, the merging operation may be continuously performed on the original data segments that meet the data Segment merging condition, so as to obtain a large data Segment, and by merging multiple small segments into one large Segment, not only the total file handle occupation may be reduced, but also the number of data random accesses may be reduced, so as to improve the reading performance. For example, as shown in fig. 6, three small data segments 601 are combined into a large data segment 602.
In the process of increasing the data storage scale, the storage amount of index data is also increased. In order to save the storage space of the solid state disk with higher cost, in the related art, low-frequency index data with lower access frequency is generally transferred from the solid state disk to a third party storage medium for backup storage. However, the high-frequency index and the medium-frequency index still occupy a relatively large solid state disk space, and still bring about high storage cost.
Based on the above, the embodiment of the invention provides a data storage method, which specifically comprises the following steps: after access record information is acquired, a data segment list to be backed up is constructed according to the access record information, and target file data of target data segments in the data segment list to be backed up are deleted according to access time information recorded in the data segments to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period, and the invention can reduce the target file data of the target data segment stored in the first storage object and save the storage space of the first storage object; the target file data is stored in the second storage object, so that the target file data can be subjected to full backup, and the integrity of the file data is maintained; in addition, the invention generates the index link information of the target file data in the target data segment, can rapidly position the target file data stored in the second storage object by utilizing the characteristic of high access speed of the first storage object, and can give consideration to the access efficiency under the condition of reducing the storage cost of the first storage object.
Referring to fig. 7, fig. 7 is a schematic diagram of an implementation environment provided by an embodiment of the present invention, where the implementation environment includes a terminal device 701 and a server 702, where the terminal device 701 and the server 702 may be directly or indirectly connected through a wired or wireless communication manner, and the embodiment of the present invention is not limited herein. As shown in fig. 7, a terminal 701 and a server 702 may be communicatively connected through a communication network 703.
The terminal device may be any electronic product capable of performing data processing operations including, but not limited to, writing, querying, removing, etc. on the first storage object in the embodiment of the present invention, for example, but not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, etc. As shown in fig. 7, a first storage object 704 in the embodiment of the present invention is installed in the terminal device, where in the embodiment of the present invention, the first storage object refers to a storage medium capable of supporting fast data reading and writing, for example, a solid state disk installed in a notebook computer or a desktop computer. The terminal equipment can perform man-machine interaction in one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like so as to realize data processing operation on the first storage object.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like. The server includes a second storage object 705, which in the embodiment of the present invention refers to a storage medium that has a larger storage space and is capable of transferring data from the first storage object, such as a mechanical hard disk, cloud storage, object storage service, and so on. It will be appreciated that the data access speed of the first memory object is greater than the data access speed of the second memory object. The server may receive a remote instruction sent by the terminal device, where the remote instruction is used to execute the data storage method in the embodiment of the present invention, and transfer the data in the first storage object to the second storage object.
In some embodiments, the servers described above may also be implemented as nodes in a blockchain system.
As shown in fig. 7, during the use of the terminal device, more and more index data are generated, and these index data are stored in the first storage object. Of these index data, some are data that the terminal device will access at high frequency, while the remaining medium frequency index and low frequency index data are still stored in the first storage object. These medium frequency index and low frequency index data may occupy the memory space of the first memory object, resulting in an increase in the overall memory cost. Therefore, an embodiment of the present invention proposes a data storage method, as shown in fig. 7, where a terminal device executes the data storage method to transfer target file data in intermediate frequency index and low frequency index data from a first storage object to a second storage object, so as to achieve the purposes of releasing a storage space of the first storage object and reducing storage cost.
Referring to fig. 8 with reference to the implementation environment shown in fig. 7, fig. 8 is a flowchart illustrating steps of a data storage method according to an embodiment of the present invention, where the method may be implemented by the terminal device 701 in fig. 7, and the method specifically includes, but is not limited to, the following steps S800 to S830:
S800, access record information is obtained;
specifically, during the operation of the terminal device, the terminal device may generate a large number of data records (documents), such as access record information of objects, device operation log information, instruction information about which the device is ready to execute, and the like. In the embodiment of the present invention, taking Document as access record information for illustration, in this step, the terminal device first obtains the access record information and needs to store the access record information, and then, the data storage method proposed by the present invention is executed to store the access record information, so as to achieve the purpose of reducing storage cost.
S810, constructing a data segment list to be backed up according to the access record information; the data segment list to be backed up comprises at least one data segment to be backed up; recording at least one piece of access record information on the data segment to be backed up; the data segment to be backed up is stored in a first storage object;
in particular, to avoid security problems caused by loss of index data, it is often necessary to back up the index data. In the embodiment of the invention, taking Documents as access record information as an example, a group of index data segments (segments) can be generated for Documents generated by continuous time windows. It should be noted that, in the embodiment of the present invention, the data Segment to be backed up may be a group of index data segments, or may be an index data Segment, and in addition, multiple groups of segments may be combined into one large data Segment (Segment) through a Segment combining operation (Segment Merge), where the large data Segment is used as the data Segment to be backed up in the embodiment of the present invention, so as to reduce the number of random access times of the terminal to the index data, and also reduce the occupation of file handles in the storage space. The data segment list (Index) to be backed up is constructed by a plurality of data segments to be backed up.
It should be noted that, in the embodiment of the present invention, the process of constructing the to-be-backed up data segment list according to the to-be-backed up data segments corresponding to the access record information is performed in the first storage object, and it is understood that the to-be-backed up data segment list constructed in the first storage object is also stored in the first storage object, and the subsequent step of the present invention aims to backup the to-be-backed up data segment list in the first storage object to the second storage object, so as to save the storage space of the first storage object.
S820, deleting the target file data of the target data segment in the data segment list to be backed up according to the access time information recorded in the data segment to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period;
specifically, in the processes of daily operation and maintenance, emergency maintenance and the like of the terminal equipment, the access record information of the terminal equipment can be rapidly and accurately positioned through index data (namely the data segment to be backed up in the embodiment of the invention), so that the management precision and the management efficiency of the terminal equipment are improved. With the increase of the frequency and the duration of the terminal device, the data size of the index data corresponding to the access record information is also increased. After the terminal device generates the index data, the index data are stored in the first storage object so as to support the requirement of the terminal device for frequent access and high-speed reading and writing of the index data.
However, the index data includes not only the frequently accessed high-frequency index but also the medium-frequency index and the low-frequency index, which are also stored in the first storage object, and since the first storage object represented by the solid state disk is generally costly, these medium-frequency index and low-frequency index do not need to be frequently accessed by the terminal device, but occupy the storage space in the first storage object, thereby resulting in an increase in storage cost.
Therefore, in the embodiment of the invention, it is proposed that part of index data with lower access frequency in the index data is transferred from the first storage object to the second storage object, and the transferred part of index data in the first storage object is deleted, so that the occupancy rate of storage space in the first storage object is reduced, and space is made for index data with higher access frequency. It will be appreciated that when only index data with a relatively high access frequency needs to be stored in the first storage object, the storage space required by the first storage object is reduced, so that the purpose of saving storage cost can be achieved.
It should be noted that, the first storage object in the embodiment of the present invention refers to a storage medium capable of supporting rapid data reading and writing, for example, a solid state disk installed in a notebook computer or a desktop computer; the second storage object refers to a storage medium, such as a mechanical hard disk, cloud storage, object storage service, etc., that has a large storage space and is capable of transferring data from the first storage object. It will be appreciated that the data access speed of the first memory object is greater than the data access speed of the second memory object.
In the embodiment of the present invention, the terminal device may select the data segment to be backed up to be deleted from the data segment to be backed up list, where the data segment to be backed up to be deleted is called a target data segment. The target data Segment further includes a plurality of items of data, such as Segment metadata, a line memory file, a column memory file and an index related file, and when deleting the target data Segment, the target file data specified in the target data Segment can be selected to be deleted.
In the above description, it is mentioned that the object file data in the first storage object is deleted, and the deleted data is transferred to the second storage object, so as to reduce the occupation of the storage space of the first storage object by the intermediate frequency index and the low frequency index. It will be appreciated that the target data segments selected in the first memory object should therefore be the medium frequency index and the low frequency index.
However, the "high frequency", "intermediate frequency" and "low frequency" are actually relative concepts, and the standards of the "high frequency", "intermediate frequency" and "low frequency" of the index data may not be consistent for the first storage objects with different read/write speeds, or for the terminal devices with different access speeds, or for the different types of data segment lists to be backed up. Therefore, it is assumed that the data storage method is implemented in the cluster node environment, and before implementing the data storage method in the embodiment of the present invention, the terminal device in each node may configure the backup policy parameters corresponding to the current to-be-backed up data segment list. The backup strategy parameters are used for selecting target data segments in a data segment list to be backed up according to preset standards, and deleting target file data in the target data segments by the terminal equipment.
In the embodiment of the invention, the measurement index data belongs to the high-frequency index, the medium-frequency index or the low-frequency index, and can be determined from the two aspects of access time and access frequency, so that the backup strategy parameters comprise a time period strategy parameter and a statistical strategy parameter. The time period policy parameter is used for characterizing that the cluster node deletes target file data of a target data segment in the data segment list to be backed up according to access time information recorded in the data segment to be backed up. If the embodiment of the invention implements the data storage method according to the time period policy parameter, the initial parameter may be configured to implement data storage according to the time period policy parameter, for example, a type parameter (type) under the backup policy "index. And the statistical strategy parameters are used for representing the cluster nodes to delete the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period. If the embodiment of the invention implements the data storage method according to the time period policy parameter, the initial parameter may be configured to implement data storage according to the statistical policy parameter, for example, a type under the backup policy "index. Segment. Retrieval. Policy" may be designated as "auto" in the computer programming code implementation (referring to the statistical policy parameter using the access frequency as a judgment standard).
That is, after the time period policy parameter is configured, the terminal device may select which data segments to be backed up belong to the medium frequency index or even the low frequency index according to the access time information recorded in the data segments to be backed up, determine the data segments to be backed up as target data segments, and delete the target file data of the target data segments in the data segment list to be backed up.
Similarly, the terminal device may also determine a target data segment in the data segment to be backed up according to the access frequency statistics information of the data segment to be backed up in the preset recording time period after configuring the statistics policy parameter, and then delete the target file data in the target data segment.
S830, storing the target file data into a second storage object, and generating index link information of the target file data in a target data segment; the index link information is used for representing the storage position of the target file data in the second storage object, and the access speed of the first storage object is greater than that of the second storage object.
Specifically, in the foregoing, in order to alleviate the storage pressure in the first storage object, the embodiment of the present invention proposes deleting the target file data in the target data segment in the to-be-backed-up data segment list. It will be appreciated that the index data accessed by the terminal device at medium frequency or low frequency is not without value, and in order to maintain the integrity of the data, the index data still needs to be stored in other storage media, so in the embodiment of the present invention, the data deleted in the first storage object needs to be transferred to the second storage object.
The high-frequency index still needs to be stored in the first storage object because the access speed of the first storage object is greater than that of the second storage object, so that the real-time performance of the data reading process is ensured.
In the above description, the target data segments transferred from the first storage object to the second storage object in the embodiment of the present invention are mainly the medium frequency index and the low frequency index, and the terminal device may need to access the target data segments at random. Therefore, the terminal device needs to generate index linking information of the target file data in the target data segment, the index linking information being stored in the first storage object. In the embodiment of the present invention, the index link information is used to characterize the storage location of the target file data in the second storage object, that is, when the terminal device needs to access the target file data, since the original target file data in the local first storage object has been deleted, the terminal device needs to know the storage location of the target file data to be accessed in the second storage object through the index link information, and then the terminal device can access the target file data according to the storage location.
It can be understood that, since the target file data originally stored in the first storage object is transferred to the second storage object, the terminal device is changed from directly accessing the target file data locally to indirectly accessing the target file data from the second storage object, and since the data access speed of the first storage object is fast, the terminal device can still quickly locate the target file data in the second storage object according to the index link information. Therefore, the data storage method provided by the embodiment of the invention can reduce the storage cost of the first storage object on the premise of ensuring the access efficiency to the medium-frequency or low-frequency index.
Through steps S820-S830, the terminal device deletes the target file data specified in the target data segment from the first storage object, and after generating index link information corresponding to the target file data in the first storage object, the terminal device accesses the second storage object, and stores the target file data in the second storage object. That is, in some embodiments, assuming that one target data segment is composed of target file data and non-target file data, the non-target file data including at least index linking information, the non-target file data in one target data segment is stored in the first storage object and the target file data is stored in the second storage object. Thus, the embodiment of the invention is actually a "first storage object+second storage object" mixed storage mode implemented on the target data segment. Therefore, if the data storage method is implemented in the cluster node environment, the terminal device in each node may configure the relevant parameters of the cluster node before implementing the data storage method according to the embodiment of the present invention.
Firstly, in this step, the terminal device needs to access the second storage object, and store the target file data into the second storage object, so that the terminal device in each node needs to configure the address of the second storage object and the access authority credential of the second storage object first.
Secondly, in order for the terminal device to execute the data storage method provided by the embodiment of the present invention, it is necessary to configure the storage mode of the cluster node to be a hybrid storage mode. For example, setting a corresponding parameter in the Index configuration, setting "Index. Storage" (this column characterizes the storage mode of Index) to "hybrid" (refers to the hybrid storage mode), and enabling the terminal device to implement hybrid storage of the target data segment.
Through the steps S800-S830, the embodiment of the present invention provides a data storage method implemented by the terminal device 801 in fig. 8, where access record information is first obtained, then a data segment list to be backed up is constructed according to the access record information, and then target file data of a target data segment in the data segment list to be backed up is deleted according to access time information recorded in the data segment to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period, and the invention can reduce the target file data of the target data segment stored in the first storage object and save the storage space of the first storage object; the target file data is stored in the second storage object, so that the target file data can be subjected to full backup, and the integrity of the file data is maintained; in addition, the invention generates the index link information of the target file data in the target data segment, can rapidly position the target file data stored in the second storage object by utilizing the characteristic of high access speed of the first storage object, and can give consideration to the access efficiency under the condition of reducing the storage cost of the first storage object.
In the foregoing, it is mentioned that, in order to reduce the total file handle occupation in the first storage object, a plurality of segments may be subjected to Segment Merge to obtain one large Segment. It will be appreciated that the number of segments within a large Segment should be limited for easier management. In order to avoid the situation that newly generated segments are merged into the same large Segment, in some embodiments, the large Segment needs to be frozen to construct a to-be-backed up data Segment list in the embodiment of the present invention, which is specifically described below in conjunction with fig. 9.
Referring to fig. 9, fig. 9 is a flowchart of steps for constructing a list of data segments to be backed up, which may be performed by the terminal device 701 in fig. 7, and includes, but is not limited to, steps S900-S970:
s900, receiving newly written access record information;
specifically, with the use of the terminal device, the newly written access record information (i.e., document) is continuously received in the first storage object.
S910, judging whether the data volume of the access record information is larger than a first threshold value or judging whether the acquisition time length of the access record information exceeds a first time period;
Specifically, as mentioned above, segments are typically index data generated during a continuous time, so a first threshold may be preset, where the first threshold is used to characterize the data amount of the newly written documents and is sufficient to generate a group of segments, and when the data amount of the newly written documents in the first storage object is less than or equal to the preset first threshold, it indicates that the data amount of the newly written documents is insufficient, and the step S900 is skipped to continue to receive new documents; when the data amount of the Document newly written in the first storage object is greater than the preset first threshold, the step S920 is skipped.
In other embodiments, whether segments need to be generated may also be determined according to the acquisition duration of the segments. The acquisition time length refers to a time length between an acquisition time of a Document acquired last in a previous Segment and an acquisition time of a Document acquired last in a current Segment. The acquisition duration is determined in dependence on a preset first time period. The first time period refers to a period in which the first storage object generates a new Segment. For example, the first time period may be set to 5min, where the first storage object continuously acquires the time of acquiring the latest Document, and compares the time of acquiring the latest Document in the last Segment with the time of acquiring the latest Document, to determine the acquiring duration. By comparing the acquired time period with the first time period, it can be determined whether a new Segment needs to be generated. When the acquired time length does not exceed the first time period, the step S900 is skipped to continuously receive new documents; when the acquisition duration exceeds the first time period, the process goes to step S920.
S920, generating a new original data segment;
specifically, when the data amount of the acquired access record information (Document) is greater than a first threshold value, or when the acquisition duration of the access record information exceeds a first time period, a new original data segment is generated according to the acquired access record information. The original data Segment is referred to as Segment.
It should be noted that, the original data Segment refers to Segment that has not yet determined the access frequency of the terminal device; the target data Segment refers to segments which are partially accessed by terminal equipment at medium frequency or low frequency in the original data segments, and the target data Segment is determined by means of backup strategy parameters preset in the content.
S930, judging whether the new data segment meets the data segment merging condition;
specifically, after generating a new Segment (i.e., an original data Segment), it is required to determine whether the new Segment can be merged with other segments according to a preset data Segment merging condition. The data segment merging conditions are, for example: the terminal equipment accesses segments generated by the same service to be combined, the first storage object judges whether the new segments meet the data Segment combining condition according to the related parameters of the new segments, if not, the step S900 is skipped, and the first storage object continues to receive new documents; if the data segment merging condition is satisfied, the process goes to step S940.
S940, executing data segment merging operation;
specifically, the merging operation is continuously performed on the original data segments meeting the data Segment merging condition, so as to generate a data Segment to be backed up, wherein the data Segment to be backed up is the large Segment in the content, and the data Segment to be backed up consists of at least one original data Segment.
S950, judging whether the data volume of the data segment to be backed up obtained after the data segment merging operation is larger than a second threshold value;
specifically, as mentioned above, in order to avoid the occurrence of the situation that newly generated segments are merged into the same large Segment, it is necessary to continuously acquire the data amount of the large Segment and compare the data amount with a preset second threshold. The second threshold is used to characterize the maximum value of the data amount of the current large Segment. When the data size of the large Segment after the Merge is smaller than or equal to the second threshold, jumping to step S900, continuing to receive a new Document, and repeating the steps S910-S940; when the data size of the large Segment after Merge is greater than the second threshold, the process goes to step S960.
S960, freezing the data segment to be backed up;
specifically, when the data size of the data Segment to be backed up (i.e., the large Segment) is greater than the preset second threshold, which indicates that the data size of the current data Segment to be backed up has exceeded the preset maximum value, the freezing operation is performed on the current data Segment to be backed up, so that the merging operation of the data Segment to be backed up is stopped, that is, the frozen data Segment to be backed up can still be accessed by the terminal device, but the newly generated original data Segment is not merged into the frozen data Segment to be backed up.
S970, constructing a data segment list to be backed up according to the frozen data segments to be backed up;
specifically, since the frozen data segments to be backed up (i.e., large segments) will not have new segments merged in, the method is relatively stable, so that the frozen data segments to be backed up are constructed into a data Segment list to be backed up (i.e., index) in the embodiment of the present invention, and the target file data in the data Segment list to be backed up is restored according to the method steps in fig. 2.
Through steps S900-S970, the embodiment of the present invention provides that a to-be-backed up data segment list can be constructed according to the access record information, mainly, original data segments meeting the data segment merging conditions are continuously merged into the to-be-backed up data segment, when the data volume of the to-be-backed up data segment is greater than a second threshold, the current to-be-backed up data segment is frozen, and no new original data segment is merged into the current to-be-backed up data segment. When the data storage method in the embodiment of the invention is executed, the data transfer operation can be performed on the data to be backed up list containing the frozen data segments to be backed up.
Based on the above, it is assumed that, in the implementation environment of the cluster node, each terminal device is first configured to determine that the storage mode of the cluster node is a hybrid storage mode; and in the process that the first storage object continuously receives new access record information, the terminal equipment performs merging and freezing operations on the original data segments to obtain a data segment list to be backed up. Finally, according to the backup policy parameters of the pre-configured data segment list to be backed up, the target file data in the data segment list to be backed up is restored, and this step corresponds to step S820. The backup policy parameters include a time period policy parameter and a statistical policy parameter, and a method for transferring target file data according to the time period policy parameter will be described below.
Referring to fig. 10, fig. 10 is a flowchart illustrating a first step of transferring target file data according to a time period policy parameter according to an embodiment of the present invention, where the method is executed by the terminal device 701 in fig. 7, and the method specifically includes, but is not limited to, steps S1000 to S1030:
s1000, acquiring each data segment to be backed up in a data segment list to be backed up;
specifically, according to step S810 described above, the list of data segments to be backed up refers to Index, and the data segments to be backed up refer to large segments that are frozen. The terminal device may periodically check the data segment list to be backed up, obtain all the frozen data segments to be backed up, and check the data segments to be backed up one by one.
S1010, identifying the data access type of the data segment to be backed up; the data access types comprise a local access type, a hybrid access type and a remote access type;
specifically, since the terminal device may periodically check the to-be-backed up data segment list, and some to-be-backed up data segments in the to-be-backed up data segment list may already have some of the target file data transferred to the second storage object, there may be a difference in data access types of the to-be-backed up data segments. In an embodiment of the present invention, the data access types of the data segment to be backed up include a Local access type (Local), a hybrid access type (Mixed), and a Remote access type (Remote).
The local access type refers to: and storing all file data in the current data segment to be backed up in the first storage object, wherein the data type of the current data segment to be backed up is a local access type. It will be appreciated that the data segment to be backed up may be index data that the terminal device needs to access at high frequency, so all file data in the data segment to be backed up is stored in the local first storage object, so as to ensure the access speed of the terminal device.
When the data storage method is realized in the embodiment of the invention, the data transfer is carried out in units of file data, so that the access types of the transferred data segments to be backed up can be divided into a mixed access type and a remote access type according to the specific types of the transferred target file data.
The hybrid access means that part of the file data in the target data segment is transferred to the second storage object, and the rest part of the file data still exists in the first storage object; the remote access type means that all file data which can be transferred in the target data segment are transferred to the second storage object, and the terminal equipment needs to remotely access the file data which belongs to the current target data segment through index link information in the local first storage object.
It can be understood that, when the access type of the data segment to be backed up is a local access type or a hybrid access type, in the local first storage object, some file data capable of being transferred still remain in the current data to be backed up, and whether the remaining file data need to be stored in the second storage object can be determined according to the subsequent steps. And when the access type of the data segment to be backed up is a remote access type, the fact that all file data which can be restored in the current data segment to be backed up is restored is indicated, and subsequent operation on the data segment to be backed up is not needed.
S1020, screening out a target data segment from the data segment to be backed up of the local access type according to the access time information recorded in the data segment to be backed up and a preset high-frequency access time window value; or screening out target data segments from the data segments to be backed up of the mixed access type according to the access time information recorded in the data segments to be backed up and a preset intermediate frequency access time window value;
in particular, the three access types of the data segments to be backed up can be understood as a progressive relationship: all file data in the data segment to be backed up are stored in a first storage object and are of a local access type; the part of file data which can be transferred in the data section to be backed up is stored in the first storage object, the part of file data which can be transferred in the data section to be backed up is stored in the second storage object, and the part of file data which can be transferred in the data section to be backed up is of a mixed access type, and the part of file data which can be transferred in the data section to be backed up is stored in the second storage object, and the part of file data which can be transferred in the data section to be backed up is of a remote access type. In the embodiment of the invention, whether the current data segment to be backed up needs to be changed into the access type of the next stage or not can be determined according to the preset time period strategy parameters, and the access type from the access type to the next stage can be changed into the local access type and the mixed access type.
And for the data segment to be backed up of the local access type, screening out a target data segment from the data segment to be backed up of the local access type according to the access time information recorded in the data segment to be backed up and a preset high-frequency access time window value. The high-frequency access time window value is preconfigured before the data storage method of the embodiment of the invention is implemented, and the high-frequency access time window value is used for representing the maximum access time window value of the current data segment to be backed up reaching the high-frequency access standard. It may be understood that if the access time window value of the current data to be backed up is already greater than the high frequency access time window value, then part of the target file data in the current data segment to be backed up needs to be deleted from the first storage object, and the deleted target file data is transferred to the second storage object.
In other embodiments, in practice, in order to ensure the integrity and security of data when implementing the data storage method of the embodiment of the present invention, the data segment to be backed up of the local access type may be first stored in the second storage object completely, that is, the step of storing the data in the first storage object in the second storage object for backup is completed. The data storage method provided by the embodiment of the invention is mainly used for reducing the occupation of the medium frequency or low frequency index to the storage space of the first storage object, so that a target data segment is required to be selected from the data segments to be backed up, and target file data in the target data segment is deleted, thereby completing the purpose of reducing the storage space occupation rate of the first storage object.
When the first storage object deletes the target file data, corresponding index link information needs to be generated in the first storage object for the terminal device to remotely access the target file data, and this step is already described in the above description and will not be described herein.
Therefore, according to the above embodiment, the processing method of the data segment to be backed up for the local access type according to the embodiment of the present invention is specifically as follows: when the data segment to be backed up of the local access type is completely stored in the second storage object, subtracting the latest access time of the data segment to be backed up from the current time to obtain a first time window value of the data segment to be backed up. The first time window value is used for representing the duration of the data segment to be backed up with the current data access state being the local access state from the latest access time to the current time. It is easy to understand that the shorter the first time window value is, the more frequently the terminal equipment accesses the current data segment to be backed up, and the current data segment to be backed up can be regarded as a high-frequency index and needs to be reserved in the first storage object so as to ensure the access speed of the terminal equipment; conversely, the longer the first time window value, for example, when the first time window value is greater than the preset high-frequency access time window value, it is indicated that the current data segment to be backed up can be at least regarded as an intermediate frequency index, that is, at least part of the file data in the current data segment to be backed up can be deleted, so as to make room for the first storage object, and therefore, it is determined that the data segment to be backed up is the first target data segment. After the first target data segment is determined, deleting the first file data specified by the first target data segment from the first storage object.
According to the above, since the second storage object already stores the target file data and the first storage object deletes the first file data, the current data segment to be backed up has actually completed the operation of transferring at least part of the file data, and it is not difficult to infer that the terminal device needs to update the data access type of the current data segment to be backed up from the local access type to the hybrid access type. And if the current data segment to be backed up is determined not to need to delete the file data in the first storage object according to the comparison of the first time window value and the high-frequency access time window value, keeping the access type of the current data segment to be backed up as the local access type unchanged.
Similar to the processing procedure of the data segment to be backed up in the local access type, the target data segment can be screened from the data segment to be backed up in the hybrid access type according to the access time information recorded in the data segment to be backed up and the preset intermediate frequency access time window value. The intermediate frequency access time window value is preconfigured before the data storage method of the embodiment of the invention is implemented.
In some embodiments, subtracting the latest access time of the data segment to be backed up of the local access type from the current time results in a second time window value of the current data segment to be backed up. It can be understood that, whether the data access type of the current data segment to be backed up is a local access type or a hybrid access type, the time window capable of measuring the access frequency of the current data segment to be backed up is still the time window value obtained by subtracting the latest access time of the data segment to be backed up of the local access type from the current time. The second time window in this step is used to characterize the duration of the data segment to be backed up with the current data access state being the hybrid access state from the latest access time to the current time.
Comparing the second time window value with the sum of the high-frequency access time window value and the intermediate-frequency access time window value, if the second time window value is smaller than or equal to the sum of the high-frequency access time window value and the intermediate-frequency access time window value, the current data segment to be backed up can still be regarded as an intermediate-frequency index, the current data segment to be backed up can not continuously delete file data in the first storage object, and the data access type is still a hybrid access type, so that the access speed of the terminal equipment to the data segments to be backed up can not be greatly influenced; and if the second time window value is greater than the sum of the high-frequency access time window value and the intermediate-frequency access time window value, which indicates that the frequency of the terminal equipment accessing the current data segment to be backed up is very low, determining the data segment to be backed up as a second target data segment. And deleting the second file data in the second target data segment if the second target data segment indicates that all the file data which can be deleted in the current data segment need to be deleted.
The content of the target file data in the deleting target data segment is described in a progressive manner, which can be specifically as follows: for example, at the beginning, the data access type of the data segment a to be backed up is a local access type, after the data segment a to be backed up is completely stored in the second storage object, the terminal device periodically checks the data segment a to be backed up, and subtracts the latest access time (assumed to be the first time) of the data segment a to be backed up from the current check time to obtain a first time window value; when the first time window value is larger than the high-frequency access time window value, determining the data segment A to be backed up as a first target data segment, deleting the first file data in the data segment A to be backed up, changing the data access type into the hybrid access type, and waiting for the next periodic inspection of the terminal equipment. When the terminal device checks the data segment a to be backed up next time, subtracting the latest access time of the data segment a to be backed up (assuming that the second time is the same as the first time or may be later than the first time) from the next check time, and obtaining a second time window value, and when the second time window value is greater than the high-frequency access time window value and the medium-frequency access time window value, determining the data segment a to be backed up as a second target data segment, deleting the second file data in the data segment a to be backed up, and changing the data access type to the remote access type.
The flow of steps in the above example is described below in conjunction with fig. 11.
Referring to fig. 11, fig. 11 is a flowchart illustrating a second step of transferring target file data according to a time period policy parameter according to an embodiment of the present application. As shown in fig. 11, the terminal device periodically checks each frozen data segment to be backed up in the first storage object, and identifies a data access type corresponding to the data segment to be backed up. When the data segment to be backed up is local access type data (local), firstly checking whether the current data segment to be backed up is completely stored in a second storage object, if not, skipping the current data segment to be backed up by the terminal equipment, and checking the next frozen data segment to be backed up; if yes, determining a first time window value according to the current moment and the latest access time of the current data segment to be backed up, judging whether the first time window value is larger than the high-frequency access time window value, if not, skipping the current data segment to be backed up by the terminal equipment, and checking the next frozen data segment to be backed up; if yes, the terminal equipment determines the current data segment to be backed up as a first target data segment, deletes first file data (the first file data is a row memory file and a column memory file in the target data segment) in the first target data segment from the first storage object, and updates the data access type of the current data segment to be backed up into a hybrid access type. After the current data segment to be backed up is processed, the terminal equipment continues to check the next frozen data segment to be backed up. If the current data segment to be backed up is the mixed storage type data, determining a second time window value according to the current moment and the latest access time of the current data segment to be backed up, judging whether the second time window value is larger than the sum of the high-frequency access time window value and the medium-frequency access time window value, if not, skipping the current data segment to be backed up by the terminal equipment, and checking the next frozen data segment to be backed up; if yes, the terminal equipment determines that the current data segment to be backed up is a second target data segment, deletes second file data (the second file data is an index file in the target data segment) in the second target data segment from the first storage object, and updates the data access type of the current data segment to be backed up to a remote access type. After the current data segment to be backed up is processed, the terminal equipment continues to check the next frozen data segment to be backed up. If the current data segment to be backed up is the remote storage type data, the terminal equipment skips the current data segment to be backed up and checks the next frozen data segment to be backed up.
In other embodiments, the checking period of the data segment to be backed up by the terminal device is different, and this may also happen: when the terminal equipment checks the data section A to be backed up of the local access type for the first time, the obtained first time window value is larger than the high-frequency access window value and the medium-frequency access time window value, the current data section A to be backed up can be directly determined to be a second target data section, all the file data which can be deleted in the second target data section are determined to be second file data, all the second file data are deleted, and the data access type of the current data section A to be backed up is changed from the local access type to the remote access type.
S1030, deleting the target file data of the target data segment;
specifically, according to the above, the terminal device may determine a first target data segment in the data segment to be backed up of the local access type according to the first time window value and the high frequency access time window value, and delete the first file data in the first target data segment from the first storage object; or the terminal equipment can determine a second target data segment in the data to be backed up of the hybrid access type according to the second time window value, the high-frequency access time window value and the medium-frequency access time window value, and delete the second file data in the second target data segment from the first storage object.
Referring to fig. 12, fig. 12 is a schematic diagram illustrating a storage condition of data segments to be backed up corresponding to three data access types in a first storage object according to an embodiment of the present application. As shown in fig. 12, when the data segment to be backed up is of the local access type, the line memory file, the column memory file, the index file-dictionary, the index file-inverted list, and the metadata file in the data segment to be backed up are stored in the first storage object. When the data segment to be backed up is of a hybrid access type, deleting a row memory file and a column memory file in the first storage object, and generating index link information 1201 in the first storage object, wherein the index link information comprises a row memory file link of a corresponding row memory file and a column memory file link of a corresponding column memory file; the index file-dictionary, index file-inverted list, and metadata file remain in the first storage object. When the data segment to be backed up is of a hybrid access type, all the row memory file, the column memory file, the index file-dictionary and the index file-inverted list in the first storage object are deleted, and index link information 1202 is generated in the first storage object, wherein the index link information comprises a row memory file link of the corresponding row memory file, a column memory file link of the corresponding column memory file, an index file-dictionary link of the corresponding index file-dictionary and an index file-inverted list link of the corresponding index file-inverted list; the metadata file is still maintained in the first storage object.
In the above, it is mentioned that the hybrid storage type and the remote storage type are distinguished, and the criteria thereof is mainly to see whether file data that can be deleted in a current data segment to be backed up is completely deleted in the first storage object. In the embodiment of the invention, at least the access capability of the terminal device to the target file data stored in the second storage object needs to be reserved, so that the file data in the target data Segment (that is, segment) can be divided into the file data which can be deleted and the file data which cannot be deleted. For example, the target data Segment includes a metadata file of Segment, an index file (the index file further includes a dictionary and a reverse list), a line memory file, and a column memory file. Among them, the metadata file is regarded as file data that cannot be deleted, and the index file, the line memory file, and the column memory file are regarded as file data that can be deleted. If the data segment to be backed up of the hybrid storage type is regarded as the medium frequency index and the data segment to be backed up of the remote storage type is regarded as the low frequency index, it is easy to infer that more file data are reserved in the first storage object than the data segment to be backed up of the hybrid storage type is of the remote storage type, and the access efficiency of the terminal equipment to the medium frequency index data can be guaranteed. Therefore, when the data segment to be backed up is of a hybrid storage type, the first file data refers to the row memory file and the column memory file, and deleting the first file data refers to deleting all the row memory files and the column memory files in the first target data segment; similarly, when the data segment to be backed up is of a remote storage type, the second file data refers to an index file, and deleting the second file data refers to continuing to delete the index file in the second target data segment on the premise that all the row memory files and the column memory files have been deleted in the second target data segment.
It should be noted that, in order to ensure the remote access capability of the terminal device to the target file data in the second storage object, the above deleted index file, row memory file and column memory file should all generate corresponding index link information in the first storage object. In some embodiments, the index link information may be stored in the first storage object in separate handles, respectively, that are used to characterize the index link information. In yet other embodiments, to save overall stem, index linking information may be added to the metadata description file for the target data segment.
Through the steps S1000-S1030, the embodiment of the present invention provides a step of restoring the target file data according to the time period policy parameter, mainly by determining the time window value of the latest access time and the current time of the data segment to be backed up, comparing the time window value with the preset high-frequency access time window value and the preset intermediate-frequency access time window value, and correspondingly deleting different target file data for the data segments to be backed up with different data access types, thereby determining the file data deletion condition in the target data segment according to the index data access time.
In the above description, it is mentioned that, according to the time window value, the access frequency of the terminal device to the index data may be reflected, and then, in the defined time range, the access frequency of the terminal device to the index data may also reflect whether the index data is a high frequency, medium frequency or low frequency index, and in connection with fig. 13, a method for transferring the target file data according to the statistical policy parameter is described.
Referring to fig. 13, fig. 13 is a flowchart illustrating steps for transferring target file data according to statistical policy parameters according to an embodiment of the present invention, where the method is executed by the terminal device 701 in fig. 7, and the method specifically includes, but is not limited to, steps S1300-S1320:
s1300, acquiring each data segment to be backed up in a data segment list to be backed up, and recording access frequency statistical information of each data segment to be backed up;
specifically, similar to step S1000 described above, the terminal device may periodically check the to-be-backed up data segment list, obtain all the frozen to-be-backed up data segments, and record the access frequency statistics of each to-be-backed up data segment. The access frequency statistical information is used for representing the access times of the terminal equipment to the current data segment to be backed up in a preset recording time period.
S1310, screening target data segments from the data segments to be backed up according to access frequency statistical information of the data segments to be backed up in a preset recording time period, and a preset high-frequency access frequency threshold value and a preset intermediate-frequency access frequency threshold value;
specifically, in the parameter setting stage of the cluster node, a high-frequency access frequency threshold and an intermediate-frequency access frequency threshold may be preset, where the high-frequency access frequency threshold refers to a lowest access frequency value capable of determining that a current data segment to be backed up is a high-frequency index, and the intermediate-frequency access frequency threshold refers to a lowest access frequency value capable of determining that the current data segment to be backed up is an intermediate-frequency index. It will be appreciated that the high frequency access frequency threshold should be greater than the medium frequency access frequency threshold.
Referring to the above, when the data segment to be backed up is completely stored in the second storage object, the terminal device may determine, from the access frequency statistics information of the data segment to be backed up, the accessed frequency value of the current data segment to be backed up within a preset recording time period. When the accessed frequency value is larger than or equal to the high-frequency access threshold value, the current data segment to be backed up is indicated to be a high-frequency index, and the terminal equipment does not need to carry out subsequent operation on the data segment in the inspection; when the accessed frequency value is smaller than the high-frequency access threshold value and the accessed frequency value is larger than or equal to the medium-frequency access threshold value, the current data segment to be backed up is indicated to be the medium-frequency index, and the data segment to be backed up is determined to be the first target data segment; and when the accessed frequency value is smaller than the medium frequency access threshold value, indicating that the current data segment to be backed up is the medium frequency index, and determining that the current data segment to be backed up is the second target data segment.
S1320, deleting the target file data of the target data segment.
Specifically, after determining that the data segment to be backed up is the first target data segment or the second target data segment according to the accessed frequency value, the first file data (the first file data includes a row memory file and a column memory file) in the first target data segment may be deleted, or the second file data (the second file data includes an index file) may be deleted on the premise that all the row memory files and the column memory files have been deleted in the second target data segment, thereby freeing up the storage space of the first storage object. Similarly, after deleting the target file data, index link information is generated corresponding to the target file data, and the index link information can be stored in a form of independent handle or written into a metadata description file of the target data segment. And, for the data segment to be backed up, from which only the first file data is deleted or from which the first file data and the second file data are deleted, the data access type thereof is modified correspondingly. The specific storage conditions of the data segments to be backed up obtained after the above steps are all discussed in detail in fig. 12, and are not described herein again.
Through steps S1300-S1320, the embodiment of the present invention provides a method step of transferring target file data according to a statistical policy parameter, which mainly includes comparing an accessed frequency value with a preset high-frequency access frequency threshold and an intermediate-frequency access frequency threshold, determining that a current data segment to be backed up is a first target data segment or a second target data segment, and deleting the first file data in the first target data segment or deleting the second file data in the second target data segment from which the first file data has been deleted.
In summary, an embodiment of the present invention provides a data storage method, which is actually a method for implementing a hybrid storage mode of "a first storage object+a second storage object" on index data, implemented by a terminal device 701 provided with the first storage object in fig. 7. Firstly, under the environment of a cluster node, each terminal device needs to be preconfigured with an address of a second storage object and an access authority certificate of the second storage object, a storage mode of the cluster node is configured to be a hybrid storage mode, and backup strategy parameters corresponding to a data segment list to be backed up and specific reference values are preconfigured, wherein the reference values comprise, but are not limited to, a high-frequency access time window value, a medium-frequency access time window value, a high-frequency access frequency threshold value and a medium-frequency access frequency threshold value. After the configuration is finished, the terminal equipment acquires access record information, and then constructs a data segment list to be backed up according to the access record information, wherein the data segment list to be backed up can contain frozen data segments to be backed up. The terminal equipment can select to completely backup the data segment to be backed up to the second storage object, and then delete the target file data in the first storage object; only the target file data to be deleted in the first storage object may be stored in the second storage object. Based on the preset time period policy parameters and the related reference values, the terminal equipment can progressively delete the target file data of the target data segment in the data segment list to be backed up according to the access time information recorded in the data segment to be backed up; based on the preset statistical policy parameters and the related reference values, the terminal device progressively deletes the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period. The method and the device can reduce the target file data of the target data segment stored in the first storage object, and save the storage space of the first storage object; the target file data is stored in the second storage object, so that the target file data can be subjected to full backup, and the integrity of the file data is maintained; in addition, the invention generates the index link information of the target file data in the target data segment, can rapidly position the target file data stored in the second storage object by utilizing the characteristic of high access speed of the first storage object, and can give consideration to the access efficiency under the condition of reducing the storage cost of the first storage object. After experiments, the storage cost of the first storage object can be reduced by about 50% after the data storage method provided by the embodiment of the invention is executed.
In the following, it is assumed that, in the environment of a cluster node, a specific procedure for implementing the data storage method provided by the embodiment of the present invention by the terminal device of each node is set forth.
First, in the implementation of the computer programming code, the terminal device configures in advance an address of the second storage object and an access right credential of the second storage object.
Then, the storage mode "Index. Storage" of Index in the program code is configured as a hybrid storage mode "hybrid". In addition, the embodiment of the invention provides two data backup strategies of time period strategy parameters and statistical strategy parameters. When the data is selected to be backed up according to the time period policy parameter, the terminal device configures a type parameter (type) under the backup policy "index.segment.retrieval.policy" as "time_period", and configures a high frequency access time window value corresponding to the hot data "hot" as "3d" (d is a preset duration of the object, for example, days) under the "index.segment.retrieval.policy", and configures a medium frequency access time window value corresponding to the warm data "norm" as "2d".
After the basic configuration of the parameters is finished, the terminal equipment executes the data storage method in the embodiment of the invention, firstly generates an original data Segment (Segment) corresponding to the newly written access record information (Document), and executes a data Segment merging operation (Segment Merge) on a plurality of original data segments meeting the data Segment merging condition to obtain a data Segment to be backed up (large Segment), and when the data volume of the data Segment to be backed up is larger than a preset first threshold, the current data Segment to be backed up is frozen, and the new original data Segment is not merged.
From the plurality of frozen data segments, a list (Index) of data segments to be backed up is constructed. In some embodiments, the terminal device may select to fully backup the data segment to be backed up in the data segment to be backed up list to the second storage object, and then periodically check each frozen data segment to be backed up in the first storage object, to identify a data access type corresponding to the data segment to be backed up.
When the data segment to be backed up is local access type data (local), firstly checking whether the current data segment to be backed up is completely stored in a second storage object, if not, skipping the current data segment to be backed up by the terminal equipment, and checking the next frozen data segment to be backed up; if yes, determining a first time window value according to the current moment and the latest access time of the current data segment to be backed up.
Judging whether the first time window value is larger than the high-frequency access time window value, if not, skipping the current data segment to be backed up by the terminal equipment, and checking the next frozen data segment to be backed up; if yes, the terminal equipment determines the current data segment to be backed up as a first target data segment, deletes first file data (the first file data is a row memory file and a column memory file in the target data segment) in the first target data segment from the first storage object, and updates the data access type of the current data segment to be backed up into a hybrid access type.
After the current data segment to be backed up is processed, the terminal equipment continues to check the next frozen data segment to be backed up. And if the current data segment to be backed up is the mixed storage type data, determining a second time window value according to the current moment and the latest access time of the current data segment to be backed up.
Judging whether the second time window value is larger than the sum of the high-frequency access time window value and the medium-frequency access time window value, if not, skipping the current data segment to be backed up by the terminal equipment, and checking the next frozen data segment to be backed up; if yes, the terminal equipment determines that the current data segment to be backed up is a second target data segment, deletes second file data (the second file data is an index file in the target data segment) in the second target data segment from the first storage object, and updates the data access type of the current data segment to be backed up to a remote access type.
After the current data segment to be backed up is processed, the terminal equipment continues to check the next frozen data segment to be backed up. If the current data segment to be backed up is remote storage type data, indicating that index files of the data segment to be backed up in a first storage object are deleted and stored in a second storage object, skipping the current data segment to be backed up by the terminal equipment, and checking the next frozen data segment to be backed up until checking of all data to be backed up is completed.
In other embodiments, when a statistical policy parameter is selected to backup data, in an implementation of computer programming code, the terminal device configures a type parameter (type) under a backup policy "index.
According to the above, the terminal device generates a to-be-backed-up data segment list including at least one frozen to-be-backed-up data segment. The terminal equipment periodically checks each frozen data segment to be backed up in the first storage object, and identifies the data access type corresponding to the data segment to be backed up.
When the data segment to be backed up is local access type data (local), firstly checking whether the current data segment to be backed up is completely stored in a second storage object, if not, skipping the current data segment to be backed up by the terminal equipment, and checking the next frozen data segment to be backed up; if yes, the terminal equipment determines an accessed frequency value in a preset recording time period according to the access frequency statistical information of the current data segment to be backed up.
When the accessed frequency value is larger than the high-frequency access frequency threshold, the terminal equipment skips the current data segment to be backed up, and checks the next frozen data segment to be backed up; when the accessed frequency value is smaller than or equal to the high-frequency access frequency threshold and is larger than the medium-frequency access frequency threshold, the terminal equipment determines that the current data segment to be backed up is a first target data segment, deletes the first file data in the first target data segment from the first storage object, and updates the data access type of the current data segment to be backed up to the hybrid access type.
After the current data segment to be backed up is processed, the terminal equipment continues to check the next frozen data segment to be backed up.
If the current data segment to be backed up is mixed storage type data and the accessed frequency value is smaller than or equal to the medium frequency access frequency threshold value, the terminal equipment determines that the current data segment to be backed up is a second target data segment, deletes second file data in the second target data segment from the first storage object, and updates the data access type of the current data segment to be backed up to a remote access type.
After the current data segment to be backed up is processed, the terminal equipment continues to check the next frozen data segment to be backed up.
If the data access type of the current data segment to be backed up is a remote access type, which indicates that the index files of the data segment to be backed up in the first storage object are deleted and stored in the second storage object, the terminal device skips the current data segment to be backed up, and checks the next frozen data segment to be backed up.
It should be noted that, in the foregoing examples, the data storage method provided by the present invention is implemented by the terminal device of each node in the cluster node environment, and in other application scenarios, those skilled in the art may implement the implementation subject of "terminal device+server device" or "server device cluster" or "cloud device" based on the scheme of the present invention. For example, when the "terminal device+server device" is taken as the execution subject, the related index data of the first storage object on the terminal device can be transferred to the server device according to the method of the present invention, so as to achieve storage space saving for the terminal device. When the server device cluster is taken as an execution subject, the related index data of the first storage object on one server device can be transferred to the other server device according to the method of the invention, so that the storage space of the server device is saved. When the cloud device is used as an execution main body, the related index data of the first storage object on the terminal device can be transferred to the cloud device according to the method of the invention, so that the storage space of the terminal device is saved. The present invention does not limit the subject to which the first storage object or the second storage object belongs.
Referring to fig. 14, fig. 14 is a schematic structural diagram of a data storage device according to an embodiment of the present invention, where the data storage device may be applied to a terminal device 701 shown in fig. 7, for example, the device may implement part or all of the functions of the terminal device 701 by hardware or a combination of hardware and software, so as to implement steps in the data storage method as described above. As shown in fig. 14, the data storage device 1400 may include:
a first module 1410 for obtaining access record information;
a second module 1420, configured to construct a list of data segments to be backed up according to the access record information; the data segment list to be backed up comprises at least one data segment to be backed up; recording at least one piece of access record information on the data segment to be backed up; the data segment to be backed up is stored in a first storage object;
a third module 1430 configured to delete the target file data of the target data segment in the data segment list to be backed up according to the access time information recorded in the data segment to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period;
a fourth module 1440, configured to store the target file data to the second storage object, and generate index link information of the target file data in the target data segment; the index link information is used for representing the storage position of the target file data in the second storage object, and the storage speed of the first storage object is greater than that of the second storage object.
In an embodiment, the third module comprises:
a first unit, configured to obtain each data segment to be backed up in the data segment to be backed up list;
a second unit, configured to identify a data access type of the data segment to be backed up; wherein the data access types comprise a local access type, a hybrid access type and a remote access type;
the third unit is used for screening out target data segments from the data segments to be backed up of the local access type according to the access time information recorded in the data segments to be backed up and a preset high-frequency access time window value; or screening out a target data segment from the data segment to be backed up of the mixed access type according to the access time information recorded in the data segment to be backed up and a preset intermediate frequency access time window value;
and a fourth unit, configured to delete the target file data of the target data segment.
In an embodiment, the third unit comprises:
a fifth unit, configured to subtract the latest access time of the data segment to be backed up from the current time when the data segment to be backed up of the local access type is completely stored in the second storage object, to obtain a first time window value of the data segment to be backed up;
And a sixth unit, configured to determine that the data segment to be backed up is a first target data segment when the first time window value is greater than the preset high-frequency access time window value.
In an embodiment, the third unit comprises:
a seventh unit, configured to subtract the latest access time of the data segment to be backed up of the local access type from the current time to obtain a second time window value of the data segment to be backed up;
and an eighth unit, configured to determine that the data segment to be backed up is a second target data segment when the second time window value is greater than a sum of the preset high-frequency access time window value and the intermediate-frequency access time window value.
In an embodiment, the third module comprises:
a ninth unit, configured to obtain each data segment to be backed up in the data segment list to be backed up, and record access frequency statistics information of each data segment to be backed up;
a tenth unit, configured to screen out a target data segment from the data segment to be backed up according to the access frequency statistics information of the data segment to be backed up in the preset recording time period, and a preset high-frequency access frequency threshold value and a preset intermediate-frequency access frequency threshold value;
and an eleventh unit configured to delete target file data of the target data segment.
In an embodiment, the tenth unit includes:
a twelfth unit, configured to determine, when the data segment to be backed up has been completely stored in the second storage object, an accessed frequency value of the data segment to be backed up in a preset recording time period from access frequency statistics information of the data segment to be backed up;
and a thirteenth unit, configured to determine, when the accessed frequency value is smaller than the high frequency access threshold and the accessed frequency value is greater than or equal to the medium frequency access threshold, that the data segment to be backed up is a first target data segment.
In an embodiment, the tenth unit includes:
and a fourteenth unit, configured to determine, when the accessed frequency value is smaller than the intermediate frequency access threshold, that the data segment to be backed up is a second target data segment.
In an embodiment, the fourth unit and the seventh unit comprise at least one of:
a fifteenth unit, configured to delete first file data in the first target data segment, and update a data access type of the first target data segment to a hybrid access type;
or alternatively, the process may be performed,
sixteenth unit, configured to delete the second file data in the second target data segment, and update the data access type of the second target data segment to a remote access type.
In an embodiment, the fourth module includes:
seventeenth unit, configured to generate an independent handle in the target data segment, where the independent handle is used to characterize the index link information;
or alternatively, the process may be performed,
an eighteenth unit, configured to add the index link information to the metadata description file of the target data segment.
In an embodiment, the second module comprises:
a nineteenth unit, configured to generate an original data segment according to the acquired access record information when the acquired data amount of the access record information is greater than a first threshold value, or when an acquisition duration of the access record information exceeds a first time period;
a twentieth unit, configured to continuously perform a merging operation on the original data segments that meet a data segment merging condition, generate the data segment to be backed up, and perform a freezing operation on the data segment to be backed up until the data volume of the data segment to be backed up is greater than a second threshold, so that the merging operation of the data segment to be backed up is stopped;
and a twenty-first unit, configured to construct the data segment list to be backed up according to the generated at least one data segment to be backed up.
In an embodiment, the data storage device further includes a fifth module, where the fifth module is configured to implement the parameter configuration step of the cluster node, and the fifth module includes at least one of:
A twenty-second unit, configured to configure an address of the second storage object and an access right credential of the second storage object;
or, a twenty-third unit, configured to configure a storage mode of the cluster node to be a hybrid storage mode;
or, a twenty-fourth unit, configured to configure a backup policy parameter of the to-be-backed up data segment list, where the backup policy parameter includes a time period policy parameter and a statistical policy parameter;
or, a twenty-fifth unit configured to configure a high-frequency access time window value and an intermediate-frequency access time window value;
or, a twenty-sixth unit configured to configure a high-frequency access frequency threshold and a medium-frequency access frequency threshold;
the time period policy parameter is used for representing the cluster node to delete target file data of a target data segment in the data segment list to be backed up according to access time information recorded in the data segment to be backed up;
and the statistical policy parameters are used for representing that the cluster node deletes the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period.
In summary, the data storage device provided by the embodiment of the invention can acquire access record information first, then construct a data segment list to be backed up according to the access record information, and then delete target file data of a target data segment in the data segment list to be backed up according to access time information recorded in the data segment to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period, and the data storage device provided by the embodiment of the invention can reduce the target file data of the target data segment stored in the first storage object and save the storage space of the first storage object; the target file data is stored in the second storage object, so that the target file data can be subjected to full backup, and the integrity of the file data is maintained; in addition, the invention generates the index link information of the target file data in the target data segment, can rapidly position the target file data stored in the second storage object by utilizing the characteristic of high access speed of the first storage object, and can give consideration to the access efficiency under the condition of reducing the storage cost of the first storage object.
The embodiment of the invention also provides electronic equipment, which comprises a processor and a memory;
the memory stores a program;
the processor executes a program to perform the data storage method of the terminal device 701 shown in fig. 7.
The electronic device has a function of performing a data processing operation on the first storage object, such as a personal computer (Personal Computer, PC), a mobile phone, a smart phone, a personal digital assistant (Personal Digital Assistant, PDA), a wearable device, a palm computer PPC (Pocket PC), a tablet computer, or the like.
In the embodiment of the invention, the processor included in the electronic device has the following functions:
acquiring access record information;
constructing a data segment list to be backed up according to the access record information; the data segment list to be backed up comprises at least one data segment to be backed up; recording at least one piece of access record information on the data segment to be backed up; the data segment to be backed up is stored in a first storage object;
deleting target file data of target data segments in a data segment list to be backed up according to access time information recorded in the data segments to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period;
Storing the target file data to a second storage object, and generating index link information of the target file data in a target data segment; the index link information is used for representing the storage position of the target file data in the second storage object, and the access speed of the first storage object is greater than that of the second storage object.
Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor, to cause the computer device to perform the data storage method of fig. 8 described above.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the functions and/or features may be integrated in a single physical device and/or software module or may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and these equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims (15)

1. A method of data storage, comprising:
acquiring access record information;
constructing a data segment list to be backed up according to the access record information; wherein the data segment list to be backed up comprises at least one data segment to be backed up; the data segment to be backed up records at least one piece of access record information; the data segment to be backed up is stored in a first storage object;
deleting target file data of a target data segment in the data segment list to be backed up according to the access time information recorded in the data segment to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period;
Storing the target file data to a second storage object, and generating index link information of the target file data in the target data segment; the index link information is used for representing the storage position of the target file data in the second storage object, and the access speed of the first storage object is greater than that of the second storage object.
2. The method of claim 1, wherein deleting the target file data of the target data segment in the data segment list to be backed up according to the access time information recorded in the data segment to be backed up comprises:
acquiring each data segment to be backed up in the data segment list to be backed up;
identifying the data access type of the data segment to be backed up; wherein the data access types comprise a local access type, a hybrid access type and a remote access type;
screening target data segments from the data segments to be backed up of the local access type according to the access time information recorded in the data segments to be backed up and a preset high-frequency access time window value; or screening out a target data segment from the data segment to be backed up of the mixed access type according to the access time information recorded in the data segment to be backed up and a preset intermediate frequency access time window value;
And deleting the target file data of the target data segment.
3. The method of claim 2, wherein the step of screening the target data segment from the data segment to be backed up of the local access type according to the access time information recorded in the data segment to be backed up and the preset high-frequency access time window value includes:
when the data segment to be backed up of the local access type is completely stored in the second storage object, subtracting the latest access time of the data segment to be backed up from the current time to obtain a first time window value of the data segment to be backed up;
and when the first time window value is larger than the preset high-frequency access time window value, determining the data segment to be backed up as a first target data segment.
4. A data storage method according to claim 3, wherein the screening the target data segment from the data segments to be backed up of the hybrid access type according to the access time information recorded in the data segments to be backed up and the preset intermediate frequency access time window value includes:
subtracting the latest access time of the data segment to be backed up of the local access type from the current time to obtain a second time window value of the data segment to be backed up;
And when the second time window value is larger than the sum of the preset high-frequency access time window value and the intermediate-frequency access time window value, determining the data segment to be backed up as a second target data segment.
5. The method for storing data according to claim 1, wherein deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistics of the data segment to be backed up in the preset recording time period includes:
acquiring each data segment to be backed up in the data segment list to be backed up, and recording access frequency statistical information of each data segment to be backed up;
screening target data segments from the data segments to be backed up according to the access frequency statistical information of the data segments to be backed up in the preset recording time period, and a preset high-frequency access frequency threshold value and a preset intermediate-frequency access frequency threshold value;
and deleting the target file data of the target data segment.
6. The method according to claim 5, wherein the step of screening the target data segment from the data segment to be backed up according to the access frequency statistics of the data segment to be backed up in the preset recording time period and the preset high-frequency access frequency threshold and the preset intermediate-frequency access frequency threshold includes:
When the data segment to be backed up is completely stored in the second storage object, determining an accessed frequency value of the data segment to be backed up in a preset recording time period from the access frequency statistical information of the data segment to be backed up;
and when the accessed frequency value is smaller than the high-frequency access threshold value and the accessed frequency value is larger than or equal to the medium-frequency access threshold value, determining the data segment to be backed up as a first target data segment.
7. The method of claim 6, wherein the selecting the target data segment from the data segments to be backed up according to the access frequency statistics of the data segments to be backed up in the preset recording time period and the preset high-frequency access frequency threshold and the preset medium-frequency access frequency threshold, further comprises:
and when the accessed frequency value is smaller than the medium frequency access threshold value, determining the data segment to be backed up as a second target data segment.
8. A data storage method according to claim 4 or 7, wherein said deleting the target file data of the target data segment comprises at least one of:
deleting the first file data in the first target data segment, and updating the data access type of the first target data segment to be a hybrid access type;
Or alternatively, the process may be performed,
and deleting the second file data in the second target data segment.
9. The data storage method according to claim 1, wherein the generating index link information of the target file data in the target data segment includes:
generating an independent handle in the target data segment, wherein the independent handle is used for representing the index link information;
or alternatively, the process may be performed,
and adding the index link information into the metadata description file of the target data segment.
10. The method according to claim 1, wherein said constructing a list of data segments to be backed up based on said access record information comprises:
when the acquired data volume of the access record information is larger than a first threshold value, or when the acquired time length of the access record information exceeds a first time period, generating an original data segment according to the acquired access record information;
continuously carrying out merging operation on the original data segments meeting the data segment merging condition to generate the data segments to be backed up, and carrying out freezing operation on the data segments to be backed up until the data quantity of the data segments to be backed up is larger than a second threshold value so as to stop merging operation on the data segments to be backed up;
And constructing the data segment list to be backed up according to the generated at least one data segment to be backed up.
11. A data storage method according to claim 1, characterized in that the method further comprises a parameter configuration step of the cluster node, which step comprises at least one of:
configuring an address of the second storage object and an access right credential of the second storage object;
or, configuring the storage mode of the cluster node as a hybrid storage mode;
or configuring backup strategy parameters of the data segment list to be backed up, wherein the backup strategy parameters comprise time period strategy parameters and statistical strategy parameters;
or configuring a high-frequency access time window value and a medium-frequency access time window value;
or configuring a high-frequency access frequency threshold and a medium-frequency access frequency threshold;
the time period policy parameter is used for representing the cluster node to delete target file data of a target data segment in the data segment list to be backed up according to access time information recorded in the data segment to be backed up;
and the statistical policy parameters are used for representing that the cluster node deletes the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period.
12. A data storage device, comprising:
the first module is used for acquiring access record information;
the second module is used for constructing a data segment list to be backed up according to the access record information; wherein the data segment list to be backed up comprises at least one data segment to be backed up; the data segment to be backed up records at least one piece of access record information; the data segment to be backed up is stored in a first storage object;
the third module is used for deleting the target file data of the target data segment in the data segment list to be backed up according to the access time information recorded in the data segment to be backed up; or deleting the target file data of the target data segment in the data segment list to be backed up according to the access frequency statistical information of the data segment to be backed up in the preset recording time period;
a fourth module, configured to store the target file data to a second storage object, and generate index link information of the target file data in the target data segment; the index link information is used for representing the storage position of the target file data in the second storage object, and the storage speed of the first storage object is greater than that of the second storage object.
13. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program to implement the method of any one of claims 1 to 11.
14. A computer-readable storage medium, characterized in that the storage medium stores a program that is executed by a processor to implement the method of any one of claims 1 to 11.
15. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 11.
CN202210278287.9A 2022-03-21 2022-03-21 Data storage method, device, electronic equipment and computer readable storage medium Pending CN116820323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210278287.9A CN116820323A (en) 2022-03-21 2022-03-21 Data storage method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210278287.9A CN116820323A (en) 2022-03-21 2022-03-21 Data storage method, device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116820323A true CN116820323A (en) 2023-09-29

Family

ID=88128000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210278287.9A Pending CN116820323A (en) 2022-03-21 2022-03-21 Data storage method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116820323A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117215972A (en) * 2023-11-09 2023-12-12 中央军委政治工作部军事人力资源保障中心 Cache layering method and system based on cloud native infrastructure

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117215972A (en) * 2023-11-09 2023-12-12 中央军委政治工作部军事人力资源保障中心 Cache layering method and system based on cloud native infrastructure

Similar Documents

Publication Publication Date Title
KR102564170B1 (en) Method and device for storing data object, and computer readable storage medium having a computer program using the same
CN104281533B (en) A kind of method and device of data storage
JP4648723B2 (en) Method and apparatus for hierarchical storage management based on data value
US9355112B1 (en) Optimizing compression based on data activity
US9396290B2 (en) Hybrid data management system and method for managing large, varying datasets
US11093387B1 (en) Garbage collection based on transmission object models
US7636736B1 (en) Method and apparatus for creating and using a policy-based access/change log
DE102016013248A1 (en) Reference block accumulation in a reference quantity for deduplication in storage management
CN110018989B (en) Snapshot comparison method and device
CN104854582B (en) Storage is efficient, updates the method and system of the affairs type full-text index view maintenance of optimization
CN109240607B (en) File reading method and device
CN111400334B (en) Data processing method, data processing device, storage medium and electronic device
CN111611250A (en) Data storage device, data query method, data query device, server and storage medium
CN110502510B (en) Real-time analysis and duplicate removal method and system for WIFI terminal equipment trajectory data
CN111930305A (en) Data storage method and device, storage medium and electronic device
CN116820323A (en) Data storage method, device, electronic equipment and computer readable storage medium
US9020902B1 (en) Reducing head and tail duplication in stored data
EP3343395B1 (en) Data storage method and apparatus for mobile terminal
CN112711564B (en) Merging processing method and related equipment
CN114595286A (en) Data synchronization method and device, electronic equipment and storage medium
CN110955682A (en) Method and device for deleting cache data, data cache and reading cache data
CN116301656A (en) Data storage method, system and equipment based on log structure merging tree
CN114579061B (en) Data storage method, device, equipment and medium
CN111752941A (en) Data storage method, data access method, data storage device, data access device, server and storage medium
CN115878625A (en) Data processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40094534

Country of ref document: HK