WO2020186931A1 - 数据存储管理方法、装置和计算机可读存储介质 - Google Patents
数据存储管理方法、装置和计算机可读存储介质 Download PDFInfo
- Publication number
- WO2020186931A1 WO2020186931A1 PCT/CN2020/074191 CN2020074191W WO2020186931A1 WO 2020186931 A1 WO2020186931 A1 WO 2020186931A1 CN 2020074191 W CN2020074191 W CN 2020074191W WO 2020186931 A1 WO2020186931 A1 WO 2020186931A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- storage unit
- data storage
- data
- data table
- access
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/122—File system administration, e.g. details of archiving or snapshots using management policies
- G06F16/125—File system administration, e.g. details of archiving or snapshots using management policies characterised by the use of retention policies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0644—Management of space entities, e.g. partitions, extents, pools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0652—Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0653—Monitoring storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0647—Migration mechanisms
Definitions
- the present disclosure relates to the field of data storage technology, and in particular to a data storage management method, device and computer-readable storage medium.
- Data can be stored in a big data cluster or data warehouse, and users can perform operations such as querying, adding, and deleting.
- staff may screen the data according to actual needs and delete some unnecessary data storage units.
- the data may be deleted due to space problems, causing development difficulties for the developer.
- the data storage unit gets fatter over time, and data query speed is slow.
- a technical problem to be solved by the present disclosure is: how to effectively manage data storage and improve the efficiency of data storage management.
- a data storage management method including: obtaining the number of accesses of a data storage unit in a big data cluster within a preset time interval corresponding to the data storage unit; the data storage unit includes: data Table or partition file; determine whether the data storage unit belongs to the storage unit to be destroyed according to the number of accesses; in the case that the data storage unit belongs to the storage unit to be destroyed, a reminder to be destroyed is issued.
- obtaining the number of accesses of a data storage unit in a big data cluster within a preset time interval corresponding to the data storage unit includes: periodically querying the last access time of the data storage unit; and according to the last access time of the data storage unit A change in the visit time, update the record of the number of visits; determine the number of visits according to the record of the number of visits.
- obtaining the number of accesses of the data storage unit in the big data cluster within the preset time interval corresponding to the data storage unit includes: in the case that the data storage unit is a data table, according to the time when the data table is located In the stage, the number of visits of the data table in the preset time interval corresponding to the time stage is obtained; wherein, the data table is set with a life cycle, and the life cycle is divided into multiple time stages.
- obtaining the number of accesses of the data storage unit in the big data cluster within the preset time interval corresponding to the data storage unit includes: in the case that the data storage unit is a partition file, periodically obtaining the current status of the partition file The number of visits within the preset time interval corresponding to the period.
- determining whether the data storage unit belongs to the storage unit to be destroyed according to the number of accesses includes: in the case that the data storage unit is a data table, obtaining the first access number corresponding to the time phase according to the time phase of the data table Threshold; when the number of accesses is less than or equal to the first access number threshold, it is determined that the data table belongs to the storage unit to be destroyed; wherein, the first access number thresholds corresponding to different time periods are the same or different.
- determining whether the data storage unit belongs to the storage unit to be destroyed according to the number of accesses includes: when the data storage unit is a partitioned file, and when the number of accesses is less than or equal to the second threshold of access times, determining the partition The file belongs to the storage unit to be destroyed.
- the method further includes: assigning the data table to different storage devices for storage according to the number of accesses of the data table in the preset time interval corresponding to the time period.
- allocating the data storage unit to different storage devices for storage includes: comparing the number of accesses of the data table in a preset time interval corresponding to the time period with multiple access number thresholds corresponding to the time period, Determine the data popularity level of the data table; according to the data popularity level of the data table, assign the data table to a storage device with a performance corresponding to the data popularity level for storage.
- the method further includes: in the case that the data storage unit does not belong to the storage unit to be destroyed, and the life cycle of the data storage unit reaches the end time point, reconfiguring the life cycle of the data storage unit.
- a data storage management device including: an access frequency acquisition module, configured to acquire accesses of a data storage unit in a big data cluster in a preset time interval corresponding to the data storage unit The number of times; the data storage unit includes: a data table or partition file; a state determination module, used to determine whether the data storage unit belongs to the storage unit to be destroyed according to the number of accesses; the reminder module, used in the case that the data storage unit belongs to the storage unit to be destroyed Send a reminder to be destroyed.
- the access times acquisition module is used to periodically query the last access time of the data storage unit; update the record of the access times according to the change of the last access time of the data storage unit; determine the access times according to the record of the access times .
- the access count obtaining module is used to obtain the access count of the data table in the preset time interval corresponding to the time stage according to the time stage of the data table when the data storage unit is a data table; where , The data table is set up a life cycle, the life cycle is divided into multiple time stages.
- the access count obtaining module is configured to periodically obtain the access count of the partition file in a preset time interval corresponding to the current period when the data storage unit is a partition file.
- the state determination module is configured to obtain the first access count threshold corresponding to the time period according to the time period in which the data table is located when the data storage unit is a data table; In the case of the number threshold, it is determined that the data table belongs to the storage unit to be destroyed; wherein, the first access number threshold corresponding to different time periods is the same or different.
- the status determination module is configured to determine that the partition file belongs to the storage unit to be destroyed when the number of accesses is less than or equal to the second access number threshold when the data storage unit is a partition file.
- the device further includes: a storage migration module, configured to allocate the data table to different storage devices for storage according to the number of accesses of the data table in the preset time interval corresponding to the time period.
- a storage migration module configured to allocate the data table to different storage devices for storage according to the number of accesses of the data table in the preset time interval corresponding to the time period.
- the storage migration module is configured to compare the number of accesses of the data table in the preset time interval corresponding to the time period with multiple access number thresholds corresponding to the time period to determine the data popularity level of the data table; For the data popularity level of the data table, the data table is allocated to a storage device with a performance corresponding to the data popularity level for storage.
- the device further includes: a reconfiguration module, configured to reconfigure the life of the data storage unit when the data storage unit does not belong to the storage unit to be destroyed and the life cycle of the data storage unit reaches the end time point. cycle.
- a reconfiguration module configured to reconfigure the life of the data storage unit when the data storage unit does not belong to the storage unit to be destroyed and the life cycle of the data storage unit reaches the end time point. cycle.
- a data storage management device including: a memory; and a processor coupled to the memory, and the processor is configured to execute any of the foregoing embodiments based on instructions stored in the memory. Data storage management method.
- a computer-readable storage medium on which a computer program is stored, where the program is executed by a processor to implement the data storage management method of any of the foregoing embodiments.
- the present disclosure automatically detects the number of accesses of the data storage unit in the big data cluster within the preset time interval corresponding to the data storage unit, and judges whether it can be destroyed according to the number of accesses. If the data storage unit can be destroyed, a reminder to be destroyed is issued .
- the method of the present disclosure can automatically effectively manage data storage in a big data cluster, destroy data storage units that are no longer needed in time, release storage space, and improve data query efficiency and data storage management efficiency.
- FIG. 1 shows a schematic flowchart of a data storage management method according to some embodiments of the present disclosure.
- FIG. 2 shows a schematic flowchart of data storage management methods according to other embodiments of the present disclosure.
- Fig. 3 shows a schematic structural diagram of a data storage management apparatus according to some embodiments of the present disclosure.
- FIG. 4 shows a schematic structural diagram of a data storage management device according to other embodiments of the present disclosure.
- FIG. 5 shows a schematic structural diagram of a data storage management device according to still other embodiments of the present disclosure.
- Fig. 6 shows a schematic structural diagram of a data storage management device according to still other embodiments of the present disclosure.
- the present disclosure provides a data storage management method, which is described below with reference to FIG. 1.
- FIG. 1 is a flowchart of some embodiments of the data storage management method of the present disclosure. As shown in Fig. 1, the method of this embodiment includes: steps S102 to S106.
- step S102 the number of accesses of the data storage unit in the big data cluster in a preset time interval corresponding to the data storage unit is obtained.
- the last access time of the data storage unit is periodically queried; the record of the number of accesses is updated according to the change of the last access time of the data storage unit; the record of the number of accesses is determined to be within the corresponding preset time interval
- the number of visits For example, when the data is stored in the Hive data warehouse, the Hive metastore (metastore) database (for example, Mysql) can be polled every preset period, if the LAST_ACCESS_TIME (last access time) in the TBLS table ) Changes, the number of visits is recorded according to the change of LAST_ACCESS_TIME. The number of visits can be cleared at regular intervals and recalculated.
- the command desc extended table_name in Hive can be used to view, and the granularity of the view can be set according to the dfs.namenode.accesstime.precision parameter, that is, the length of the preset period.
- the data storage unit in the big data cluster or data warehouse is a data table, and a life cycle can be set for the data table.
- the life cycle settings can be set when the data table is created.
- the life cycle of a data table can indicate the time period from creation to destruction of the data table.
- the length of the life cycle can be set according to actual business requirements.
- the life cycle of different data tables can be different, for example, the life cycle is set to 3 years Or 2 months and so on.
- the life cycle can be divided into multiple time phases, for example, the life cycle is divided into a first time phase, a second time phase, a third time phase, a fourth time phase, and so on.
- the time lengths of the preset time intervals corresponding to different time periods may be the same or different, and the preset time intervals corresponding to different time periods may not overlap.
- Different time stages can be set according to the change stage of the data popularity of the data table.
- the data popularity of the data table can be divided into different data popularity levels.
- the data popularity levels include: online heat data, online temperature data, and offline cooling Data and data to be destroyed, etc.
- the data popularity of the preset number of historical data tables corresponding to the business can be counted.
- the first time stage can be set (for example, the first time stage is Data table creation starts to the third month);
- the second time phase can be set (for example, the second time phase is the fourth Month to year); and so on, the third time period corresponds to offline cold data (for example, the third time period is the start of the second year of data table creation to the end of the second year), and the fourth time period corresponds to the data to be destroyed (for example ,
- the third time period is the beginning of the third year of the establishment of the data table to the end of the life cycle).
- the time period can be set based on the statistical results of most data tables of the same business type.
- the data storage unit when the data storage unit is a data table, according to the time period of the data table, the number of accesses of the data table in the preset time interval corresponding to the time period is obtained. For example, if the data table is currently in the third time stage, the number of visits in the time period from the second year of the establishment of the data table to the end of the second year is obtained.
- the data table may be divided into partition files for storage, that is, the data storage unit may be a partition file.
- the data storage unit is a partitioned file
- the number of accesses of the partitioned file in the preset time interval corresponding to the current cycle is periodically obtained, and the current cycle is the time phase of the data storage unit.
- the number of accesses of the partition file in the last two years including the current cycle is obtained every one month.
- Partition files can also refer to the division of time phases in the data table, and different time phases correspond to different preset time intervals to count the number of accesses.
- step S104 according to the number of accesses, it is determined whether the data storage unit belongs to the storage unit to be destroyed.
- the first access count threshold corresponding to the time period is obtained according to the time period in which the data table is located; in the case where the access count is less than or equal to the first access count threshold , Determine that the data table belongs to the storage unit to be destroyed.
- the thresholds of the first access times corresponding to different time periods are the same or different.
- the first access count threshold corresponding to the first time period is set to a negative number, so that the data table will not be determined as the storage unit to be destroyed in the first time period, and the first time period can be set in the third or fourth time period.
- a threshold for the number of accesses is set to 0, and when the number of accesses of the data table in the third time period or the fourth time period is 0, it will be determined as the storage unit to be destroyed.
- the partition file in a case where the data storage unit is a partition file, in a case where the number of accesses is less than or equal to the second access number threshold, it is determined that the partition file belongs to the storage unit to be destroyed. For example, if the number of accesses of the partition file in two years is equal to 0, the partition file belongs to the storage unit to be destroyed.
- the data storage unit may also be set to different time phases for counting the number of accesses, and different time phases correspond to different preset time intervals, for example ,
- the fifth time period is the last three months from the current time
- the sixth time period is the time period in the last year except the first time period
- the seventh time period is the last two years, etc.
- the preset time intervals corresponding to different visit count statistics stages can overlap.
- Different access count statistics stages may correspond to different third access count thresholds. In the case where the access count is less than or equal to the third access count threshold, it is determined that the data table belongs to the storage unit to be destroyed.
- the data storage unit in the preset time interval corresponding to the data storage unit is less than or equal to the fourth access number threshold, it is determined that the data storage unit belongs to the storage unit to be destroyed. No matter which stage the data storage unit belongs to, as long as the number of accesses of the data storage unit within the preset time interval corresponding to the data storage unit (for example, the last two years) is less than or equal to the fourth access number threshold, it is determined that the data storage unit belongs to be destroyed Storage unit.
- step S106 if the data storage unit belongs to the storage unit to be destroyed, a notification to be destroyed is issued.
- a reminder to be destroyed is issued so that the staff can know that the storage unit is to be destroyed, and the staff can reconfirm whether to destroy the storage unit to be destroyed according to business needs.
- the storage unit to be destroyed can be displayed on the operation interface, or a reminder to be destroyed can be sent to the staff in the form of mail, short message, etc.
- the life cycle of the data storage unit is reconfigured. If the staff determines not to destroy the storage unit to be destroyed, the life cycle of the data storage unit is reconfigured.
- the reconfiguration life cycle can be different from the original life cycle. For example, for the above-mentioned data storage unit, the length of the life cycle may be shortened by a certain step after the previous life cycle ends each time as the length of the next life cycle.
- the number of accesses of the data storage unit in the big data cluster within the preset time interval corresponding to the data storage unit is automatically detected, and whether the data storage unit can be destroyed is determined according to the number of accesses. If the data storage unit can be destroyed, Issue a reminder to be destroyed.
- the method of the foregoing embodiment can automatically effectively manage data storage in a big data cluster, destroy data storage units that are no longer needed in a timely manner, release storage space, and improve data query efficiency and data storage management efficiency.
- data storage units with different data popularity can be stored separately, which is described below with reference to FIG. 2.
- Fig. 2 is a flowchart of other embodiments of the data storage management method of the present disclosure. As shown in Fig. 2, the method of this embodiment includes: steps S202 to S204.
- step S202 according to the time period in which the data table is located, the number of accesses of the data table in the preset time interval corresponding to the time period is obtained.
- step S204 the data table is allocated to different storage devices for storage according to the number of accesses of the data table in the preset time interval corresponding to the time period.
- the data table is set with a life cycle and divided into different time stages.
- the number of accesses in the preset time interval corresponding to the time period of the data table is compared with multiple access number thresholds corresponding to the time period to determine the data popularity level of the data table; according to the data popularity of the data table Level, assign the data table to the storage device with the performance corresponding to the data popularity level for storage.
- One time period corresponds to multiple thresholds for the number of access times, and different thresholds for the number of access times correspond to different data popularity levels.
- the threshold corresponding to the first time period includes 100, 50, 30, etc. If the number of accesses of the data table in the first time period exceeds 100, the data popularity level of the data table is determined to be the highest level, which is online hot data. If the number of visits to the data table in the first time period is less than 100 and greater than 50, the data popularity level of the data table is determined to be the second popularity level, which is online temperature data, and so on.
- the threshold setting of the number of visits in different time periods can be different.
- the data storage unit may also be set to different time periods for counting the number of accesses. Count the number of accesses in different time periods respectively, compare the number of accesses in the preset time interval corresponding to each time period of the data storage unit with multiple access times thresholds corresponding to the time period, and determine the data popularity level of the data storage unit ; According to the data popularity level of the data storage unit, the data table is allocated to a storage device with a performance corresponding to the data popularity level for storage.
- the data storage unit is allocated to storage devices with different performances according to the data popularity level of the data storage unit for storage. As the data life cycle data flows between different storage devices, the data storage unit with high access times is made It can be processed by high-performance storage devices to improve the efficiency of data access and query and enhance user experience.
- the present disclosure also provides a data storage management device, which is described below with reference to FIG. 3.
- FIG. 3 is a structural diagram of some embodiments of the data storage management device of the present disclosure. As shown in FIG. 3, the device 30 of this embodiment includes: an access count acquisition module 302, a status determination module 304, and a reminder module 306.
- the access frequency obtaining module 302 is configured to obtain the access frequency of the data storage unit in the big data cluster within a preset time interval corresponding to the data storage unit; the data storage unit includes: a data table or a partition file.
- the access times acquisition module 302 periodically queries the last access time of the data storage unit; updates the record of the access times according to the change of the last access time of the data storage unit; determines the access times according to the record of the access times .
- the access count obtaining module 302 is configured to obtain the access count of the data table in the preset time interval corresponding to the time stage according to the time stage of the data table when the data storage unit is a data table; Among them, the data table is set with a life cycle, and the life cycle is divided into multiple time stages.
- the access frequency obtaining module 302 is configured to periodically obtain the access frequency of the partition file in the preset time interval corresponding to the current cycle when the data storage unit is a partition file.
- the status determination module 304 is configured to determine whether the data storage unit belongs to the storage unit to be destroyed according to the number of accesses.
- the state determining module 304 is configured to obtain the first access count threshold corresponding to the time period according to the time period of the data table when the data storage unit is a data table; In the case of the threshold of the number of accesses, it is determined that the data table belongs to the storage unit to be destroyed; wherein, the threshold of the first access number corresponding to different time periods is the same or different.
- the state determination module 304 is configured to determine that the partition file belongs to the storage unit to be destroyed when the number of accesses is less than or equal to the second access number threshold when the data storage unit is a partition file.
- the reminder module 306 is used to send a reminder to be destroyed when the data storage unit belongs to the storage unit to be destroyed.
- Fig. 4 is a structural diagram of some embodiments of the data storage management device of the present disclosure.
- the device 40 of this embodiment includes: access times obtaining module 402, status determining module 404, and reminding module 406 respectively have the same or similar functions as access times obtaining module 302, status determining module 304, and reminding module 306;
- the device 40 further includes: a storage migration module 408.
- the storage migration module 408 is configured to allocate the data table to different storage devices for storage according to the number of accesses of the data table in the preset time interval corresponding to the time period.
- the storage migration module 408 is configured to compare the number of accesses of the data table in the preset time interval corresponding to the time period with multiple access times thresholds corresponding to the time period to determine the data popularity level of the data table; According to the data popularity level of the data table, the data table is allocated to a storage device with a performance corresponding to the data popularity level for storage.
- the device 40 further includes: a reconfiguration module 410, configured to reconfigure the data storage unit when the data storage unit does not belong to the storage unit to be destroyed and the life cycle of the data storage unit reaches the end time point Life cycle.
- a reconfiguration module 410 configured to reconfigure the data storage unit when the data storage unit does not belong to the storage unit to be destroyed and the life cycle of the data storage unit reaches the end time point Life cycle.
- the data storage management apparatus in the embodiments of the present disclosure can be implemented by various computing devices or computer systems, which are described below in conjunction with FIG. 5 and FIG. 6.
- FIG. 5 is a structural diagram of some embodiments of the data storage management device of the present disclosure.
- the apparatus 50 of this embodiment includes: a memory 510 and a processor 520 coupled to the memory 510, and the processor 520 is configured to execute any of the implementations in the present disclosure based on instructions stored in the memory 510
- the data storage management method in the example is a structural diagram of some embodiments of the data storage management device of the present disclosure.
- the memory 510 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
- the system memory stores, for example, the operating system, application programs, boot loader, database, and other programs.
- Fig. 6 is a structural diagram of other embodiments of the data storage management device of the present disclosure.
- the device 60 of this embodiment includes a memory 610 and a processor 620, which are similar to the memory 510 and the processor 520, respectively. It may also include an input/output interface 630, a network interface 640, a storage interface 650, and so on. These interfaces 630, 640, 650, and the memory 610 and the processor 620 may be connected via a bus 660, for example.
- the input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
- the network interface 640 provides a connection interface for various networked devices, for example, it can be connected to a database server or a cloud storage server.
- the storage interface 650 provides a connection interface for external storage devices such as SD cards and U disks.
- the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
- These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
- the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, thereby executing on the computer or other programmable equipment
- the instructions provide steps configured to implement functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本公开涉及一种数据存储管理方法、装置和计算机可读存储介质,涉及数据存储技术领域。本公开的方法包括:获取大数据集群内数据存储单元在对应的预设时间区间内的访问次数;根据数据存储单元在对应的预设时间区间内的访问次数,确定数据存储单元是否属于待销毁存储单元;在数据存储单元属于待销毁存储单元的情况下,发出待销毁提醒。
Description
相关申请的交叉引用
本申请是以CN申请号为201910197865.4申请日为2019年3月15日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
本公开涉及数据存储技术领域,特别涉及一种数据存储管理方法、装置和计算机可读存储介质。
随着互联网技术的发展,数据呈现爆炸式增长。数据可以存储于大数据集群或者数据仓库中,用户可以进行查询、添加、删除等操作。
目前,随着数据量不断增加,工作人员可能根据实际需求对数据进行筛查,将一些不需要的数据存储单元进行删除。
发明内容
发明人发现:由工作人员进行数据筛查费时费力,可能造成误删除,当数据开发人员利用某一历史数据进行开发时,该数据有可能因为空间问题会删除,给开发人员开发带来困难。如果不进行存储管理和数据的删除,数据存储单元随着时间的推移越来越胖,数据的查询速度缓慢。
本公开所要解决的一个技术问题是:如何对数据存储进行有效管理,提高数据存储管理的效率。
根据本公开的一些实施例,提供的一种数据存储管理方法,包括:获取大数据集群内数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数;数据存储单元包括:数据表或分区文件;根据访问次数,确定数据存储单元是否属于待销毁存储单元;在数据存储单元属于待销毁存储单元的情况下,发出待销毁提醒。
在一些实施例中,获取大数据集群内数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数包括:周期性查询数据存储单元的最后一次访问时间;根据数据存储单元的最后一次访问时间的变化,更新访问次数的记录;根据访问次数的记 录确定访问次数。
在一些实施例中,获取大数据集群内数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数包括:在数据存储单元为数据表的情况下,根据数据表所处的时间阶段,获取数据表在时间阶段对应的预设时间区间内的访问次数;其中,数据表被设置生命周期,生命周期被划分为多个时间阶段。
在一些实施例中,获取大数据集群内数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数包括:在数据存储单元为分区文件的情况下,周期性获取分区文件在当前周期对应的预设时间区间内的访问次数。
在一些实施例中,根据访问次数,确定数据存储单元是否属于待销毁存储单元包括:在数据存储单元为数据表的情况下,根据数据表所处的时间阶段获取时间阶段对应的第一访问次数阈值;在访问次数小于或等于第一访问次数阈值的情况下,确定数据表属于待销毁存储单元;其中,不同时间阶段对应的第一访问次数阈值相同或不同。
在一些实施例中,根据访问次数,确定数据存储单元是否属于待销毁存储单元包括:在数据存储单元为分区文件的情况下,在访问次数小于或等于第二访问次数阈值的情况下,确定分区文件属于待销毁存储单元。
在一些实施例中,该方法还包括:根据数据表在时间阶段对应的预设时间区间内的访问次数,将数据表分配至不同的存储设备进行存储。
在一些实施例中,将数据存储单元分配至不同的存储设备进行存储包括:将数据表在时间阶段对应的预设时间区间内的访问次数与时间阶段对应的多个访问次数阈值进行比对,确定数据表的数据热度等级;根据数据表的数据热度等级,将数据表分配至与数据热度等级对应性能的存储设备进行存储。
在一些实施例中,该方法还包括:在数据存储单元不属于待销毁存储单元,且数据存储单元的生命周期达到结束时间点的情况下,重新配置数据存储单元的生命周期。
根据本公开的另一些实施例,提供的一种数据存储管理装置,包括:访问次数获取模块,用于获取大数据集群内数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数;数据存储单元包括:数据表或分区文件;状态确定模块,用于根据访问次数,确定数据存储单元是否属于待销毁存储单元;提醒模块,用于在数据存储单元属于待销毁存储单元的情况下,发出待销毁提醒。
在一些实施例中,访问次数获取模块用于周期性查询数据存储单元的最后一次访问时间;根据数据存储单元的最后一次访问时间的变化,更新访问次数的记录;根据 访问次数的记录确定访问次数。
在一些实施例中,访问次数获取模块用于在数据存储单元为数据表的情况下,根据数据表所处的时间阶段,获取数据表在时间阶段对应的预设时间区间内的访问次数;其中,数据表被设置生命周期,生命周期被划分为多个时间阶段。
在一些实施例中,访问次数获取模块用于在数据存储单元为分区文件的情况下,周期性获取分区文件在当前周期对应的预设时间区间内的访问次数。
在一些实施例中,状态确定模块用于在数据存储单元为数据表的情况下,根据数据表所处的时间阶段获取时间阶段对应的第一访问次数阈值;在访问次数小于或等于第一访问次数阈值的情况下,确定数据表属于待销毁存储单元;其中,不同时间阶段对应的第一访问次数阈值相同或不同。
在一些实施例中,状态确定模块用于在数据存储单元为分区文件的情况下,在访问次数小于或等于第二访问次数阈值的情况下,确定分区文件属于待销毁存储单元。
在一些实施例中,该装置还包括:存储迁移模块,用于根据数据表在时间阶段对应的预设时间区间内的访问次数,将数据表分配至不同的存储设备进行存储。
在一些实施例中,存储迁移模块用于将数据表在时间阶段对应的预设时间区间内的访问次数与时间阶段对应的多个访问次数阈值进行比对,确定数据表的数据热度等级;根据数据表的数据热度等级,将数据表分配至与数据热度等级对应性能的存储设备进行存储。
在一些实施例中,该装置还包括:重配置模块,用于在数据存储单元不属于待销毁存储单元,且数据存储单元的生命周期达到结束时间点的情况下,重新配置数据存储单元的生命周期。
根据本公开的又一些实施例,提供的一种数据存储管理装置,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器中的指令,执行如前述任意实施例的数据存储管理方法。
根据本公开的再一些实施例,提供的一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现前述任意实施例的数据存储管理方法。
本公开中自动检测大数据集群内数据存储单元在该数据存储单元对应的预设时间区间内的访问次数,根据访问次数判断是否可以被销毁,如果数据存储单元可以被销毁,则发出待销毁提醒。本公开的方法能够自动对大数据集群中的数据存储进行有效管理,及时将不再需要的数据存储单元进行销毁,释放存储空间,提高数据的查询 效率和数据存储管理的效率。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明被配置为解释本公开,并不构成对本公开的不当限定。在附图中:
图1示出本公开的一些实施例的数据存储管理方法的流程示意图。
图2示出本公开的另一些实施例的数据存储管理方法的流程示意图。
图3示出本公开的一些实施例的数据存储管理装置的结构示意图。
图4示出本公开的另一些实施例的数据存储管理装置的结构示意图。
图5示出本公开的又一些实施例的数据存储管理装置的结构示意图。
图6示出本公开的再一些实施例的数据存储管理装置的结构示意图。
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开提供一种数据存储管理方法,下面结合图1进行描述。
图1为本公开数据存储管理方法一些实施例的流程图。如图1所示,该实施例的方法包括:步骤S102~S106。
在步骤S102中,获取大数据集群内数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数。
在一些实施例中,周期性查询数据存储单元的最后一次访问时间;根据数据存储单元的最后一次访问时间的变化,更新访问次数的记录;根据访问次数的记录确定在对应的预设时间区间内的访问次数。例如,在数据存储于Hive数据仓中的情况下,可以每隔预设周期轮询一次Hive metastore(元数据存储)的数据库(例如数据库为 Mysql),若TBLS表中的LAST_ACCESS_TIME(最后一次访问时间)变化,根据LAST_ACCESS_TIME的改变记录访问次数。访问次数可以每隔一定时间进行清零,重新计算。可用Hive中的命令desc extended table_name进行查看,并可根据dfs.namenode.accesstime.precision参数设置查看的粒度,即设置预设周期的长度。
例如,确定数据存储单元所处时间阶段,获取与该时间阶段对应的预设时间区间内的访问次数。在一些实施例中,大数据集群或数据仓库中的数据存储单元为数据表,可以对数据表设置生命周期。生命周期的设置可以在数据表创建时进行设置。数据表的生命周期可以表示该数据表的由创建到销毁的时间段,生命周期的时间长度可以根据实际业务的需求进行设置,不同数据表的生命周期可以不同,例如,生命周期设置为3年或2个月等。
生命周期可以划分为多个时间阶段,例如,生命周期划分为第一时间阶段、第二时间阶段、第三时间阶段、第四时间阶段等。不同时间阶段对应的预设时间区间的时间长度可以相同或不同,不同时间阶段对应的预设时间区间可以不重叠。不同时间阶段可以根据数据表的数据热度的变化阶段而设置,例如,可以将数据表的数据热度划分为不同的数据热度等级,例如,数据热度等级包括:在线热数据、在线温数据、离线冷数据和待销毁数据等。
可以统计业务对应的预设数量的历史数据表的数据热度情况,根据这些历史数据表处于在线热数据的状态持续的时间长度和时间范围,可以设置第一时间阶段(例如,第一时间阶段为数据表建立开始到第三个月);根据这些历史数据表处于在线温数据的状态持续的时间长度和时间范围,可以设置第二时间阶段(例如,第二时间阶段为数据表建立第四个月到一年);以此类推,第三时间阶段对应离线冷数据(例如,第三时间阶段为数据表建立第二年开始到第二年结束),第四时间阶段对应待销毁数据(例如,第三时间阶段为数据表建立第三年开始到生命周期结束)。不是每个数据表都会经历不同的数据热度等级,时间阶段可以是基于相同业务类型的多数数据表的统计结果而设置。
在一些实施例中,在数据存储单元为数据表的情况下,根据数据表所处的时间阶段,获取数据表在该时间阶段对应的预设时间区间内的访问次数。例如,数据表当前处于第三时间阶段,则获取数据表建立第二年开始到第二年结束这一时间段内的访问次数。
在一些实施例中,数据表可以划分为分区文件进行存储,即数据存储单元可以是 分区文件。在数据存储单元为分区文件的情况下,周期性获取分区文件在当前周期对应的预设时间区间内的访问次数,当前周期即为数据存储单元所处时间阶段。例如,每隔1个月获取分区文件在包括当前周期的最近两年内的访问次数。分区文件也可以参考数据表的划分时间阶段的,不同时间阶段对应不同预设时间区间的方式,统计访问次数。
在步骤S104中,根据访问次数,确定数据存储单元是否属于待销毁存储单元。
不同数据存储单元统计访问次数的时间区间不同,确定是否属于待销毁存储单元的方式也可以不同。在一些实施例中,在数据存储单元为数据表的情况下,根据数据表所处的时间阶段获取时间阶段对应的第一访问次数阈值;在访问次数小于或等于第一访问次数阈值的情况下,确定数据表属于待销毁存储单元。不同时间阶段对应的第一访问次数阈值相同或不同。例如,第一时间阶段对应的第一访问次数阈值设置为负数,这样数据表在第一时间阶段中则不会被确定为待销毁存储单元,在第三时间阶段或第四时间阶段可以将第一访问次数阈值设置为0,当数据表在第三时间阶段或第四时间阶段访问次数为0的情况下,则会被确定为待销毁存储单元。
在一些实施例中,在数据存储单元为分区文件的情况下,在访问次数小于或等于第二访问次数阈值的情况下,确定分区文件属于待销毁存储单元。例如,分区文件在两年访问次数等于0,则分区文件属于待销毁存储单元。
在一些实施例中,除了上述对数据存储单元设置生命周期划分时间阶段的方案,数据存储单元还可以被设置不同的统计访问次数的时间阶段,不同的时间阶段对应不同的预设时间区间,例如,第五时间阶段为从当前时间开始的最近三个月,第六时间阶段为最近一年间的时间段除去第一时间阶段,第七时间阶段为最近两年等。不同访问次数统计阶段对应的预设时间区间可以重叠。不同的访问次数统计阶段可以对应不同的第三访问次数阈值,在访问次数小于或等于第三访问次数阈值的情况下,确定数据表属于待销毁存储单元。
在一些实施例中,数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数小于或等于第四访问次数阈值的情况下,确定数据存储单元属于待销毁存储单元。无论数据存储单元属于哪个阶段,只要数据存储单元在与该数据存储单元对应预设时间区间内(例如最近两年)的访问次数小于或等于第四访问次数阈值,则确定数据存储单元属于待销毁存储单元。
在步骤S106中,在数据存储单元属于待销毁存储单元的情况下,发出待销毁提 醒。
发出待销毁提醒以便工作人员获知有待销毁存储单元,并且工作人员可以根据业务需求再次确认是否将待销毁存储单元进行销毁。可以在操作界面中显示待销毁存储单元,或者以邮件、短信等形式向工作人员发出待销毁提醒。
在一些实施例中,在数据存储单元不属于待销毁存储单元,且数据存储单元的生命周期达到结束时间点的情况下,重新配置数据存储单元的生命周期。如果工作人员确定不对待销毁存储单元进行销毁,则重新配置数据存储单元的生命周期。重新配置的生命周期可以与原来的生命周期不同。例如,对于上述数据存储单元每次前一生命周期结束后可以按照一定步长缩短生命周期的长度作为下一次的生命周期的长度。
上述实施例的方法中自动检测大数据集群内数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数,根据访问次数判断是否可以被销毁,如果数据存储单元可以被销毁,则发出待销毁提醒。上述实施例的方法能够自动对大数据集群中的数据存储进行有效管理,及时将不再需要的数据存储单元进行销毁,释放存储空间,提高数据的查询效率和数据存储管理的效率。
为了进一步提高数据查询的效率,可以针对不同数据热度的数据存储单元进行分别存储,下面结合图2进行描述。
图2为本公开数据存储管理方法另一些实施例的流程图。如图2所示,该实施例的方法包括:步骤S202~S204。
在步骤S202中,根据数据表所处的时间阶段,获取数据表在该时间阶段对应的预设时间区间内的访问次数。
在步骤S204中,根据数据表在该时间阶段对应的预设时间区间内的访问次数,将数据表分配至不同的存储设备进行存储。
例如,参考前述实施例,数据表被设置生命周期并划分不同的时间阶段。在一些实施例中,将数据表在时间阶段对应的预设时间区间内的访问次数与时间阶段对应的多个访问次数阈值进行比对,确定数据表的数据热度等级;根据数据表的数据热度等级,将数据表分配至与数据热度等级对应性能的存储设备进行存储。
一个时间阶段对应多个访问次数阈值,不同的访问次数阈值对应不同的数据热度等级。例如,第一时间阶段对应的阈值包括100,50,30等,如果数据表在第一时间阶段的访问次数超过100,确定数据表的数据热度等级为最高等级,为在线热数据。如果数据表在第一时间阶段的访问次数小于100大于50,确定数据表的数据热度等级为 第二热度等级,为在线温数据,以此类推。不同时间阶段的访问次数阈值设置可以不同。
数据表的数据热度等级越高,被分配的存储设备(例如,机架)的性能越好(例如,处理效率高,存储空间大)。将不同的数据表根据访问情况分配至不同的存储设备,可以使访问次数多的数据表的查询和访问效率提高,提升用户体验。
例如,参考前述实施例,数据存储单元还可以被设置不同的统计访问次数的时间阶段。分别统计不同时间阶段的访问次数,将数据存储单元在各个时间阶段对应的预设时间区间内的访问次数与该时间阶段对应的多个访问次数阈值进行比对,确定数据存储单元的数据热度等级;根据数据存储单元的数据热度等级,将数据表分配至与数据热度等级对应性能的存储设备进行存储。
上述实施例的方法中根据数据存储单元的数据热度等级将数据存储单元分配至不同性能的存储设备进行存储,随着数据生命周期数据在不同的存储设备间流转,使得访问次数高的数据存储单元能够被高性能的存储设备进行处理,提高数据访问和查询的效率,提升用户体验。
本公开还提供一种数据存储管理装置,下面结合图3进行描述。
图3为本公开数据存储管理装置的一些实施例的结构图。如图3所示,该实施例的装置30包括:访问次数获取模块302,状态确定模块304,提醒模块306。
访问次数获取模块302,用于获取大数据集群内数据存储单元在与该数据存储单元对应的预设时间区间内的访问次数;数据存储单元包括:数据表或分区文件。
在一些实施例中,访问次数获取模块302用周期性查询数据存储单元的最后一次访问时间;根据数据存储单元的最后一次访问时间的变化,更新访问次数的记录;根据访问次数的记录确定访问次数。
在一些实施例中,访问次数获取模块302用于在数据存储单元为数据表的情况下,根据数据表所处的时间阶段,获取数据表在时间阶段对应的预设时间区间内的访问次数;其中,数据表被设置生命周期,生命周期被划分为多个时间阶段。
在一些实施例中,访问次数获取模块302用于在数据存储单元为分区文件的情况下,周期性获取分区文件在当前周期对应的预设时间区间内的访问次数。
状态确定模块304,用于根据访问次数,确定数据存储单元是否属于待销毁存储单元。
在一些实施例中,状态确定模块304用于在数据存储单元为数据表的情况下,根 据数据表所处的时间阶段获取时间阶段对应的第一访问次数阈值;在访问次数小于或等于第一访问次数阈值的情况下,确定数据表属于待销毁存储单元;其中,不同时间阶段对应的第一访问次数阈值相同或不同。
在一些实施例中,状态确定模块304用于在数据存储单元为分区文件的情况下,在访问次数小于或等于第二访问次数阈值的情况下,确定分区文件属于待销毁存储单元。
提醒模块306,用于在数据存储单元属于待销毁存储单元的情况下,发出待销毁提醒。
下面结合图4描述本公开数据存储管理装置的另一些实施例。
图4为本公开数据存储管理装置的一些实施例的结构图。如图4所示,该实施例的装置40包括:访问次数获取模块402,状态确定模块404,提醒模块406分别与访问次数获取模块302,状态确定模块304,提醒模块306的功能相同或相似;装置40还包括:存储迁移模块408。
存储迁移模块408用于根据数据表在时间阶段对应的预设时间区间内的访问次数,将数据表分配至不同的存储设备进行存储。
在一些实施例中,存储迁移模块408用于将数据表在时间阶段对应的预设时间区间内的访问次数与时间阶段对应的多个访问次数阈值进行比对,确定数据表的数据热度等级;根据数据表的数据热度等级,将数据表分配至与数据热度等级对应性能的存储设备进行存储。
在一些实施例中,装置40还包括:重配置模块410,用于在数据存储单元不属于待销毁存储单元,且数据存储单元的生命周期达到结束时间点的情况下,重新配置数据存储单元的生命周期。
本公开的实施例中的数据存储管理装置可各由各种计算设备或计算机系统来实现,下面结合图5以及图6进行描述。
图5为本公开数据存储管理装置的一些实施例的结构图。如图5所示,该实施例的装置50包括:存储器510以及耦接至该存储器510的处理器520,处理器520被配置为基于存储在存储器510中的指令,执行本公开中任意一些实施例中的数据存储管理方法。
其中,存储器510例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其 他程序等。
图6为本公开数据存储管理装置的另一些实施例的结构图。如图6所示,该实施例的装置60包括:存储器610以及处理器620,分别与存储器510以及处理器520类似。还可以包括输入输出接口630、网络接口640、存储接口650等。这些接口630,640,650以及存储器610和处理器620之间例如可以通过总线660连接。其中,输入输出接口630为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口640为各种联网设备提供连接接口,例如可以连接到数据库服务器或者云端存储服务器等。存储接口650为SD卡、U盘等外置存储设备提供连接接口。
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解为可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生被配置为实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供被配置为实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。
Claims (20)
- 一种数据存储管理方法,包括:获取大数据集群内数据存储单元在与所述数据存储单元对应的预设时间区间内的访问次数;所述数据存储单元包括:数据表或分区文件;根据所述访问次数,确定所述数据存储单元是否属于待销毁存储单元;在所述数据存储单元属于待销毁存储单元的情况下,发出待销毁提醒。
- 根据权利要求1所述的数据存储管理方法,其中,所述获取大数据集群内数据存储单元在与所述数据存储单元对应的预设时间区间内的访问次数包括:周期性查询所述数据存储单元的最后一次访问时间;根据所述数据存储单元的最后一次访问时间的变化,更新访问次数的记录;根据访问次数的记录确定所述访问次数。
- 根据权利要求1所述的数据存储管理方法,其中,所述获取大数据集群内数据存储单元在与所述数据存储单元对应的预设时间区间内的访问次数包括:在所述数据存储单元为数据表的情况下,根据所述数据表所处的时间阶段,获取所述数据表在所述时间阶段对应的预设时间区间内的访问次数;其中,所述数据表被设置生命周期,所述生命周期被划分为多个时间阶段。
- 根据权利要求1所述的数据存储管理方法,其中,所述获取大数据集群内数据存储单元在与所述数据存储单元对应的预设时间区间内的访问次数包括:在所述数据存储单元为分区文件的情况下,周期性获取所述分区文件在当前周期对应的预设时间区间内的访问次数。
- 根据权利要求3所述的数据存储管理方法,其中,所述根据所述访问次数,确定所述数据存储单元是否属于待销毁存储单元包括:在所述数据存储单元为数据表的情况下,根据所述数据表所处的时间阶段获取所述时间阶段对应的第一访问次数阈值;在所述访问次数小于或等于所述第一访问次数阈值的情况下,确定所述数据表属于待销毁存储单元;其中,不同时间阶段对应的第一访问次数阈值相同或不同。
- 根据权利要求4所述的数据存储管理方法,其中,所述根据访问次数,确定所述数据存储单元是否属于待销毁存储单元包括:在所述数据存储单元为分区文件的情况下,在所述访问次数小于或等于所述第二访问次数阈值的情况下,确定所述分区文件属于待销毁存储单元。
- 根据权利要求3所述的数据存储管理方法,还包括:根据所述数据表在所述时间阶段对应的预设时间区间内的访问次数,将所述数据表分配至不同的存储设备进行存储。
- 根据权利要求7所述的数据存储管理方法,其中,所述将所述数据存储单元分配至不同的存储设备进行存储包括:将所述数据表在所述时间阶段对应的预设时间区间内的访问次数与所述时间阶段对应的多个访问次数阈值进行比对,确定所述数据表的数据热度等级;根据所述数据表的数据热度等级,将所述数据表分配至与所述数据热度等级对应性能的存储设备进行存储。
- 根据权利要求1-8任一项所述的数据存储管理方法,还包括:在所述数据存储单元不属于待销毁存储单元,且所述数据存储单元的生命周期达到结束时间点的情况下,重新配置所述数据存储单元的生命周期。
- 一种数据存储管理装置,包括:访问次数获取模块,用于获取大数据集群内数据存储单元在与所述数据存储单元对应的预设时间区间内的访问次数;所述数据存储单元包括:数据表或分区文件;状态确定模块,用于根据所述访问次数,确定所述数据存储单元是否属于待销毁存储单元;提醒模块,用于在所述数据存储单元属于待销毁存储单元的情况下,发出待销毁提醒。
- 根据权利要求10所述的数据存储管理装置,其中,所述访问次数获取模块用于周期性查询所述数据存储单元的最后一次访问时间;根据所述数据存储单元的最后一次访问时间的变化,更新访问次数的记录;根据访问次数的记录确定所述访问次数。
- 根据权利要求10所述的数据存储管理装置,其中,所述访问次数获取模块用于在所述数据存储单元为数据表的情况下,根据所述数据表所处的时间阶段,获取所述数据表在所述时间阶段对应的预设时间区间内的访问次数;其中,所述数据表被设置生命周期,所述生命周期被划分为多个时间阶段。
- 根据权利要求10所述的数据存储管理装置,其中,所述访问次数获取模块用于在所述数据存储单元为分区文件的情况下,周期性获取所述分区文件在当前周期对应的预设时间区间内的访问次数。
- 根据权利要求12所述的数据存储管理装置,其中,所述状态确定模块用于在所述数据存储单元为数据表的情况下,根据所述数据表所处的时间阶段获取所述时间阶段对应的第一访问次数阈值;在所述访问次数小于或等于所述第一访问次数阈值的情况下,确定所述数据表属于待销毁存储单元;其中,不同时间阶段对应的第一访问次数阈值相同或不同。
- 根据权利要求13所述的数据存储管理装置,其中,所述状态确定模块用于在所述数据存储单元为分区文件的情况下,在所述访问次数小于或等于所述第二访问次数阈值的情况下,确定所述分区文件属于待销毁存储单元。
- 根据权利要求12所述的数据存储管理装置,还包括:存储迁移模块,用于根据所述数据表在所述时间阶段对应的预设时间区间内的访问次数,将所述数据表分配至不同的存储设备进行存储。
- 根据权利要求16所述的数据存储管理装置,其中,所述存储迁移模块用于将所述数据表在所述时间阶段对应的预设时间区间内的访问次数与所述时间阶段对应的多个访问次数阈值进行比对,确定所述数据表的数据热度等级;根据所述数据表的数据热度等级,将所述数据表分配至与所述数据热度等级对应性能的存储设备进行存储。
- 根据权利要求10-17任一项所述的数据存储管理装置,还包括:重配置模块,用于在所述数据存储单元不属于待销毁存储单元,且所述数据存储单元的生命周期达到结束时间点的情况下,重新配置所述数据存储单元的生命周期。
- 一种数据存储管理装置,包括:存储器;以及耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行如权利要求1-9任一项所述的数据存储管理方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-9任一项所述方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/432,815 US11822788B2 (en) | 2019-03-15 | 2020-02-03 | Data storage management method and apparatus, and computer-readable storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910197865.4A CN111694505B (zh) | 2019-03-15 | 2019-03-15 | 数据存储管理方法、装置和计算机可读存储介质 |
CN201910197865.4 | 2019-03-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020186931A1 true WO2020186931A1 (zh) | 2020-09-24 |
Family
ID=72475911
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/074191 WO2020186931A1 (zh) | 2019-03-15 | 2020-02-03 | 数据存储管理方法、装置和计算机可读存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11822788B2 (zh) |
CN (1) | CN111694505B (zh) |
WO (1) | WO2020186931A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112965665B (zh) * | 2021-03-09 | 2023-09-26 | 华泰证券股份有限公司 | 一种基于sas和ssd的gp数据库数据存储方法 |
CN113391764A (zh) * | 2021-06-09 | 2021-09-14 | 北京沃东天骏信息技术有限公司 | 一种信息处理方法及装置、存储介质 |
CN114722243A (zh) * | 2022-04-15 | 2022-07-08 | 北京科杰科技有限公司 | 一种数据表排序方法及装置、电子设备、存储介质 |
CN115113827B (zh) * | 2022-08-24 | 2023-02-03 | 苏州浪潮智能科技有限公司 | 一种数据销毁方法、装置、计算机设备和存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100161780A1 (en) * | 2008-12-22 | 2010-06-24 | Electronics And Telecommunications Research Institute | Hot data management method based on hit counter |
CN101777028A (zh) * | 2010-01-21 | 2010-07-14 | 北京北大众志微系统科技有限责任公司 | 一种混合型二级存储系统的实现方法及装置 |
CN103607312A (zh) * | 2013-11-29 | 2014-02-26 | 广州华多网络科技有限公司 | 用于服务器系统的数据请求处理方法及系统 |
CN104715020A (zh) * | 2015-02-13 | 2015-06-17 | 腾讯科技(深圳)有限公司 | 缓存数据的删除方法及服务器 |
CN107168654A (zh) * | 2017-05-26 | 2017-09-15 | 华中科技大学 | 一种基于数据对象热度的异构内存分配方法及系统 |
CN107346321A (zh) * | 2016-05-06 | 2017-11-14 | 阿里巴巴集团控股有限公司 | 数据仓库管理方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2013277351A1 (en) * | 2012-06-18 | 2015-01-22 | Actifio, Inc. | Enhanced data management virtualization system |
CN104778420B (zh) * | 2015-04-24 | 2018-07-03 | 广东电网有限责任公司信息中心 | 非结构化数据全生命周期的安全管理视图建立方法 |
JP2019053415A (ja) * | 2017-09-13 | 2019-04-04 | 東芝メモリ株式会社 | メモリシステム、その制御方法及びプログラム |
TW201926081A (zh) * | 2017-11-27 | 2019-07-01 | 財團法人資訊工業策進會 | 資料轉移系統及方法 |
-
2019
- 2019-03-15 CN CN201910197865.4A patent/CN111694505B/zh active Active
-
2020
- 2020-02-03 US US17/432,815 patent/US11822788B2/en active Active
- 2020-02-03 WO PCT/CN2020/074191 patent/WO2020186931A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100161780A1 (en) * | 2008-12-22 | 2010-06-24 | Electronics And Telecommunications Research Institute | Hot data management method based on hit counter |
CN101777028A (zh) * | 2010-01-21 | 2010-07-14 | 北京北大众志微系统科技有限责任公司 | 一种混合型二级存储系统的实现方法及装置 |
CN103607312A (zh) * | 2013-11-29 | 2014-02-26 | 广州华多网络科技有限公司 | 用于服务器系统的数据请求处理方法及系统 |
CN104715020A (zh) * | 2015-02-13 | 2015-06-17 | 腾讯科技(深圳)有限公司 | 缓存数据的删除方法及服务器 |
CN107346321A (zh) * | 2016-05-06 | 2017-11-14 | 阿里巴巴集团控股有限公司 | 数据仓库管理方法及装置 |
CN107168654A (zh) * | 2017-05-26 | 2017-09-15 | 华中科技大学 | 一种基于数据对象热度的异构内存分配方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
US11822788B2 (en) | 2023-11-21 |
CN111694505A (zh) | 2020-09-22 |
US20220121372A1 (en) | 2022-04-21 |
CN111694505B (zh) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020186931A1 (zh) | 数据存储管理方法、装置和计算机可读存储介质 | |
US11314613B2 (en) | Graphical user interface for visual correlation of virtual machine information and storage volume information | |
US10740308B2 (en) | Key_Value data storage system | |
CN108053863B (zh) | 适合大小文件的海量医疗数据存储系统及数据存储方法 | |
JP4733461B2 (ja) | 計算機システム、管理計算機及び論理記憶領域の管理方法 | |
JP6212655B2 (ja) | 分散システム、計算機、及び、仮想マシンの配置方法 | |
US10162529B2 (en) | Dynamic three-tier data storage utilization | |
CN109947373B (zh) | 一种数据处理方法和装置 | |
JP2021500649A (ja) | シャード・データベースのシャード・テーブルにレコードを記憶するためのコンピュータ実装方法、コンピュータ・プログラム製品、およびシステム、シャード・データベースのシャード・テーブルからレコードを検索するためのコンピュータ実装方法、コンピュータ・プログラム製品、およびシステム、ならびにシャード・データベースを記憶するためのシステム | |
US9817856B2 (en) | Dynamic range partitioning | |
US10831371B2 (en) | Quota controlled movement of data in a tiered storage system | |
CN107368260A (zh) | 基于分布式系统的存储空间整理方法、装置及系统 | |
JP6242930B2 (ja) | センサデータ管理装置、センサデータ管理方法およびプログラム | |
CN107103068A (zh) | 业务缓存的更新方法及装置 | |
JP6269140B2 (ja) | アクセス制御プログラム、アクセス制御方法、およびアクセス制御装置 | |
CN105989015B (zh) | 一种数据库扩容方法和装置以及访问数据库的方法和装置 | |
WO2014082203A1 (zh) | 元数据管理方法和装置 | |
CN113177050A (zh) | 一种数据均衡的方法、装置、查询系统及存储介质 | |
US10241693B2 (en) | Dynamic two-tier data storage utilization | |
US11429311B1 (en) | Method and system for managing requests in a distributed system | |
US20200379796A1 (en) | Cluster expansion method and apparatus, electronic device and storage medium | |
US20180165380A1 (en) | Data processing system and data processing method | |
US11003693B2 (en) | Grouping tables with existing tables in a distributed database | |
JP6568232B2 (ja) | 計算機システム、及び、装置の管理方法 | |
JP2015095246A (ja) | 情報処理システム、管理装置、サーバ装置及びキー割当プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20774511 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 020222) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20774511 Country of ref document: EP Kind code of ref document: A1 |