CN117112492B - Self-adaptive space-time big data distributed storage method and intelligent file system - Google Patents

Self-adaptive space-time big data distributed storage method and intelligent file system Download PDF

Info

Publication number
CN117112492B
CN117112492B CN202311079515.0A CN202311079515A CN117112492B CN 117112492 B CN117112492 B CN 117112492B CN 202311079515 A CN202311079515 A CN 202311079515A CN 117112492 B CN117112492 B CN 117112492B
Authority
CN
China
Prior art keywords
space
time
data
virtual pool
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311079515.0A
Other languages
Chinese (zh)
Other versions
CN117112492A (en
Inventor
蒋湘涛
李建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University of Forestry and Technology
Original Assignee
Central South University of Forestry and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University of Forestry and Technology filed Critical Central South University of Forestry and Technology
Priority to CN202311079515.0A priority Critical patent/CN117112492B/en
Publication of CN117112492A publication Critical patent/CN117112492A/en
Application granted granted Critical
Publication of CN117112492B publication Critical patent/CN117112492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a self-adaptive space-time big data distributed storage method and an intelligent file system, wherein the method comprises the following steps: establishing a space-time big data management frame; acquiring time characteristics and space characteristics of space-time data to be stored; according to the time characteristics of the space-time data, placing the space-time data in a target virtual pool corresponding to a time window; dividing the space-time data into a plurality of sub-data according to the number of storage nodes managed by the target virtual pool according to the space characteristics of the space-time data; distributing each piece of sub data obtained by segmentation to different storage nodes managed by a target virtual pool for distributed storage; learning access characteristics of a user to the stored space-time data; according to the access characteristics, the time-space big data management framework is adaptively evolved, so that the response efficiency of the evolved time-space big data management framework to the user access request is improved. The invention is beneficial to adaptively evolving the time data storage rule so as to improve the efficiency of accessing the time data by the user.

Description

Self-adaptive space-time big data distributed storage method and intelligent file system
Technical Field
The invention relates to the technical field of space-time big data storage, in particular to an intelligent file system and a self-adaptive space-time big data distributed storage method.
Background
The space-time data is data with time and space dimensions, and comprises three-dimensional information of time, space and thematic attributes, and has the comprehensive characteristics of multiple sources, mass and rapid updating. Thus, the storage of spatiotemporal data exhibits the characteristics of large data storage and has the unique attributes of spatiotemporal data.
In the space-time data storage process, data needs to be managed and stored through a file system. In the existing file system, in the process of storing the space-time data, the received space-time data is stored in the storage space without distinction, and as the space-time data stored in the storage space is more and more, the pressure of storing and managing the space-time data by the file system can be gradually increased. In the file system in the prior art, because the indifferent storage is adopted when the space-time data is stored, and the data management and storage pressure formed along with the improvement of the space-time data storage capacity can cause low access efficiency when a user side needs to access the data in the file system.
Disclosure of Invention
The invention mainly aims to provide an intelligent file system and a self-adaptive space-time big data distributed storage method, and aims to solve the problem that in the existing file system, the efficiency of accessing space-time data by a user is low.
In order to achieve the above purpose, the invention provides a self-adaptive space-time big data distributed storage method, which comprises the following steps:
establishing a space-time big data management frame, wherein the space-time big data management frame sequentially generates each virtual pool according to the granularity of a set time window along with the time, corresponds each virtual pool to different time windows, and correspondingly manages a plurality of storage nodes for each virtual pool;
acquiring time characteristics and space characteristics of space-time data to be stored;
according to the time characteristics of the space-time data, placing the space-time data in a target virtual pool corresponding to a time window;
dividing the space-time data into a plurality of sub-data according to the number of storage nodes managed by the target virtual pool according to the space characteristics of the space-time data;
distributing each piece of sub data obtained by segmentation to different storage nodes managed by a target virtual pool for distributed storage;
learning access characteristics of a user to the stored space-time data;
According to the access characteristics, the time-space big data management framework is adaptively evolved, so that the response efficiency of the evolved time-space big data management framework to the user access request is improved.
Preferably, the step of learning the access characteristic of the user to the stored spatiotemporal data comprises:
acquiring an access request sent by a user, wherein the access request comprises time characteristics, space characteristics and data types of space-time data requested to be accessed;
and determining the access frequency of the user to the spatiotemporal data managed in each virtual pool according to the access request.
Preferably, the step of adaptively evolving the temporal big data management framework according to the access characteristics includes at least one of the following steps:
according to the access frequency of the space-time data managed in the virtual pool, evolving the granularity of a time window in the time big data management frame;
according to the access frequency of the space-time data managed in the virtual pools, evolving the number of storage nodes correspondingly managed by each virtual pool in the time big data management frame;
and according to the access frequency of the space-time data managed in the virtual pools, evolving the corresponding relation between each virtual pool and the storage node in the time big data management frame.
Preferably, the step of evolving the granularity of the time window in the temporal big data management framework according to the access frequency of the temporal big data managed in the virtual pool includes:
judging whether to trigger a virtual pool fusion instruction or a virtual pool segmentation instruction according to the access frequency of the space-time data managed in the virtual pool;
if the virtual pool fusion instruction is triggered, judging whether the adjacent virtual pools of the virtual pool triggering the virtual pool fusion instruction also trigger the virtual pool fusion instruction;
when the access frequency of the adjacent virtual pools also triggers the virtual pool fusion instruction, executing virtual pool fusion operation so that the time window scale of the fused virtual pools is the sum of the time window scales of all continuous virtual pools before fusion;
if the virtual pool segmentation instruction is triggered, the virtual Chi Qiefen triggering the virtual pool segmentation instruction is segmented into a plurality of virtual pools, the corresponding time window is segmented into a plurality of sequentially continuous time windows according to the segmentation quantity of the virtual pools, and each segmented virtual pool corresponds to different segmented time windows.
Preferably, the step of evolving the number of storage nodes correspondingly managed by each virtual pool in the temporal big data management frame according to the access frequency of the temporal and spatial data managed by the virtual pool includes:
Judging whether to trigger a storage node quantity adjusting instruction according to the access frequency of the space-time data managed in the virtual pool;
if yes, the number of the storage nodes managed by the corresponding virtual pools is adjusted according to the storage node number adjustment instruction.
Preferably, the step of evolving a correspondence between each virtual pool and a storage node in the temporal big data management frame according to the access frequency of the temporal big data managed in the virtual pool includes:
judging whether to trigger a storage node adjustment instruction according to the access frequency of the space-time data managed in the virtual pool;
if yes, according to the performance parameters of each storage node, the storage node which triggers the virtual pool management of the storage node adjustment instruction is adjusted.
Preferably, the method further comprises:
determining target space-time data which is requested to be accessed by a user according to an access request sent by the user;
determining an access virtual pool for managing the space-time data requested to be accessed according to the time characteristics and the space characteristics of the target space-time data;
acquiring each access storage node managed by the access virtual pool;
extracting target sub-data segmented by the target space-time data in parallel from all access storage nodes;
Splicing all target sub-data into complete space-time data according to a segmentation mode;
and returning the complete space-time data to the user as feedback data.
Preferably, the method further comprises:
acquiring appointed sub-data determined by a user from target space-time data;
searching an access storage node for storing the appointed sub-data from the access storage nodes managed by the access virtual pool according to the appointed sub-data;
and returning the returned designated sub-data to the user as feedback data.
Preferably, the method further comprises:
acquiring performance parameters of each idle storage node, and forming a performance parameter sequencing result of the storage nodes;
and corresponding the storage nodes with high performance parameters in the performance parameter sequencing result to the new virtual pool.
In order to achieve the above purpose, the invention also provides an intelligent file system, which applies the self-adaptive space-time big data distributed storage method; the intelligent file system comprises a data management interface module and a storage module which are in communication connection; the data management interface module is used for providing an access interface of the space-time data for the user so as to receive an access request of the user through the access interface; the storage module comprises a plurality of storage nodes;
The data management interface module is used for: establishing a space-time big data management frame; acquiring time characteristics and space characteristics of space-time data to be stored; according to the time characteristics of the space-time data, placing the space-time data in a target virtual pool corresponding to a time window; dividing the space-time data into a plurality of sub-data according to the number of storage nodes managed by the target virtual pool according to the space characteristics of the space-time data; distributing each piece of sub data obtained by segmentation to different storage nodes managed by a target virtual pool; learning access characteristics of a user to the stored space-time data; according to the access characteristics, adaptively evolving the time-space big data management framework to improve the response efficiency of the evolved time-space big data management framework to the user access request;
the storage module is used for: and carrying out distributed storage on each piece of sub data by adopting an allocated storage node.
In the technical scheme, a space-time big data management frame is established, virtual pools with different time windows are sequentially generated in the space-time big data management frame along with the time, and the granularity of the time window of each virtual pool is determined according to a preset mode and can evolve; each virtual pool is correspondingly managed with a plurality of storage nodes to realize distributed storage, and the number of the storage nodes and the corresponding relation between the virtual pools and the storage nodes can be evolved. The space-time data to be stored is placed in a corresponding target virtual pool according to time characteristics, the target virtual pool segments the space-time data to be stored according to the space characteristics of the space-time data to be stored, and each piece of sub-data is stored through one storage node of the target virtual pool after segmentation, so that the space-time data to be stored is distributed and stored in each storage node managed by the target virtual pool according to a distributed storage mode. Under the space-time big data management framework, each virtual pool correspondingly manages space-time data with different time characteristics, after each space-time data is distributed and segmented and stored in each storage node managed by the virtual pool, when a user accesses the space-time data, a target virtual pool can be determined according to the time characteristics of the space-time data, and each storage node distributed and managed by the target virtual pool feeds back each piece of sub data in parallel, so that the sub data is spliced to form a complete request for accessing the space-time data. It is easy to understand that, in the case of distributed storage, the greater the number of storage nodes in one spatio-temporal data distributed storage, the greater the number of storage nodes in which sub-data are fed back in parallel when accessed by a user, and thus the faster the rate at which complete spatio-temporal data are formed by feedback. According to the invention, the access characteristics of the user to the stored space-time data are learned, and the space-time big data management framework is adaptively evolved according to the access characteristics, so that the evolved space-time big data management framework accords with the access characteristics of the user to the space-time data, and the response efficiency of the evolved space-time big data management framework to the user access request is improved.
Drawings
FIG. 1 is a schematic flow chart of a method for adaptively storing space-time big data in a distributed manner in a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a framework of the intelligent file system of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In the following description, suffixes such as "unit", "part" or "unit" for representing elements are used only for facilitating the description of the present invention, and have no specific meaning per se. Thus, "unit," "component," or "unit" may be used in combination.
Referring to fig. 1 to 2, a first embodiment of the present invention provides an adaptive space-time big data distributed storage method, which includes the following steps:
step S10, a space-time big data management frame is established, wherein the space-time big data management frame sequentially generates each virtual pool along with the time according to the granularity of a set time window, each virtual pool is corresponding to different time windows, and each virtual pool is correspondingly managed with a plurality of storage nodes;
Step S20, obtaining time characteristics and space characteristics of space-time data to be stored;
step S30, according to the time characteristics of the space-time data, placing the space-time data in a target virtual pool corresponding to a time window;
step S40, dividing the space-time data into a plurality of sub-data according to the space characteristics of the space-time data and the number of storage nodes managed by the target virtual pool;
step S50, distributing each piece of sub data obtained by segmentation to different storage nodes managed by the target virtual pool for distributed storage;
step S60, learning access characteristics of the user to the stored space-time data;
step S70, adaptively evolving the time-space big data management framework according to the access characteristics so as to improve the response efficiency of the evolved time-space big data management framework to the user access request.
In the technical scheme, a space-time big data management frame is established, virtual pools with different time windows are sequentially generated in the space-time big data management frame along with the time, and the granularity of the time window of each virtual pool is determined according to a preset mode and can evolve; each virtual pool is correspondingly managed with a plurality of storage nodes to realize distributed storage, and the number of the storage nodes and the corresponding relation between the virtual pools and the storage nodes can be evolved. The space-time data to be stored is placed in a corresponding target virtual pool according to time characteristics, the target virtual pool segments the space-time data to be stored according to the space characteristics of the space-time data to be stored, and each piece of sub-data is stored through one storage node of the target virtual pool after segmentation, so that the space-time data to be stored is distributed and stored in each storage node managed by the target virtual pool according to a distributed storage mode. Under the space-time big data management framework, each virtual pool correspondingly manages space-time data with different time characteristics, after each space-time data is distributed and segmented and stored in each storage node managed by the virtual pool, when a user accesses the space-time data, a target virtual pool can be determined according to the time characteristics of the space-time data, and each storage node distributed and managed by the target virtual pool feeds back each piece of sub data in parallel, so that the sub data is spliced to form a complete request for accessing the space-time data. It is easy to understand that, in the case of distributed storage, the greater the number of storage nodes in one spatio-temporal data distributed storage, the greater the number of storage nodes in which sub-data are fed back in parallel when accessed by a user, and thus the faster the rate at which complete spatio-temporal data are formed by feedback. According to the invention, the access characteristics of the user to the stored space-time data are learned, and the space-time big data management framework is adaptively evolved according to the access characteristics, so that the evolved space-time big data management framework accords with the access characteristics of the user to the space-time data, and the response efficiency of the evolved space-time big data management framework to the user access request is improved.
Specifically, the time window granularity refers to the time window span size of each virtual pool, e.g., the time window granularity size may be 10 minutes, 30 minutes, 1 hour, 2 hours, etc. The time window granularity of each virtual pool may or may not be equal. According to the method, the granularity of the time window is preset, and each virtual pool is sequentially generated along with the time according to the granularity of the time window. And each virtual pool correspondingly forms a time window according to the set granularity of the time window. For example, when the granularity of the preset time window is 30 minutes, the default time windows corresponding to the virtual pools sequentially generated along with the time are respectively: 00:01-00:30, 00:31-01:00, 01:01-01:30, 01:31-02:00 of xx year xx month xx day.
The space-time data in the invention can be specifically satellite remote sensing data, the time characteristic of the satellite remote sensing data is the acquisition time of the space-time data, and the space characteristic of the satellite remote sensing data is the geographic area covered by the space-time data.
Each virtual pool is correspondingly managed with a plurality of storage nodes so as to carry out distributed storage through all the managed storage nodes.
For example, after the spatiotemporal data collected by 00:08 of xxxx year, xx month and xx day is placed in a virtual pool with a time window of 00:01-00:30 of xxxx year, if the virtual pool manages 8 storage nodes, the spatiotemporal data is segmented into 8 sub-data according to spatial characteristics, and then each storage node stores 1 sub-data therein, so that the complete spatiotemporal data is stored in 8 storage nodes in parallel. When the user needs to access the space-time data, the 8 storage nodes read out the corresponding stored sub-data in parallel, so that the rate of obtaining the query result by the user is faster. It is easy to understand that the greater the number of storage nodes managed by each virtual pool, the smaller the sub-data each storage node needs to read out, and thus, the faster the response rate to the user's access request; conversely, the fewer the number of storage nodes managed by each virtual pool, the greater the sub-data each storage node needs to read, and thus the slower the response rate to the user's access request.
Based on the first embodiment of the adaptive space-time big data distributed storage method of the present invention, in the second embodiment of the adaptive space-time big data distributed storage method of the present invention, the step S60 includes:
step S61, an access request sent by a user is obtained, wherein the access request comprises time characteristics, space characteristics and data types of space-time data requested to be accessed;
step S62, determining, according to the access request, the frequency of access of the user to the spatiotemporal data managed in each virtual pool (which may be regarded as the frequency of access of the user to each virtual pool).
In this embodiment, the temporal characteristics, spatial characteristics and data types of the spatiotemporal data requested to be accessed by the user are determined through the access request, so that a virtual pool for managing the spatiotemporal data is determined, and accurate spatiotemporal data is located from the virtual pool.
By analyzing the access request sent by the user, the access preference of the user to the stored spatiotemporal data can be determined, for example, the spatiotemporal data access frequency of which time window is high and the spatiotemporal data access frequency of which time window is low.
Thus, by learning the access characteristics of the user to the stored spatiotemporal data, the frequency of the user's access to each virtual pool can be determined.
Specifically, in this embodiment, the access frequency interval may be defined. And analyzing which specific access frequency interval the access frequency of each virtual pool falls into. Therefore, the invention can further evolve the space-time big data management framework according to the access frequency of each virtual pool.
Based on the second embodiment of the adaptive space-time big data distributed storage method of the present invention, in a third embodiment of the adaptive space-time big data distributed storage method of the present invention, the step S70 includes at least one of the following steps:
step S71, according to the access frequency of the space-time data managed in the virtual pool, evolving the granularity of a time window in the space-time big data management frame;
step S72, according to the access frequency of the space-time data managed in the virtual pools, evolving the number of storage nodes correspondingly managed by each virtual pool in the space-time big data management frame;
step S73, according to the access frequency of the space-time data managed in the virtual pools, the corresponding relation between each virtual pool and the storage node in the space-time big data management frame is evolved.
According to the invention, the time big data management framework can be adaptively evolved according to the access characteristics. Therefore, the most important technical improvement of the invention is that after the space-time big data management framework is established, the space-time big data management framework is continuously evolved according to the access characteristics, so that the space-time big data management framework shows unique adaptability of evolution according to the access characteristics of users.
And the evolution process does not need to be manually participated, the access characteristics are intelligently calculated in the system, and the time big data management framework is adaptively evolved according to the calculation result. Therefore, the self-adaptive space-time big data distributed storage method is a distributed storage method which is continuously carried out and continuously adjusted so as to adapt to the access rule of users.
The user does not participate in the evolution process, the system is not evolved according to what logic in the system, and the only perception of the user is that under the self-adaptive space-time big data distributed storage method, the speed of accessing space-time data is faster than the general access speed, and the response speed of the system is faster and faster.
In the embodiment, the steps S71 to S73 can all realize the evolution of the system, and in a specific application, one or more of the steps S71 to S73 (more than two in the present invention, including two) can be selected to realize the evolution process of the space-time big data management frame, and of course, each step has a certain evolution effect on the evolution process of the space-time big data management frame, and when the steps S71 to S73 are adopted to jointly evolve the space-time big data management frame, the maximized access rate improving effect can be obtained.
Specifically, the evolution process of the time space big data management framework in the invention can be performed when the system is idle.
Based on the third embodiment of the adaptive space-time big data distributed storage method of the present invention, in a fourth embodiment of the adaptive space-time big data distributed storage method of the present invention, the step S71 includes:
step S711, judging whether to trigger a virtual pool fusion instruction or a virtual pool segmentation instruction according to the access frequency of the space-time data managed in the virtual pool;
step S712, if the virtual pool fusion instruction is triggered, judging whether the adjacent virtual pool of the virtual pool triggering the virtual pool fusion instruction also triggers the virtual pool fusion instruction;
step S713, when the access frequency of the adjacent virtual pools also triggers the virtual pool fusion instruction, executing the virtual pool fusion operation so that the time window scale of the fused virtual pools is the sum of the time window scales of the continuous virtual pools before fusion;
in step S714, if the virtual pool splitting instruction is triggered, the virtual Chi Qiefen that triggers the virtual pool splitting instruction is split into multiple virtual pools, the corresponding time window is split into multiple consecutive time windows according to the virtual pool splitting number, and each virtual pool after splitting corresponds to a different time window after splitting.
Evolving the granularity of time windows within the spatio-temporal big data management framework helps to scale up the granularity of time windows of virtual pools that are accessed less frequently (e.g., accessed once 1 month or accessed once 2 months), or scale down the granularity of time windows of virtual pools that are accessed more frequently (e.g., accessed once 1 day, accessed once 5 days).
In the invention, a mapping relation table of the granularity of the time window and the access frequency interval can be set, the corresponding access frequency interval is determined according to the actual access frequency of the virtual pool, and the corresponding granularity of the time window is determined according to the corresponding access frequency interval. And when the actual access frequency interval and the time window granularity of the virtual pool do not accord with the mapping relation table, triggering a virtual pool fusion instruction or a virtual pool segmentation instruction. For example, according to the mapping relation table, determining whether the actual access frequency interval of the virtual pool accords with the granularity of a time window in the mapping relation table; when the granularity of the time window is smaller than that in the mapping relation table, the time window needs to be enlarged, and a virtual pool fusion instruction is triggered; when the time window granularity is larger than that in the mapping relation table, the time window needs to be reduced, and a virtual pool segmentation instruction is triggered; and when the time window granularity in the mapping relation table is equal to the time window granularity, the virtual pool fusion instruction and the virtual pool segmentation instruction are not triggered.
When the granularity of the time window of the virtual pool with low access frequency is enlarged, if the adjacent virtual pools also need to be correspondingly enlarged, the adjacent virtual pools (i.e. the virtual pools with continuity in the time window) are fused into one virtual pool according to the requirement of the enlarged granularity of the time window (for example, the granularity of the fused time window can be determined according to the mapping relation table), and at the moment, the time window of the virtual pool formed by fusion is the sum of the time windows of all the virtual pools before fusion.
Therefore, one virtual pool can manage more space-time data, which is beneficial to adaptively reducing the number of the historically generated virtual pools in the space-time big data management framework.
Furthermore, each virtual pool is correspondingly provided with a meta-information unit, and the meta-information units can be stored in a management node corresponding to the virtual pool or in a space-time big data management framework. When the number of the virtual pools is reduced, the number of the corresponding meta-information units is also reduced, so that the automatic merging of the unusual meta-information data is facilitated, and the automatic integration of the meta-information data is realized.
The virtual pools with high access frequency are segmented into a plurality of sequentially continuous virtual pools according to the required time window granularity (for example, according to the mapping relation table), and the time windows of the sequentially continuous virtual pools are also sequentially continuous, so that the time window granularity of the virtual pools with high access frequency is reduced, at the moment, less data are managed by each virtual pool, less data are stored by storage nodes corresponding to the virtual pools, and the response rate of access is faster.
Further, in the present invention, the number of storage nodes corresponding to each virtual pool is set, after a plurality of adjacent virtual pools are fused into one virtual pool, the number of storage nodes corresponding to a plurality of adjacent virtual pools before fusion can be automatically reduced to the number of storage nodes corresponding to one virtual pool required after fusion through a fusion process (for example, when each virtual pool is generated, 8 storage nodes are respectively corresponding to management, and 24 storage nodes are adopted before fusion of 3 adjacent virtual pools, after the 3 virtual pools are fused into 1 virtual pool, 8 storage nodes can be adopted to store data in the virtual pool formed by fusion, thereby releasing another 16 storage nodes), so that redundant storage nodes can be released for space-time data with low access frequency to serve as storage nodes corresponding to the newly added virtual pool (the performance of each storage node can be ordered, and the storage nodes with the performance being positioned in the front or at the tail end can be released according to the access requirement). Thus, the correspondence of storage nodes to virtual pools is also adaptive.
It is easy to understand that the access frequency of the space-time data just stored is higher, and the access frequency of the user to the stale data gradually decreases as time goes by, and in a common storage system, the data with low access frequency needs to be deleted or migrated manually. In the invention, the evolution of the system can automatically release and store new data from the storage nodes corresponding to the data which is not accessed for a long time, thereby realizing self-adaptive data migration and storage node release, needing no manual participation and having higher intelligent degree.
Based on the third or fourth embodiment of the adaptive space-time big data distributed storage method of the present invention, in a fifth embodiment of the adaptive space-time big data distributed storage method of the present invention, the step S72 includes:
step S721, judging whether to trigger a storage node quantity adjustment instruction according to the access frequency of the space-time data managed in the virtual pool;
if yes, step S722 is executed: and adjusting the number of the storage nodes managed by the corresponding virtual pools according to the storage node number adjusting instruction.
Specifically, in the mapping relationship table, different numbers of storage nodes may be correspondingly set for each access frequency interval. If the number of storage nodes corresponding to the actual access frequency interval of the virtual pool by the user is different from the number of storage nodes currently managed by the virtual pool according to the mapping relation table, the number of storage nodes managed by the virtual pool needs to be evolved.
Specifically, step S72 may evolve the number of storage nodes for the virtual pool formed by fusion, the virtual pool formed by segmentation, or the virtual pool that has not been fused.
Specifically, the storage node number adjustment instruction includes a storage node number up instruction and a storage node number down instruction.
And when the actual storage node number of the virtual pool does not reach the storage node number corresponding to the access frequency, triggering a storage node number heightening instruction. For example, the number of storage nodes originally managed by the virtual pool is 8, and can be adjusted to 10.
When the number of the actual storage nodes of the virtual pool exceeds the number of the storage nodes corresponding to the access frequency, a storage node number lowering instruction can be triggered. For example, the number of storage nodes originally managed by the virtual pool is 8, and can be adjusted to 6.
Specifically, the storage node number adjustment instruction may be triggered multiple times, for example, when the access frequency of the virtual pool gradually decreases over time, the storage node number may be gradually decreased until the storage node number corresponding to the virtual pool is 1. Meanwhile, after the number of storage nodes of adjacent virtual pools is reduced to 1, the adjacent virtual pools can be fused to 1 by a virtual pool fusion mode, and the fused virtual pools are stored by only adopting 1 storage node. Therefore, through the dynamic storage node reduction process, a large number of storage nodes are saved, and more storage resources are released.
Based on the third to fifth embodiments of the adaptive space-time big data distributed storage method of the present invention, in a sixth embodiment of the adaptive space-time big data distributed storage method of the present invention, the step S73 includes:
Step S731, judging whether to trigger a storage node adjustment instruction according to the access frequency of the space-time data managed in the virtual pool;
if yes, step S732 is executed: and adjusting the storage nodes which trigger the virtual pool management of the storage node adjustment instruction according to the performance parameters of each storage node.
Specifically, in this embodiment, each storage node may be one computer terminal, or a combination of multiple computer terminals. The performance parameter of the storage node may be in particular a CPU load factor.
Further, when the access frequency of the virtual pool is higher, providing a higher response rate can effectively improve the user access experience. In the invention, in terms of improving the access rate, not only the number of the storage nodes for distributed storage is improved, but also the performance parameter of each storage node is considered. Specifically, the rate at which each storage node reads out the stored sub-data is detected to determine the performance parameter of the storage node, where the performance parameter may be preferably a CPU load factor, and the response performance of each storage node is evaluated according to the size of the CPU load factor.
Therefore, the storage nodes with good performance parameters and managed by the virtual pool with low access frequency can be replaced by the storage nodes with relatively lower performance parameters, so that the space-time data with high access frequency is adaptively migrated to the storage nodes with high response speed for storage according to the access characteristics of users, the space-time data with low access frequency is adaptively migrated to the storage nodes with low response speed for storage, the adaptive migration of the space-time data is completed, and the access experience of the users is improved.
Further, in this embodiment, performance parameters of each storage node may be obtained, and the storage nodes may be ordered according to the performance parameters of each storage node; further, the access frequency average value of the spatio-temporal data managed by each virtual pool can be ordered, the storage nodes with superior performance parameters are corresponding to the virtual pools with high access frequency average value, and the storage nodes with back performance parameters are corresponding to the virtual pools with low access frequency average value according to the ordering of the performance parameters of the storage nodes and the access frequency average value of the spatio-temporal data managed by the virtual pools, so that the corresponding relation between each virtual pool and the storage nodes in the spatio-temporal big data management framework is evolved.
Based on the second embodiment of the adaptive space-time big data distributed storage method of the present invention, in a seventh embodiment of the adaptive space-time big data distributed storage method of the present invention, the method further includes:
step S80, determining target space-time data which is requested to be accessed by a user according to an access request sent by the user;
step S90, determining an access virtual pool for managing the space-time data requested to be accessed according to the time characteristics and the space characteristics of the target space-time data;
Step S100, each access storage node for accessing virtual pool management is obtained;
step S110, extracting target sub-data segmented by the target space-time data in parallel from all access storage nodes;
step S120, splicing all target sub-data into complete space-time data according to a segmentation mode;
and step S130, returning the complete space-time data to the user as feedback data.
Specifically, when the space-time data is stored, the space-time data is divided into a plurality of sub-data according to the spatial characteristics. The method for segmenting the spatio-temporal data according to the spatial features can adopt various methods:
for example, the geographical area covered by the spatio-temporal data may be equally divided into sub-data equal to the number of storage nodes, e.g., meshing cut.
For another example, the spatiotemporal data may be segmented according to an ecological region type to which the geographic region belongs, the ecological region type including: forest, wetland, watershed, farmland, city, but other types of ecological areas may of course be included.
Alternatively, the spatio-temporal data may be segmented according to cities or latitude and longitude regions corresponding to the geographic regions.
After the space-time data is segmented and stored in a distributed mode to the corresponding storage nodes, if a user needs to access the space-time data, corresponding sub-data are read from each distributed storage node, the read sub-data are spliced to the original data positions according to the segmentation mode to form complete space-time data, and finally the complete space-time data are fed back to the user.
Based on the seventh embodiment of the adaptive space-time big data distributed storage method of the present invention, in the eighth embodiment of the adaptive space-time big data distributed storage method of the present invention, the method further includes:
step S140, acquiring specified sub-data determined by a user from the target space-time data;
step S150, searching access storage nodes for storing the appointed sub data from the access storage nodes managed by the access virtual pool according to the appointed sub data;
step S160, the returned designated sub data is returned to the user as feedback data.
Furthermore, the embodiment also provides a fine granularity feedback method for accessing data. Specifically, because the space-time data is stored to different storage nodes according to the space feature segmentation in the invention, when a user does not need to inquire the complete space-time data, at least one appointed sub-data (for example, forest data) in the complete space-time data can be determined as an inquiry target according to the space feature segmentation mode. Therefore, the access storage nodes for storing the appointed sub data are searched from the access storage nodes managed by the access virtual pool, and the returned appointed sub data are returned to the user as feedback data, so that the fine-granularity query of the space-time data is realized.
Based on the first to eighth embodiments of the adaptive spatiotemporal big data distributed storage method of the present invention, in the ninth embodiment of the adaptive spatiotemporal big data distributed storage method of the present invention, the method further includes:
step S170, obtaining performance parameters of each idle storage node, and forming a performance parameter ordering result of the storage nodes;
step S180, the storage nodes with high performance parameters in the performance parameter sequencing result are corresponding to the newly generated virtual pools.
For data storage, the characteristics of high new data access frequency and low old data access frequency are generally presented. According to the invention, idle storage nodes are ordered according to performance parameters, and storage nodes with high performance parameters are corresponding to a new virtual pool, so that the newly generated virtual pool is always stored by adopting storage nodes with superior performance parameters, so that new data can show good access rate under the condition of high access frequency, in the process that the new data gradually becomes old data, the storage nodes with superior performance parameters are gradually released from the space-time data storage task with low access frequency by adopting continuous evolution of a space-time big data management frame, and the space-time data with low access frequency is summarized and migrated to the storage nodes with low performance for storage, and the space-time data with high access frequency is split and migrated to the storage nodes with excellent performance for storage, so that the space-time big data management frame is continuously evolved along with the change of access characteristics of users, and the evolution process of self-adaptive space-time data storage without manual participation is always maintained.
In order to achieve the above purpose, the invention also provides an intelligent file system, which applies the self-adaptive space-time big data distributed storage method; the intelligent file system comprises a data management interface module and a storage module which are in communication connection; the data management interface module is used for providing an access interface of the space-time data for the user so as to receive an access request of the user through the access interface; the storage module comprises a plurality of storage nodes;
the data management interface module is used for: establishing a space-time big data management frame; acquiring time characteristics and space characteristics of space-time data to be stored; according to the time characteristics of the space-time data, placing the space-time data in a target virtual pool corresponding to a time window; dividing the space-time data into a plurality of sub-data according to the number of storage nodes managed by the target virtual pool according to the space characteristics of the space-time data; distributing each piece of sub data obtained by segmentation to different storage nodes managed by a target virtual pool; learning access characteristics of a user to the stored space-time data; according to the access characteristics, adaptively evolving the time-space big data management framework to improve the response efficiency of the evolved time-space big data management framework to the user access request;
The storage module is used for: and carrying out distributed storage on each piece of sub data by adopting an allocated storage node.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part in the form of a software product stored in a computer readable storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device to enter the method according to the embodiments of the present invention.
In the description of the present specification, descriptions of terms "one embodiment," "another embodiment," "other embodiments," or "first embodiment through X-th embodiment," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, method steps or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (6)

1. The self-adaptive space-time big data distributed storage method is characterized by comprising the following steps of:
Establishing a space-time big data management frame, wherein the space-time big data management frame sequentially generates each virtual pool according to the granularity of a set time window along with the time, corresponds each virtual pool to different time windows, and correspondingly manages a plurality of storage nodes for each virtual pool;
acquiring time characteristics and space characteristics of space-time data to be stored;
according to the time characteristics of the space-time data, placing the space-time data in a target virtual pool corresponding to a time window;
dividing the space-time data into a plurality of sub-data according to the number of storage nodes managed by the target virtual pool according to the space characteristics of the space-time data;
distributing each piece of sub data obtained by segmentation to different storage nodes managed by a target virtual pool for distributed storage;
learning access characteristics of a user to the stored space-time data;
according to the access characteristics, adaptively evolving the time-space big data management framework to improve the response efficiency of the evolved time-space big data management framework to the user access request;
the step of adaptively evolving the time big data management framework according to the access characteristics comprises at least one of the following steps: according to the access frequency of the space-time data managed in the virtual pool, evolving the granularity of a time window in the time big data management frame; according to the access frequency of the space-time data managed in the virtual pools, evolving the number of storage nodes correspondingly managed by each virtual pool in the time big data management frame; according to the access frequency of the space-time data managed in the virtual pools, evolving the corresponding relation between each virtual pool and the storage node in the time big data management frame;
The step of evolving the granularity of the time window in the time big data management frame according to the access frequency of the time-space data managed in the virtual pool comprises the following steps: judging whether to trigger a virtual pool fusion instruction or a virtual pool segmentation instruction according to the access frequency of the space-time data managed in the virtual pool; if the virtual pool fusion instruction is triggered, judging whether the adjacent virtual pools of the virtual pool triggering the virtual pool fusion instruction also trigger the virtual pool fusion instruction; when the access frequency of the adjacent virtual pools also triggers the virtual pool fusion instruction, executing virtual pool fusion operation so that the time window scale of the fused virtual pools is the sum of the time window scales of all continuous virtual pools before fusion; if the virtual pool segmentation instruction is triggered, dividing the virtual Chi Qiefen triggering the virtual pool segmentation instruction into a plurality of virtual pools, segmenting a corresponding time window into a plurality of sequentially continuous time windows according to the segmentation quantity of the virtual pools, and corresponding each segmented virtual pool to different segmented time windows;
the step of evolving the number of storage nodes corresponding to each virtual pool in the time space big data management frame according to the access frequency of the time space data managed in the virtual pools comprises the following steps: judging whether to trigger a storage node quantity adjusting instruction according to the access frequency of the space-time data managed in the virtual pool; if yes, adjusting the number of the storage nodes managed by the corresponding virtual pools according to the storage node number adjustment instruction;
The step of evolving the corresponding relation between each virtual pool and the storage node in the time space big data management frame according to the access frequency of the time space data managed in the virtual pools comprises the following steps: judging whether to trigger a storage node adjustment instruction according to the access frequency of the space-time data managed in the virtual pool; if yes, according to the performance parameters of each storage node, the storage node which triggers the virtual pool management of the storage node adjustment instruction is adjusted.
2. The adaptive spatiotemporal big data distributed storage method of claim 1, wherein the step of learning the user access characteristics to the stored spatiotemporal data comprises:
acquiring an access request sent by a user, wherein the access request comprises time characteristics, space characteristics and data types of space-time data requested to be accessed;
and determining the access frequency of the user to the spatiotemporal data managed in each virtual pool according to the access request.
3. The adaptive spatiotemporal big data distributed storage method of claim 2, further comprising:
determining target space-time data which is requested to be accessed by a user according to an access request sent by the user;
determining an access virtual pool for managing the space-time data requested to be accessed according to the time characteristics and the space characteristics of the target space-time data;
Acquiring each access storage node managed by the access virtual pool;
extracting target sub-data segmented by the target space-time data in parallel from all access storage nodes;
splicing all target sub-data into complete space-time data according to a segmentation mode;
and returning the complete space-time data to the user as feedback data.
4. The adaptive spatiotemporal big data distributed storage method of claim 3, further comprising:
acquiring appointed sub-data determined by a user from target space-time data;
searching an access storage node for storing the appointed sub-data from the access storage nodes managed by the access virtual pool according to the appointed sub-data;
and returning the returned designated sub-data to the user as feedback data.
5. The adaptive spatiotemporal big data distributed storage method of any of claims 1 to 4, further comprising:
acquiring performance parameters of each idle storage node, and forming a performance parameter sequencing result of the storage nodes;
and corresponding the storage nodes with high performance parameters in the performance parameter sequencing result to the new virtual pool.
6. An intelligent file system, characterized in that an adaptive spatiotemporal big data distributed storage method according to any of claims 1 to 5 is applied; the intelligent file system comprises a data management interface module and a storage module which are in communication connection; the data management interface module is used for providing an access interface of the space-time data for the user so as to receive an access request of the user through the access interface; the storage module comprises a plurality of storage nodes;
The data management interface module is used for: establishing a space-time big data management frame; acquiring time characteristics and space characteristics of space-time data to be stored; according to the time characteristics of the space-time data, placing the space-time data in a target virtual pool corresponding to a time window; dividing the space-time data into a plurality of sub-data according to the number of storage nodes managed by the target virtual pool according to the space characteristics of the space-time data; distributing each piece of sub data obtained by segmentation to different storage nodes managed by a target virtual pool; learning access characteristics of a user to the stored space-time data; according to the access characteristics, adaptively evolving the time-space big data management framework to improve the response efficiency of the evolved time-space big data management framework to the user access request;
the storage module is used for: and carrying out distributed storage on each piece of sub data by adopting an allocated storage node.
CN202311079515.0A 2023-08-25 2023-08-25 Self-adaptive space-time big data distributed storage method and intelligent file system Active CN117112492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311079515.0A CN117112492B (en) 2023-08-25 2023-08-25 Self-adaptive space-time big data distributed storage method and intelligent file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311079515.0A CN117112492B (en) 2023-08-25 2023-08-25 Self-adaptive space-time big data distributed storage method and intelligent file system

Publications (2)

Publication Number Publication Date
CN117112492A CN117112492A (en) 2023-11-24
CN117112492B true CN117112492B (en) 2024-03-12

Family

ID=88797829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311079515.0A Active CN117112492B (en) 2023-08-25 2023-08-25 Self-adaptive space-time big data distributed storage method and intelligent file system

Country Status (1)

Country Link
CN (1) CN117112492B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118132A (en) * 2013-02-28 2013-05-22 浙江大学 Distributed caching system and method oriented to spatio-temporal data
WO2015096582A1 (en) * 2013-12-27 2015-07-02 华为技术有限公司 Index creation method, querying method, apparatus and device for spatial-temporal data
KR101852597B1 (en) * 2017-09-14 2018-04-27 주식회사 포스웨이브 Moving object big-data information storage systems and processing method using the same
CN109871418A (en) * 2019-01-04 2019-06-11 广州市城市规划勘测设计研究院 A kind of space index method and system of space-time data
CN110347680A (en) * 2019-06-21 2019-10-18 北京航空航天大学 A kind of space-time data indexing means towards high in the clouds environment
CN112328583A (en) * 2020-10-29 2021-02-05 北京东方耀阳信息技术有限公司 Spatio-temporal data management method
CN115827907A (en) * 2023-02-22 2023-03-21 中国科学院空天信息创新研究院 Cross-cloud multi-source data cube discovery and integration method based on distributed memory

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118132A (en) * 2013-02-28 2013-05-22 浙江大学 Distributed caching system and method oriented to spatio-temporal data
WO2015096582A1 (en) * 2013-12-27 2015-07-02 华为技术有限公司 Index creation method, querying method, apparatus and device for spatial-temporal data
KR101852597B1 (en) * 2017-09-14 2018-04-27 주식회사 포스웨이브 Moving object big-data information storage systems and processing method using the same
CN109871418A (en) * 2019-01-04 2019-06-11 广州市城市规划勘测设计研究院 A kind of space index method and system of space-time data
CN110347680A (en) * 2019-06-21 2019-10-18 北京航空航天大学 A kind of space-time data indexing means towards high in the clouds environment
CN112328583A (en) * 2020-10-29 2021-02-05 北京东方耀阳信息技术有限公司 Spatio-temporal data management method
CN115827907A (en) * 2023-02-22 2023-03-21 中国科学院空天信息创新研究院 Cross-cloud multi-source data cube discovery and integration method based on distributed memory

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A new distributed storage scheme for cluster video server;Liao, XF et al.;JOURNAL OF SYSTEMS ARCHITECTURE;20050228;第79-94页 *
大规模时空数据分布式存储方法研究;钟运琴;方金云;赵晓芳;;高技术通讯;20131215(12);第1219-1229页 *

Also Published As

Publication number Publication date
CN117112492A (en) 2023-11-24

Similar Documents

Publication Publication Date Title
US20190303382A1 (en) Distributed database systems and methods with pluggable storage engines
US8392836B1 (en) Presenting quick list of contacts to communication application user
US8738593B2 (en) Method and apparatus for reducing index sizes and increasing performance of non-relational databases
CN107562531B (en) Data equalization method and device
CN111737393B (en) Vector data self-adaptive management method and system in web environment
CN105279163A (en) Buffer memory data update and storage method and system
JP2003506777A (en) Multidimensional storage model and method
CN1271130A (en) Space/time net window of computer system
US20100114836A1 (en) Data decay management
US20140123025A1 (en) Presenting instant messages
CN107590083B (en) Massive remote sensing tile data rapid publishing method based on OWGA memory cache
CN102480502B (en) I/O load equilibrium method and I/O server
CN114415965A (en) Data migration method, device, equipment and storage medium
CN117112492B (en) Self-adaptive space-time big data distributed storage method and intelligent file system
CN112966832A (en) Multi-server-based federal learning system
CN109145225B (en) Data processing method and device
CN114385627A (en) Data analysis method and device based on GIS map and storage medium
CN116703132B (en) Management method and device for dynamic scheduling of shared vehicles and computer equipment
CN113590027A (en) Data storage method, data acquisition method, system, device and medium
CN109976885B (en) Event processing method and device based on multitask operating system and storage medium
CN110971647A (en) Node migration method of big data system
CN114840539A (en) Data processing method, device, equipment and storage medium
CN114253938A (en) Data management method, data management device, and storage medium
CN110049501B (en) Data acquisition method, data acquisition device and computer-readable storage medium
CN109683991B (en) Method and device for rapidly acquiring large amount of equipment information through multiple strategies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant