CN116627320A - Unstructured data storage, migration and identification method - Google Patents

Unstructured data storage, migration and identification method Download PDF

Info

Publication number
CN116627320A
CN116627320A CN202310374828.2A CN202310374828A CN116627320A CN 116627320 A CN116627320 A CN 116627320A CN 202310374828 A CN202310374828 A CN 202310374828A CN 116627320 A CN116627320 A CN 116627320A
Authority
CN
China
Prior art keywords
unstructured data
data
storage area
unstructured
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310374828.2A
Other languages
Chinese (zh)
Inventor
王燕蓉
闫丽飞
赵维伟
吕世雷
赵元朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Fujian Yirong Information Technology Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Fujian Yirong Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd, Fujian Yirong Information Technology Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN202310374828.2A priority Critical patent/CN116627320A/en
Publication of CN116627320A publication Critical patent/CN116627320A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of unstructured data processing, in particular to a method for storing, migrating and identifying unstructured data, which comprises the following steps: s1: the method comprises the steps of obtaining unstructured data, establishing an index tag for the unstructured data, and storing the index tag into a guide partition; s2: the unstructured data with the index label established is analyzed, the same part and different parts in the unstructured data are determined and split, and the split unstructured data are stored in a storage area; s3: storing the same part of the split unstructured data in a redundant partition in a storage area, and storing the different part of the split unstructured data in a total storage area in the storage area; the invention can analyze unstructured data, classify, combine and store the unstructured data, improve the storage compression efficiency of the data and facilitate the identification and migration of the data.

Description

Unstructured data storage, migration and identification method
Technical Field
The invention belongs to the technical field of unstructured data processing, and particularly relates to a method for storing, migrating and identifying unstructured data.
Background
In the information socialization era, mass data information is accumulated in each industry in the process of processing related business, along with popularization and development of IT application, the traditional paper data storage mode is continuously reduced, and more electronic information storage modes are adopted to store in a computer. For unstructured data storage, similar to pictures, images, video, etc., the diverse nature of unstructured data formats makes it inconvenient to use a two-dimensional table structure to implement the representation of data compressed storage.
The existing unstructured data are generally stored in a memory in sequence, so that the data are not related, the identification and retrieval time is long, the unstructured data are difficult to use, a large amount of redundant information exists in the unstructured data, a large amount of storage space is wasted when the data information is stored, and the data storage compression efficiency is low.
Disclosure of Invention
In order to make up for the defects of the prior art, the unstructured data are classified, combined and stored after being analyzed, the storage compression efficiency of the data is improved, and the identification and the migration of the data are facilitated.
The technical scheme adopted for solving the technical problems is as follows: the invention discloses a storage, migration and identification method of unstructured data, which is characterized in that: the method comprises the following steps:
s1: the method comprises the steps of obtaining unstructured data, establishing an index tag for the unstructured data, and storing the index tag into a guide partition;
s2: the unstructured data with the index label established is analyzed, the same part and different parts in the unstructured data are determined and split, and the split unstructured data are stored in a storage area;
s3: storing the same part of the split unstructured data in a redundant partition in a storage area, and storing the different part of the split unstructured data in a total storage area in the storage area;
and storing one copy of the same part in different unstructured data in the redundant partition, respectively setting mutually independent labels for a plurality of groups of different parts of data with different sources stored in the redundant partition, and backing up the redundant partition in the storage area.
Preferably, the index tag includes check information for checking the integrity of unstructured data, and the check information includes, but is not limited to, an MD5 value, an SHA1 value and a CRC32 value of unstructured data, and the unstructured data needs to be checked for the check information recorded in the index tag when the unstructured data is stored, referred to and used.
Preferably, the index tag established for the unstructured data in the step S1 includes type information, the redundant partition and the total storage area are divided into sub-partitions corresponding to the unstructured data of different types, and the type information recorded in the unstructured data block after the unstructured data analysis and splitting is still stored in the corresponding sub-partition.
Preferably, the index tag includes a main tag and a sub tag, the main tag stores index tags of all unstructured data stored currently, the sub tag is generated according to a fixed time interval, and the sub tag stores index tags of unstructured data stored in a corresponding period.
Preferably, when the unstructured data is migrated, the index tag, the data in the redundant partition and the data in the total storage area are migrated in sequence, the migration number is performed before the data blocks in the redundant partition and the total storage area are migrated, and the migration number data is migrated after the index tag.
Preferably, the unstructured data is verified at a fixed time interval in the unstructured data migration process.
Preferably, the unstructured data is divided into hot spot data and common data according to the use times and the use time, and the hot spot data is transmitted preferentially in the migration process.
Preferably, the hot spot data is stored in the same storage device or storage area after being migrated.
The beneficial effects of the invention are as follows:
1. according to the method for storing, migrating and identifying the unstructured data, the unstructured data to be stored is analyzed and split, the same part in different unstructured data is selected to be stored, redundant parts in the stored unstructured data are effectively reduced, the size of the stored data is reduced, the storage compression efficiency of the unstructured data is improved, and the waste of storage space is reduced.
2. According to the method for storing, migrating and identifying unstructured data, the index labels can be compared with each other by setting the main index and the sub index, so that the condition that the stored data cannot be used due to errors of the index labels is reduced, and meanwhile, the searching and using speed of the stored data is improved through the sub index.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a block flow diagram of the identification method of the present invention.
Detailed Description
The invention is further described in connection with the following detailed description in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.
As shown in fig. 1, the method for storing, migrating and identifying unstructured data according to the present invention comprises the following steps:
s1: the method comprises the steps of obtaining unstructured data, establishing an index tag for the unstructured data, and storing the index tag into a guide partition;
s2: the unstructured data with the index label established is analyzed, the same part and different parts in the unstructured data are determined and split, and the split unstructured data are stored in a storage area;
s3: storing the same part of the split unstructured data in a redundant partition in a storage area, and storing the different part of the split unstructured data in a total storage area in the storage area;
the method comprises the steps that one part of the same part of different unstructured data is stored in a redundant partition, mutually independent labels are respectively arranged on a plurality of groups of the same part of data with different sources stored in the redundant partition, and backup exists in the redundant partition in a storage area;
when the unstructured data is stored, an index label is established for the obtained unstructured data, so that the stored unstructured data is conveniently searched, read and migrated, the disorder of the stored unstructured data is avoided, the storage compression efficiency of the unstructured data is influenced, and the unstructured data is difficult to review and use, meanwhile, the same part and different parts in the split data block are determined by analyzing and splitting the unstructured data with the established index label, so that the same part in the data block is optionally stored in a redundant partition according to the difference between the data blocks, the different parts of the data block in minutes are all stored in a total storage area, the redundant part in the stored unstructured data is effectively reduced, the total size of the stored unstructured data is reduced, therefore, the storage compression efficiency of unstructured data is improved, the waste of storage space is reduced, the unstructured data is convenient to identify, consult and migrate, meanwhile, when the stored unstructured data is needed to be used, the same part and different parts corresponding to the unstructured data are called from the redundant subarea and the total storage area according to the index label, the same part and different parts are combined according to the sequence recorded in the index label to obtain the initial unstructured data, meanwhile, label information corresponding to the data stored in the same part in the redundant subarea in the unstructured data is recorded in the index label, so that when the unstructured data is used, as a plurality of groups of the same part data are stored in the redundant subarea, the situation of data calling errors occurs when the corresponding same part data is called in the redundant subarea, affecting the normal use of unstructured data.
As one embodiment of the present invention, the index tag includes check information for checking the integrity of unstructured data, where the check information includes, but is not limited to, MD5 value, SHA1 value, and CRC32 value of unstructured data, and the unstructured data needs to check the check information recorded in the index tag when stored, referred to, and used;
when an index tag is built for unstructured data, check information of the unstructured data is synchronously calculated, so that the check information is added to the index tag, meanwhile, when the unstructured data is used, after the same part of data and different parts of data which are called from a redundant partition and a total storage area are combined, the check information recorded in the index tag is used for verification, so that the unstructured data to be called for use is complete and free of errors, the situation that the unstructured data is changed and the original state of the unstructured data cannot be restored due to the fact that the data blocks called from the redundant partition and the total storage area are combined and the normal use of the unstructured data are influenced is avoided.
As an implementation mode of the invention, the index label established for the unstructured data in the step S1 includes type information, the redundant partition and the total storage area are divided into sub-partitions corresponding to the unstructured data of different types, and the type information recorded in the unstructured data block after the unstructured data analysis and splitting is still stored in the corresponding sub-partition;
when an index tag is established for unstructured data, type information of the unstructured data is added in the index tag according to the type of the unstructured data, such as pictures, audios, videos and texts, then the unstructured data is analyzed and split to obtain data blocks, and the data blocks are stored in corresponding sub-partitions in a redundant partition or a total storage area according to the type information recorded in the index tag, so that the similar unstructured data are stored in the same partition, the retrieval speed and the reading speed of the stored unstructured data are improved, the use of the unstructured data is guaranteed, and the response speed when the unstructured data is used is accelerated.
As one embodiment of the present invention, the index tag includes a main tag and a sub tag, the main tag stores index tags of all unstructured data stored currently, the sub tag is generated according to a fixed time interval, and the sub tag stores index tags of unstructured data stored in a corresponding period;
through setting up main label and sub-label for the index label that the unstructured data was established can exist the backup of mutual contrast, thereby avoid the index label to receive the interference after, lead to the unstructured data of storage to appear chaotic or lose, influence the normal use of unstructured data, simultaneously, establish the sub-label according to fixed time interval, look for the index label from the sub-label at first when using unstructured data, thereby the effectual size that reduces the index label can avoid directly looking for the index label from main label, lead to the problem that retrieval time increases, retrieval speed slows down, simultaneously, the sub-label is established according to time sequence, also be favorable to the seek the location to the index label, and then accelerate the retrieval, the use to unstructured data.
As one embodiment of the invention, when the unstructured data is migrated, sequentially migrating the index tag, the data in the redundant partition and the data in the total storage area, wherein the migration numbering is performed before the migration of the data blocks in the redundant partition and the total storage area, and the migration numbering data is migrated after the index tag;
when unstructured data is migrated, the data blocks in the redundant partition and the total storage area are numbered, so that the number of the transmitted and the untransmitted data blocks can be determined in the unstructured data migration process, and because the data blocks are all migrated, even if the data blocks are interfered in the unstructured data migration process, the positions of the lost data blocks can be determined according to the migration numbers of the data blocks, retransmission is performed, interruption and continuous transmission are allowed at any time in the unstructured data migration process, the migration process can be normally completed, meanwhile, the positions of the lost data blocks can be determined according to the migration numbers without searching for an index tag from the beginning when the data blocks are lost in the transmission process, so that the time and the calculation force spent in searching for the index tag are reduced, the migration speed of the unstructured data is accelerated, and the integrity of the migrated data is ensured.
As one embodiment of the invention, the unstructured data is verified at fixed time intervals in the unstructured data migration process;
because the interference in the data migration process is not considered to be controlled, the loss of the data block can occur at any time point in the transmission process, and the data which is lost and wrong in the unstructured data which is already transmitted can be checked in time by checking the migrated data through the check information and the migration number in the middle fixed time in the migration process, so that the transmission is carried out again, the integrity of the unstructured data after the migration is ensured, the errors which occur in the data migration process are corrected rapidly, and the efficiency of the data migration is improved.
As one embodiment of the present invention, the unstructured data is divided into hot spot data and normal data according to the number of times and the time of use, and the hot spot data is preferentially transmitted in the migration process;
the hot spot data and the common data are determined through the number of times and the using time of the stored unstructured data in daily use, wherein the number of times and the using time of the hot spot data are high, and the hot spot data are relatively important to users, so when the unstructured data are migrated, the hot spot data are preferentially migrated, the hot spot data are migrated to a target storage area earlier than the common data, the occupation time of the hot spot data in the migration process is reduced, the users can be ensured to normally use the hot spot data, and meanwhile, the possibility of damage and loss after the hot spot data are migrated can be reduced due to the fact that the transmission link has the possibility of fluctuation along with the time, and the normal use of the stored data by the users is ensured.
As one implementation mode of the invention, the hot spot data is stored in the same storage device or storage area after being migrated;
after the unstructured data is migrated, the hot spot data is stored in the same storage device or storage area, and the storage position of the corresponding unstructured data is changed, so that the access speed of the migrated hot spot data is improved, the time delay of a user when the user uses the hot spot data is reduced, and the user experience is improved.
The specific working procedure is as follows:
when the unstructured data is stored, an index label is established for the obtained unstructured data, so that the stored unstructured data is conveniently searched, read and migrated, the disorder of the stored unstructured data is avoided, the storage compression efficiency of the unstructured data is influenced, and the unstructured data is difficult to review and use, meanwhile, the same part and different parts in the split data block are determined by analyzing and splitting the unstructured data with the established index label, so that the same part in the data block is optionally stored in a redundant partition according to the difference between the data blocks, the different parts of the data block in minutes are all stored in a total storage area, the redundant part in the stored unstructured data is effectively reduced, the total size of the stored unstructured data is reduced, therefore, the storage compression efficiency of unstructured data is improved, the waste of storage space is reduced, the unstructured data is convenient to identify, consult and migrate, meanwhile, when the stored unstructured data is needed to be used, the same part and different parts corresponding to the unstructured data are called from the redundant subarea and the total storage area according to the index label, the same part and different parts are combined according to the sequence recorded in the index label to obtain the initial unstructured data, meanwhile, label information corresponding to the data stored in the same part in the redundant subarea in the unstructured data is recorded in the index label, so that when the unstructured data is used, as a plurality of groups of the same part data are stored in the redundant subarea, the situation of data calling errors occurs when the corresponding same part data is called in the redundant subarea, the normal use of unstructured data is affected;
when an index tag is established for unstructured data, check information of the unstructured data is synchronously calculated, so that the check information is added to the index tag, and meanwhile, when the unstructured data is used, after the same part of data and different parts of data which are called from a redundant partition and a total storage area are combined, the check information recorded in the index tag is used for verification, so that the unstructured data to be called for use is complete and free of errors, the situation that the unstructured data is changed and the initial state of the unstructured data cannot be restored due to the fact that the data blocks called from the redundant partition and the total storage area are combined and normal use of the unstructured data is affected is avoided;
when an index tag is established for unstructured data, type information of the unstructured data is added in the index tag according to the type of the unstructured data, such as pictures, audios, videos and texts, then the unstructured data is analyzed and split to obtain data blocks, and the data blocks are stored in corresponding sub-partitions in a redundant partition or a total storage area according to the type information recorded in the index tag, so that the similar unstructured data are stored in the same partition, the retrieval speed and the reading speed of the stored unstructured data are improved, the use of the unstructured data is guaranteed, and the response speed when the unstructured data is used is accelerated;
by setting the main label and the sub-label, the index label established for the unstructured data can have backup of mutual comparison, so that the problem that the index label is disturbed, the stored unstructured data is disordered or lost, the normal use of the unstructured data is influenced, meanwhile, the sub-label is established according to a fixed time interval, the index label is searched from the sub-label when the unstructured data is used, the size of the index label is effectively reduced, the problems that the index label is directly searched from the main label, the search time is increased and the search speed is reduced are solved, meanwhile, the sub-label is established according to the time sequence, the search and the positioning of the index label are facilitated, and the search and the use of the unstructured data are further accelerated;
when unstructured data is migrated, the data blocks in the redundant partition and the total storage area are numbered, so that the number of the transmitted and the untransmitted data blocks can be determined in the unstructured data migration process, and as the data blocks are all migrated, even if the data blocks are lost due to interference in the unstructured data migration process, the positions of the lost data blocks can be determined according to the migration numbers of the data blocks, retransmission is performed, interruption and continuous transmission are allowed at any time in the unstructured data migration process, the migration process can be normally completed, meanwhile, when the data blocks are migrated and the data blocks are lost in the transmission process, the positions of the lost data blocks can be determined according to the migration numbers without searching for the index tag, so that the time and the calculation force spent in searching for the index tag are reduced, the migration speed of the unstructured data is accelerated, and the integrity of the migrated data is ensured;
because the interference in the data migration process is not considered to be controlled, the loss of the data block can occur at any time point in the transmission process, and the data which is lost and wrong in the unstructured data which is already transmitted can be checked out in time by checking the migrated data through the check information and the migration number in the middle of the migration process, so that the transmission is carried out again, the integrity of the unstructured data after the migration is ensured, the errors which occur in the data migration process are corrected quickly, and the efficiency of the data migration is improved;
the method has the advantages that the hot spot data and the common data are determined through the number of times and the using time of the stored unstructured data in daily use, wherein the number of times and the using time of the hot spot data are large, and the hot spot data are relatively important for users, so that when the unstructured data are migrated, the hot spot data are preferentially migrated, the hot spot data are migrated to a target storage area earlier than the common data, the occupation time of the hot spot data in the migration process is reduced, the users can be ensured to normally use the hot spot data, and meanwhile, the possibility of damage and loss after the hot spot data are migrated can be reduced due to the fact that the transmission link has the possibility of fluctuation along with the time, and the normal use of the stored data by the users is ensured;
after the unstructured data is migrated, the hot spot data is stored in the same storage device or storage area, and the storage position of the corresponding unstructured data is changed, so that the access speed of the migrated hot spot data is improved, the time delay of a user when the user uses the hot spot data is reduced, and the user experience is improved.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. A method for storing, migrating and identifying unstructured data, which is characterized in that: the method comprises the following steps:
s1: the method comprises the steps of obtaining unstructured data, establishing an index tag for the unstructured data, and storing the index tag into a guide partition;
s2: the unstructured data with the index label established is analyzed, the same part and different parts in the unstructured data are determined and split, and the split unstructured data are stored in a storage area;
s3: storing the same part of the split unstructured data in a redundant partition in a storage area, and storing the different part of the split unstructured data in a total storage area in the storage area;
and storing one copy of the same part in different unstructured data in the redundant partition, respectively setting mutually independent labels for a plurality of groups of different parts of data with different sources stored in the redundant partition, and backing up the redundant partition in the storage area.
2. A method of unstructured data storage, migration, and identification according to claim 1, wherein: the index tag comprises check information for checking the integrity of unstructured data, wherein the check information comprises, but is not limited to, an MD5 value, an SHA1 value and a CRC32 value of unstructured data, and the unstructured data needs to check the check information recorded in the index tag when being stored, referred to and used.
3. A method of unstructured data storage, migration, and identification according to claim 1, wherein: the index label established for the unstructured data in the step S1 comprises type information, the redundant subareas and the total storage area are divided into subareas corresponding to the unstructured data of different types, and the type information recorded in the unstructured data block after the unstructured data analysis and splitting is still stored in the corresponding subareas.
4. A method of unstructured data storage, migration, and identification according to claim 1, wherein: the index tag comprises a main tag and a sub tag, wherein the main tag stores all index tags of unstructured data stored currently, the sub tag is generated according to a fixed time interval, and the sub tag stores the index tag of the unstructured data stored in a corresponding time interval.
5. A method of unstructured data storage, migration, and identification according to claim 2, wherein: and when the unstructured data is migrated, sequentially migrating the index tag, the data in the redundant partition and the data in the total storage area, wherein the migration number is carried out before the migration of the data blocks in the redundant partition and the total storage area, and the migration number data is migrated after the index tag.
6. The method for storing, migrating and identifying unstructured data according to claim 5, wherein: and verifying the migrated unstructured data at a fixed time interval in the unstructured data migration process.
7. The method for storing, migrating and identifying unstructured data according to claim 5, wherein: the unstructured data is divided into hot spot data and common data according to the use times and the use time, and the hot spot data is transmitted preferentially in the migration process.
8. The method for storing, migrating and identifying unstructured data according to claim 7, wherein: and the hot spot data are stored in the same storage device or storage area after being migrated.
CN202310374828.2A 2023-04-10 2023-04-10 Unstructured data storage, migration and identification method Pending CN116627320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310374828.2A CN116627320A (en) 2023-04-10 2023-04-10 Unstructured data storage, migration and identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310374828.2A CN116627320A (en) 2023-04-10 2023-04-10 Unstructured data storage, migration and identification method

Publications (1)

Publication Number Publication Date
CN116627320A true CN116627320A (en) 2023-08-22

Family

ID=87596260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310374828.2A Pending CN116627320A (en) 2023-04-10 2023-04-10 Unstructured data storage, migration and identification method

Country Status (1)

Country Link
CN (1) CN116627320A (en)

Similar Documents

Publication Publication Date Title
CN110531940B (en) Video file processing method and device
CN109710572B (en) HBase-based file fragmentation method
CN106874348B (en) File storage and index method and device and file reading method
CN104077380B (en) A kind of data de-duplication method, apparatus and system
CN105824846B (en) Data migration method and device
US11176110B2 (en) Data updating method and device for a distributed database system
CN113111129A (en) Data synchronization method, device, equipment and storage medium
CN113568582B (en) Data management method, device and storage equipment
CN107665219A (en) A kind of blog management method and device
CN110727406A (en) Data storage scheduling method and device
CN111367926A (en) Data processing method and device for distributed system
CN104965835B (en) A kind of file read/write method and device of distributed file system
CN114860745B (en) Database expansion method based on artificial intelligence and related equipment
CN109947730A (en) Metadata restoration methods, device, distributed file system and readable storage medium storing program for executing
CN113448946B (en) Data migration method and device and electronic equipment
WO2021238408A1 (en) Object storage platform, object aggregation method and apparatus, and server
CN110515958A (en) Data consistency method, apparatus, equipment and storage medium based on big data
CN113553325A (en) Synchronization method and system for aggregation objects in object storage system
CN112965939A (en) File merging method, device and equipment
CN109542860B (en) Service data management method based on HDFS and terminal equipment
CN108959527B (en) Method for reading and displaying interlocking log based on Windows file mapping technology
CN116627320A (en) Unstructured data storage, migration and identification method
CN108021562B (en) Disk storage method and device applied to distributed file system and distributed file system
US20220187990A1 (en) Interrupted replicated write recognition
CN114491145A (en) Metadata design method based on stream storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination