CN117891653A - Method, device and system for transmitting HBase data and tape library data mutually - Google Patents

Method, device and system for transmitting HBase data and tape library data mutually Download PDF

Info

Publication number
CN117891653A
CN117891653A CN202311674689.1A CN202311674689A CN117891653A CN 117891653 A CN117891653 A CN 117891653A CN 202311674689 A CN202311674689 A CN 202311674689A CN 117891653 A CN117891653 A CN 117891653A
Authority
CN
China
Prior art keywords
package
data
tar
tape library
index number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311674689.1A
Other languages
Chinese (zh)
Inventor
霍建军
戚翯
刘鸿新
张会君
杨立峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Postal Savings Bank of China Ltd
Original Assignee
Postal Savings Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postal Savings Bank of China Ltd filed Critical Postal Savings Bank of China Ltd
Priority to CN202311674689.1A priority Critical patent/CN117891653A/en
Publication of CN117891653A publication Critical patent/CN117891653A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and a system for transmitting HBase data and tape library data mutually. The method for taking data from the HBase to the tape library offline comprises the following steps: scanning a metadata table, and acquiring a file index number list of the data to be offline from the HBase cluster whole table according to the transaction date; traversing the service system number and then traversing the region code for the list, downloading and storing the data of the same region code to the same local catalog, compressing the data into a tar package when the preset size is full in the downloading process, and recording the corresponding relation between the file index number and the tar package; the name element of the tar package comprises a package date, a business system number, a region code and a current day package sequence number; moving the tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to a tape library; and storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of the metadata. The method reduces interaction with the tape library and improves the efficiency of offline data export and recovery.

Description

Method, device and system for transmitting HBase data and tape library data mutually
Technical Field
The application relates to the technical field of data storage, in particular to a method, a device and a system for mutually transmitting HBase data and tape library data.
Background
Unstructured data is data that is inconvenient to represent with a two-dimensional logical table of a database, namely unstructured data, and comprises office documents, texts, pictures, XML, HTML, various reports, images, audio/video information and the like in all formats, relative to structured data. With the advent of the digital age, data has exhibited explosive growth, with over 80% being present in unstructured form. The vast unstructured data generated during business operations and business handling processes contains a great deal of information and knowledge with high value. Therefore, each enterprise builds a big data platform for storing and managing mass data.
According to the storage period and the frequency of access, the Data are classified into three types of Hot Data, warm Data and cold Data, wherein the Hot Data (Hot Data) is Data with high access frequency and critical to business and application, and the Data generally need to be accessed and processed quickly and efficiently; warm Data (Warm Data) refers to Data with moderate access frequency and certain importance to services and applications, which do not need to be accessed and processed as fast as hot Data, but also need to be stored and accessed reliably for a certain period of time; cold Data (Cold Data) refers to Data that is less frequently accessed, less important to traffic and applications, and is typically stored for a longer period of time, is infrequently active, is not accessed frequently, or even never accessed, but still requires long-term retention of Data. For example, according to the national accounting archive management method, the certificates and the account books need to be stored for more than 30 years, and the data belong to the cold data which are not accessed frequently.
Hadoop is a technical architecture based on open source, realizes a system hardware architecture which adopts a large number of cheap PC Server clusters and low-end arrays to replace the traditional high-end storage solution, and is a main architecture scheme for constructing a big data platform. HBase represents Hadoop Database, which is a child of the Hadoop project of Apache. Hbase is often used as a mass small file storage scheme to store warm data.
The tape library is a tape-based backup system like an automatic loading tape drive, has a storage capacity reaching PB (1PB=100 kiloGB), can realize functions of continuous backup, automatic tape searching and the like, can realize intelligent recovery, real-time monitoring and statistics under the support of management software, and is a main device for centralized network data backup. The advantages of the tape library in the cold data storage service scene are established by the characteristics of large storage capacity, low price and low energy consumption.
In order to ensure the online storage and viewing of the warm data of the service system, the cold data of each service system needs to be exported to the tape library offline at regular intervals. However, the existing tape library is imported for backup, and the tape library needs to be sequentially read and stored for massive scattered small files, so that the tape library is frequently interacted with the tape in a read-write mode, so that the magnetic powder in the tape is excessively consumed, and the service life of each tape is seriously shortened; meanwhile, in order to ensure the safety of stored data, the tape library generally performs multi-copy copying, so that metadata stored in massive small files becomes extremely huge, and overall management and reading become extremely difficult along with the increase of time.
Disclosure of Invention
The embodiment of the application provides a method, a device and a system for mutually transmitting HBase data and tape library data, which can reduce interaction with the tape library, and realize efficient off-line data from Hbase to the tape library and recovery of data from the tape library to Hbase.
According to a first aspect of the present application, a method of taking data offline from an HBase to a tape library is presented, the method being performed by a tape library archiving server, comprising the steps of:
scanning a metadata table, and acquiring a file index number list of the data to be offline from the HBase cluster whole table according to the transaction date;
traversing the service system number and then traversing the region code for the file index number list, downloading and storing the offline data to be subjected to the same region code to a local same directory, compressing the offline data into a tar package when the data is downloaded to be full of a preset size, and recording the corresponding relation between the file index number and the tar package; the name element of the tar package comprises a package date, a business system number, a region code and a current day package sequence number;
moving the obtained tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to a tape library;
and storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of metadata.
According to a second aspect of the present application, there is provided an apparatus for taking data offline from an HBase to a tape library, the apparatus being provided at a tape library archiving server, comprising:
the off-line list obtaining unit is used for scanning the metadata list and obtaining a file index number list of the off-line data from the HBase cluster whole list according to the transaction date;
the tar package compression unit is used for traversing the service system number and then traversing the region code for the file index number list, downloading and storing file index number data of the same region code to the same local directory, compressing each time a preset size is full in the data downloading process into a tar package, and recording the corresponding relation between the file index number and the tar package; the name element of the tar package comprises a package date, a business system number, a region code and a current day package sequence number;
the tar package backup unit is used for moving the obtained tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to the tape library;
and the corresponding relation storage unit is used for storing the corresponding relation between the file index number and the tar package into a metadata base elastic search and updating the life cycle state of metadata.
According to a third aspect of the present application, there is provided a method of recovering data from a tape library to an HBase, the method being performed by a tape library recovery server, comprising the steps of:
According to the offline retrieval application of the service system, scanning the metadata table to obtain a file index number list of the data to be recovered;
searching the corresponding relation between the file index number in the metadata database elastic search and the tar package according to the file index number list, and obtaining the package name of the corresponding tar package; the package name elements of the tar package comprise a package date, a business system number, a region code and a current day package sequence number;
generating a recovery control file according to the packet name of the tar packet, calling a tape library command to control the tape library to recover the tar packet and returning to a tape library recovery log;
analyzing the recovery log of the tape library, and acquiring the recovered tar packet at a specified position according to the analyzed log;
decompressing and checking the restored tar package, extracting the index number data of the retrieved file, and uploading the index number data to the HBase cluster.
According to a fourth aspect of the present application, there is provided an apparatus for recovering data from a tape library to an HBase, the apparatus being applied to a tape library recovery server, comprising:
the to-be-restored list acquisition unit is used for scanning the metadata list to acquire a file index number list of to-be-restored data according to the offline retrieval application of the service system;
the corresponding relation searching unit is used for searching the corresponding relation between the file index number in the metadata base elastic search and the tar package according to the file index number list and obtaining the package name of the corresponding tar package; the package name elements of the tar package comprise a package date, a business system number, a region code and a current day package sequence number;
the tar package recovery unit is used for generating a recovery control file according to the package name of the tar package, calling a tape library command to control the tape library to recover the tar package and returning a tape library recovery log;
the tar package obtaining unit is used for analyzing the recovery log of the tape library and obtaining the recovered tar package at a designated position according to the analyzed log;
and the retrieving data uploading unit is used for decompressing and checking the restored tar packet, extracting the retrieved file index number data and uploading the retrieved file index number data to the HBase cluster.
According to a fifth aspect of the present application, a system for mutual transmission of HBase data and tape library data is provided, the system comprising a tape library archiving server and a tape library recovery server, wherein:
the archiving server with the library is used for scanning the metadata table and acquiring a file index number list of the data to be offline from the HBase cluster whole table according to the transaction date; traversing the service system number and then traversing the region code for the file index number list, downloading and storing file index number data of the same region code to a local same catalog, compressing each time a preset size is full in the data downloading process into a tar package, and recording the corresponding relation between the file index number and the tar package, wherein package name elements of the tar package comprise package date, service system number, region code and current package sequence number; moving the obtained tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to a tape library; storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of metadata;
The recovery server with the library is used for scanning the metadata table to obtain a file index number list of data to be recovered according to the offline retrieval application of the service system; searching the corresponding relation between the file index number in the metadata database elastic search and the tar package according to the file index number list, and obtaining the package name of the corresponding tar package; generating a recovery control file according to the packet name of the tar packet, calling a tape library command to control the tape library to recover the tar packet and returning to a tape library recovery log; analyzing the recovery log of the tape library, and acquiring the recovered tar packet at a specified position according to the analyzed log; decompressing and checking the restored tar package, extracting the index number data of the retrieved file, and uploading the index number data to the HBase cluster.
Any one of the above technical schemes adopted in the embodiment of the application can achieve the following beneficial effects:
when the data is offline from the HBase to the tape library, the tape library archiving server firstly scans the metadata table and acquires the file index number list of the data to be offline from the HBase cluster whole table according to the transaction date, so that the efficiency of data export is met; then traversing the service system number and then traversing the region code for the file index number list, downloading and storing the data to be offline of the same region code to a local same directory, compressing each time when the preset size is full in the data downloading process into a tar package, and recording the corresponding relation between the file index number and the tar package, wherein package name elements of the tar package comprise package date, service system number, region code and current package sequence number, so that the number of read-write interaction with a tape library is reduced, timeliness of each backup data is ensured, and the whole data backup principle is adopted, the exported import service is combined with distributed operation downloaded by each sub-center to prepare globally unique tar package naming specification, so that the performance of importing and exporting the tape library of the offline package is optimized, effective support is realized for searching and recovering the follow-up offline package, and the quick positioning capability of the tape library on the offline package is improved; then, the obtained tar package is moved to a backup catalog, and a tape library command is regularly called to backup the tar package to a tape library, so that the data can be efficiently offline from Hbase to the tape library; and finally, storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of metadata, so that corresponding metadata management is increased, the tar package can be quickly positioned according to the file index number when the tape library is restored to HBase by the tape library restoration server, and the query time is reduced and the offline data restoration efficiency and convenience are improved when the tape library is requested to restore the offline package according to the package name of the tar package.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of a method for offline data from HBase to a tape library according to an embodiment of the present application;
FIG. 2 illustrates a flow chart of taking data offline from an HBase to a tape library in an embodiment of the present application;
FIG. 3 illustrates a naming convention diagram of a tar package of an embodiment of the present application;
FIG. 4 is a schematic functional structure of an apparatus for offline data from HBase to a tape library according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for backing up data from a tape library to an HBase according to an embodiment of the present application;
FIG. 6 illustrates a flow chart of recovering data from a tape library to an HBase in an embodiment of the present application;
FIG. 7 is a schematic functional structure of an apparatus for backing up data from a tape library to an HBase according to an embodiment of the present application;
fig. 8 is a schematic diagram of a composition structure of a system for mutual transmission of HBase data and tape library data according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present application and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
Example 1
According to a first aspect of the present application, referring to fig. 1 and 2, an embodiment of the present application proposes a method for offline data from HBase to a tape library, the method being performed by a tape library archiving server, comprising steps S11 to S14:
S11, scanning the metadata table, and acquiring a file index number list of the data to be offline from the HBase cluster whole table according to the transaction date.
The basic process of backing up offline data is according to the corresponding backup mode provided by Hbase, for example, the cluster command of Hbase: the whole table copies Distcp/CopyTable; hbase cluster configuration function: creating Snapshot by the Snapshot; and an API for data export and data import provided by the Hbase cluster: export/Import automatically grabs to-be-offline data conforming to rules from Hbase cluster storage every day, exports the data to an offline data preparation area according to offline data organization rules and each service system by a directory, calls a corresponding tape library command to carry out offline data packaging backup, ensures that the exported data meets data integrity, and realizes effective management of file metadata.
To turn on this step S11, job scheduling is performed using ETL (Extract Transform and Load, extract conversion load) scheduling policy, triggering offline export tasks. Metadata such as offline task number, offline package, etc. can be stored in NoSQL (broadly referred to as a non-relational database) or a relational database, and the web interface integrates offline task management.
A list of file index numbers, i.e. file index files. The file index number is a unique identification of the definition file within the system. For the HBase database, the file index number refers to the file Rowkey, which is a unique number stored in the HBase, and the number is also stored in the metadata table.
The system data source is generated in the front of each branch center and is concentrated and filed in the same table of the national center, and the daily data volume is rapidly increased, so that the data is rapidly downloaded and exported every day, and the reduction of the branch buffer space is particularly important. In the step S11, the metadata table is scanned, and the file index number list of the data to be offline is obtained from the HBase cluster whole table according to the transaction date, so that the efficiency of data export is met.
S12, traversing the service system number and then traversing the region code for the file index number list, downloading and storing the offline data to be subjected to the same region code to a local same catalog, compressing the offline data into a tar package every time a preset size is full in the data downloading process, and recording the corresponding relation between the file index number and the tar package; the package name element of the tar package comprises a package date, a business system number, a region code and a current day package sequence number.
In the step S12, the data to be offline data in the distributed cluster is scanned and downloaded, the downloaded file is compressed into an offline package with a preset size, and the offline package is moved to the backup directory, and the related information of the offline package is recorded. In particular, the method comprises the steps of,
the archiving server with the library traverses the service system number and then traverses the region code for the file index number list, finds the task list to be executed by the machine, updates the state of the offline archiving task list after acquiring the offline data corresponding to the machine according to the task list, and circularly reads the file index number in the task list, inserts the file index number into the offline archiving file index number list, then downloads the file to the backup catalog in a multithreading way, and updates the file index number state at the same time so as to distinguish the downloaded state from the un-downloaded state. The method comprises the steps of downloading file index number data of the same region code, then placing the file index number data in a local same directory, controlling the size of a folder to be a preset size by each thread, compressing the file index number data into an offline package, namely a tar package once the file index number data reach the preset size, then moving the tar package to a backup directory, and recording the corresponding relation between the file index number and the tar package at the same time so as to facilitate flow tracking. The number of read-write interactions with the tape library can be reduced by packing a number of downloaded small files into a tar package of a preset size.
In the step S12, the current situation of a service system and a deployment architecture are combined, a whole data backup principle is adopted, an export import service is combined with distributed operation downloaded by each sub-center, a globally unique naming specification of a tar package is formulated, the tar package is named according to the packing date, the service system number, the region code and the elements of the current packing sequence number, each element contains special meanings, the performance of importing and exporting a tape library of an offline package is optimized, effective support is achieved for searching and recovering the subsequent offline package, and the quick positioning capability of the tape library on the offline package is improved.
Specifically, referring to fig. 3, the packet name of the tar packet is composed of 4 special elements: 8-bit packing date+11-bit business system number+2-bit region code+file packing sequence number, and the element is divided by underlining.
The numbering of each element is illustrated as follows:
(1) the first element is a packing date representing the time when the offline package was generated.
The format adopts YYYY/MM/DD, and the date, year, month and day are intercepted according to the real-time generated by file downloading and packaging.
(2) The second element is a service system number, data indicating which service system.
The method comprises the steps of dividing different business systems into different business systems in a row, uniformly planning unique system codes for each business system, and making each business system have independent 11-bit system numbers.
(3) The third element is a region code that identifies the sub-center from which the data was generated.
The system platform consists of 36 groups of central clusters+1 groups of national clusters, each group of clusters being assigned a central code to identify from which group of clusters the data is generated. The province code unified with the country is consistent, and the first two digits of the province code unified with the country are taken, for example: beijing-11, tianjin-12, hebei-13, etc.
(4) The fourth element is a package sequence number indicating what the offline package is generated on the day.
The packing sequence numbers of each area and each day start from 1 and are sequentially increased, the element has no bit limit, and the business with large data volume can be ensured to be smoothly carried out.
According to the tar package naming specification, the subarea of each data download is ensured, the size of the exported data can be accurately calculated through the area code and the daily packing sequence number, and convenience is provided for the subsequent data verification; and convenience is increased for subsequent operation and maintenance management and service system check recovery through the packing date and the service system number.
The step S12 realizes the packing mode of the split center mode, can flexibly configure threads, ensures the downloading speed of the file index number (the unique file retrieval identifier) in the traversal process, and improves the whole offline efficiency. Meanwhile, the design of the packing rule of the sub-center downloads the offline package according to different area codes, and file index information corresponding to the offline package is generated by dividing the directory, so that on the follow-up data verification, each sub-center operation performs one-to-one serial verification on each sub-center directory file and the offline package, and if a problem exists, the sub-center operation re-downloads the offline package according to a single sub-center, thereby ensuring the integrity of the whole data.
And S13, moving the obtained tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to the tape library.
The tape library data import may use a combination of virtual tape library tape replication and command forwarding to implement write caching with virtual tape as a physical tape backup. The implementation process is that the backup server backs up the data to the virtual tape; after the backup is completed, the virtual tape equipment imports the data in the memory into the physical tape based on the strategy trigger, and after the importation is completed, the backup data in the memory can be deleted according to the strategy; after copying the data to the physical tape, the virtual tape device directly forwards all read-write operations of the backup server to the virtual tape to the physical tape process in order to ensure the consistency of the tape data.
In this step S13, after the obtained tar packet is moved to the backup directory, the tape library command is regularly invoked according to the offline export task, and the tar packet is backed up to the tape library, so as to realize efficient offline data from Hbase to the tape library. Of course, after step S13, the tape library monitoring command is called again to query whether the backup is completed, and if the backup is completed, the states of the offline archive file index table and the offline archive task table are updated to the offline completed state, so as to identify the offline data in the deleted Hbase.
And S14, storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of metadata.
In this step S14, corresponding metadata management is added, that is, the correspondence between the file index number and the derived offline package is stored in the metadata database. In addition, considering that the warm data and the cold data have different storage periods, the life cycle state of the metadata is updated after the warm data is converted into the cold data for storage. Since elastiscearch is a distributed open source search and analysis engine, it was developed on the basis of Apache Lucene. Lucene is an open-source search engine toolkit, and elastic search makes full use of Lucene and extends it, making storage, indexing, searching faster and easier, and most importantly, everything flexible and resilient, as shown by the term "elastic". Moreover, the application code is not necessarily written in Java to be compatible with the elastiscearc, and the elastiscearch clusters can be indexed, searched and managed entirely by HTTP requests in JSON format. Therefore, in the step S14, when the data recovery is achieved by utilizing the database elastic search, the offline package can be rapidly positioned according to the file index number, so that the backup data recovery efficiency is improved.
According to some embodiments of the present application, the tar packet has a size of about 2G.
And compressing each full 2G size into a tar packet in the data downloading process. The method has the advantages that one compressed packet of the data 2G to be offline is taken as a quota, so that the standard requirement of the medium storage of the offline tape library is met, the interaction of the tape library is reduced, the timeliness of each backup data is guaranteed, the 2G file size is matched with the data granularity of the writing and reading of the tape, the saturation of the content of the service data packet is met, and the efficiency is improved.
According to some embodiments of the present application, the method of the embodiments of the present application further comprises:
and in the data downloading process, counting the number of the exported data to be offline, and carrying out numerical comparison with the number of lines obtained by scanning the file index number list by an Export tool of the HBase so as to ensure the integrity of the exported data.
In order to ensure the integrity of the exported data, on one hand, the Export of Hbase performs full table scanning on a file index number list of the to-be-offline data by using scan, and internal next () is continuously called until all blocks are read, so as to obtain the number of rows of all rowkeys in the table. And on the other hand, the number of the derived bars of the offline data to be processed is counted in the data downloading process, and the number of the derived bars is compared with the number of the rows obtained by the Export through scan, so that the derived data is ensured to be consistent with the number of scan bars in the table.
According to some embodiments of the present application, the method of the embodiments of the present application further comprises: each tar packet is encrypted using SM3 algorithm.
To ensure the integrity of the exported data, embodiments of the present application also introduce the SM3 algorithm. The SM3 algorithm is a hash algorithm, is a cryptographic hash algorithm which is autonomously designed for China, is suitable for generating and verifying digital signatures and verification message authentication codes and random numbers in commercial cryptographic applications, and can meet the security requirements of various cryptographic applications. In order to ensure the security of the hash algorithm, the length of the generated hash value should not be too short, for example, MD5 outputs a 128-bit hash value, the output length is too short, the security is affected, the output length of SHA-1 algorithm is 160 bits, the output length of SM3 algorithm is 256 bits, and the generation of the hash value by SM3 is simple and effective for any given message.
The SM3 algorithm has the following characteristics: (1) It is not feasible to calculate a message from any given hash value (i.e., the function is one way); (2) it is not feasible to modify the message without modifying the hash value; (3) It is not feasible to find two messages with the same hash value.
Based on the characteristics of the SM3 algorithm, SM3 encryption is carried out on each offline packet, so that the integrity and safety of the derived data are effectively ensured.
In summary, the method for offline data from HBase to tape library provided in the embodiments of the present application is executed by a tape library archiving server, and a metadata table is scanned first, and a file index list of data to be offline is obtained from an HBase cluster full table according to a transaction date, so that efficiency of data export is satisfied; then traversing the service system number and then traversing the area code for the file index number list, downloading and storing the data to be offline of the same area code to a local same catalog, compressing each time a preset size is full in the data downloading process to form a tar package, and recording the corresponding relation between the file index number and the tar package, wherein package name elements of the tar package comprise package date, service system number, area code and current package sequence number, so that the number of read-write interaction with a tape library is reduced, and the whole data backup principle is adopted, the exported import service is combined with the distributed operation downloaded by each sub-center to prepare a globally unique tar package naming specification, the performance of importing and exporting the offline package into the tape library is optimized, effective support is achieved for searching and recovering the subsequent offline package, and the quick positioning capability of the tape library on the offline package is improved; then, the obtained tar package is moved to a backup catalog, and a tape library command is regularly called to backup the tar package to a tape library, so that the data can be efficiently offline from Hbase to the tape library; and finally, storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of metadata, thereby increasing corresponding metadata management, so that the offline package can be rapidly positioned according to the file index number, and the efficiency and convenience of offline data recovery are improved.
Example 2
According to a second aspect of the present application, as shown in fig. 4, an embodiment of the present application further proposes an apparatus for offline data from HBase to a tape library, where the apparatus is disposed on a tape library archiving server, and includes:
the offline list obtaining unit 41 is configured to scan the metadata table, and obtain a file index number list of the offline data from the HBase cluster whole table according to the transaction date;
the tar package compression unit 42 is configured to traverse the service system number and then traverse the region code for the file index number list, download and store file index number data of the same region code to the same local directory, compress each time a preset size is full in the data downloading process into a tar package, and record a corresponding relationship between a file index number and the tar package; the name element of the tar package comprises a package date, a business system number, a region code and a current day package sequence number;
a tar package backup unit 43, configured to move the obtained tar package to a backup directory, and regularly call a tape library command to backup the tar package to a tape library;
and a correspondence storage unit 44, configured to store the correspondence between the file index number and the tar package into a metadata base elastic search, and update the lifecycle state of the metadata.
According to some embodiments of the present application, still referring to fig. 4, the apparatus of the embodiments of the present application further includes:
the Export verification unit 45 is configured to count the number of exports to be offline data in the data downloading process, and perform numerical comparison with the number of lines obtained by scanning the file index number list by using the Export tool of HBase, so as to ensure the integrity of the exported data.
According to some embodiments of the present application, still referring to fig. 4, the apparatus of the embodiments of the present application further includes:
a tar packet encryption unit 46 for encrypting each tar packet using SM3 algorithm.
It can be understood that each unit in the apparatus of this embodiment 2 can correspondingly implement each step in the foregoing method embodiment 1, and the relevant explanation about the method of embodiment 1 is applicable to each unit in the apparatus of this embodiment 2, which is not described herein again. In addition, the names of the units described in the embodiments of the present application do not constitute limitations on the units themselves in some cases.
Example 3
According to a third aspect of the present application, referring to fig. 5 and 6, the embodiments of the present application further provide a method for recovering data from a tape library to an HBase, where the method is performed by a tape library recovery server, and includes the following steps:
S51, according to the offline retrieval application of the service system, the metadata table is scanned to obtain a file index number list of the data to be recovered.
When the service system checks the historical offline files, the off-line retrieval application is initiated, the off-line retrieval application is received by the recovery server with the library, and the retrieval task is inserted into the recovery task table and the off-line retrieval file index number data table. And judging the type of the offline retrieval application according to the specified conditions in the retrieval application, and acquiring a recovery task according to the specified conditions. The specified conditions include: 1) File index number (Rowkey); 2) Transaction institution + transaction date; 3) Company account numbers, etc. And according to the recovery task, acquiring a file index number list through scanning the metadata list to obtain a file index number list of the data to be recovered.
S52, searching the corresponding relation between the file index number in the metadata base elastic search and the tar package according to the file index number list, and obtaining the package name of the corresponding tar package; the size of the tar package is about 2G, and the package name element comprises a package date, a business system number, a region code and a current day package sequence number.
In the foregoing embodiment 1, when the backup of data from HBase to tape library is completed, the correspondence between the file index number and tar package is stored in the metadata base elastomer search. In the step, the corresponding relation between the file index number in the elastic search and the tar package is searched according to the file index number, and the package name of the corresponding tar package is obtained.
S53, generating a recovery control file according to the packet name of the tar packet, calling a tape library command to control the tape library to recover the tar packet and returning to a tape library recovery log.
When the tape library recovery server requests the tape library to recover the offline package according to the package name of the tar package, the offline package is regularly stored in the tape library because the package name element of the tar package comprises the package date, the service system number, the region code and the current package sequence number, so that the pressure of an administrator is reduced, the query time is shortened, and the convenience of offline data recovery is improved. It can be understood that the offline package is not limited to being backed up in a tape library, different storage media can be expanded, the convenience of management of other storage media can be ensured, and the long-term storage and complete safety of electronic data of each business system of the bank can be effectively ensured.
It should be noted that, the tape library recovery server requests the tape library to recover offline packages according to the package names of the tar packages, firstly generates a recovery control file according to the package names of the tar packages, and then calls the tape library to command the tape library to perform data recovery. And after the tar package file is restored according to the control command tape library, generating a tape library restoration log, and returning the restoration log to the tape library restoration server.
S54, analyzing the recovery log of the tape library, and acquiring the recovered tar packet at a designated position according to the analyzed log.
The tape library recovery server analyzes the tape library recovery log according to the key words, acquires the recovery position of the tar package from the analyzed log, and further acquires the recovered tar package at the designated position.
S55, decompressing and checking the restored tar packet, extracting the indexed file index number data, and uploading the indexed file index number data to the HBase cluster.
The recovery server with the library decompresses and verifies the recovered tar package, then moves the recovered tar package to a processing directory under the HBase cluster, extracts the retrieved file index number data from the processing directory and uploads the retrieved file index number data to the HBase cluster, so that the server initiating the offline retrieval application can check the retrieved file index number data. And then, the recovery server with the library updates the state of the recovery task, generates a recovery result XML file, and returns a data recovery result notice to the service system initiating the offline retrieval application.
Therefore, on the data recovery, according to the data retrieval application of the service system, the recovery server with the library acquires the packet name of the tar packet to be recovered, can rapidly locate the data attribution sub-center, find the storage position of the offline packet, reduce the time for searching the offline data, ensure the timeliness of the service system and related supervision and inspection on the retrieval of the historical long-term data, and ensure the safety of the data asset.
It will be appreciated that the method implementation of this embodiment 3 relies on the method implementation of embodiment 1 described above. When the method of the embodiment 1 is executed, the tape library archiving server compresses data to be offline into a tar package according to the area code every time the preset size is full, records the corresponding relation between the file index number and the tar package, names the tar package according to the specification, and stores the corresponding relation between the file index number and the tar package into the metadata library elastic search, so that interaction with the tape library is reduced, the performance of importing and exporting the tape library by the offline package is optimized, effective support is achieved for searching and recovering the subsequent offline package, and the quick positioning capability of the tape library to the offline package is improved, thereby ensuring convenience of offline data recovery when the method of the embodiment 3 is executed by the tape library recovering server. The explanation of the method of embodiment 1 is applicable to embodiment 3, and will not be repeated here.
Example 4
According to a fourth aspect of the present application, referring to fig. 7, an embodiment of the present application further proposes an apparatus for recovering data from a tape library to an HBase, where the apparatus is applied to a tape library recovery server, including:
a to-be-restored list acquiring unit 71, configured to scan the metadata table to acquire a file index number list of to-be-restored data according to an offline retrieval application of the service system;
A correspondence searching unit 72, configured to search a correspondence between a file index number in the metadata database elastic search and a tar packet according to the file index number list, and obtain a packet name of the corresponding tar packet; the package name elements of the tar package comprise a package date, a business system number, a region code and a current day package sequence number;
a tar package recovery unit 73, configured to generate a recovery control file according to a package name of the tar package, call a tape library command to control the tape library to recover the tar package, and return to a tape library recovery log;
a tar packet obtaining unit 74, configured to parse the tape library recovery log, and obtain the recovered tar packet at a specified position according to the parsed log;
and the retrieving data uploading unit 75 is configured to decompress and verify the recovered tar packet, extract the retrieved file index number data, and upload the retrieved file index number data to the HBase cluster.
It can be understood that each unit in the apparatus of this embodiment 4 can correspondingly implement each step in the foregoing method embodiment 3, and the relevant explanation about the method of embodiment 3 is applicable to each unit in the apparatus of this embodiment 4, which is not described herein again. In addition, the names of the units described in the embodiments of the present application do not constitute limitations on the units themselves in some cases.
Example 5
According to a fifth aspect of the present application, referring to fig. 8, an embodiment of the present application further proposes a system for mutual transmission of HBase data and tape library data, where the system includes a tape library archiving server and a tape library recovery server, where:
the archiving server with the library is used for scanning the metadata table and acquiring a file index number list of the data to be offline from the HBase cluster whole table according to the transaction date; traversing the service system number and then traversing the region code for the file index number list, downloading and storing file index number data of the same region code to a local same catalog, compressing each time a preset size is full in the data downloading process into a tar package, and recording the corresponding relation between the file index number and the tar package, wherein package name elements of the tar package comprise package date, service system number, region code and current package sequence number; moving the obtained tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to a tape library; storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of metadata;
the recovery server with the library is used for scanning the metadata table to obtain a file index number list of data to be recovered according to the offline retrieval application of the service system; searching the corresponding relation between the file index number in the metadata database elastic search and the tar package according to the file index number list, and obtaining the package name of the corresponding tar package; generating a recovery control file according to the packet name of the tar packet, calling a tape library command to control the tape library to recover the tar packet and returning to a tape library recovery log; analyzing the recovery log of the tape library, and acquiring the recovered tar packet at a specified position according to the analyzed log; decompressing and checking the restored tar package, extracting the index number data of the retrieved file, and uploading the index number data to the HBase cluster.
According to some embodiments of the present application, the archiving server with library is further configured to count the number of exports of the offline data to be exported in the data downloading process, and perform a numerical comparison with the number of lines obtained by scanning the file index number list by using an Export tool of HBase, so as to ensure the integrity of the exported data; and encrypting each tar packet using SM3 algorithm.
It can be also understood that the tape library archiving server and the tape library restoring server in the system described in embodiment 5 can correspondingly implement the steps in the foregoing method embodiments 1 and 3, and the relevant explanation of the foregoing method embodiments is applicable to the tape library archiving server and the tape library restoring server in the system described in embodiment 5, which are not described herein again.
Finally, it should be noted that:
the embodiment numbers are merely for the purpose of description and do not represent the advantages or disadvantages of the embodiments. In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments. Embodiments of the present application may be implemented in hardware, software, firmware, or a combination thereof.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The system embodiments described above are merely exemplary, and for example, the division of the units may be a logic function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be an indirect coupling or communication connection via some interfaces, units or modules, which may be in electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, systems and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units. Moreover, the integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.

Claims (11)

1. A method of taking data offline from an HBase to a tape library, the method performed by a tape library archiving server, comprising the steps of:
Scanning a metadata table, and acquiring a file index number list of the data to be offline from the HBase cluster whole table according to the transaction date;
traversing the service system number and then traversing the region code for the file index number list, downloading and storing the offline data to be subjected to the same region code to a local same directory, compressing the offline data into a tar package when the data is downloaded to be full of a preset size, and recording the corresponding relation between the file index number and the tar package; the name element of the tar package comprises a package date, a business system number, a region code and a current day package sequence number;
moving the obtained tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to a tape library;
and storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of metadata.
2. The method of claim 1, wherein the tar packet has a size of about 2G.
3. The method according to claim 1, wherein the method further comprises:
and in the data downloading process, counting the number of the exported data to be offline, and carrying out numerical comparison with the number of lines obtained by scanning the file index number list by an Export tool of the HBase so as to ensure the integrity of the exported data.
4. The method according to claim 1, wherein the method further comprises:
each tar packet is encrypted using SM3 algorithm.
5. An apparatus for taking data offline from an HBase to a tape library, said apparatus disposed at a tape library archiving server, comprising:
the off-line list obtaining unit is used for scanning the metadata list and obtaining a file index number list of the off-line data from the HBase cluster whole list according to the transaction date;
the tar package compression unit is used for traversing the service system number and then traversing the region code for the file index number list, downloading and storing file index number data of the same region code to the same local directory, compressing each time a preset size is full in the data downloading process into a tar package, and recording the corresponding relation between the file index number and the tar package; the name element of the tar package comprises a package date, a business system number, a region code and a current day package sequence number;
the tar package backup unit is used for moving the obtained tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to the tape library;
and the corresponding relation storage unit is used for storing the corresponding relation between the file index number and the tar package into a metadata base elastic search and updating the life cycle state of metadata.
6. The apparatus of claim 5, wherein the apparatus further comprises:
and the Export verification unit is used for counting the Export of the offline data in the data downloading process and comparing the Export with the number of lines obtained by scanning the file index number list by an Export tool of the HBase so as to ensure the integrity of the exported data.
7. The apparatus of claim 5, wherein the apparatus further comprises:
and the tar packet encryption unit is used for encrypting each tar packet by using an SM3 algorithm.
8. A method of recovering data from a tape library to an HBase, the method performed by a tape library recovery server, comprising the steps of:
according to the offline retrieval application of the service system, scanning the metadata table to obtain a file index number list of the data to be recovered;
searching the corresponding relation between the file index number in the metadata database elastic search and the tar package according to the file index number list, and obtaining the package name of the corresponding tar package; the package name elements of the tar package comprise a package date, a business system number, a region code and a current day package sequence number;
generating a recovery control file according to the packet name of the tar packet, calling a tape library command to control the tape library to recover the tar packet and returning to a tape library recovery log;
Analyzing the recovery log of the tape library, and acquiring the recovered tar packet at a specified position according to the analyzed log;
decompressing and checking the restored tar package, extracting the index number data of the retrieved file, and uploading the index number data to the HBase cluster.
9. An apparatus for recovering data from a tape library to an HBase, said apparatus being applied to a tape library recovery server, comprising:
the to-be-restored list acquisition unit is used for scanning the metadata list to acquire a file index number list of to-be-restored data according to the offline retrieval application of the service system;
the corresponding relation searching unit is used for searching the corresponding relation between the file index number in the metadata base elastic search and the tar package according to the file index number list and obtaining the package name of the corresponding tar package; the package name elements of the tar package comprise a package date, a business system number, a region code and a current day package sequence number;
the tar package recovery unit is used for generating a recovery control file according to the package name of the tar package, calling a tape library command to control the tape library to recover the tar package and returning a tape library recovery log;
the tar package obtaining unit is used for analyzing the recovery log of the tape library and obtaining the recovered tar package at a designated position according to the analyzed log;
And the retrieving data uploading unit is used for decompressing and checking the restored tar packet, extracting the retrieved file index number data and uploading the retrieved file index number data to the HBase cluster.
10. A system for the mutual transmission of HBase data and tape library data, said system comprising a tape library archiving server and a tape library recovery server, wherein:
the archiving server with the library is used for scanning the metadata table and acquiring a file index number list of the data to be offline from the HBase cluster whole table according to the transaction date; traversing the service system number and then traversing the region code for the file index number list, downloading and storing file index number data of the same region code to a local same catalog, compressing each time a preset size is full in the data downloading process into a tar package, and recording the corresponding relation between the file index number and the tar package, wherein package name elements of the tar package comprise package date, service system number, region code and current package sequence number; moving the obtained tar package to a backup catalog, and regularly calling a tape library command to backup the tar package to a tape library; storing the corresponding relation between the file index number and the tar package into a metadata base elastic search, and updating the life cycle state of metadata;
The recovery server with the library is used for scanning the metadata table to obtain a file index number list of data to be recovered according to the offline retrieval application of the service system; searching the corresponding relation between the file index number in the metadata database elastic search and the tar package according to the file index number list, and obtaining the package name of the corresponding tar package; generating a recovery control file according to the packet name of the tar packet, calling a tape library command to control the tape library to recover the tar packet and returning to a tape library recovery log; analyzing the recovery log of the tape library, and acquiring the recovered tar packet at a specified position according to the analyzed log; decompressing and checking the restored tar package, extracting the index number data of the retrieved file, and uploading the index number data to the HBase cluster.
11. The system of claim 10, wherein the archiving server with library is further configured to count the number of Export of the offline data during the data downloading process, and perform a numerical comparison with the number of rows obtained by the Export tool of HBase scanning the file index list to ensure the integrity of the exported data; and encrypting each tar packet using SM3 algorithm.
CN202311674689.1A 2023-12-07 2023-12-07 Method, device and system for transmitting HBase data and tape library data mutually Pending CN117891653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311674689.1A CN117891653A (en) 2023-12-07 2023-12-07 Method, device and system for transmitting HBase data and tape library data mutually

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311674689.1A CN117891653A (en) 2023-12-07 2023-12-07 Method, device and system for transmitting HBase data and tape library data mutually

Publications (1)

Publication Number Publication Date
CN117891653A true CN117891653A (en) 2024-04-16

Family

ID=90641668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311674689.1A Pending CN117891653A (en) 2023-12-07 2023-12-07 Method, device and system for transmitting HBase data and tape library data mutually

Country Status (1)

Country Link
CN (1) CN117891653A (en)

Similar Documents

Publication Publication Date Title
US20230083789A1 (en) Remote single instance data management
US11016859B2 (en) De-duplication systems and methods for application-specific data
US8683228B2 (en) System and method for WORM data storage
US8219524B2 (en) Application-aware and remote single instance data management
US8631052B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
AU2001238269B2 (en) Hash file system and method for use in a commonality factoring system
US8909881B2 (en) Systems and methods for creating copies of data, such as archive copies
US8239348B1 (en) Method and apparatus for automatically archiving data items from backup storage
CN110879813A (en) Binary log analysis-based MySQL database increment synchronization implementation method
US8667032B1 (en) Efficient content meta-data collection and trace generation from deduplicated storage
US20040148306A1 (en) Hash file system and method for use in a commonality factoring system
CN104156278A (en) File version control system and file version control method
US11567902B2 (en) Systems and methods for document search and aggregation with reduced bandwidth and storage demand
US20200210377A1 (en) Content management system and method
US8843450B1 (en) Write capable exchange granular level recoveries
CN102722584A (en) Data storage system and method
US11748495B2 (en) Systems and methods for data usage monitoring in multi-tenancy enabled HADOOP clusters
CN112835918A (en) MySQL database increment synchronization implementation method
CN112800019A (en) Data backup method and system based on Hadoop distributed file system
CN117891653A (en) Method, device and system for transmitting HBase data and tape library data mutually
CN113157414A (en) Task processing method and device, nonvolatile storage medium and processor
CN112035471A (en) Transaction processing method and computer equipment
Kim et al. Digital forensics formats: seeking a digital preservation storage container format for web archiving
CN115269524B (en) Integrated system and method for end-to-end small file collection transmission and storage
US20240143789A1 (en) Encryption Key Management Using Content-Based Datasets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination