CN115934640A - Data storage method, system, electronic equipment and storage medium - Google Patents
Data storage method, system, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115934640A CN115934640A CN202211704133.8A CN202211704133A CN115934640A CN 115934640 A CN115934640 A CN 115934640A CN 202211704133 A CN202211704133 A CN 202211704133A CN 115934640 A CN115934640 A CN 115934640A
- Authority
- CN
- China
- Prior art keywords
- file
- data
- migrated
- client
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data storage method, a system, electronic equipment and a storage medium, wherein when a client receives a ticket fed back by an authorization server, a connection is established with an HDFS (Hadoop distributed file system) system, and a remote process access request is sent to the HDFS system, so that when the HDFS system determines that a file to be migrated exists in the remote access request and is matched with the name of the file to be migrated, at least one piece of data node information corresponding to the file to be migrated is obtained, and a target link with the shortest distance between a target data node corresponding to each piece of data node information and the client is determined; the client communicates with the corresponding target data node according to the corresponding target link, acquires the data block corresponding to the file to be migrated from the target data node, and transmits the data block to the blue-ray storage system connected with the HDFS system until all the data blocks related to the file to be migrated are transmitted to the blue-ray storage system. The invention can solve the problems that operation risks are easy to occur in the data migration process and corresponding operation cost is increased.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data storage method, a data storage system, an electronic device, and a storage medium.
Background
As society develops and time accumulates, a great deal of information is generated, which is generally stored digitally in the respective devices. Due to different application scenarios of different data, the related requirements of different data such as storage time, query speed, update speed and the like are greatly different, so that data which needs to be frequently queried or modified can be stored in a database system of an OLTP (On Line Transaction Processing) which can provide rapid query and update operations, and data which is not modified and used for statistical analysis is stored in a data warehouse and a big data system of an OLAP (On Line Analytical Processing).
However, most of the underlying storage media of the current data warehouse and the current big data system are solid state disks, mechanical hard disks, and the like, the service life is generally low, and the data is not suitable for long-term storage, the stored data needs to be migrated to a new storage device each time the validity period is approached, the operation risk is easy to occur in the data migration process, and the corresponding operation cost is increased.
Disclosure of Invention
In view of this, the present invention provides a data storage method, a data storage system, an electronic device, and a storage medium, so as to solve the problems that, due to a limited storage life of an existing storage device, stored data needs to be migrated to a new storage device each time the validity period is approached, so that an operation risk is likely to occur in a data migration process, and a corresponding operation cost is increased.
The invention discloses a data storage method in a first aspect, which is applied to a client, and the method comprises the following steps:
when receiving a ticket fed back by an authorization server, establishing connection with an HDFS (Hadoop distributed File System), and sending a remote process access request to the HDFS, so that when the HDFS determines that a file to be migrated exists in the remote access request, wherein the file to be migrated is matched with the name of the file to be migrated, at least one piece of data node information corresponding to the file to be migrated is obtained, and a target link, which has the shortest distance between a target data node corresponding to each piece of data node information and the client, is determined; the ticket is fed back when the authorization server confirms that the client is a legal client according to the verification information sent by the client; the verification information is generated by the client based on an authorization ticket request sent by the client through an authentication server;
communicating with the corresponding target data node according to the corresponding target link, acquiring a data block corresponding to the file to be migrated from the target data node, and transmitting the data block to a blue-ray storage system connected with the HDFS system until all data blocks related to the file to be migrated are transmitted to the blue-ray storage system; the file to be migrated is data which is not required to be modified, has non-real-time query requirements and has long storage time requirements.
A second aspect of the present invention discloses a data storage system, which is applied to a client, and includes:
the remote process access request sending unit is used for establishing connection with the HDFS system when receiving a ticket fed back by an authorization server, and sending a remote process access request to the HDFS system so as to obtain at least one piece of data node information corresponding to a file to be migrated when the HDFS system determines that the file to be migrated exists in the remote access request and is matched with the file name to be migrated, and determine a target link with the shortest distance between a target data node corresponding to each piece of data node information and the client; the ticket is fed back when the authorization server confirms that the client is a legal client according to the verification information sent by the client; the verification information is generated by the client through a verification information generation unit based on an authentication server according to an authorization ticket request sent by the client;
the transmission unit is used for communicating with the corresponding target data node according to the corresponding target link, acquiring a data block corresponding to the file to be migrated from the target data node, and transmitting the data block to a blue-ray storage system connected with the HDFS system until all data blocks related to the file to be migrated are transmitted to the blue-ray storage system; the file to be migrated is data which is not required to be modified, has non-real-time query requirements and has long storage time requirements.
A third aspect of the present invention provides an electronic device comprising: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus; the processor is used for calling and executing the program stored in the memory; the memory is used for storing a program for implementing the data storage method as disclosed in the first aspect of the present invention.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions for performing the data storage method as disclosed in the first aspect of the present invention above.
The invention provides a data storage method, a data storage system, electronic equipment and a storage medium.A client establishes connection with an HDFS (Hadoop distributed File System) when receiving a ticket fed back by an authorization server and sends a remote process access request to the HDFS, so that when the HDFS determines that a file to be migrated matched with a file name to be migrated in the remote access request exists, at least one piece of data node information corresponding to the file to be migrated is obtained, and a target link with the shortest distance between a target data node corresponding to each piece of data node information and the client is determined; the ticket is fed back when the authorization server confirms that the client is a legal client according to the verification information sent by the client; the verification information is generated by the client based on an authentication server according to an authorization ticket request sent by the client; for each target data node, the client communicates with the target data node according to the corresponding target link, acquires a data block corresponding to the file to be migrated from the target data node, and transmits the data block to a blue-ray storage system connected with the HDFS system until all data blocks related to the file to be migrated are transmitted to the blue-ray storage system; the file to be migrated is data which is not required to be modified, has non-real-time query requirements and has long storage time requirements. According to the technical scheme provided by the invention, the connection between the HDFS system and the blue-ray storage system is established in advance, and then files to be migrated, which do not need to be modified and have non-real-time query requirements and long storage time requirements, in the HDFS system can be migrated into the blue-ray storage system for storage through the client, so that the problems that the stored data are required to be migrated into new storage equipment every time the storage equipment is close to the validity period due to the storage age of the existing storage equipment, the operation risk is easy to occur in the data migration process, and the corresponding operation cost is increased are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is an architecture diagram of an HDFS system in a Hadoop cluster and a data warehouse docking blu-ray storage system according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a data storage method according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating another data storage method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data storage system according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules, or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules, or units.
It is noted that references to "a", "an", and "the" modifications in the disclosure are exemplary rather than limiting, and that those skilled in the art will understand that "one or more" unless the context clearly dictates otherwise.
Hadoop: an open source distributed computing and storage big data framework.
Greenplus: a data warehouse may provide data storage and querying.
S3: the simple object storage service is used for storing files to be accessed in an object mode, storing the objects in buckets and providing similar URLs (network access addresses) to obtain storage information.
HDFS: a distributed file system belongs to an application in a big data framework. The namenode is a main node in the system and can provide metadata information stored in the file, and the datanode is a working node in the system and is a node actually storing the file data block.
access _ key, secret _ key, a key in an encryption algorithm.
Kerberos: an authentication algorithm, wherein the primary can be understood as a user name and the keytab can be understood as a password of the user.
Blue light storage system: the on-line storage device with high reliability by using the blue-ray disc as a storage medium comprises a manipulator, a CD driver, a disc box and a management system. The method is mainly used for online automatic recording storage, query, backup and recovery of data. The blue-ray storage system takes the blue-ray disc as a bottom medium, has the characteristics of long service life, low operation power consumption, electromagnetic interference resistance and the like, and is very suitable for retaining data for a long time.
Referring to fig. 1, a diagram of an architecture of an HDFS system in a Hadoop cluster and a data warehouse docking blu-ray storage system according to an embodiment of the present invention is shown.
Based on the HDFS system in the Hadoop cluster and the architecture of the data warehouse docking blu-ray storage system shown in fig. 1, correspondingly, an embodiment of the present invention provides a data storage method, which specifically includes the following steps, as shown in fig. 2:
s201: the client sends an authorization ticket request to the authentication server so that the authentication server can obtain a corresponding authorization ticket according to the user name, encrypts the authorization ticket by using the password and sends the obtained encryption information to the client.
In the embodiment of the application, as data in the HDFS system in the Hadoop cluster are various and data associated services and use conditions are different, the data can be migrated to the Blu-ray storage system only by preprocessing the data, the data can be divided according to the update requirement, the query requirement and the retention time requirement of the data, the data which does not need to be updated and modified, the query requirement is not real-time, and the retention time is long is marked in the HDFS system, a file to be migrated is generated, and the marked file to be migrated can be used as an object to be migrated to the Blu-ray storage system.
In the embodiment of the application, after the file to be migrated is marked in the HDFS system, the client can be further authenticated, so that the legality of the client connected with the HDFS system is ensured, the access of an illegal client is avoided, and the safety of information in the HDFS system is ensured.
Specifically, a host map may be added to a host file of the client in advance, a kerberos authentication portion may be added to a request sent from the client to the authentication server, an authorization ticket request may be generated, and the generated authorization ticket request may be sent to the authentication server. Wherein, the kerberos authentication part comprises a user name principal and a password keytab.
After receiving the authorization ticket request sent by the client, the authentication server can obtain the corresponding authorization ticket according to the user name in the authorization ticket request, encrypt the obtained authorization ticket by using the password in the authorization ticket request to obtain corresponding encryption information, and finally feed back the obtained encryption information to the client.
S202: the client decrypts the encrypted information by using the password to obtain the authorization ticket, and generates verification information according to the authorization ticket.
In the specific process of executing step S202, when receiving the encrypted information fed back by the authentication server, the client may decrypt the encrypted information by using the password to obtain the authorization ticket, and generate corresponding verification information according to the obtained authorization ticket.
In the embodiment of the application, after the client generates corresponding verification information according to the authorization Ticket, the client can further send the generated verification information to the Ticket ranking Server; after receiving the verification information, the authorization server can verify the validity of the client by using the verification information; if the client is determined to be a legal client according to the verification information, the corresponding ticket can be fed back to the client; if the client is not the legal client according to the verification information, corresponding authentication failure information can be fed back to the client.
Specifically, the authorization server may determine whether the client ID in the ticket in the verification information is consistent with the client ID of the client; if the client side is consistent with the client side, the client side can be considered as a legal client side; if not, the client may be considered not to be a legitimate client.
S203: when receiving a ticket fed back by an authorization server, a client establishes connection with the HDFS system and sends a remote process access request to the HDFS system, so that when the HDFS system determines that a file to be migrated exists in the remote access request, the file to be migrated is matched with the name of the file to be migrated, at least one piece of data node information corresponding to the file to be migrated is obtained, and a target link with the shortest distance between a target data node corresponding to each piece of data node information and the client is determined.
In the embodiment of the application, after the client feeds the verification information back to the authorization server, whether a ticket fed back by the authorization server is received or not can be detected in real time; when the client detects the ticket fed back by the authorization server, the client can send a remote process access request through a host node namenode connected to the HDFS system through a Flie System according to the ticket fed back by the authorization server, wherein the remote process access request comprises a name of a file to be migrated.
After receiving the remote process access request, the master node of the HDFS system can judge whether a file to be migrated matched with the name of the file to be migrated exists in the metadata of the master node; if the file to be migrated exists, acquiring at least one piece of data node information corresponding to each data block related to the file to be migrated from the metadata of the master node; and for the data node corresponding to each data node information, searching a target link with the shortest distance between the client and the target data node corresponding to the data node information, and returning the target link corresponding to the target data node corresponding to each data node information to the client.
It should be noted that, if there is no file to be migrated that matches the name of the file to be migrated in the metadata of the host node, the HDFS system may output a prompt message for prompting that there is no file to be migrated that matches the name of the file to be migrated.
S204: the client communicates with the corresponding target data node according to the corresponding target link, acquires the data block corresponding to the file to be migrated from the target data node, and transmits the data block to the blue-ray storage system connected with the HDFS system until all the data blocks related to the file to be migrated are transmitted to the blue-ray storage system.
In the process of specifically executing step S204, when the client receives the target link fed back by each target data node, for each target data node, the client may communicate with the corresponding target data node through the corresponding target link, so as to obtain the data block related to the file to be migrated from the target data node; after the data blocks related to the file to be migrated are acquired, the acquired data blocks can be transmitted to a blue-ray storage system connected with the HDFS through target data nodes corresponding to the data blocks until the data blocks related to the file to be migrated on each target data node are transmitted to the blue-ray storage system.
Optionally, in this embodiment of the application, after acquiring a data block related to a file to be migrated from a target data node, a client may create a bucket corresponding to the file to be migrated on a blu-ray storage system connected to the HDFS system, acquire a target URL corresponding to the bucket, and finally transmit the acquired data block to the bucket created on the blu-ray storage system through the target data node corresponding to the data block.
It should be noted that, when the client successfully creates the bucket corresponding to the file to be migrated on the blu-ray storage system, the client correspondingly generates the target URL corresponding to the bucket.
It should be noted that each data block includes a corresponding check bit, where the check bit is used to represent the integrity of the data block, that is, after the data block is transmitted to a blu-ray storage system connected to the HDFS system, a check value of the data block is calculated, and it is determined whether the check value is consistent with the check value of the check bit in the data block, and if the check value is consistent with the check value of the check bit in the data block, it may be determined that the data block is completely transmitted, that is, the data block is completely transmitted; if the check value is inconsistent with the check value of the check bit in the data block, the data block is considered to be incompletely transmitted, and corresponding error information is output.
Wherein the error information is used to indicate that the data block transmission fails.
In some embodiments, for each data block related to the file to be migrated, if the client determines that the data block fails to be transmitted, one data node may be reselected from other data nodes on the HDFS system to perform communication, so that the data block is retransmitted to a bucket created on the blu-ray storage system by the reselected data node, so that the blu-ray storage system stores the data block corresponding to the file to be migrated.
Further, in this embodiment of the present application, before the client creates the bucket corresponding to the file to be migrated on the blu-ray storage system, the client object may be initialized S3 first.
Further, in this embodiment of the application, if the directory name of the file to be migrated is included in the remote procedure access request, for each name of the file to be migrated in the directory name of the file to be migrated, the data block related to the file to be migrated corresponding to the corresponding name of the file to be migrated may be transmitted to the blu-ray storage system by executing steps S203 to S204, so as to complete transmission of the file to be migrated until each file to be migrated related to the directory of the file to be migrated is transmitted to the blu-ray storage system.
S205: the client acquires first data total amount information of the file to be migrated from the HDFS system and acquires second data total amount information of the file to be migrated in the blue-ray storage system.
In the process of specifically executing step S205, after the client transmits all data blocks related to the file to be migrated to the blu-ray storage system, the client may obtain first data total amount information of the file to be migrated by calling an HDFS content query interface corresponding to the file to be migrated on the HDFS system; and querying second data total information of the file to be migrated from the blue-ray storage system and a Greenplus database connected with the blue-ray storage system.
It should be noted that the second total data amount information of the file to be migrated, which is queried from the greenplus database, is an information amount of the file to be migrated, which is stored in the blu-ray storage system.
It should be further noted that the first total data amount information includes information such as the number of rows and the column name of a data table of a file to be migrated, which is stored in the HDFS system; the second total data amount information comprises information such as data table row number and column name of the file to be migrated, which is stored in the blue-ray storage system.
S206: the client compares the first data total amount information with the second data total amount information; if the first total data amount information is consistent with the second total data amount information, executing step S207; if the first total data amount information is not consistent with the second total data amount information, step S208 is executed.
In the specific process of executing step S206, after acquiring the first total data amount information and the second total data amount information, the client may compare the number of data table rows in the first total data amount information with the number of data table rows in the second total data amount information, and compare the column name in the first total data amount information with the column name in the second total data amount information; if the number of data table lines in the first total data amount information is consistent with the number of data table lines in the second total data amount information, and the column name in the first total data amount information is consistent with the column name in the second total data amount information, it can be determined that the migration file transmitted to the Blu-ray storage system is complete.
If the number of rows of the data table in the first total data amount information is not consistent with the number of rows of the data table in the second total data amount information, and/or the column name in the first total data amount information is not consistent with the column name in the second total data amount information, it can be determined that the migration file transmitted to the Blu-ray storage system is incomplete, and then error reporting information used for prompting that the migration file currently transmitted to the Blu-ray storage system is incomplete can be output.
S207: and the client determines that the file to be migrated is completely migrated.
S208: and the client determines that the file to be migrated is incomplete in migration and outputs corresponding error reporting information.
The invention provides a data storage method, wherein when a client receives a ticket fed back by an authorization server, a connection is established with an HDFS (Hadoop distributed File System) system, and a remote process access request is sent to the HDFS system, so that when the HDFS system determines that a file to be migrated exists in the remote access request and is matched with the name of the file to be migrated, at least one piece of data node information corresponding to the file to be migrated is obtained, and a target link with the shortest distance between a target data node corresponding to each piece of data node information and the client is determined; the ticket is fed back when the authorization server confirms that the client is a legal client according to the verification information sent by the client; the verification information is generated by the client based on an authentication server according to an authorization ticket request sent by the client; for each target data node, the client communicates with the target data node according to the corresponding target link, acquires a data block corresponding to the file to be migrated from the target data node, and transmits the data block to a blue-ray storage system connected with the HDFS system until all data blocks related to the file to be migrated are transmitted to the blue-ray storage system; the file to be migrated is data which is not required to be modified, has non-real-time query requirements and has long storage time requirements. According to the technical scheme provided by the invention, the connection between the HDFS system and the blue-ray storage system is established in advance, so that files to be migrated which do not need to be modified and have non-real-time query requirements and long storage time requirements in the HDFS system can be migrated into the blue-ray storage system through the client to be stored, and the problems that the existing storage equipment has storage life, stored data needs to be migrated into new storage equipment every time the existing storage equipment is close to the validity period, so that operation risks are easy to occur in the data migration process, and corresponding operation cost is increased are solved.
Further, on the basis of the data storage method provided by the embodiment of the present application, the embodiment of the present application further includes the following steps, as shown in fig. 3, specifically including:
s301: and when the client receives a reverse migration request sent by a user, acquiring corresponding reverse migration information according to the reverse migration request.
Wherein the reverse migration information comprises an object storage URL, an access key, and an access key.
In the embodiment of the application, after the client establishes connection with the HDFS system according to the ticket fed back by the authorization server, whether a reverse migration request sent by a user exists can be detected in real time; when a reverse migration request sent by a user is detected, the reverse migration request sent by the user can be received, and corresponding reverse migration information is obtained according to the reverse migration request.
It should be noted that the reverse migration information includes the reverse migration file name in addition to the object storage URL, the access key, and the access key.
It should be further noted that the object storage URL is a URL corresponding to the bucket corresponding to the reverse migration file name.
S302: the client judges whether a corresponding reverse migration file exists in the blue-ray storage system or not according to the reverse migration information; if the reverse migration file exists in the blu-ray storage system, step S303 is executed.
In the specific execution process of step 302, after the client acquires the corresponding reverse migration information, a preset SQL statement may be further invoked to determine whether a reverse migration file matching the reverse migration file name in the reverse migration information exists in the blu-ray storage system; if the blue-ray storage system does not have the reverse migration file matched with the reverse migration file name in the reverse migration file, error reporting information for prompting that the blue-ray storage system does not have the reverse migration file matched with the reverse migration file name can be output.
If a reverse migration file matching the reverse migration file name exists in the blu-ray storage system, step S303 may be performed.
S303: the client side carries out identity authentication on the user according to the access key and the access key; if the user passes the authentication, step S304 is executed.
In the specific process of executing step S303, the client may further perform identity authentication on the user according to the access key and the access key when determining that the blue-ray storage system has a reverse migration file with a reverse migration file name matching; step S304 may be performed if the authentication of the user based on the access key and the access key passes.
If the identity authentication of the user is not passed according to the access key and the access key, error information for prompting that the identity authentication of the user is not passed can be output.
S304: the client obtains the read-write permission information of the corresponding user file directory, and judges whether the read-write permission information of the user file directory is permission to read and write the user file directory, and executes the step S305.
In the specific process of executing step S304, the client may further obtain the read-write permission information of the user file directory related to the reverse migration file under the condition that the identity authentication of the user is determined to pass, and determine whether the read-write permission information of the user file directory is allowed to be read-written; if the user file directory read-write permission information is read-write permission, that is, the reverse migration file is read-write permission, step S305 may be further executed.
If the user file directory read-write permission information is not read-write permission, that is, the user file directory read-write permission information is not read-write permission, error reporting information for prompting that the reverse migration file is not read-write permission can be output.
S305: and the client migrates the reverse migration file on the blue-ray storage system to the HDFS system.
In the implementation of the application, the client, under the condition that it is determined that the reverse migration file is allowed to be read and written, may create an external table based on the object storage URL in the greenplus database according to the reverse migration information, and create a read function based on the S3 interface in the greenplus database, so as to read a data block corresponding to the corresponding reverse migration file from a bucket corresponding to the object storage URL in the blu-ray storage system according to the created read function.
After reading the corresponding reverse migration file, the client can be connected with the HDFS system and judge whether a file directory corresponding to the reverse migration file exists in metadata of the HDFS system, and if not, corresponding error reporting information can be output; if the data blocks exist, a corresponding new folder can be created in the metadata, and data node information fed back by the main node according to the created new folder is received, so that the read data blocks of the reverse migration file are transmitted to the data nodes corresponding to the data node information until the data blocks related to the reverse migration file are transmitted to the data nodes corresponding to the data node information, and the data nodes store the corresponding data blocks.
It should be noted that the master node may feed back information of a plurality of data nodes according to the created new folder. Correspondingly, if the master node feeds back a plurality of data node information according to the created new folder, the client may transmit one or more data blocks of the read reverse migration file to a data node corresponding to any one of the data node information, and then transmit another one or more data blocks of the reverse migration file to a data node corresponding to another data node information until each data block related to the reverse migration file is transmitted to the HDFS system.
Further, in this embodiment of the present application, the HDFS system may also back up the reverse migration file in other data nodes, that is, the HDFS system completes copy replication of the reverse migration file between the data nodes, and feeds back corresponding successful backup information to the client after completing the backup.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The names of messages or information exchanged between devices in the disclosed embodiments are for illustrative purposes only and are not intended to limit the scope of such messages or information.
Although the operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
Computer program code for carrying out operations for the disclosed invention may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described above herein may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
Based on the data storage method provided by the above embodiment of the present invention, correspondingly, the embodiment of the present invention further discloses a data storage system, as shown in fig. 4, the data storage system is applied to a client, and the data storage system includes:
the remote process access request sending unit 41 is configured to, when receiving a ticket fed back by an authorization server, establish a connection with the HDFS system, and send a remote process access request to the HDFS system, so that when the HDFS system determines that a file to be migrated exists in the remote access request, the HDFS system obtains at least one piece of data node information corresponding to the file to be migrated, and determines a target link, where a target data node corresponding to each piece of data node information is the shortest in distance from a client; the ticket is fed back when the authorization server confirms that the client is a legal client according to the verification information sent by the client; the verification information is generated by the client based on an authentication server according to an authorization ticket request sent by the client;
the transmission unit 42 is configured to communicate with a corresponding target data node according to a corresponding target link, acquire a data block corresponding to the file to be migrated from the target data node, and transmit the data block to the blu-ray storage system connected to the HDFS system until all data blocks related to the file to be migrated are transmitted to the blu-ray storage system; the file to be migrated is data which does not need to be modified, has non-real-time query requirements and long storage time requirements.
The specific principle and the implementation process of each unit in the data storage system disclosed in the embodiment of the present invention are the same as those of the data storage method disclosed in fig. 2 in the embodiment of the present invention, and reference may be made to corresponding parts in the data storage method disclosed in fig. 2 in the embodiment of the present invention, and details are not repeated here.
The invention provides a data storage system.A client establishes connection with an HDFS (Hadoop distributed File System) when receiving a ticket fed back by an authorization server, and sends a remote process access request to the HDFS, so that when the HDFS determines that a file to be migrated exists in the remote access request and is matched with a file name to be migrated, at least one piece of data node information corresponding to the file to be migrated is obtained, and a target link with the shortest distance between a target data node corresponding to each piece of data node information and the client is determined; the ticket is fed back when the authorization server confirms that the client is a legal client according to the verification information sent by the client; the verification information is generated by the client based on an authentication server according to an authorization ticket request sent by the client; for each target data node, a target link corresponding to a client communicates with the target data node, a data block corresponding to a file to be migrated is obtained from the target data node, and the data block is transmitted to a blue-ray storage system connected with an HDFS (Hadoop distributed file system) until all data blocks related to the file to be migrated are transmitted to the blue-ray storage system; the file to be migrated is data which does not need to be modified, has non-real-time query requirements and long storage time requirements. According to the technical scheme provided by the invention, the connection between the HDFS system and the blue-ray storage system is established in advance, and then files to be migrated, which do not need to be modified and have non-real-time query requirements and long storage time requirements, in the HDFS system can be migrated into the blue-ray storage system for storage through the client, so that the problems that the stored data are required to be migrated into new storage equipment every time the storage equipment is close to the validity period due to the storage age of the existing storage equipment, the operation risk is easy to occur in the data migration process, and the corresponding operation cost is increased are solved.
Optionally, the verification information generating unit includes:
the authorization ticket request sending unit is used for sending an authorization ticket request to the authentication server so that the authentication server can obtain a corresponding authorization ticket according to a user name, encrypt the authorization ticket by using a password and send the obtained encrypted information to the client; wherein, the ticket request includes user name and password;
and the verification information generating subunit is used for decrypting the encrypted information by using the password to obtain the authorized Ticket, and generating the verification information according to the Ticket grading Ticket.
Optionally, the transmission unit includes:
the storage bucket creating unit is used for creating a storage bucket corresponding to the file to be migrated on the blue-ray storage system and acquiring a target URL corresponding to the storage bucket;
and the transmission subunit is used for transmitting the data block to a storage bucket on the blue-ray storage system according to the target URL.
Optionally, the data storage system provided in the embodiment of the present invention further includes:
the first judgment unit is used for calculating a check value of the data block after the data block is transmitted to a blue-ray storage system connected with the HDFS system, and judging whether the check value is consistent with a check value of a check bit in the data block;
the first determining unit is used for determining that the data block is transmitted if the check value is consistent with the check value of the check bit in the data block;
and the first output unit is used for determining that the data block fails to be transmitted and outputting corresponding error reporting information if the check value is inconsistent with the check value of the check bit in the data block.
Optionally, after executing the transmission unit, the data storage system provided in the embodiment of the present invention further includes:
the system comprises a data total amount information acquisition unit, a data total amount information acquisition unit and a data total amount information acquisition unit, wherein the data total amount information acquisition unit is used for acquiring first data total amount information of a file to be migrated from an HDFS (Hadoop distributed file system) and acquiring second data total amount information of the file to be migrated in a blue-ray storage system;
the comparison unit is used for comparing the first data total amount information with the second data total amount information;
the second determining unit is used for determining that the file to be migrated is completely migrated if the first total data amount information is consistent with the second total data amount information;
and the second output unit is used for determining that the file to be migrated is incomplete if the first data total information is inconsistent with the second data total information, and outputting corresponding error reporting information.
Optionally, the data storage system provided in the embodiment of the present invention further includes:
the reverse migration information acquisition unit is used for acquiring corresponding reverse migration information according to a reverse migration request when receiving the reverse migration request sent by a user; the reverse migration information comprises an object storage URL, an access key and an access key;
the second judging unit is used for judging whether a corresponding reverse migration file exists in the blue-ray storage system or not according to the reverse migration information;
and the migration unit is used for migrating the reverse migration files on the blue-ray storage system to the HDFS system if the reverse migration files exist in the blue-ray storage system.
Optionally, before executing the migration unit, the data storage system provided in the embodiment of the present invention further includes:
the identity authentication unit is used for authenticating the identity of the user according to the access key and the access key;
the user file directory read-write permission information acquisition unit is used for acquiring corresponding user file directory read-write permission information if the identity authentication of the user passes;
the migration unit is further configured to migrate the reverse migration file on the blu-ray storage system to the HDFS system if the user file directory read-write permission information is read-write permission.
An embodiment of the present application further provides an electronic device, which includes: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus; the processor is used for calling and executing the program stored in the memory; the memory is used for storing a program, and the program is used for realizing the data storage method.
Referring now to FIG. 5, a block diagram of an electronic device suitable for use in implementing the disclosed embodiments of the invention is shown. The electronic devices in the disclosed embodiments of the present invention may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the disclosed embodiments of the present invention.
As shown in fig. 5, the electronic device may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device 501, the ROM502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program containing program code for performing the data storage method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the data storage method of the disclosed embodiment of the invention.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, in which computer-executable instructions are stored, where the computer-executable instructions are used to execute a data storage method.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: when receiving a ticket fed back by an authorization server, establishing connection with an HDFS (Hadoop distributed File System), and sending a remote process access request to the HDFS, so that when the HDFS determines that a file to be migrated exists in the remote access request, wherein the file to be migrated is matched with the name of the file to be migrated, at least one piece of data node information corresponding to the file to be migrated is obtained, and a target link, which has the shortest distance between a target data node corresponding to each piece of data node information and the client, is determined; the ticket is fed back when the authorization server confirms that the client is a legal client according to the verification information sent by the client; the verification information is generated by the client based on an authentication server according to an authorization ticket request sent by the client; communicating with the corresponding target data node according to the corresponding target link, acquiring a data block corresponding to the file to be migrated from the target data node, and transmitting the data block to a blue-ray storage system connected with the HDFS system until all data blocks related to the file to be migrated are transmitted to the blue-ray storage system; the file to be migrated is data which is not required to be modified, has non-real-time query requirements and has long storage time requirements.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It should be noted that the computer readable medium mentioned above in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
Claims (10)
1. A data storage method is applied to a client, and the method comprises the following steps:
when receiving a ticket fed back by an authorization server, establishing connection with an HDFS (Hadoop distributed File System), and sending a remote process access request to the HDFS, so that when the HDFS determines that a file to be migrated exists in the remote access request, wherein the file to be migrated is matched with the name of the file to be migrated, at least one piece of data node information corresponding to the file to be migrated is obtained, and a target link, which has the shortest distance between a target data node corresponding to each piece of data node information and the client, is determined; the ticket is fed back when the authorization server confirms that the client is a legal client according to verification information sent by the client; the verification information is generated by the client based on an authorization ticket request sent by the client through an authentication server;
communicating with the corresponding target data node according to the corresponding target link, acquiring a data block corresponding to the file to be migrated from the target data node, and transmitting the data block to a blue-ray storage system connected with the HDFS system until all data blocks related to the file to be migrated are transmitted to the blue-ray storage system; the file to be migrated is data which is not required to be modified, has non-real-time query requirements and has long storage time requirements.
2. The method according to claim 1, wherein the generating the verification information based on the authentication server based on the encryption information of the authorization ticket request sent by the client comprises:
sending an authorization ticket request to an authentication server so that the authentication server acquires a corresponding authorization ticket according to a user name, encrypting the authorization ticket by using a password, and sending the obtained encryption information to the client; wherein the authorization ticket request comprises the user name and password;
and decrypting the encrypted information by using the password to obtain the authorization Ticket, and generating the verification information according to the Ticket grading Ticket.
3. The method of claim 1, wherein the transferring the data block to a Blu-ray storage system connected to the HDFS system comprises:
creating a storage bucket corresponding to the file to be migrated on the blue-ray storage system, and acquiring a target URL corresponding to the storage bucket;
and transmitting the data block to a bucket on the blue-ray storage system according to the target URL.
4. The method of claim 1, further comprising:
after the data block is transmitted to a blue-ray storage system connected with the HDFS system, calculating a check value of the data block, and judging whether the check value is consistent with a check value of a check bit in the data block;
if the check value is consistent with the check value of the check bit in the data block, determining that the data block is transmitted;
and if the check value is inconsistent with the check value of the check bit in the data block, determining that the data block fails to be transmitted, and outputting corresponding error reporting information.
5. The method of claim 1, wherein after all data blocks related to the file to be migrated are transferred to the Blu-ray storage system, the method further comprises:
acquiring first data total amount information of the file to be migrated from the HDFS system, and acquiring second data total amount information of the file to be migrated from the blue-ray storage system;
comparing the first data total amount information with the second data total amount information;
if the first total data amount information is consistent with the second total data amount information, determining that the file to be migrated is completely migrated;
and if the first total data amount information is inconsistent with the second total data amount information, determining that the file to be migrated is incomplete, and outputting corresponding error reporting information.
6. The method of claim 1, further comprising:
when a reverse migration request sent by a user is received, acquiring corresponding reverse migration information according to the reverse migration request; wherein the reverse migration information comprises an object storage URL, an access key and an access key;
judging whether a corresponding reverse migration file exists in the blue-ray storage system or not according to the reverse migration information;
and if the reverse migration file exists in the blue-ray storage system, migrating the reverse migration file on the blue-ray storage system to the HDFS system.
7. The method of claim 6, further comprising:
according to the access key and the access key, performing identity verification on the user;
if the user passes the identity authentication, acquiring corresponding user file directory read-write permission information;
wherein migrating the reverse migration file on the Blu-ray storage system to the HDFS system comprises:
and if the user file directory read-write permission information is read-write permission, migrating the reverse migration file on the blue-ray storage system to the HDFS system.
8. A data storage system, wherein the data storage system is applied to a client, the system comprising:
the remote process access request sending unit is used for establishing connection with the HDFS system when receiving a ticket fed back by an authorization server, and sending a remote process access request to the HDFS system so as to obtain at least one piece of data node information corresponding to a file to be migrated when the HDFS system determines that the file to be migrated exists in the remote access request and is matched with the file name to be migrated, and determine a target link with the shortest distance between a target data node corresponding to each piece of data node information and the client; the ticket is fed back when the authorization server confirms that the client is a legal client according to the verification information sent by the client; the verification information is generated by the client through a verification information generation unit based on an authentication server according to an authorization ticket request sent by the client;
the transmission unit is used for communicating with the corresponding target data node according to the corresponding target link, acquiring a data block corresponding to the file to be migrated from the target data node, and transmitting the data block to a blue-ray storage system connected with the HDFS system until all data blocks related to the file to be migrated are transmitted to the blue-ray storage system; the file to be migrated is data which is not required to be modified, has non-real-time query requirements and has long storage time requirements.
9. An electronic device, comprising: the system comprises a processor and a memory, wherein the processor and the memory are connected through a communication bus; the processor is used for calling and executing the program stored in the memory; the memory for storing a program for implementing the data storage method of any one of claims 1-7.
10. A computer-readable storage medium having computer-executable instructions stored thereon for performing the data storage method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211704133.8A CN115934640A (en) | 2022-12-29 | 2022-12-29 | Data storage method, system, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211704133.8A CN115934640A (en) | 2022-12-29 | 2022-12-29 | Data storage method, system, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115934640A true CN115934640A (en) | 2023-04-07 |
Family
ID=86550665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211704133.8A Pending CN115934640A (en) | 2022-12-29 | 2022-12-29 | Data storage method, system, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115934640A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116828456A (en) * | 2023-08-24 | 2023-09-29 | 深圳市数组技术有限公司 | Encryption storage authentication system |
-
2022
- 2022-12-29 CN CN202211704133.8A patent/CN115934640A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116828456A (en) * | 2023-08-24 | 2023-09-29 | 深圳市数组技术有限公司 | Encryption storage authentication system |
CN116828456B (en) * | 2023-08-24 | 2023-11-17 | 深圳市数组技术有限公司 | Encryption storage authentication system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11683187B2 (en) | User authentication with self-signed certificate and identity verification and migration | |
US9537918B2 (en) | File sharing with client side encryption | |
US20230014599A1 (en) | Data processing method and apparatus for blockchain system | |
US20160112397A1 (en) | Anomaly detection for access control events | |
US10270757B2 (en) | Managing exchanges of sensitive data | |
US20200213331A1 (en) | Data service system | |
CN109657492B (en) | Database management method, medium, and electronic device | |
CN113157648A (en) | Block chain based distributed data storage method, device, node and system | |
CN110611657A (en) | File stream processing method, device and system based on block chain | |
CN111199037B (en) | Login method, system and device | |
US8667281B1 (en) | Systems and methods for transferring authentication credentials | |
JP2022541835A (en) | Methods and apparatus, electronic devices, storage media and computer programs for processing data requests | |
CN111597567A (en) | Data processing method, data processing device, node equipment and storage medium | |
CN114615031A (en) | File storage method and device, electronic equipment and storage medium | |
CN115934640A (en) | Data storage method, system, electronic equipment and storage medium | |
CN112181983B (en) | Data processing method, device, equipment and medium | |
US10462113B1 (en) | Systems and methods for securing push authentications | |
CN108920971A (en) | The method of data encryption, the method for verification, the device of encryption and verification device | |
CN116346486A (en) | Combined login method, device, equipment and storage medium | |
CN110659476A (en) | Method and apparatus for resetting password | |
TWM591661U (en) | Digital Identity Management System | |
CN113626873B (en) | Authentication method, device, electronic equipment and computer readable medium | |
TWI727474B (en) | Digital identity management system and method | |
US11983710B2 (en) | Hash-based transaction tagging | |
US11316664B2 (en) | System for characterization and tracking of electronic data in a networked environment using cohesive information units |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |