CN109542860B - Service data management method based on HDFS and terminal equipment - Google Patents

Service data management method based on HDFS and terminal equipment Download PDF

Info

Publication number
CN109542860B
CN109542860B CN201811250917.1A CN201811250917A CN109542860B CN 109542860 B CN109542860 B CN 109542860B CN 201811250917 A CN201811250917 A CN 201811250917A CN 109542860 B CN109542860 B CN 109542860B
Authority
CN
China
Prior art keywords
service
service data
data
folder
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811250917.1A
Other languages
Chinese (zh)
Other versions
CN109542860A (en
Inventor
安栋
王斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811250917.1A priority Critical patent/CN109542860B/en
Publication of CN109542860A publication Critical patent/CN109542860A/en
Application granted granted Critical
Publication of CN109542860B publication Critical patent/CN109542860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a business data management method and terminal equipment based on an HDFS, wherein a first folder is created in the HDFS; storing first service data corresponding to the service into a first folder, wherein an identification part of the service data comprises a first identification and a second identification, the first identification is a global unique identification of the service in the HDFS, and the second identification is a global unique identification of the service data in the HDFS; if the first service data is correct, a unique mapping relation between the second identifier and the effective identifier in the first service data is created in a preset form, and if the first service data is wrong, a unique mapping relation between the second identifier and the ineffective identifier in the first service data is created in the preset form; and reading the service data for running the service in the HDFS according to the identification part of the service data in the first folder and the preset form. The correct service data can be read through the identification part of the service data and the preset form, and system operation errors caused by the wrong service data are avoided.

Description

Service data management method based on HDFS and terminal equipment
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a business data management method and terminal equipment based on an HDFS.
Background
With the rapid development of IT technology and the deep informatization, the generation amount of data is continuously increased. In the big data age, the increasing speed of the data volume exceeds the increasing speed of the capacity of the storage data medium, namely the storage cost is increased continuously, the cost of the storage medium is increased continuously, the data processing technology in the traditional data management system is extremely challenged by the huge data generation amount, and the data is stored more efficiently and stably, so that the data processing technology becomes a hot spot for research in many fields such as data processing.
HDFS (Hadoop Distributed File System ) is currently often used to store these data. HDFS is a reliable file system suitable for storing large-scale data at low cost. The HDFS is different from the existing distributed file system, and the core design concept of the HDFS is an efficient access mode of write once and read many times, and after data is written, the HDFS system does not support modification and deletion of data. However, for some industries, such as the financial industry, it is necessary to store service data required to run the service for each service, but since HDFS does not support the characteristics of data modification and deletion, when the stored service data is erroneous, erroneous data cannot be effectively identified, resulting in failure of system operation.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a service data management method and a terminal device based on an HDFS, so as to solve the problem in the prior art that effective service data cannot be accurately identified in the service running process under the HDFS architecture.
A first aspect of an embodiment of the present invention provides a service data management method based on HDFS, including:
creating a first folder in a distributed file system HDFS;
storing first service data corresponding to a service into the first folder, wherein the service data comprises an identification part aiming at any service data in the first folder, the identification part of the service data comprises a first identification and a second identification, the first identification is a global unique identification of the service in an HDFS, and the second identification is a global unique identification of the service data in the HDFS;
judging whether the first service data is correct, if the first service data is correct, creating a unique mapping relation between a second identifier and an effective identifier in the first service data in a preset form, and if the first service data is incorrect, creating a unique mapping relation between the second identifier and an ineffective identifier in the first service data in the preset form, wherein the effective identifier is used for indicating that the first service data is correct data, and the ineffective identifier is used for indicating that the first service data is incorrect data;
And reading the service data for running the service in the HDFS according to the identification part of the service data in the first folder and the preset form.
A second aspect of embodiments of the present invention provides a computer-readable storage medium storing computer-readable instructions that when executed by a processor perform the steps of:
creating a first folder in a distributed file system HDFS;
storing first service data corresponding to a service into the first folder, wherein the service data comprises an identification part aiming at any service data in the first folder, the identification part of the service data comprises a first identification and a second identification, the first identification is a global unique identification of the service in an HDFS, and the second identification is a global unique identification of the service data in the HDFS;
judging whether the first service data is correct, if the first service data is correct, creating a unique mapping relation between a second identifier and an effective identifier in the first service data in a preset form, and if the first service data is incorrect, creating a unique mapping relation between the second identifier and an ineffective identifier in the first service data in the preset form, wherein the effective identifier is used for indicating that the first service data is correct data, and the ineffective identifier is used for indicating that the first service data is incorrect data;
And reading the service data for running the service in the HDFS according to the identification part of the service data in the first folder and the preset form.
A third aspect of an embodiment of the present invention provides a terminal device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer readable instructions:
creating a first folder in a distributed file system HDFS;
storing first service data corresponding to a service into the first folder, wherein the service data comprises an identification part aiming at any service data in the first folder, the identification part of the service data comprises a first identification and a second identification, the first identification is a global unique identification of the service in an HDFS, and the second identification is a global unique identification of the service data in the HDFS;
judging whether the first service data is correct, if the first service data is correct, creating a unique mapping relation between a second identifier and an effective identifier in the first service data in a preset form, and if the first service data is incorrect, creating a unique mapping relation between the second identifier and an ineffective identifier in the first service data in the preset form, wherein the effective identifier is used for indicating that the first service data is correct data, and the ineffective identifier is used for indicating that the first service data is incorrect data;
And reading the service data for running the service in the HDFS according to the identification part of the service data in the first folder and the preset form.
The invention provides a business data management method and terminal equipment based on an HDFS, which are used for identifying whether the business data in the HDFS is correct or not by creating an identification for the business data written in the HDFS and creating an effective mark or an ineffective mark corresponding to the business data in a preset form, and the correct business data can be read through the identification part of the business data in the HDFS and the preset form, so that system operation errors caused by the wrong business data are avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a service data management method based on HDFS according to an embodiment of the present invention;
FIG. 2 is a flowchart of another business data management method based on HDFS according to an embodiment of the present invention;
FIG. 3 is a flowchart of another business data management method based on HDFS according to an embodiment of the present invention;
FIG. 4 is a flowchart of another business data management method based on HDFS according to an embodiment of the present invention;
FIG. 5 is a block diagram illustrating a service data storage device based on an HDFS according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
The embodiment of the invention provides a business data management method based on an HDFS. Referring to fig. 1, the method includes:
S101, creating a first folder in the distributed file system HDFS.
The HDFS architecture is subject to a master-slave mode and includes a name node (NameNode) for manipulating file or directory operations of a file namespace, such as open, close, rename, and the like, and a plurality of data nodes (datanodes). It determines the mapping of blocks to data nodes simultaneously. The data node is responsible for read and write requests from the file system clients. The data node also performs the creation of blocks, block copy instructions from the name node, and the like.
In the embodiment of the invention, the client needs to write the service data corresponding to each service in the service system into the HDFS, specifically, the client sends a request for writing a file to a name node, the request creates a first folder in a newly created manner, after the name node requests the name node through the client, metadata information about the first folder is added in a metadata structure maintained by the name node, and an instruction for creating the first folder and the metadata information of the first folder are sent to the client, wherein the metadata information comprises path information of the data node of the first folder created by the client. After receiving the instruction, the client creates a first folder in the data node corresponding to the path information.
S102, storing first service data corresponding to a service into the first folder, wherein the service data comprises an identification part aiming at any service data in the first folder, the identification part of the service data comprises a first identification and a second identification, the first identification is a global unique identification of the service in an HDFS, and the second identification is a global unique identification of the service data in the HDFS.
And the client writes service data corresponding to each service in the service system in the first folder. For service data corresponding to any service in a service system, in the embodiment of the invention, an identifier is created for the service data, the identifier comprises two parts, namely a first identifier and a second identifier, the first identifier is a global unique identifier of the service in an HDFS, and the second identifier is a global unique identifier of the service data in the HDFS.
Alternatively, a first identifier of 32 bytes in length may be created before the data portion of the traffic data, and a second identifier of 8 bytes in length may be created.
S103, judging whether the first service data are correct, if so, creating a unique mapping relation between a second identifier and an effective identifier in the first service data in a preset form, and if so, creating a unique mapping relation between the second identifier and an ineffective identifier in the first service data in the preset form, wherein the effective identifier is used for indicating that the first service data are correct data, and the ineffective identifier is used for indicating that the first service data are wrong data.
Optionally, for a service, the service data initially written in the first folder by the service is first service data in the embodiment of the present invention. The service is operated through the first service data, and two conditions exist, namely, the operation is successful and the operation is failed. And if the operation is successful, the first service data corresponding to the service is the correct data, and if the operation is failed, the first service data corresponding to the service is the wrong data.
In order to avoid operation failure of the service caused by error data, in the embodiment of the invention, a client maintains a preset form, in which, for any item of service data, if the service is operated by the service data and is operated successfully in a service system, a unique mapping relation between a second identifier and an effective identifier of the service data is created in the preset form to indicate that the service data is effective data, and the service can be operated successfully by the data, and if the service is operated by the service data and is operated failure in the service system, a unique mapping relation between the second identifier and an ineffective identifier of the service data is created in the preset form to indicate that the service data is ineffective data, and the operation failure can be caused by the service data.
S104, according to the identification part of the business data in the first folder and the preset form, reading the business data for running the business in the HDFS.
Since in a service system, a certain service or a plurality of services may need to be run multiple times, each time the service is run, service data corresponding to the service needs to be read in the HDFS. At this time, by combining with the preset form, all second identifiers carrying effective marks are read in the preset form, all effective service data are read in the first folder through the second marks carrying the effective marks, the service data corresponding to the service are read in the read effective service data according to the first identifiers of the service which is repeatedly operated as required, and therefore the read service data are effective, and the problem of system operation failure caused by invalid service data is avoided.
The embodiment of the invention provides a business data management method based on an HDFS, in the method, by creating an identifier for business data written into the HDFS and establishing an effective mark or an ineffective mark corresponding to the business data in a preset form, the effective mark or the ineffective mark is used for identifying whether the business data in the HDFS is correct or not, and the correct business data can be read through the identification part of the business data in the HDFS and the preset form, so that system operation errors caused by the wrong business data are avoided.
In connection with fig. 2, an embodiment of the present invention provides a method for storing service data based on an HDFS, where the implementation of the method is based on the embodiment shown in fig. 1, and the method is used to solve the problem that the service data stored in the HDFS cannot be modified and updated, and the method includes:
s201, writing second business data corresponding to the business in the first folder, wherein the version number of the second business data is higher than that of the first business data.
In the embodiment of the invention, for any service data, version number information is added to the identification part of the service data, and the higher the version number is, the later the time for writing the service data into the HDFS is. On the basis, if the service data corresponding to one service is required to be updated or modified, if the service data written into the HDFS is invalid, the service data is required to be modified, or the original service data is required to be adjusted along with the updating of the service, and the service data is required to be updated at the moment. Since HDFS storage systems do not support updating of data, but support appending of data. By utilizing this characteristic, in the embodiment of the present invention, the update and modification of the service data is achieved by adding the service data.
Optionally, for a service, if the service is originally written into the first service data in the HDFS, the service is operated by the first service data, and the operation fails, a unique mapping relationship between the second identifier and the invalid flag of the first service is created in a preset form, so as to indicate that the first service data is invalid, and at this time, the service data of the service needs to be modified.
If the service is written into the first service data in the HDFS originally, the service is operated through the first service data, and the operation is successful, a unique mapping relation between the second identifier and the valid identifier of the first service is created in a preset form and used for indicating that the first service data is valid, and at this time, if the service is updated after a period of time, the corresponding service data also needs to be updated.
S202, if the second service data is correct, a unique mapping relation between a second identifier of the second service data and an effective identifier is created in the preset form.
And operating the service through the second service data, if the operation is successful, indicating that the second service data is correct data, and creating a unique mapping relation between a second identifier and an effective identifier of the second service data in a preset form for indicating that the second service data is correct and effective.
S203, reading a second mark carrying a valid mark from the preset form.
When the writing of the second service data is completed and the service is operated again, the correct service data corresponding to the service needs to be read in the HDFS storage system, and at this time, the second identifier carrying the effective mark is read in the preset form.
At this time, the method is divided into two scenes, if the second service data is modification of the first service data, at this time, in the preset form, the second identifier of the first service data carries an invalid identifier, and the second identifier of the second service data carries an valid identifier.
If the second service data is the update of the first service data, in the preset form, the second identifiers of the first service data and the second service data are effective marks.
S204, the business data corresponding to the second identifier carrying the effective mark is read from the first folder.
At this time, the business data corresponding to the second identifier carrying the effective identifier is read from the first folder. At this time, the read service data are valid data or data that were once valid.
S205, according to the global unique identifier of the service in the HDFS, acquiring all service data corresponding to the service, and reading the service data with the highest version number from all service data corresponding to the service as service data for running the service.
For a service, the service corresponds to a globally unique identifier in the HDFS, and according to the unique identifier and the first identifier of the identifier part of all service data, all service data corresponding to the service can be read.
At this time, if the service data corresponding to the service is modified once, only one piece of service data corresponding to the service, namely, the second service data written for modification, is read through the method, and if the service data corresponding to the service is updated once, two pieces of service data corresponding to the service, namely, the first service data and the second service data written for updating, are read.
And aiming at the service, if only one piece of service data corresponding to the read service exists, taking the read piece of service data as the service data for running the service. If more than one service data is read corresponding to the service, the service data with the highest version number is used as the service data for running the service.
It should be noted that, in the service system, a service needs to be repeatedly operated, and correspondingly, the corresponding service data needs to be modified or updated for multiple times, so by the method provided by the embodiment of the invention, multiple service data are generally read for a service, and at this time, the service data with the highest version number is used as the service data for operating the service.
The embodiment of the invention provides a business data management method based on an HDFS, which is realized by creating an identifier for each piece of business data written into an HDFS storage system and maintaining a preset form, when the data part of the business data corresponding to a certain business written into the HDFS needs to be modified or updated, the new business data corresponding to the business is written into the HDFS, and the version number of the new business data is higher than the version number of the business data needing to be modified or updated. When the service data corresponding to the service is required to be read in the HDFS, all the effective service data stored in the HDFS storage system can be obtained through a preset form, all the service data corresponding to the service can be obtained through the unique identification of the service in the HDFS, and then the service data with the highest version number is used as the service data for running the service according to the version number of each service data, so that the updating and the modification of the service data in the HDFS are realized.
Further, in conjunction with fig. 3, an embodiment of the present invention further provides a service data management method based on an HDFS, where the method is used to solve the problem that service data stored in the HDFS cannot be deleted, and the method includes:
s301, writing third business data corresponding to the business in the first folder, wherein the version number of the third business data is a preset value.
In the embodiment of the invention, for any service, version number information is added to the identification part of the service data. For a service, if all the service data corresponding to the service need to be deleted, a new piece of service data corresponding to the service is written in the HDFS system, which may be called as third service data. The version number of the third service data is a preset value, such as-1. In the embodiment of the invention, only when the service data corresponding to one service needs to be deleted, the version number of the newly written service data is the preset value, and the version numbers of the other service data are not the preset value when the service data corresponding to the service is written for the first time or the service data corresponding to the service is modified and updated.
S302, a unique mapping relation between the second identifier and the effective identifier of the third service data is created in the preset form.
The second identifier of the third service data has a unique mapping relation with the effective mark in the preset form, so that the third service data can be read when the service data corresponding to the service is read.
S303, reading a second identifier carrying an effective mark in the preset form and reading service data corresponding to the second identifier carrying the effective mark in the first folder.
This step may refer to step S203 in the above embodiment, and the embodiment of the present invention will not be described herein.
S304, reading all service data corresponding to the service according to the global unique identifier of the service in the HDFS.
Through step S303 and step S304, a plurality of pieces of service data corresponding to the service can be read. For example, for a service, first service data corresponding to the service is initially written, then second service data is written for the purpose of data modification or update, then the service is terminated, and third service data is written for the purpose of deleting service data corresponding to the service, where if the second identifiers of the first service data and the second service data both carry unique mapping relationships with the valid identifiers in the preset form, then, through step S304, the first service data, the second service data and the third service data corresponding to the service can be read.
And S305, if all the business data corresponding to the business have business data with the version number of the preset value, judging that all the business data corresponding to the business are deleted, and not reading all the business data corresponding to the business.
When the read service data contains the service data with the version number of the preset value, the service is stopped, and any service data is not read, so that the aim of deleting all the service data corresponding to the service is fulfilled.
The embodiment of the invention provides a business data management method based on an HDFS, which is realized by creating an identifier for each piece of business data written into an HDFS storage system and maintaining a preset form, and when a data part of the business data corresponding to a certain business written into the HDFS needs to be deleted, the business data is realized by writing a new piece of business data corresponding to the business, and the version number of the new business data is a preset value. When the service data corresponding to the service is read in the HDFS, all the effective service data stored in the HDFS storage system can be obtained through a preset form, all the service data corresponding to the service can be obtained through the unique identification of the service in the HDFS, and when all the service data corresponding to the read service contains the service data with the version number of the preset value, the service is stopped, any service data is not read, and therefore the purpose of deleting all the service data corresponding to the service is achieved.
Further, in conjunction with fig. 4, an embodiment of the present invention further provides a service data management method based on an HDFS, where the method is used to solve the problem that service data stored in the HDFS cannot be modified and updated, and the method includes:
s401, writing fourth business data corresponding to the business in the first folder.
For any service data in the first folder, the identification portion of the service data further includes a timestamp written to the service data.
And S402, if the fourth service data is correct, creating a unique mapping relation between the second identifier and the effective identifier of the fourth service data in the preset form.
S403, reading a second mark carrying a valid mark from the preset form.
S404, the business data corresponding to the second identifier carrying the effective mark is read from the first folder.
S405, according to the global unique identifier of the service in the HDFS, acquiring all service data corresponding to the service, and according to the time stamp contained in each service data in all service data corresponding to the service, reading the latest written service data as the service data for running the service.
In the embodiment of the present invention, the version number information carried by the service data in the embodiment corresponding to fig. 2 is replaced by the time stamp written in the service data, which is used to indicate the sequence of writing the service data. In the embodiment of the present invention, reference may be made to the embodiment corresponding to fig. 2 for other specific implementation, and the description of the embodiment of the present invention is omitted.
The embodiment of the invention provides a business data management method based on an HDFS, which is realized by creating an identifier for each piece of business data written into an HDFS storage system and maintaining a preset form, when the data part of the business data corresponding to a certain business written into the HDFS needs to be modified or updated, the business data is written into a new piece of business data corresponding to the business, and the writing time of the new business data is later than the writing time of the business data needing to be modified or updated according to a timestamp carried by the business data. When the service data corresponding to the service is required to be read in the HDFS, all the effective service data stored in the HDFS storage system can be obtained through a preset form, all the service data corresponding to the service can be obtained through the unique identification of the service in the HDFS, and then the service data with the latest writing time is used as the service data for running the service according to the time stamp of each service data, so that the updating and the modification of the service data in the HDFS are realized.
Further, in order to increase the reading speed of the service data, in the embodiment of the invention, a second folder is created in the HDFS, and the service data corresponding to the second identifier carried with the effective identifier in the first folder is stored in the second folder at intervals of a preset time according to the preset form; and when the service system is operated, reading the service data for operating the service in the second folder according to the identification part of the service data in the second folder. Thereby improving the efficiency of data reading.
Further, in the embodiment of the present invention, a first folder includes a first description file, a second folder includes a second description file, and the first description file or the second description file is used for recording a start position of each service data, and the service data in the first folder is read according to the first description file; or, according to the second description file, reading the business data in the second folder.
Further, in the embodiment of the present invention, for any piece of service data in the first folder, the identification portion of the service data further includes a third identifier, where the third identifier is used to record a data length of a data portion of the service data, and after the service data for running the service is read in the HDFS, whether the data portion in the service data is complete is determined according to the third identifier of the service data; and if the data part of the service data is complete, operating the service through the service data. Thereby ensuring that the data part of the service data can be identified in time after the data part is lost.
Further, in order to reduce the occupied amount of the storage space, in the embodiment of the present invention, the data portion of the service data is compressed and serialized data.
Fig. 5 is a schematic diagram of a service data storage device based on HDFS according to an embodiment of the present invention, and in combination with fig. 5, the device includes: a first creation unit 51, a storage unit 52, a judgment unit 53, a second creation unit 54, and a reading unit 55;
a first creating unit 51 for creating a first folder in the distributed file system HDFS;
a storage unit 52, configured to store first service data corresponding to a service into the first folder, where, for any service data in the first folder, the service data includes an identification portion, and the identification portion of the service data includes a first identification and a second identification, where the first identification is a globally unique identification of the service in the HDFS, and the second identification is a globally unique identification of the service data in the HDFS;
a judging unit 53, configured to judge whether the first service data is correct;
if the first service data is correct, the second creating unit 54 is configured to create a unique mapping relationship between the second identifier and the valid identifier in the first service data in a preset form, and if the first service data is incorrect, the second creating unit 54 creates a unique mapping relationship between the second identifier and the invalid identifier in the first service data in the preset form, where the valid identifier is used to indicate that the first service data is correct data, and the invalid identifier is used to indicate that the first service data is incorrect data;
And a reading unit 55, configured to read service data for running the service in the HDFS according to the identification portion of the service data in the first folder and the preset form.
Further, for any service data in the first folder, the identification portion of the service data further includes a version number of the service data, and when the data portion in the first service data is updated or modified, the storage unit 52 is further configured to write second service data corresponding to the service in the first folder, where the version number of the second service data is higher than the version number of the first service data;
if the second service data is correct, the second creating unit 54 is further configured to create a unique mapping relationship between the second identifier and the valid identifier of the second service data in the preset form;
the reading unit 55 is configured to: reading a second mark carrying a valid mark from the preset form; reading business data corresponding to the second mark carrying the effective mark from the first folder; and acquiring all service data corresponding to the service according to the global unique identifier of the service in the HDFS, and reading the service data with the highest version number from all service data corresponding to the service as service data for running the service.
Further, when deleting all the service data corresponding to the service, the storage unit 52 is further configured to write third service data corresponding to the service in the first folder, where a version number of the third service data is a preset value;
the second creating unit 54 is further configured to create a unique mapping relationship between the second identifier and the valid identifier of the third service data in the preset form;
the reading unit 55 is configured to: reading a second mark carrying a valid mark from the preset form; reading business data corresponding to the second mark carrying the effective mark from the first folder; reading all service data corresponding to the service according to the global unique identifier of the service in the HDFS; if all the business data corresponding to the business have the business data with the version number of the preset value, judging that all the business data corresponding to the business are deleted, and not reading all the business data corresponding to the business.
Further, for any service data in the first folder, the identification portion of the service data further includes a timestamp written into the service data, and when the data portion in the first service data is updated or modified, the storage unit 52 is configured to write fourth service data corresponding to the service in the first folder;
If the fourth service data is correct, the second creating unit 54 is further configured to create a unique mapping relationship between the second identifier and the valid identifier of the fourth service data in the preset form;
the reading unit 55 is configured to: reading a second mark carrying a valid mark from the preset form; reading business data corresponding to the second mark carrying the effective mark from the first folder; acquiring all service data corresponding to the service according to the global unique identifier of the service in the HDFS; and reading the latest written service data as service data for running the service according to the time stamp contained in each piece of service data in all the service data corresponding to the service.
Further, the first creating unit 51 is further configured to create a second folder in the HDFS;
the storage unit 52 is further configured to store, at intervals of a preset time, service data corresponding to a second identifier of the valid identifier carried in the first folder to the second folder according to the preset form;
the reading unit 55 is further configured to read, when the service system is running, service data for running the service in the second folder according to the identification portion of the service data in the second folder.
Further, the first folder includes a first description file, the second folder includes a second description file, the first description file or the second description file is used for recording a start position of each service data, and the reading unit 55 is used for reading the service data in the first folder according to the first description file;
or, according to the second description file, reading the business data in the second folder.
Further, for any piece of service data in the first folder, the identification portion of the service data further includes a third identifier, where the third identifier is used to record a data length of a data portion of the service data, and after the service data for running the service is read in the HDFS, the determining unit 53 is further configured to determine whether the data portion in the service data is complete according to the third identifier of the service data; and if the data part of the service data is complete, operating the service through the service data.
Further, the data portion of the service data is compressed and serialized data.
The embodiment of the invention provides a business data management device based on an HDFS, which is used for establishing an identification for business data written into the HDFS, establishing an effective mark or an ineffective mark corresponding to the business data in a preset form and identifying whether the business data in the HDFS is correct or not, and reading the correct business data through an identification part of the business data in the HDFS and the preset form, so that system operation errors caused by the wrong business data are avoided.
Fig. 6 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 6, the terminal device 6 of this embodiment includes: a processor 60, a memory 61 and a computer program 62 stored in said memory 61 and executable on said processor 60, for example a HDFS based service data storage program. The processor 60, when executing the computer program 62, implements the steps of the various embodiments of the HDFS-based service data management method described above, such as steps 101 through 104 shown in fig. 1, or steps 201 through 205 shown in fig. 2, and so on. Alternatively, the processor 60, when executing the computer program 62, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 51 to 55 shown in fig. 5.
Illustratively, the computer program 62 may be partitioned into one or more modules/units that are stored in the memory 61 and executed by the processor 60 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 62 in the terminal device 6.
The terminal device 6 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the terminal device 6 and does not constitute a limitation of the terminal device 6, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.
The processor 60 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may be an external storage device of the terminal device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing the computer program and other programs and data required by the terminal device. The memory 61 may also be used for temporarily storing data that has been output or is to be output.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the steps of the business data management method based on the HDFS in any embodiment when being executed by a processor.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (8)

1. A business data management method based on HDFS, the method comprising:
creating a first folder in a distributed file system HDFS;
storing first service data corresponding to a service into the first folder, wherein the service data comprises an identification part aiming at any service data in the first folder, the identification part of the service data comprises a first identification and a second identification, the first identification is a global unique identification of the service in an HDFS, and the second identification is a global unique identification of the service data in the HDFS;
judging whether the first service data is correct, if the first service data is correct, creating a unique mapping relation between a second identifier and an effective identifier in the first service data in a preset form, and if the first service data is incorrect, creating a unique mapping relation between the second identifier and an ineffective identifier in the first service data in the preset form, wherein the effective identifier is used for indicating that the first service data is correct data, and the ineffective identifier is used for indicating that the first service data is incorrect data;
Reading service data for running the service in the HDFS according to the identification part of the service data in the first folder and the preset form;
for any service data in the first folder, the identification part of the service data further includes a version number of the service data, and when the data part in the first service data is updated or modified, the method further includes:
writing second business data corresponding to the business in the first folder, wherein the version number of the second business data is higher than that of the first business data;
if the second service data is correct, a unique mapping relation between a second identifier and an effective identifier of the second service data is created in the preset form;
reading service data for running the service in the HDFS includes:
reading a second mark carrying a valid mark from the preset form;
reading business data corresponding to the second mark carrying the effective mark from the first folder;
acquiring all service data corresponding to the service according to the global unique identifier of the service in the HDFS, and reading the service data with the highest version number from all service data corresponding to the service as service data for running the service;
When deleting all the service data corresponding to the service, the method further comprises the following steps:
writing third business data corresponding to the business into the first folder, wherein the version number of the third business data is a preset value;
creating a unique mapping relation between a second identifier and an effective identifier of the third service data in the preset form;
reading service data for running the service in the HDFS includes:
reading a second mark carrying a valid mark from the preset form;
reading business data corresponding to the second mark carrying the effective mark from the first folder;
reading all service data corresponding to the service according to the global unique identifier of the service in the HDFS;
if all the business data corresponding to the business have the business data with the version number of the preset value, judging that all the business data corresponding to the business are deleted, and not reading all the business data corresponding to the business.
2. The business data management method according to claim 1, wherein for any business data in the first folder, the identification portion of the business data further contains a time stamp written to the business data, and when the data portion in the first business data is updated or modified, the method further comprises:
Writing fourth business data corresponding to the business in the first folder;
if the fourth service data is correct, a unique mapping relation between a second identifier and an effective identifier of the fourth service data is created in the preset form;
the reading service data for running the service in the HDFS specifically includes:
reading a second mark carrying a valid mark from the preset form;
reading business data corresponding to the second mark carrying the effective mark from the first folder;
acquiring all service data corresponding to the service according to the global unique identifier of the service in the HDFS;
and reading the latest written service data as service data for running the service according to the time stamp contained in each piece of service data in all the service data corresponding to the service.
3. The service data management method according to any one of claims 1-2, characterized in that the method further comprises:
creating a second folder in the HDFS;
storing business data corresponding to a second identifier carrying an effective mark in the first folder to the second folder at preset time intervals according to the preset form;
And when the service system is operated, reading the service data for operating the service in the second folder according to the identification part of the service data in the second folder.
4. The service data management method according to claim 3, wherein the first folder contains a first description file, the second folder contains a second description file, and the first description file or the second description file is used for recording a start position of each service data, and the method further comprises:
reading business data in the first folder according to the first description file;
or, according to the second description file, reading the business data in the second folder.
5. The service data management method according to any one of claims 1 to 2, wherein for any one piece of service data in the first folder, the identification portion of the service data further includes a third identification for recording a data length of a data portion of the service data, and after reading the service data for running the service in the HDFS, the method further includes:
judging whether the data part in the service data is complete or not according to the third identifier of the service data;
And if the data part of the service data is complete, operating the service through the service data.
6. The traffic data management method according to claim 5, wherein the data portion of the traffic data is compressed and serialized data.
7. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.
8. A terminal device, characterized in that it comprises a memory, a processor, on which a computer program is stored which is executable on the processor, the processor executing the computer program to carry out the steps of the method according to any one of claims 1 to 6.
CN201811250917.1A 2018-10-25 2018-10-25 Service data management method based on HDFS and terminal equipment Active CN109542860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811250917.1A CN109542860B (en) 2018-10-25 2018-10-25 Service data management method based on HDFS and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811250917.1A CN109542860B (en) 2018-10-25 2018-10-25 Service data management method based on HDFS and terminal equipment

Publications (2)

Publication Number Publication Date
CN109542860A CN109542860A (en) 2019-03-29
CN109542860B true CN109542860B (en) 2023-07-07

Family

ID=65845477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811250917.1A Active CN109542860B (en) 2018-10-25 2018-10-25 Service data management method based on HDFS and terminal equipment

Country Status (1)

Country Link
CN (1) CN109542860B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895521A (en) * 2019-11-07 2020-03-20 浪潮电子信息产业股份有限公司 OSD and MON connection method, device, equipment and storage medium
CN112559469A (en) * 2020-10-16 2021-03-26 武汉中科通达高新技术股份有限公司 Data synchronization method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779180A (en) * 2012-06-29 2012-11-14 华为技术有限公司 Operation processing method of data storage system and data storage system
US8832154B1 (en) * 2009-12-08 2014-09-09 Netapp, Inc. Object location service for network-based content repository
CN106354840A (en) * 2016-08-31 2017-01-25 北京小米移动软件有限公司 File processing method and device and distributed file system
WO2017107984A1 (en) * 2015-12-25 2017-06-29 中兴通讯股份有限公司 Data recovery method and device
CN108388675A (en) * 2018-03-26 2018-08-10 深圳市买买提信息科技有限公司 Circulation method and terminal device are drawn in a kind of identity
CN108563781A (en) * 2018-04-25 2018-09-21 广州绿源信息科技有限公司 Internet of Things big data processing method based on Hadoop and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9842126B2 (en) * 2012-04-20 2017-12-12 Cloudera, Inc. Automatic repair of corrupt HBases

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8832154B1 (en) * 2009-12-08 2014-09-09 Netapp, Inc. Object location service for network-based content repository
CN102779180A (en) * 2012-06-29 2012-11-14 华为技术有限公司 Operation processing method of data storage system and data storage system
WO2017107984A1 (en) * 2015-12-25 2017-06-29 中兴通讯股份有限公司 Data recovery method and device
CN106354840A (en) * 2016-08-31 2017-01-25 北京小米移动软件有限公司 File processing method and device and distributed file system
CN108388675A (en) * 2018-03-26 2018-08-10 深圳市买买提信息科技有限公司 Circulation method and terminal device are drawn in a kind of identity
CN108563781A (en) * 2018-04-25 2018-09-21 广州绿源信息科技有限公司 Internet of Things big data processing method based on Hadoop and system

Also Published As

Publication number Publication date
CN109542860A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
US10235093B1 (en) Restoring snapshots in a storage system
US11537659B2 (en) Method for reading and writing data and distributed storage system
JP6553822B2 (en) Dividing and moving ranges in distributed systems
US8954401B2 (en) Systems and methods for providing increased scalability in deduplication storage systems
US8938430B2 (en) Intelligent data archiving
US8924353B1 (en) Systems and methods for copying database files
CN110765076B (en) Data storage method, device, electronic equipment and storage medium
US10628200B2 (en) Base state for thin-provisioned volumes
US8954398B1 (en) Systems and methods for managing deduplication reference data
US10372684B2 (en) Metadata peering with improved inodes
CN109284066B (en) Data processing method, device, equipment and system
CN104965835B (en) A kind of file read/write method and device of distributed file system
CN110019063B (en) Method for computing node data disaster recovery playback, terminal device and storage medium
CN109542860B (en) Service data management method based on HDFS and terminal equipment
CN107609011B (en) Database record maintenance method and device
US9696919B1 (en) Source/copy reference tracking with block pointer sets
CN115858488A (en) Parallel migration method and device based on data governance and readable medium
US10545825B2 (en) Fault-tolerant enterprise object storage system for small objects
CN109388651B (en) Data processing method and device
CN109347899B (en) Method for writing log data in distributed storage system
US9684668B1 (en) Systems and methods for performing lookups on distributed deduplicated data systems
US10592530B2 (en) System and method for managing transactions for multiple data store nodes without a central log
US10146466B1 (en) Merging mapping metadata to promote reference counting efficiency
US8595271B1 (en) Systems and methods for performing file system checks
US11314430B2 (en) Reading data in sub-blocks using data state information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant