CN112835967A - Data processing method, device, equipment and medium based on distributed storage system - Google Patents

Data processing method, device, equipment and medium based on distributed storage system Download PDF

Info

Publication number
CN112835967A
CN112835967A CN201911165510.3A CN201911165510A CN112835967A CN 112835967 A CN112835967 A CN 112835967A CN 201911165510 A CN201911165510 A CN 201911165510A CN 112835967 A CN112835967 A CN 112835967A
Authority
CN
China
Prior art keywords
target
target data
data
database
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911165510.3A
Other languages
Chinese (zh)
Other versions
CN112835967B (en
Inventor
郭永强
徐言林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Uniview Technologies Co Ltd
Original Assignee
Zhejiang Uniview Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Uniview Technologies Co Ltd filed Critical Zhejiang Uniview Technologies Co Ltd
Priority to CN201911165510.3A priority Critical patent/CN112835967B/en
Publication of CN112835967A publication Critical patent/CN112835967A/en
Application granted granted Critical
Publication of CN112835967B publication Critical patent/CN112835967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention discloses a data processing method, a device, equipment and a medium based on a distributed storage system. The method comprises the steps of responding to a processing request of target data, and determining target storage equipment according to target data identification of the target data; matching the target data identification with a mapping relation between the data identification and writing time in a first type database of the target storage equipment to determine the writing time of the target data; selecting a target second-class database from second-class databases of the target storage device according to the writing time of the target data; and reading target data from the target second-class database and processing the target data. According to the embodiment of the invention, the target data is quickly read from the database to be processed according to the data writing time, so that the data reading period is shortened, a large amount of invalid reading is avoided, the data processing speed and the performance of the distributed storage system are improved, and the operation efficiency of the database is optimized.

Description

Data processing method, device, equipment and medium based on distributed storage system
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a data processing method, a data processing device, data processing equipment and a data processing medium based on a distributed storage system.
Background
With large-scale video monitoring deployment in safe cities and the like, a large amount of back-end storage equipment needs to be matched to store acquired video data, particularly picture data. While a distributed storage system may be implemented with multiple storage devices, such as: the Object Storage Device (OSD) provides Storage service for the outside, and has the advantages of easy expansion, no data loss due to single-point failure, fast failure recovery and the like, so that the distributed Storage system can be widely deployed in the field of monitoring videos.
In actual use, each storage device in the distributed storage system is provided with a database, so that the database is used for storing metadata information of data. In the related art, when data is written into a storage device in a distributed storage system, data identification, a group ID, a storage POOL ID, and other information are usually stored as a KEY Value, for example, KEY _ POOL _ ID + PG _ ID + data identification and other information, and metadata corresponding to the data is stored as Value in a database of the storage device in a KEY Value pair form. In order to achieve data sharing and balancing, grouping is performed on data by taking a placed group as a unit, KEYs are Sorted in an ascending manner and then written into a Sorted Sequence Table (SST) file, so that the database is sequentially processed during subsequent capacity expansion or fault recovery. However, when a database holds a large number of SST files and performs a data read operation, the data read cycle may be long. In addition, since the KEYs in the database are sorted in an ascending order, rather than in an order of time issued by the client, when the outdated data is deleted, all data in the database need to be read out, and it is determined which data in all read data exceeds the retention period, and then the data exceeding the retention period is deleted, so that a large amount of invalid reads may be caused, and the performance of the distributed storage system may be affected.
Therefore, how to improve the data processing performance in the distributed storage system is a problem that needs to be solved at present.
Disclosure of Invention
Embodiments of the present invention provide a data processing method, apparatus, device, and medium based on a distributed storage system, which are used to solve the problems of long time and low speed for reading or deleting data stored in the distributed storage system, so as to improve the data processing performance of the distributed storage system.
In a first aspect, an embodiment of the present invention provides a data processing method based on a distributed storage system, where the method includes:
responding to a processing request of target data, and determining target storage equipment according to a target data identifier of the target data;
matching the target data identification with a mapping relation between the data identification and writing time in a first type database of the target storage equipment to determine the writing time of the target data;
selecting a target second class database from second class databases of target storage equipment according to the writing time of the target data, wherein the number of the second class databases is a positive integer greater than 1;
and reading the target data from the target second-class database, and processing the target data.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus based on a distributed storage system, where the apparatus includes:
the first determining module is used for responding to a processing request of target data and determining target storage equipment according to a target data identifier of the target data;
the second determining module is used for matching the target data identifier with a mapping relation between the data identifier and the writing time in a first type database of the target storage device to determine the writing time of the target data;
the selection module is used for selecting a target second class database from second class databases of the target storage equipment according to the writing time of the target data, wherein the number of the second class databases is a positive integer larger than 1;
and the processing module is used for reading the target data from the target second-class database and processing the target data.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the data processing method based on the distributed storage system according to any one of the embodiments of the present invention.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the data processing method based on the distributed storage system according to any one of the embodiments of the present invention.
The technical scheme disclosed by the embodiment of the invention has the following beneficial effects:
determining target storage equipment according to a target data identifier in a processing request of target data, matching the target data identifier with a mapping relation between data identifiers and writing time in a first class database of the target storage equipment, determining the writing time of the target data, selecting a target second class database from a plurality of second class databases of the target storage equipment according to the writing time of the target data, reading the target data from the target second class database, and processing the target data. Therefore, target data are quickly read from the database according to the data writing time to process the target data, so that the data reading period is shortened, a large amount of invalid reading is avoided, the data processing speed and the performance of the distributed storage system are improved, the operation efficiency of the database is optimized, and the user experience is improved.
Drawings
FIG. 1 is a schematic flow chart illustrating a process for storing target data in a distributed storage system according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a data processing method based on a distributed storage system according to an embodiment of the present invention;
FIG. 3 is a flow chart of a data reading process based on a distributed storage system according to an embodiment of the present invention;
FIG. 4 is a flow chart illustrating a data deletion process based on a distributed storage system according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a data processing apparatus based on a distributed storage system according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad invention. It should be further noted that, for convenience of description, only some structures, not all structures, relating to the embodiments of the present invention are shown in the drawings.
The embodiment of the invention provides a data processing method based on a distributed storage system, aiming at the problem of how to improve the data processing performance in the distributed storage system in the related technology.
According to the embodiment of the invention, the target storage equipment is determined according to the target data identification of the target data, the mapping relation between the data identification and the writing time in the first class database of the determined target storage equipment is matched according to the target data identification, the writing time of the target data is determined, and then the target second class database is selected from a plurality of second class databases of the target storage equipment according to the writing time of the target data, so that the target data is read from the target second class database and is processed. Therefore, target data are quickly read from the database according to the data writing time to process the target data, so that the data reading period is shortened, a large amount of invalid reading is avoided, the data processing speed and the performance of the distributed storage system are improved, the operation efficiency of the database is optimized, and the user experience is improved.
In order to clearly illustrate the process of reading and processing target data in the data processing method based on the distributed storage system according to the embodiment of the present invention, first, a process of storing the target data in the distributed storage system according to the embodiment of the present invention is described below.
Fig. 1 is a schematic flowchart of a process for storing target data in a distributed storage system according to an embodiment of the present invention, where the scheme for storing target data in a distributed storage system according to an embodiment of the present invention is executable by a data processing apparatus based on a distributed storage system, where the apparatus may be implemented by software and/or hardware, and may be integrated inside a computer device, where the computer device may be any device having a data processing function. As shown in fig. 1, the method comprises the steps of:
s101, responding to a storage request of target data, and determining target storage equipment according to the target data to be written.
In this embodiment, the target Storage Device is specifically an Object Storage Device (OSD). The target data is specifically metadata.
In actual use, the distributed storage system can be used as a back end of a client, and various operations can be executed on the client when a user uses the distributed storage system. Such as data read, data delete, and data write operations.
For example, when a client receives a storage request (write request) for target data, the target data and identification information thereof carried in the storage request may be sent to the distributed storage system, so that the distributed storage system determines a target storage device by using the prior art, which is not described herein in detail.
S102, storing the mapping relation between the data identification and the writing time of the target data into a first type database of the target storage device.
The data identifier may refer to a data name, or any other information capable of uniquely determining the data identity, such as a number or a serial number. In this embodiment, the data identifier is preferably a data name.
Optionally, the data identifier of the target data may be used as a key Value, and the write time of the target data may be used as a Value, and the Value is stored in the first type of database of the target storage device in a key-Value pair (key-Value) manner, so as to provide a favorable condition for subsequent data query or restart of the distributed storage system. In this embodiment, the key value and the value in the key value pair may also be encoded, and the encoded key value pair is stored in the first-class database.
It should be noted that, in the embodiment of the present invention, the mapping relationship between the data identifier of the target data and the write-in time is stored in the first type of database of the target storage device, and the mapping relationship between the data identifier of the target data and the write-in time may also be recorded in the memory of the target storage device, so that when the target data is subsequently read, and under a normal operation condition of the distributed storage system, the corresponding write-in time is directly obtained from the memory of the target storage device based on the data identifier of the target data, and a subsequent processing operation is performed according to the obtained write-in time, so that the speed and efficiency of data processing may be improved. The processing operation may include a query operation, a read operation, a delete operation, and the like.
S103, selecting a target second-class database from second-class databases of the target storage device according to the writing time of the target data, and writing the target data into the target second-class database, wherein the number of the second-class databases is a positive integer greater than 1.
In the prior art, when a distributed storage system stores data, it usually stores data identifiers, group IDs, storage POOL IDs, and other information as KEY values, for example, KEY _ ID + PG _ ID + data identifiers, and other information, and stores metadata corresponding to the data as Value in a database of a storage device in a KEY Value pair manner. For data apportionment balance, grouping data by taking a grouped set as a unit, sorting KEY in an ascending manner, and writing the KEY into a Sorted Sequence Table (SST) file. There is a problem in that when a database of a storage device holds a large number of SST files and a data read operation is performed, a data read cycle may be long. In addition, since KEYs in the database are sorted in an ascending order, rather than in an order of time issued by the client, when data identifiers need to be read and sorted according to a time order in which data is written, all data in the database need to be read, then data to be processed is determined from all the read data, then the data to be processed is processed, for example, when outdated data is deleted, all data in the database need to be read, and it is determined which data in all the read data exceeds a retention period, and then the data exceeding the retention period is deleted. Resulting in a large number of invalid reads that affect the performance of the distributed storage system.
The method aims to solve the problems that the performance of the distributed storage system is influenced by long data reading period and a large number of invalid readings. In the embodiment of the invention, the writing time based on the data is creatively provided, and the data is stored according to the writing time, so that the target data can be rapidly read directly according to the writing time when the data is read subsequently, the data reading period can be shortened, a large amount of invalid reading is avoided, and the performance of a distributed storage system is improved.
The second type database of the target storage device may be created periodically according to a preset period, or may be created by detecting that the capacity of the database is triggered when the capacity exceeds a threshold value, where the second type database is not specifically limited herein.
For example, in this embodiment, the target storage device corresponds to a plurality of second-class databases, and each second-class database corresponds to a creation time. Therefore, in the embodiment of the invention, the writing time of the target data is matched with the creation time of each second-class database of the target storage device, the successfully-matched second-class database is determined as the target second-class database, and the target data is written into the target second-class database, so that the data is stored according to the sequence of the writing time, and the corresponding data can be rapidly acquired directly based on the writing time when the subsequent data is read and processed, thereby providing a favorable condition for data processing, and avoiding the problems of long reading period and large number of invalid reads existing in one database of the target storage device in the related technology. Wherein, the successful matching can mean that the writing time of the target data falls within the creation time period of any second type database.
It should be noted that, the specific way of writing the target data into the target second-type database is as follows:
and taking the storage POOL ID, the homing group ID, the writing time of the target data, the data identification of the target data and other information as key values according to the POOL _ ID + PG _ ID + writing time + data identification and related information, and simultaneously taking the target data as values, namely taking the metadata as values and writing the values into the target second-class database in a key value pair mode.
According to the technical scheme of the embodiment of the invention, the mapping relation between the data identification and the writing time of the target data is stored in the first class database of the target storage device, then the target second class database is determined from the plurality of second class databases of the target storage device according to the writing time of the target data, and the target data is written into the target second class database by taking the writing time + data identification as a new data identification, so that the writing sequence of the target data is kept consistent with the writing sequence of the second class databases as much as possible, and thus a large amount of invalid reading can be reduced when the target data is read or the outdated target data is deleted. In addition, by dividing the second class database by a time period, the capacity of the database can be reduced, thereby reducing the range of data reading.
Further, after S101, the embodiment of the present invention further includes: and if the target data type is a picture, controlling a switch for reading the second type database of the target storage equipment to be in a closed state.
In the field of video surveillance, surveillance equipment can capture video and pictures. Since the collected pictures are usually not renamed, the metadata of the newly collected pictures are not stored in the database of the storage device in the distributed storage system, but in order to realize the compatible function, the distributed storage system queries the metadata of the newly collected pictures in the database of the storage device, thereby causing unnecessary resource waste.
Therefore, after the target storage device is determined, when the target data type is determined to be a picture, the switch for reading the second type database of the target storage device is controlled to be in the off state, so that unnecessary reading operation is reduced, and the performance of the distributed storage system is improved.
As can be seen from the above description, when storing target data, in the embodiment of the present invention, first, a mapping relationship between a data identifier of the target data and a write time is stored in a first class database of a target storage device, and then, a target second class database is selected from a plurality of second class databases of the target storage device according to the write time of the target data, and the target data is written into the target second class database based on the write time + the data identifier as a new data identifier, so as to complete a storage operation of the target data. Based on the foregoing embodiments, the following describes in detail the processes of reading target data and processing the read target data in the data processing method based on the distributed storage system according to the embodiments of the present invention.
As shown in fig. 2, the method may specifically include:
s201, responding to the processing request of the target data, and determining the target storage device according to the target data identification of the target data.
For example, when a client receives a processing request (a read request or a delete request) for target data, a target data identifier carried in a storage request may be sent to the distributed storage system, so that the distributed storage system determines a target storage device based on the target data identifier.
S202, matching the target data identification with a mapping relation between the data identification and the writing time in a first type database of the target storage device, and determining the writing time of the target data.
Optionally, in the embodiment of the present invention, the writing time of the target data may be determined from a mapping relationship between the data identifier and the writing time in the first type of database of the target storage device according to the target data identifier.
It should be noted that, the memory of the target storage device may also record the mapping relationship between the data identifier of the target data and the write-in time, so that when the distributed storage system operates normally in this embodiment, the mapping relationship between the data identifier and the write-in time may also be preferentially obtained from the memory of the target storage device according to the target data identifier, and the write-in time of the target data is determined.
S203, selecting a target second-class database from second-class databases of the target storage device according to the writing time of the target data, wherein the number of the second-class databases is a positive integer larger than 1.
During specific implementation, the target second-class database is determined by matching the writing time of the target data with the creation time of the second-class database of the target storage device.
That is, if the creation time of any second-class database matches the write time of the target data, the second-class database corresponding to the creation time is determined as the target second-class database.
For example, if the write time of the target data X1 is 2019-10-1, when the creation time of the 3 rd second class database in the target storage device is 2019-10-1, the 3 rd second class database is determined as the target second class database.
For another example, if the write time of the target data X1 is 2019-10-12, when the creation time of the 11 th second-class database in the target storage device is 2019-10-10 and the creation time of the 12 th second-class database is 2019-10-17, the 11 th second-class database is determined as the target second-class database.
And S204, reading the target data from the target second-class database, and processing the target data.
Optionally, after the target second-class database is determined, the target data may be read according to the target data identifier of the target data, and the target data is processed. In an embodiment of the present invention, processing the target data may include: batch read and delete processing, etc.
Further, since the read target data is metadata, for this reason, the present embodiment may also read object data corresponding to the metadata from a corresponding storage medium based on the read metadata.
It can be understood that, in the embodiment of the present invention, the amount of data stored in each second class database of the target storage device is much smaller than the amount of data stored in one database of the target storage device in the existing scheme, so that the efficiency of reading the target data is significantly higher than that in the existing scheme, and when the target data is read in batch, because the data is sorted in the group of the second class databases according to the sequence of the data writing time, the operations on the second class databases are performed sequentially, so that the operation efficiency of the databases is improved.
According to the technical scheme provided by the embodiment of the invention, the target storage equipment is determined according to the target data identification in the processing request of the target data, the target data identification is matched with the mapping relation between the data identification and the writing time in the first class database of the target storage equipment, the writing time of the target data is determined, the target second class database is selected from a plurality of second class databases of the target storage equipment according to the writing time of the target data, then the target data is read from the target second class database, and the target data is processed. Therefore, target data are quickly read from the database according to the data writing time to process the target data, so that the data reading period is shortened, a large amount of invalid reading is avoided, the data processing speed and the performance of the distributed storage system are improved, the operation efficiency of the database is optimized, and the user experience is improved.
The embodiment of the present invention provides a preferred implementation manner of a data processing method based on a distributed storage system on the basis of the above embodiments, and the target data is read from the target second-type database. Fig. 3 is a schematic flow chart of a data reading process based on a distributed storage system according to an embodiment of the present invention, where step S204 may further specifically include steps S304-S305, and as shown in fig. 3, the method includes the following steps:
s301, responding to the processing request of the target data, and determining the target storage device according to the target data identification of the target data.
S302, matching the target data identification with a mapping relation between the data identification and the writing time in a first type database of the target storage device, and determining the writing time of the target data.
S303, selecting a target second-class database from second-class databases of the target storage device according to the writing time of the target data, wherein the number of the second-class databases is a positive integer larger than 1.
S304, obtaining a new target data identifier according to the target data identifier and the write-in time of the target data.
Obtaining a new target data identifier according to the target data identifier and the write-in time of the target data, including: and adding the writing time of the target data to the front of the target data identifier to obtain a new target data identifier.
S305, reading the target data from the target second-class database according to the new target data identification, and processing the target data.
According to the technical scheme of the embodiment of the invention, the new target data identifier is obtained by splicing the writing time of the target data before the target data identifier, so that the searching range can be reduced when the target data is read according to the new target data identifier, the data reading speed is increased, and the business process is optimized.
The embodiment of the present invention further provides a preferred real-time method for processing the read target data on the basis of the first embodiment, and reads the target data written before the deletion time point from the target second-type database according to the deletion time point in the processing request, and performs a deletion operation on the read target data, thereby reading the target data to be deleted according to the deletion time point, and avoiding generating a large amount of invalid reads. Fig. 4 is a schematic flowchart of a data deletion process based on a distributed storage system according to an embodiment of the present invention, and as shown in fig. 4, the method includes the following steps:
s401, responding to the processing request of the target data, and determining the target storage device according to the target data identification of the target data.
S402, matching the target data identification with the mapping relation between the data identification and the writing time in the first type database of the target storage device, and determining the writing time of the target data.
S403, selecting a target second-class database from second-class databases of the target storage device according to the writing time of the target data, wherein the number of the second-class databases is a positive integer greater than 1.
S404, reading target data written before the deletion time point from the target second-class database according to the deletion time point in the processing request.
S405, writing a deletion instruction of the target data into the target second-class database to instruct the target second-class database to delete the target data.
The deleting time point may be set according to actual needs, and is not limited herein.
For example, if the deletion time point is 2019-9-31 and the target second-class database is the 3 rd second-class database, all data (target data) written before 2019-9-31 is read from the 3 rd second-class database, and a deletion instruction for all read data is written into the 3 rd second-class database, so that the 3 rd second-class database deletes the target data.
According to the technical scheme provided by the embodiment of the invention, the target data written before the deletion time point is read from the target second-class database according to the deletion time point in the processing request, and then the deletion instruction of the target data is written into the target second-class database so as to indicate the target second-class data to delete the target data. Therefore, the outdated data can be accurately acquired based on the deleting time point and deleted, so that a large amount of invalid reading is reduced, the reading speed of the outdated data is increased, the reading period is shortened, the outdated data deleting efficiency is improved, and the performance of the distributed storage system is improved.
In order to achieve the above object, an embodiment of the present invention further provides a data processing apparatus based on a distributed storage system. Fig. 5 is a schematic structural diagram of a data processing apparatus based on a distributed storage system according to an embodiment of the present invention. As shown in fig. 5, the data processing apparatus based on the distributed storage system according to the embodiment of the present invention includes: a first determination module 510, a second determination module 520, a selection module 530, and a processing module 540.
The first determining module 510 is configured to, in response to a processing request for target data, determine a target storage device according to a target data identifier of the target data;
a second determining module 520, configured to match the target data identifier with a mapping relationship between data identifiers and write time in a first type of database of the target storage device, and determine write time of the target data;
a selecting module 530, configured to select a target second class database from second class databases of a target storage device according to write time of the target data, where the number of the second class databases is a positive integer greater than 1;
and the processing module 540 is configured to read the target data from the target second-class database, and process the target data.
As an optional implementation manner of the embodiment of the present invention, the selecting module 530 is specifically configured to: and matching the writing time of the target data with the creation time of a second class database of the target storage equipment to determine the target second class database.
As an optional implementation manner of the embodiment of the present invention, the processing module 540 includes: a determination unit and a first reading unit; wherein the content of the first and second substances,
the determining unit is used for obtaining a new target data identifier according to the target data identifier and the writing time of the target data;
and the first reading unit is used for reading the target data from the target second-class database according to the new target data identification.
As an optional implementation manner of the embodiment of the present invention, the determining unit is specifically configured to:
and adding the writing time of the target data to the front of the target data identifier to obtain a new target data identifier.
As an optional implementation manner of the embodiment of the present invention, the processing module 540 further includes: a second reading unit and a deletion unit.
The second reading unit is used for reading target data written before the deletion time point from the target second-class database according to the deletion time point in the processing request;
and the deleting unit is used for writing a deleting instruction of the target data into the target second-class database so as to instruct the target second-class database to delete the target data.
As an optional implementation manner of the embodiment of the present invention, the apparatus further includes: a third determining module, a storage module and a writing module, wherein:
the third determining module is used for responding to a storage request of the target data and determining the target storage equipment according to the target data to be written;
the storage module is used for storing the mapping relation between the data identification and the writing time of the target data into a first type database of the target storage equipment;
and the writing module is used for selecting a target second-class database from second-class databases of the target storage equipment according to the writing time of the target data and writing the target data into the target second-class database, wherein the number of the second-class databases is a positive integer greater than 1.
As an optional implementation manner of the embodiment of the present invention, the apparatus further includes: and a control module.
And the control module is used for controlling a switch for reading the second database of the target storage device to be in a closed state if the target data type is a picture.
It should be noted that the foregoing explanation of the embodiment of the data processing method based on the distributed storage system is also applicable to the data processing apparatus based on the distributed storage system of the embodiment, and the implementation principle thereof is similar, and is not described herein again.
According to the technical scheme provided by the embodiment of the invention, the target storage equipment is determined according to the target data identification in the processing request of the target data, the target data identification is matched with the mapping relation between the data identification and the writing time in the first class database of the target storage equipment, the writing time of the target data is determined, the target second class database is selected from a plurality of second class databases of the target storage equipment according to the writing time of the target data, then the target data is read from the target second class database, and the target data is processed. Therefore, target data are quickly read from the database according to the data writing time to process the target data, so that the data reading period is shortened, a large amount of invalid reading is avoided, the data processing speed and the performance of the distributed storage system are improved, the operation efficiency of the database is optimized, and the user experience is improved.
In order to achieve the above object, an embodiment of the present invention further provides a computer device. Fig. 6 is a schematic structural diagram of a computer device provided by the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 600 suitable for use in implementing embodiments of the invention. The computer device provided by the embodiment of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the data processing method based on the distributed storage system according to any one of the embodiments of the present invention.
The computer device 600 shown in fig. 6 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention. As shown in fig. 6, computer device 600 is in the form of a general purpose computing device. The components of computer device 600 may include, but are not limited to: one or more processors or processing units 610 (i.e., processors in embodiments of the invention), a system memory 620 (i.e., memory in embodiments of the invention), and a bus 18 that couples various system components including the system memory 620 and the processing unit 610.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 600 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 600 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 620 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The computer device 600 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 620 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 620, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The computer device 600 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the computer device 600, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 600 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the computer device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) through the network adapter 20. As shown, the network adapter 20 communicates with the other modules of the computer device 600 via the bus 18. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 610 executes various functional applications and data processing by running the program stored in the system memory 620, for example, implementing the data processing method based on the distributed storage system provided by the embodiment of the present invention, including:
responding to a processing request of target data, and determining target storage equipment according to a target data identifier of the target data;
matching the target data identification with a mapping relation between the data identification and writing time in a first type database of the target storage equipment to determine the writing time of the target data;
selecting a target second class database from second class databases of target storage equipment according to the writing time of the target data, wherein the number of the second class databases is a positive integer greater than 1;
and reading the target data from the target second-class database, and processing the target data.
It should be noted that the explanation of the foregoing embodiment of the data processing method based on the distributed storage system is also applicable to the computer device of the embodiment, and the implementation principle thereof is similar and will not be described herein again.
The computer device provided by the embodiment of the invention determines the target storage device according to the target data identifier in the processing request of the target data, matches the target data identifier with the mapping relation between the data identifier and the writing time in the first class database of the target storage device, determines the writing time of the target data, selects the target second class database from a plurality of second class databases of the target storage device according to the writing time of the target data, reads the target data from the target second class database, and processes the target data. Therefore, target data are quickly read from the database according to the data writing time to process the target data, so that the data reading period is shortened, a large amount of invalid reading is avoided, the data processing speed and the performance of the distributed storage system are improved, the operation efficiency of the database is optimized, and the user experience is improved.
In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium. The computer-readable storage medium provided by the embodiment of the present invention stores thereon a computer program, which when executed by a processor implements a data processing method based on a distributed storage system as provided by the embodiment of the present invention, the method including:
responding to a processing request of target data, and determining target storage equipment according to a target data identifier of the target data;
matching the target data identification with a mapping relation between the data identification and writing time in a first type database of the target storage equipment to determine the writing time of the target data;
selecting a target second class database from second class databases of target storage equipment according to the writing time of the target data, wherein the number of the second class databases is a positive integer greater than 1;
and reading the target data from the target second-class database, and processing the target data.
Of course, the computer-readable storage medium provided in the embodiments of the present invention has computer-executable instructions that are not limited to the method operations described above, and may also perform related operations in the data processing method based on the distributed storage system provided in any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the embodiments of the present invention can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better implementation in many cases. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device) perform the methods described in the embodiments of the present invention.
It should be noted that, in the above device embodiment, each included unit and each included module are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.
It should be noted that the foregoing is only a preferred embodiment of the present invention and the technical principles applied. Those skilled in the art will appreciate that the embodiments of the present invention are not limited to the specific embodiments described herein, and that various obvious changes, adaptations, and substitutions are possible, without departing from the scope of the embodiments of the present invention. Therefore, although the embodiments of the present invention have been described in more detail through the above embodiments, the embodiments of the present invention are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the concept of the embodiments of the present invention, and the scope of the embodiments of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A data processing method based on a distributed storage system is characterized by comprising the following steps:
responding to a processing request of target data, and determining target storage equipment according to a target data identifier of the target data;
matching the target data identification with a mapping relation between the data identification and writing time in a first type database of the target storage equipment to determine the writing time of the target data;
selecting a target second class database from second class databases of target storage equipment according to the writing time of the target data, wherein the number of the second class databases is a positive integer greater than 1;
and reading the target data from the target second-class database, and processing the target data.
2. The method of claim 1, wherein selecting the target second class database from the second class databases of the target storage device based on the write time of the target data comprises:
and matching the writing time of the target data with the creation time of a second class database of the target storage equipment to determine the target second class database.
3. The method of claim 1, wherein reading the target data from the target second class database comprises:
obtaining a new target data identifier according to the target data identifier and the write-in time of the target data;
and reading the target data from the target second-class database according to the new target data identification.
4. The method of claim 3, wherein obtaining a new target data identifier according to the target data identifier and the write time of the target data comprises:
and adding the writing time of the target data to the front of the target data identifier to obtain a new target data identifier.
5. The method of claim 1, wherein reading the target data from the target second class database comprises:
reading target data written before the deletion time point from the target second-class database according to the deletion time point in the processing request;
processing the target data, including:
writing a deletion instruction of the target data into the target second-class database to instruct the target second-class database to delete the target data.
6. The method of claim 1, wherein prior to responding to the processing request for the target data, further comprising:
responding to a storage request for target data, and determining target storage equipment according to the target data to be written;
storing the mapping relation between the data identification and the writing time of the target data into a first type database of the target storage equipment;
and selecting a target second-class database from second-class databases of the target storage equipment according to the writing time of the target data, and writing the target data into the target second-class database, wherein the number of the second-class databases is a positive integer greater than 1.
7. The method of claim 6, wherein after determining the target storage device according to the target data to be written, further comprising:
and if the target data type is a picture, controlling a switch for reading the second type database of the target storage equipment to be in a closed state.
8. A data processing apparatus based on a distributed storage system, comprising:
the first determining module is used for responding to a processing request of target data and determining target storage equipment according to a target data identifier of the target data;
the second determining module is used for matching the target data identifier with a mapping relation between the data identifier and the writing time in a first type database of the target storage device to determine the writing time of the target data;
the selection module is used for selecting a target second class database from second class databases of the target storage equipment according to the writing time of the target data, wherein the number of the second class databases is a positive integer larger than 1;
and the processing module is used for reading the target data from the target second-class database and processing the target data.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the data processing method based on the distributed storage system according to any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of data processing based on a distributed storage system according to any one of claims 1 to 7.
CN201911165510.3A 2019-11-25 2019-11-25 Data processing method, device, equipment and medium based on distributed storage system Active CN112835967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911165510.3A CN112835967B (en) 2019-11-25 2019-11-25 Data processing method, device, equipment and medium based on distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911165510.3A CN112835967B (en) 2019-11-25 2019-11-25 Data processing method, device, equipment and medium based on distributed storage system

Publications (2)

Publication Number Publication Date
CN112835967A true CN112835967A (en) 2021-05-25
CN112835967B CN112835967B (en) 2023-07-21

Family

ID=75922177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911165510.3A Active CN112835967B (en) 2019-11-25 2019-11-25 Data processing method, device, equipment and medium based on distributed storage system

Country Status (1)

Country Link
CN (1) CN112835967B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115718739A (en) * 2023-01-09 2023-02-28 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium based on database back-end storage

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150019495A1 (en) * 2013-07-09 2015-01-15 Delphix Corp. Customizable storage system for virtual databases
US20150317212A1 (en) * 2014-05-05 2015-11-05 Oracle International Corporation Time-based checkpoint target for database media recovery
US20170094011A1 (en) * 2015-09-30 2017-03-30 International Business Machines Corporation Data consistency maintenance for sequential requests
US20170185296A1 (en) * 2015-12-23 2017-06-29 EMC IP Holding Company LLC Methods and apparatus for controlling data reading from a storage system
CN109189785A (en) * 2018-08-10 2019-01-11 平安科技(深圳)有限公司 Date storage method, device, computer equipment and storage medium
CN109299183A (en) * 2018-11-20 2019-02-01 北京锐安科技有限公司 A kind of data processing method, device, terminal device and storage medium
CN109947373A (en) * 2019-03-28 2019-06-28 北京大道云行科技有限公司 Data processing method and device
US10409511B1 (en) * 2018-06-30 2019-09-10 Western Digital Technologies, Inc. Multi-device storage system with distributed read/write processing
CN110471629A (en) * 2019-08-22 2019-11-19 中国工商银行股份有限公司 A kind of method, apparatus of dynamic capacity-expanding, storage medium, equipment and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150019495A1 (en) * 2013-07-09 2015-01-15 Delphix Corp. Customizable storage system for virtual databases
US20150317212A1 (en) * 2014-05-05 2015-11-05 Oracle International Corporation Time-based checkpoint target for database media recovery
US20170094011A1 (en) * 2015-09-30 2017-03-30 International Business Machines Corporation Data consistency maintenance for sequential requests
US20170185296A1 (en) * 2015-12-23 2017-06-29 EMC IP Holding Company LLC Methods and apparatus for controlling data reading from a storage system
US10409511B1 (en) * 2018-06-30 2019-09-10 Western Digital Technologies, Inc. Multi-device storage system with distributed read/write processing
CN109189785A (en) * 2018-08-10 2019-01-11 平安科技(深圳)有限公司 Date storage method, device, computer equipment and storage medium
CN109299183A (en) * 2018-11-20 2019-02-01 北京锐安科技有限公司 A kind of data processing method, device, terminal device and storage medium
CN109947373A (en) * 2019-03-28 2019-06-28 北京大道云行科技有限公司 Data processing method and device
CN110471629A (en) * 2019-08-22 2019-11-19 中国工商银行股份有限公司 A kind of method, apparatus of dynamic capacity-expanding, storage medium, equipment and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115718739A (en) * 2023-01-09 2023-02-28 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium based on database back-end storage

Also Published As

Publication number Publication date
CN112835967B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
US9870288B2 (en) Container-based processing method, apparatus, and system
EP2863310B1 (en) Data processing method and apparatus, and shared storage device
US10175894B1 (en) Method for populating a cache index on a deduplicated storage system
JP5663585B2 (en) Deduplication storage system with multiple indexes for efficient file storage
US9116903B2 (en) Method and system for inserting data records into files
US8195619B2 (en) Extent reference count update system and method
US8751768B2 (en) Data storage reclamation systems and methods
EP2318927B1 (en) Systems and methods for tracking changes to a volume
CN108733306B (en) File merging method and device
US10515078B2 (en) Database management apparatus, database management method, and storage medium
US10169391B2 (en) Index management
JP2008217209A (en) Difference snapshot management method, computer system and nas computer
CN104731896A (en) Data processing method and system
CN113806300B (en) Data storage method, system, device, equipment and storage medium
CN108475201B (en) Data acquisition method in virtual machine starting process and cloud computing system
CN103198122A (en) Method and device for restarting in-memory database
CN110633046A (en) Storage method and device of distributed system, storage equipment and storage medium
CN112835967B (en) Data processing method, device, equipment and medium based on distributed storage system
US9798793B1 (en) Method for recovering an index on a deduplicated storage system
CN117112522A (en) Concurrent process log management method, device, equipment and storage medium
US9436697B1 (en) Techniques for managing deduplication of data
US10204002B1 (en) Method for maintaining a cache index on a deduplicated storage system
CN112835511B (en) Data writing method, device, equipment and medium of distributed storage cluster
CN113986471A (en) Method, device, equipment and storage medium for safely deleting mirror image file of virtual machine
US11340814B1 (en) Placing data in a data storage array based on detection of different data streams within an incoming flow of data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant