CN111400298A - Data processing method and device and computer readable storage medium - Google Patents

Data processing method and device and computer readable storage medium Download PDF

Info

Publication number
CN111400298A
CN111400298A CN202010303369.5A CN202010303369A CN111400298A CN 111400298 A CN111400298 A CN 111400298A CN 202010303369 A CN202010303369 A CN 202010303369A CN 111400298 A CN111400298 A CN 111400298A
Authority
CN
China
Prior art keywords
data
expiration time
time
expiration
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010303369.5A
Other languages
Chinese (zh)
Inventor
郭子亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202010303369.5A priority Critical patent/CN111400298A/en
Publication of CN111400298A publication Critical patent/CN111400298A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Abstract

The embodiment of the application discloses a data processing method, which comprises the following steps: acquiring a time management file; the time management file is used for recording expiration time information corresponding to each data in a database, and the expiration time information has time sequence; analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data; wherein i is an integer of 1 or more and N or less; n is the total number of data; the 1 st data is the data with the earliest expiration time in the database; if the expiration time of the ith data is greater than or equal to the current time, determining the previous i data as expired data, and deleting the expired data from the database when the expiration deletion condition is met. The embodiment of the application also discloses a data processing device, electronic equipment and a computer readable storage medium.

Description

Data processing method and device and computer readable storage medium
Technical Field
The present application relates to the field of database management technologies, and in particular, to a data processing method and apparatus, an electronic device, and a computer-readable storage medium.
Background
With the continuous development of data services, the data volume increases with blowout, and in internet services, a database based on a distributed Key-Value (KV) storage system is usually used for data storage.
In practical application, a large amount of data is stored in a database based on a distributed KV system; for some data stored in the database, the developer expects to remain in the database for only a certain period of time, and if the time for storing the data exceeds the retention time of the data, the expired data needs to be deleted.
At present, in the process of processing expired data, it is necessary to traverse the relevant information of each data in the database and determine whether each data in the database is expired one by one, so that the processing efficiency of deleting the expired data is low.
Disclosure of Invention
The embodiment of the application provides a data processing method and device, electronic equipment and a computer readable storage medium, and can improve the processing efficiency of deleting outdated data.
In a first aspect, an embodiment of the present application provides a data processing method, where the method includes:
acquiring a time management file; the time management file is used for recording expiration time information corresponding to each data in a database, and the expiration time information has time sequence;
analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data; wherein i is an integer of 1 or more and N or less; n is the total number of data; the 1 st data is the data with the earliest expiration time in the database;
if the expiration time of the ith data is greater than or equal to the current time, determining the previous i data as expired data, and deleting the expired data from the database when the expiration deletion condition is met.
In a second aspect, an embodiment of the present application provides a data processing apparatus, where the apparatus includes:
an acquisition unit configured to acquire a time management file; the time management file is used for recording expiration time information corresponding to each data in a database, and the expiration time information has time sequence;
the analysis unit is used for analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data; wherein i is an integer of 1 or more and N or less; n is the total number of data; the 1 st data is the data with the earliest expiration time in the database;
and the processing unit is used for determining the previous i data as expired data if the expiration time of the ith data is greater than or equal to the current time, and deleting the expired data from the database when the expiration deletion condition is met.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory for storing a computer program capable of running on the processor;
wherein the processor is configured to execute the steps of the data processing method of the first aspect when running the computer program.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the data processing method according to the first aspect.
According to the data processing method and device, the electronic equipment and the computer storage medium, firstly, a time management file is obtained; the time management file is used for recording expiration time information corresponding to each data in the database, and the expiration time information has time sequence; then, analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data; if the expiration time of the ith data is greater than or equal to the current time, determining the previous i data as expired data, and deleting the expired data from the database when the expiration deletion condition is met; therefore, the data can be analyzed according to the time sequence of the expiration time information to obtain the expiration time of the data; when detecting that the expiration time of the ith data is greater than the current time, stopping analyzing the data, and directly determining the previous i data as expired data to delete; that is to say, only partial data in the database is analyzed, the expired data can be determined, and each data in the database does not need to be analyzed one by one, so that the expired data can be accurately positioned, and the processing efficiency when the expired data is deleted is improved.
Drawings
Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of another data processing method according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of another data processing method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
Detailed Description
So that the manner in which the features and elements of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings.
It should be noted that the terms "first", "second", and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
In practical application, rocksDB is widely applied to internet services as a local persistent storage engine. The developer can store data that needs to be persisted in the RocksDB so that the data stored in the RocksDB is not lost even if the electronic device in which the RocksDB is installed is restarted. However, in a specific scenario, a developer may want some data to be retained for a certain period of time, and if the retention time of the data in RocksDB exceeds, the data needs to be deleted.
In the related technology, a business logic function is usually adopted to traverse the related information of each data in the database, and whether each data in the database is overdue is judged one by one; that is, by scanning the Key space of the whole RocksDB, it is determined one by one whether the data corresponding to each Key in the RocksDB is expired, and if so, the data corresponding to the Key is deleted. Thus, deletion is inefficient and specific outdated data cannot be accurately located.
In order to solve the problems in the related art, an embodiment of the present application provides a data processing method, where an execution subject of the data processing method may be the image classification apparatus provided in the embodiment of the present application, or an electronic device integrated with the image classification apparatus, where the image classification apparatus may be implemented in a hardware or software manner. The electronic device may be a server or an industrial computer, and the like, and the embodiment of the present application is not limited to the type of the electronic device.
Referring to fig. 1, fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application, and as shown in fig. 1, the data processing method includes the following steps:
step 110, acquiring a time management file; the time management file is used for recording expiration time information corresponding to each data in the database, and the expiration time information has time sequence.
The database related to the embodiment of the application can be a database supporting persistent storage; it is to be understood that the data in the database is not lost after the database is started or restarted. However, as data grows, in order to ensure that the database can store data normally, the data in the database needs to be deleted.
In the embodiment provided by the application, the data processing device can manage and maintain the expiration time information of the data in the database through the time management file. It is understood that the time management file stores the expiration time information of each data in the database that needs to be deleted periodically.
In one possible embodiment, the expiration time information includes at least an expiration time of each data; the expiration time refers to the last time the data was stored in the database.
In another possible embodiment, the expiration time information may further include identification information of each data and/or a data type of the identification information of each data; the identification information can uniquely identify the data in the database, and the data type of the identification information is used for indicating the deleting mode of the target data.
In the embodiments provided in the present application, the expiration time may be determined by, but is not limited to, any one of the following:
1. at the Time of data storage, the operator can set a Time-To-live value (Time To L ive, TT L) for the stored data, so that the data processing device can determine the Time of expiry of the data according To the TT L of the data and the storage Time of the data.
2. When storing data, the operator can directly set the expiration time for the stored data. That is, the data processing apparatus may determine the expiration time of the data directly from the information configured by the operator.
In the embodiments provided in the present application, the expiration time information has a chronological property, that is, the expiration time information may be arranged in a chronological order of the expiration time. For example, the expiration time information may be arranged according to a sequence of time, or may be arranged according to a sequence of time after the time, and the arrangement manner of the expiration time information is not limited in this embodiment of the present application.
And step 120, analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data.
Wherein i is an integer of 1 or more and N or less; n is the total number of data; the 1 st data is the data with the earliest expiration time in the database.
It can be understood that the data processing apparatus may analyze the expiration times of the data in the time management file from front to back according to the arrangement order of the expiration time information, and obtain the expiration time of each data through analysis.
It should be noted that the data processing apparatus does not continuously parse the data, but after the current data is parsed, it is determined whether to parse the next data according to the relationship between the expiration time of the current data and the current time, and the detailed analysis is specifically described below.
In the embodiment provided by the present application, the data processing apparatus may first analyze the 1 st expiration time information corresponding to the 1 st data to obtain the expiration time of the 1 st data. Here, the 1 st data may be data whose expiration time is the earliest in the database.
Further, the data processing device compares the expiration time of the 1 st data with the current time, and when the data processing device judges that the expiration time of the 1 st data is less than the current time, that is, the expiration time of the 1 st data is earlier than the current time, the data processing device continues to analyze the next data in the arrangement order, that is, the 2 nd expiration time information corresponding to the 2 nd data, to obtain the expiration time of the 2 nd data. Similarly, when the expiration time of the 2 nd data is judged to be less than or equal to the current time, the 3 rd expiration time information corresponding to the 3 rd data is continuously analyzed to obtain the expiration time of the 3 rd data.
It is understood that as long as the data processing apparatus detects that the expiration time of the data is earlier than the current time, indicating that it is highly likely that the expiration time of the data arranged in the next order is still earlier than the current time, it is necessary to continue parsing the expiration time information of the next data of the current data.
In a possible implementation manner, in order to increase the resolution efficiency of the expiration time information, a dedicated thread may be created for resolution when the expiration time information of the data is resolved.
Specifically, analyzing the expiration time information of the ith data to obtain the expiration time of the ith data includes:
and starting the target thread, and analyzing the expiration time information of the ith data through the target thread to obtain the expiration time of the ith data.
In the embodiment provided by the present application, the target thread may be a coroutine (gorutine), and the gorutine is a program execution object with a lower overhead than the thread, and may be widely used in a distributed system. goroutine may support concurrent forms. In the embodiment provided by the application, the data processing device can improve the system-in-time efficiency of the expiration time information through the concurrency characteristic of the goroutine.
And step 130, if the expiration time of the ith data is greater than or equal to the current time, determining the previous i data as expired data, and deleting the expired data from the database when the expiration deletion condition is met.
In the embodiment provided by the application, when detecting that the expiration time of the ith data is greater than or equal to the current time, the data processing device stops analyzing the expiration time information of the data next to the ith data, and directly determines that the ith data and the data arranged before the ith data are expired data.
It is to be understood that, in the time management file, the expiration time information of each data is arranged in chronological order, and when the data processing apparatus detects that the expiration time of the ith data is equal to or greater than the current time, it may be default that the expiration times of the data arranged after the ith data are all greater than the current time, that is, the expiration time is later than the current time. Therefore, the data processing apparatus can determine the first i data as the expired data without analyzing the data subsequent to the i-th data.
Further, the data processing apparatus deletes the above-determined expired data from the database upon detecting that the deletion expiration condition is satisfied.
In the embodiments provided in the present application, deleting expired data from the database may be implemented by:
step 1301, acquiring identification information corresponding to each overdue data in the overdue data from a time management file;
step 1302, based on the identification information of each expired data in the expired data, deleting the related data corresponding to the identification information from the database.
It is to be understood that the data processing apparatus acquires the identification information of each of the expired data after determining the expired data. Here, the identification information corresponding to the expired data may be obtained when the expiration time information of each data is parsed in step 120.
Further, the data processing device searches the relevant information of the expired data from the database according to the identification information of the expired data. And deleting all relevant information of the searched expired data from the database.
Therefore, the data processing method provided by the embodiment of the application can analyze the data according to the time sequence of the expiration time information to obtain the expiration time; when detecting that the expiration time of the ith data is greater than the current time, stopping analyzing the data, and directly determining the previous i data as expired data to delete; that is to say, only partial data in the database is analyzed, the expired data can be determined, and each data in the database does not need to be analyzed one by one, so that the expired data can be accurately positioned, and the processing efficiency when the expired data is deleted is improved.
Based on the above embodiments, in the embodiments of the present application, before the time management file is acquired, step 101 to step 103 may also be performed. Referring to the schematic flow chart of the data processing method shown in fig. 2, on the basis of fig. 1, the data processing method provided in the embodiment of the present application may further include the following steps:
step 101, acquiring target data, and determining identification information of the target data, a data type of the identification information and an expiration time;
102, encoding at least one of identification information of target data and a data type of the identification information with the expiration time to obtain expiration time information corresponding to the target data;
and 103, storing expiration time information corresponding to the target data in a time management file based on the expiration time of the target data.
It is understood that steps 101 to 103 are a way of storing the target data in the time management file, and may also be understood as a method of constructing the time management file.
In a possible implementation manner, the data processing apparatus may encode the identification information and the expiration time of the target data to obtain expiration time information corresponding to the encoded target data.
In another possible implementation manner, the data processing apparatus may further encode the identification information and the data type of the target data and the expiration time to obtain expiration time information corresponding to the encoded target data.
In another possible implementation manner, the data processing apparatus may further encode the identification information of the target data, the data type of the identification information of the target data, and the expiration time of the target data to obtain expiration time information corresponding to the encoded target data.
Further, the data processing device may store the expiration time information corresponding to the encoded target data in the time management file according to the order of the expiration times.
In the embodiment provided by the application, the target data can be data to be stored input by an operator, and can also be data to be stored input from other three-party platforms. The source of the target data is not limited in the embodiments of the present application.
In addition, in practical applications, one data may have a plurality of different fields. Table 1 exemplarily shows related information of one data in the database. Wherein, each column in table 1 is a data field, which indicates different attributes of the data.
TABLE 1
Identification information Number learning Name (I) Age (age) Sex Time of expiry
1 123456 Zhang three 15 For male 2020.01.25
In this embodiment, after acquiring one piece of data, the data processing apparatus may extract only the information of the two fields, i.e., the identification information and the expiration time of the piece of data, and encode the data of the two fields, i.e., the identification information and the expiration time, to obtain the expiration time information of the piece of data. In this way, when the data processing device analyzes the expiration time information of each piece of data, only the identification information and the expiration time are analyzed, so that the data analysis efficiency is improved, and the processing efficiency of deleting the expired data is improved.
Further, after obtaining the expiration time information of the data, saving the expiration time information of the data into a time management file. When the expiration time information of the data is stored, it is necessary to store the expiration time information in the order of arrangement of the data expiration times in the time management file.
In a possible implementation manner, step 103 may be implemented by means of steps 1031 and 1032. Referring to the flow chart of the data processing method shown in fig. 3, on the basis of fig. 2, step 103 includes the following steps:
step 1031, analyzing the expiration time information of each data in the time management file to obtain the expiration time corresponding to each data in the time management file;
step 1032, storing the target data between the Nth data and the Mth data based on the expiration time corresponding to the target data and the expiration time corresponding to each data in the time management file;
the expiration time of the Nth data is less than the corresponding expiration time of the target data; the expiration time of the Mth data is greater than the corresponding expiration time of the target data.
It can be understood that, when the target data is stored, the expiration time information of each data in the time management file may be analyzed to obtain the expiration time of each data in the time management file, so that the expiration time information of the target data is stored between the expiration time information of the nth data and the expiration time information of the mth data according to the time relationship between the expiration time of the target data and the expiration time of each data in the time management file, and thus, the expiration time of the nth data is less than the expiration time corresponding to the target data; the expiration time of the Mth data is greater than the corresponding expiration time of the target data. In this way, the expiration time information of each data in the time management information can be guaranteed to have time-sequentiality.
In one possible implementation, the storage format of the expiration time information is a Key-Value storage format. Specifically, the expiration time information may be key data in a key value pair.
Based on this, encoding the identification information and the expiration time of the target data in step 102 may be performed as follows:
and encoding at least one of the identification information and the data type of the identification information and the expiration time, and using the encoded identification information and the expiration time of the target data as key data of the expiration time information of the target data.
In one possible implementation, the data processing apparatus may encode the identification information and the expiration time of the target data, and use the encoded identification information and the expiration time of the target data as key data of the expiration time information of the target data.
In another possible implementation manner, the data processing apparatus may further encode the identification information and the data type of the target data and the expiration time, and use the data type and the expiration time of the encoded identification information of the target data as key data of the expiration time information of the target data.
In yet another possible implementation manner, the data processing apparatus may further encode the identification information of the target data, the data type of the identification information of the target data, and the expiration time of the target data, and use the encoded identification information of the target data, the data type of the identification information, and the expiration time as key data of the expiration time information of the target data.
In addition, in order to reduce the encoding complexity of the destination data, Value data corresponding to the destination data expiration time information may be set to a null Value, or the Value data corresponding to the destination data expiration time information may be stored in fewer bytes, for example, one byte is stored in the Value data. Therefore, the data coding efficiency is improved, and the processing efficiency of deleting the overdue data is further improved.
Based on the foregoing embodiments, in the data processing method provided in the embodiments of the present application, there are various ways to determine that the deletion expiration condition is satisfied, and two of them are described below: including mode one and mode two. The method comprises the following specific steps:
in a first mode
And if the time interval between the current time and the last time of deleting the expired data exceeds the preset interval duration, determining that the expired data deleting condition is met.
It will be appreciated that the data processing apparatus may delete expired data according to a period, or time interval, in which the expired data is deleted. For example, the data processing apparatus may clear stale data every 5 hours. Specifically, the data processing device may record the time for deleting the expired data each time, and monitor whether the current time and the last time for deleting the expired data exceed a preset interval duration in real time, where the preset duration may be a cleaning period.
And if the time interval between the current time and the time of deleting the expired data once exceeds the preset interval duration, deleting the expired data from the database. It should be noted that the preset interval duration may be preset by an operator of the database.
Mode two
And if the current time is the deleting time of the database, determining that the deleting overdue condition is met.
It is understood that the data processing apparatus may perform the deletion process on the expired data according to a specific deletion time. For example, the expired data is deleted at 1 point in the morning.
It should be noted that the deletion time of the database may be configured by the database operator, or may be dynamically determined by the data processing apparatus according to the data service operation state.
For example, the data processing apparatus may monitor a peak time period and a low peak time period of the data traffic process, and determine the deletion time of the database as the low peak time period of the data traffic process.
In one possible implementation, the data processing apparatus may receive delete configuration information; deleting the configuration information includes: the latest preset deleting moment and/or the latest preset interval duration; presetting deletion time indicating the time for deleting the expired data in the database; the preset interval duration represents the interval duration between the moments of deleting the expired data twice;
and updating the preset deleting moment and/or the preset interval duration based on the configuration information.
It can be understood that, in the data processing method provided by the embodiment of the present application, the data processing apparatus may receive the configuration of the deletion speed and the deletion time interval duration of the expired data from the database operator. The database operator can modify and control the deletion time of the expired data of the database. Therefore, the effect that the data processing device actively deletes the expired data of the database is achieved, the effect that the deletion time of the expired data is controllable is achieved, and the efficiency of deleting the expired data is improved.
In one possible implementation, the data processing apparatus may further receive deletion configuration information, where the deletion configuration information includes a data amount of each deletion of the expired data.
Correspondingly, the data processing device deletes the expired data from the database according to the data quantity of each time of deleting the expired data configured by the database operator in the process of deleting the expired data.
It will be appreciated that the database operator may control the amount of data that is deleted each time the stale data is deleted. That is, the data processing apparatus may receive a configuration of the database operator for the amount of data each time expired data is deleted. Therefore, the effect that the data processing device actively deletes the expired data of the database is achieved, the control over the data quantity of the expired data is achieved, and the efficiency of deleting the expired data is improved.
Based on the foregoing embodiments, an embodiment of the present application provides a data processing method, which includes the following steps:
step 110, acquiring a time management file; the time management file is used for recording expiration time information corresponding to each data in the database, and the expiration time information has time sequence.
In a possible implementation manner, the database referred to in the embodiments of the present application is a RocksDB database. RocksDB is an embedded K-V storage engine, i.e., RocksDB can be used to store Keys and Values, which can be byte streams of any size.
In practical application, rocksDB can support a column family mechanism; the RocksDB logically separates data through a column family mechanism, and a database operator can store fields of one type of data in one column family, so as to achieve the purpose of fast writing and query. In addition, each data in the RocksDB is created with a column family named "default" by default, and if a piece of data does not carry column family information, the column family is used by default.
In the embodiment provided by the application, the data processing device can store the data in the default column family, and in addition, the data processing device can add a new column family to each data in the database, and the new column family is specially used for storing the expiration time information of each data. The expiration time information comprises identification information of each piece of data, a data type of the identification information and an expiration time.
In practical applications, each column family has a separate storage file sstable. Therefore, the storage file sstable corresponding to the newly added column family in the embodiment of the present application is the time management file in step 110.
In one possible implementation, the coding rule of the Key in the newly added family is as follows:
key is time stamp at expiration time, data type of Key and Key value;
the timestamp is used for indicating the expiration time of the data; the data type of the key is used for indicating the deletion mode of the data, and the key value is used for indicating the identification information of the data; the value of each data key value is the same as the value of the key value stored in the default column family data.
The key types generally include different data types such as character strings, hashes, lists, sets, and the like; in the embodiment of the application, the data deleting modes corresponding to the keys of different data types are different.
In addition, Value in the new increased column group may store only one byte, or may take a Value of null, which is not limited in this embodiment of the present application.
In an embodiment provided by the present application, the expiration time information in the storage file corresponding to the newly added column family is sorted according to the expiration time of each data.
And step 120, analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data.
Wherein i is an integer of 1 or more and N or less; n is the total number of data; the 1 st data is the data with the earliest expiration time in the database.
In the embodiment provided by the application, the data processing device may start one route, and obtain a value (i.e., an expiration time) of an expiration timestamp field of each data by analyzing the expiration time information of each data in the time management file by the route.
It should be noted that the data processing apparatus may control the goroutine to analyze the expiration times of the data in the time management file from front to back according to the arrangement sequence of the expiration time information, and obtain the value (i.e., the expiration time) of the expiration timestamp field of each data through analysis.
It should be noted that the data is not continuously parsed by the goroutine, but after the current data is parsed, the data processing apparatus further needs to determine whether to parse the next data according to a relationship between a value (i.e., an expiration time) of an expiration timestamp field of the current data and the current timestamp (i.e., the current time).
And step 130, if the expiration time of the ith data is greater than or equal to the current time, determining the previous i data as expired data, and deleting the expired data from the database when the expiration deletion condition is met.
Here, the data processing apparatus needs to acquire the current time stamp before parsing the data in the time management file, and determine whether to parse the next data based on the current time stamp.
Specifically, when the data processing device detects that the value of the expiration timestamp field of the ith data is greater than or equal to the current timestamp, the data processing device stops analyzing the expiration time information of the next data of the ith data, and directly determines that the ith data and the data arranged before the ith data are expired data.
It is to be understood that, in the time management file, the expiration time information of each data is arranged in chronological order, and when the data processing apparatus detects that the expiration time of the ith data is equal to or greater than the current time, it may be default that the expiration times of the data arranged after the ith data are all greater than the current time, that is, the expiration time is later than the current time. Therefore, the data processing apparatus can determine the first i data as the expired data without analyzing the data subsequent to the i-th data.
It is to be understood that the data processing apparatus may acquire identification information of each of the expired data after determining the expired data. Here, the identification information corresponding to the expired data may be obtained when the expiration time information of each data is parsed in step 120.
Further, the data processing device searches the relevant information of the expired data from the database according to the identification information of the expired data. And deleting all relevant information of the searched expired data from the database.
Therefore, the data processing method provided by the embodiment of the application can analyze the data according to the time sequence of the expiration time information to obtain the expiration time; when detecting that the expiration time of the ith data is greater than the current time, stopping analyzing the data, and directly determining the previous i data as expired data to delete; that is to say, only partial data in the database is analyzed, the expired data can be determined, and each data in the database does not need to be analyzed one by one, so that the expired data can be accurately positioned, and the processing efficiency when the expired data is deleted is improved.
Based on the foregoing embodiments, an embodiment of the present application provides a data processing apparatus, as shown in fig. 4, the data processing apparatus includes:
an acquisition unit 41 for acquiring a time management file; the time management file is used for recording expiration time information corresponding to each data in a database, and the expiration time information has time sequence;
the analyzing unit 42 is configured to analyze the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain an expiration time of the ith data; wherein i is an integer of 1 or more and N or less; n is the total number of data; the 1 st data is the data with the earliest expiration time in the database;
and the processing unit 43 is configured to determine, if the expiration time of the ith data is greater than or equal to the current time, the i-th data as expired data, and delete the expired data from the database when a deletion expiration condition is met.
In an embodiment provided by the present application, the expiration time information includes: identification information of each data and expiration time of each data
In the embodiment provided by the present application, the obtaining unit 41 is further configured to obtain target data, and determine identification information of the target data, a data type of the identification information, and an expiration time; the type of the identification information is used for indicating a deleting mode of the target data;
the data processing apparatus further includes an encoding unit; wherein the content of the first and second substances,
the encoding unit is configured to encode at least one of the target identification information and the data type of the identification information with an expiration time to obtain expiration time information corresponding to the target data;
the processing unit 43 is further configured to store expiration time information corresponding to the target data in the time management file based on an expiration time of the target data.
In the embodiment provided by the present application, the analyzing unit 42 is configured to analyze expiration time information of each data in the time management file to obtain an expiration time corresponding to each data in the time management file;
the processing unit 43 is further configured to store the target data between the nth data and the mth data based on an expiration time corresponding to the target data and an expiration time corresponding to each data in the time management file; the expiration time of the Nth data is less than the corresponding expiration time of the target data; and the expiration time of the Mth data is greater than the corresponding expiration time of the target data.
In an embodiment provided by the present application, a storage format of the expiration time information is a key-value pair storage format;
the encoding unit is specifically configured to encode at least one of the identification information of the target data and the data type of the identification information with an expiration time, and use the encoded identification information of the target data and the expiration time as key data of expiration time information of the target data.
In the embodiment provided by the present application, the parsing unit 42 is further configured to start a target thread, and parse expiration time information of the ith data through the target thread to obtain an expiration time of the ith data.
In the embodiment provided by the present application, the processing unit 43 is further configured to determine that an expiration deletion condition is met if a time interval between the current time and the last time of deleting the expired data exceeds a preset interval duration;
or if the current time is the preset deleting time of the database, determining that the deleting overdue condition is met.
In the embodiment provided in the present application, the obtaining unit 41 is configured to receive deletion configuration information; the deleting configuration information includes: the latest preset deleting moment and/or the latest preset interval duration; the preset deleting moment indicates the moment of deleting the expired data in the database; the preset interval duration represents the interval duration between the moments of deleting the expired data twice;
the processing unit 43 is further configured to update a preset deleting time and/or a preset interval duration based on the configuration information.
In the embodiment provided by the present application, the obtaining unit 41 is further configured to receive deletion configuration information, where the deletion configuration information includes a data amount of the expired data deleted each time;
the processing unit 43 deletes the expired data from the database according to the data amount of each deletion of the expired data.
In an embodiment provided by the present application, the database is a RocksDB database.
The data processing device provided by the embodiment of the application can manage the file by acquiring the time; the time management file is used for recording expiration time information corresponding to each data in the database, and the expiration time information has time sequence; then, analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data; if the expiration time of the ith data is greater than or equal to the current time, determining the previous i data as expired data, and deleting the expired data from the database when the expiration deletion condition is met; therefore, the data can be analyzed according to the time sequence of the expiration time information to obtain the expiration time of the data; when detecting that the expiration time of the ith data is greater than the current time, stopping analyzing the data, and directly determining the previous i data as expired data to delete; that is to say, only partial data in the database is analyzed, the expired data can be determined, and each data in the database does not need to be analyzed one by one, so that the expired data can be accurately positioned, and the processing efficiency when the expired data is deleted is improved.
Based on the implementation of each unit in the data processing apparatus, in order to implement the data processing method provided in the embodiment of the present application, an embodiment of the present application further provides an electronic device, as shown in fig. 5, where the electronic device 50 includes: a processor 51 and a memory 52 configured to store computer programs capable of running on the processor,
wherein the processor 51 is configured to perform the method steps in the previous embodiments when running the computer program.
In practice, of course, the various components of the electronic device 50 are coupled together by a bus system 53, as shown in FIG. 5. It will be appreciated that the bus system 53 is used to enable communications among the components. The bus system 53 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 53 in fig. 5.
In an exemplary embodiment, the present application further provides a computer readable storage medium, such as a memory 52 including a computer program shown in fig. 5, which can be executed by a processor 51 of the electronic device 50 to implement the steps of the foregoing method. The computer-readable storage medium may be a Memory such as a magnetic random access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a flash Memory (FlashMemory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM).
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims (12)

1. A method of data processing, the method comprising:
acquiring a time management file; the time management file is used for recording expiration time information corresponding to each data in a database, and the expiration time information has time sequence;
analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data; wherein i is an integer of 1 or more and N or less; n is the total number of data; the 1 st data is the data with the earliest expiration time in the database;
if the expiration time of the ith data is greater than or equal to the current time, determining the previous i data as expired data, and deleting the expired data from the database when the expiration deletion condition is met.
2. The method of claim 1, wherein the expiration time information comprises: identification information of each data and an expiration time of each data.
3. The method according to claim 1 or 2, wherein before the acquiring the time management file, the method further comprises:
acquiring target data, and determining identification information of the target data, a data type of the identification information and an expiration time; the type of the identification information is used for indicating a deleting mode of the target data;
at least one of the identification information and the data type of the identification information is coded with the expiration time to obtain expiration time information corresponding to the target data;
and storing expiration time information corresponding to the target data in the time management file based on the expiration time of the target data.
4. The method according to claim 3, wherein the storing expiration time information corresponding to the target data in the time management file based on the expiration time of the target data comprises:
analyzing the expiration time information of each data in the time management file to obtain the expiration time corresponding to each data in the time management file;
storing the target data between the Nth data and the Mth data based on the expiration time corresponding to the target data and the expiration time corresponding to each data in the time management file; the expiration time of the Nth data is less than the corresponding expiration time of the target data; and the expiration time of the Mth data is greater than the corresponding expiration time of the target data.
5. The method of claim 3, wherein the expiration time information is stored in a key-value pair storage format;
the encoding at least one of the identification information and the data type of the identification information and the expiration time to obtain the expiration time information corresponding to the target data includes:
and encoding at least one of the identification information of the target data and the data type of the identification information with an expiration time, and using the encoded identification information and the expiration time of the target data as key data of the expiration time information of the target data.
6. The method according to claim 1 or 2, wherein the analyzing the expiration time information of the ith data to obtain the expiration time of the ith data comprises:
starting a target thread, and analyzing the expiration time information of the ith data through the target thread to obtain the expiration time of the ith data.
7. The method of claim 1 or 2, wherein the determining that a deletion expiration condition is met comprises:
if the time interval between the current time and the last time of deleting the expired data exceeds the preset interval duration, determining that the expired data deleting condition is met;
alternatively, the first and second electrodes may be,
and if the current moment is the preset deleting moment of the database, determining that the deleting overdue condition is met.
8. The method of claim 7, further comprising:
receiving deletion configuration information; the deleting configuration information includes: the latest preset deleting moment and/or the latest preset interval duration; the preset deleting moment indicates the moment of deleting the expired data in the database; the preset interval duration represents the interval duration between the moments of deleting the expired data twice;
and updating the preset deleting moment and/or the preset interval duration based on the configuration information.
9. The method according to claim 1 or 2, characterized in that the method further comprises:
receiving deletion configuration information, wherein the deletion configuration information comprises the data volume of the expired data deleted each time;
the deleting the expired data from the database comprises:
and deleting the expired data from the database according to the data volume of each time of deleting the expired data.
10. A data processing apparatus, characterized in that the apparatus comprises:
an acquisition unit configured to acquire a time management file; the time management file is used for recording expiration time information corresponding to each data in a database, and the expiration time information has time sequence;
the analysis unit is used for analyzing the ith expiration time information of the ith data based on the time sequence of the expiration time information to obtain the expiration time of the ith data; wherein i is an integer of 1 or more and N or less; n is the total number of data; the 1 st data is the data with the earliest expiration time in the database;
and the processing unit is used for determining the previous i data as expired data if the expiration time of the ith data is greater than or equal to the current time, and deleting the expired data from the database when the expiration deletion condition is met.
11. An electronic device comprising a processor and a memory for storing a computer program executable on the processor;
wherein the processor is adapted to perform the steps of the data processing method of any one of claims 1 to 9 when running the computer program.
12. A computer-readable storage medium, on which a computer program is stored which is executed by a processor for implementing the data processing method of any one of claims 1 to 9.
CN202010303369.5A 2020-04-17 2020-04-17 Data processing method and device and computer readable storage medium Pending CN111400298A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010303369.5A CN111400298A (en) 2020-04-17 2020-04-17 Data processing method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010303369.5A CN111400298A (en) 2020-04-17 2020-04-17 Data processing method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111400298A true CN111400298A (en) 2020-07-10

Family

ID=71435209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010303369.5A Pending CN111400298A (en) 2020-04-17 2020-04-17 Data processing method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111400298A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214503A (en) * 2020-10-10 2021-01-12 深圳壹账通智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113420334A (en) * 2021-07-21 2021-09-21 北京优奥创思科技发展有限公司 Data protection method for clearing expired information according to authorization deadline and fields
CN113535729A (en) * 2021-07-21 2021-10-22 浪潮云信息技术股份公司 Method for realizing row and column mixed storage based on RocksDB
CN113672610A (en) * 2021-10-21 2021-11-19 支付宝(杭州)信息技术有限公司 Graph database processing method and device
CN113691631A (en) * 2021-08-27 2021-11-23 绿盟科技集团股份有限公司 Data cleaning method and device and electronic equipment
CN115509465A (en) * 2022-11-21 2022-12-23 杭州字节方舟科技有限公司 Sector management method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653635A (en) * 2015-12-25 2016-06-08 北京奇虎科技有限公司 Database management method and apparatus
CN107451190A (en) * 2017-06-26 2017-12-08 北京五八信息技术有限公司 Can persistence non-relational database data processing method and device
CN108196792A (en) * 2017-12-29 2018-06-22 北京奇虎科技有限公司 Remove the method and device of stale data
CN108932236A (en) * 2017-05-22 2018-12-04 北京金山云网络技术有限公司 A kind of file management method, scratch file delet method and device
CN109408469A (en) * 2018-09-05 2019-03-01 中国平安人寿保险股份有限公司 Stale data document handling method, device, electronic device and storage medium
CN110659271A (en) * 2019-08-29 2020-01-07 福建天泉教育科技有限公司 Data deletion optimization method and terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653635A (en) * 2015-12-25 2016-06-08 北京奇虎科技有限公司 Database management method and apparatus
CN108932236A (en) * 2017-05-22 2018-12-04 北京金山云网络技术有限公司 A kind of file management method, scratch file delet method and device
CN107451190A (en) * 2017-06-26 2017-12-08 北京五八信息技术有限公司 Can persistence non-relational database data processing method and device
CN108196792A (en) * 2017-12-29 2018-06-22 北京奇虎科技有限公司 Remove the method and device of stale data
CN109408469A (en) * 2018-09-05 2019-03-01 中国平安人寿保险股份有限公司 Stale data document handling method, device, electronic device and storage medium
CN110659271A (en) * 2019-08-29 2020-01-07 福建天泉教育科技有限公司 Data deletion optimization method and terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
费雅洁等: "《ORACLE数据库实用技术 第2版》", 西安交通大学出版社, pages: 151 - 152 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214503A (en) * 2020-10-10 2021-01-12 深圳壹账通智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113420334A (en) * 2021-07-21 2021-09-21 北京优奥创思科技发展有限公司 Data protection method for clearing expired information according to authorization deadline and fields
CN113535729A (en) * 2021-07-21 2021-10-22 浪潮云信息技术股份公司 Method for realizing row and column mixed storage based on RocksDB
CN113691631A (en) * 2021-08-27 2021-11-23 绿盟科技集团股份有限公司 Data cleaning method and device and electronic equipment
CN113691631B (en) * 2021-08-27 2024-02-20 绿盟科技集团股份有限公司 Data cleaning method and device and electronic equipment
CN113672610A (en) * 2021-10-21 2021-11-19 支付宝(杭州)信息技术有限公司 Graph database processing method and device
CN113672610B (en) * 2021-10-21 2022-02-15 支付宝(杭州)信息技术有限公司 Graph database processing method and device
WO2023066221A1 (en) * 2021-10-21 2023-04-27 支付宝(杭州)信息技术有限公司 Graph database processing
CN115509465A (en) * 2022-11-21 2022-12-23 杭州字节方舟科技有限公司 Sector management method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111400298A (en) Data processing method and device and computer readable storage medium
US20180203942A1 (en) Method for reading and writing data and distributed storage system
US9830376B2 (en) Language tag management on international data storage
US10776345B2 (en) Efficiently updating a secondary index associated with a log-structured merge-tree database
CN106407360B (en) Data processing method and device
CN103220352A (en) Terminal, server, file storage system and file storage method
CN110515895B (en) Method and system for carrying out associated storage on data files in big data storage system
CN111258819A (en) Data acquisition method, device and system for MySQL database backup file
CN109446262B (en) Data aggregation method and device
CN106815223B (en) Mass picture management method and device
CN110019012B (en) Data preprocessing method, data preprocessing device and computer-readable storage medium
CN111966339B (en) Buried point parameter input method and device, computer equipment and storage medium
CN109753505B (en) Method and system for creating temporary storage unit in big data storage system
CN115374154A (en) Data verification method and device, electronic equipment and storage medium
KR20170137756A (en) Aggregation of large amounts of temporal data from multiple overlapping sources
CN107422991B (en) Storage strategy management system
CN114138565A (en) Method and system for accelerating database backup
CN109586970B (en) Resource allocation method, device and system
CN111563123A (en) Live warehouse metadata real-time synchronization method
CN112988776A (en) Method, device and equipment for updating text parsing rule and readable storage medium
CN111698330A (en) Data recovery method and device of storage cluster and server
CN101196920A (en) Document configuration managing method and device based on adduction relationship
CN110929207A (en) Data processing method, device and computer readable storage medium
CN112235332A (en) Read-write switching method and device for cluster
CN110489125B (en) File management method and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination