CN115237905A - Data management method and device, electronic equipment and computer readable storage medium - Google Patents

Data management method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN115237905A
CN115237905A CN202210583990.0A CN202210583990A CN115237905A CN 115237905 A CN115237905 A CN 115237905A CN 202210583990 A CN202210583990 A CN 202210583990A CN 115237905 A CN115237905 A CN 115237905A
Authority
CN
China
Prior art keywords
index
data
original data
sampling
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210583990.0A
Other languages
Chinese (zh)
Inventor
蒋力
唐蠡
曾琳铖曦
吴海英
蒋宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN202210583990.0A priority Critical patent/CN115237905A/en
Publication of CN115237905A publication Critical patent/CN115237905A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations

Abstract

The application discloses a data management method, which comprises the following steps: determining original data and a first index corresponding to the original data; filtering the original data to obtain sampling data, and determining a second index corresponding to the sampling data; and in response to the first index and the original data being deleted, determining a third index associated with the second index based on the first index, wherein the third index is matched with the first index, so as to output sample data corresponding to the third index when data query is carried out based on the first index. The application also discloses a data management device, an electronic device and a computer readable storage medium. Through the mode, the data vacancy of deleted original data is filled up by the sampling data, the operation is simple, the operation is convenient and fast, and the time range of the reserved monitoring data is expanded under the condition that the normal query and use of the data are not influenced.

Description

Data management method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of data storage, and in particular, to a data management method and apparatus, an electronic device, and a computer-readable storage medium.
Background
With the advent of the big data era, data monitoring is performed on related nodes, and analysis of monitored data is an important means for judging and solving problems. However, the monitoring data is generally real-time and uninterrupted data collection, and the amount of data retained is very large. Therefore, if the monitoring data is stored for a while and relevant operations are not performed, the data is continuously deleted to reserve a storage space for the subsequent new monitoring data.
However, if a part of the monitoring data is deleted, the monitoring data cannot be viewed subsequently, which is not favorable for data query. Therefore, how to store and query the monitoring data becomes one of the hot problems of research today.
Disclosure of Invention
The present application mainly aims to provide a data management method, an apparatus, an electronic device, and a computer-readable storage medium, which can solve the technical problem of expanding the time range of reserved monitoring data without affecting the normal query and use of the data.
The application provides a data management method, which comprises the following steps: determining original data and a first index corresponding to the original data; filtering the original data to obtain sampling data, and determining a second index corresponding to the sampling data; and in response to the first index and the original data being deleted, determining a third index associated with the second index based on the first index, wherein the third index is matched with the first index, so as to output sampling data corresponding to the third index when data query is carried out based on the first index.
The application provides a data management device, the device includes: the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for determining original data and a first index corresponding to the original data; and the data processing unit is used for filtering the original data to obtain sampling data, determining a second index corresponding to the sampling data, determining a third index associated with the second index based on the first index in response to the deletion of the first index and the original data, wherein the third index is matched with the first index, and outputting the sampling data corresponding to the third index when data query is performed based on the first index.
The present application provides an electronic device comprising a processor and a computer storage medium coupled to the processor, wherein the computer storage medium has a computer program stored therein, and the processor is configured to execute the computer program to implement the method as set forth in the first technical aspect.
The present application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program that can be executed by a processor to implement the data management method described above.
The beneficial effect of this application is: different from the prior art, the method comprises the steps of firstly determining original data, then carrying out filtering sampling on the original data to obtain corresponding sampling data, wherein the original data corresponds to a first index, and the sampling data corresponds to a second index. And after the corresponding original data and the first index are deleted, adding the associated third index into the second index as an alias. The third index is matched with the first index, so that the first index and the third index can be obtained based on the same query operation, and corresponding data is obtained. When the original data is deleted, the original data can be replaced during data query only by adding the alias to the sampled data, so that the data vacancy of the deleted original data is filled with the sampled data, the operation is simple, the method is convenient and quick, excessive computing resources are not occupied, the time range of the reserved monitoring data is expanded, and the normal query use of the data is not influenced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a data management scheme;
FIG. 2 is a schematic diagram of another data management scheme;
FIG. 3 is a schematic diagram of another embodiment of a data management method provided in the present application;
FIG. 4 is a schematic flow chart diagram illustrating a first embodiment of the data management method of the present application;
FIG. 5 is a schematic flow chart diagram of a second embodiment of the data management method of the present application;
FIG. 6 is a schematic flow chart diagram of a third embodiment of the data management method of the present application;
FIG. 7 is a schematic structural diagram of an embodiment of a data management device according to the present application;
fig. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
The terms "first", "second", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Prior to the introduction of the present application, a brief introduction will be made to the related art.
ES, elasticissearch is a Lucene-based search server. The method is a distributed, high-expansion and high-real-time search and data analysis engine. It is capable of surviving an extension of hundreds of service nodes and supports PB-level structured or unstructured data. It can conveniently make a large amount of data have the capability of searching, analyzing and exploring. The horizontal flexibility of the elastic search is fully utilized, so that the data becomes more valuable in a production environment. The ES determines the data by the data index. "reindex" is a command of ES to copy documents in one index into another index, often for migration of data. An alias is also a command of the ES, and is used to set an alias for an index, and the ES automatically converts the alias into the index name when searching, so as to obtain the corresponding data.
At present, considering that the storage space is limited, in order to expand the monitoring time range of the monitoring data as much as possible, that is, to store the monitoring data for a longer time as much as possible, the following two technical solutions are generally adopted: one is to sample the monitoring data when the monitoring data is stored, and to separately reserve the sampled data; the other method is to build an index for the stored monitoring data according to the period, namely to process the deleted monitoring data, reserve a sampling part and delete a non-sampling part to reduce the amount of the stored data.
As shown in fig. 1, in the first technical solution, when acquiring monitoring data, the monitoring data needs to be analyzed to determine whether sampling and retaining are needed. This requires changes to the system architecture of the storage mechanism, increasing the complexity of the technical architecture. When data is written, original data can correspond to the index of the original data, and sampled data can be separately stored corresponding to the sampled index, so that the sampled data is distinguished from the original data, and the sampled data cannot be influenced when the original data is deleted. After data is written, if the original data is not deleted, the original data portion and the sample data are the same, which increases the storage load of the storage device and causes waste of storage space.
As shown in fig. 2, in the second technical solution, an index of the original data is created for the data when the data is written into the storage device, and then the index is judged according to time to determine the original data that needs to be deleted. After determining the original data needing to be deleted, sampling the original data, deleting the data which is not needed, and corresponding the reserved sampling data to the index of the original data. The scheme directly operates on the original index when deleting, and occupies a large amount of resources of a Central Processing Unit (CPU) and an Input/Output (IO), so that when monitoring data deletion is performed on the storage device, situations of stagnation, query failure, slow query speed and the like may occur, and the use of a user is affected.
In order to solve the above problem, the present application proposes a data management method, as shown in fig. 3, in this technical solution, after data is written into a storage device, the data corresponds to an index of original data, then the original data is sampled in step 1 to obtain sampled data, and data storage is performed through a reindex command. And 2, after deleting the original data exceeding the retention time in the step 2, adding alias to the corresponding sampling data, so that when the original data is inquired through indexing, the sampling data corresponding to the original data deleted before can appear together. Sampling is carried out before the original data are deleted, alias addition is carried out on the sampled data through alias commands after the original data are deleted, so that repeated sampled data cannot occupy storage space for a long time, alias addition can be simply realized through the alias commands, and the normal query use of the data cannot be influenced while the query range of the monitored data is expanded. The implementation can be realized by referring to the following embodiments.
Referring to fig. 4, fig. 4 is a schematic flowchart of a first embodiment of the data management method of the present application. The data management method can be executed by electronic equipment, and the electronic equipment can be terminal equipment, such as a mobile phone, a computer, intelligent interaction equipment and the like; alternatively, the electronic device may also be a server, such as a standalone physical server, or a cloud server performing cloud computing. The electronic device may be disposed in a storage device. The data management method shown in fig. 4 includes the steps of:
s11: the method comprises the steps of determining original data and a first index corresponding to the original data.
The raw data may be any piece of raw data stored in the storage device, and the raw data refers to the most real monitoring data that has not been subjected to sampling processing. In order to distinguish the original data stored in the storage device, an index is generated for each original data, and as the name suggests, the index has the functions of guiding and indicating.
Assume that the index corresponding to the original data is set as the first index. The first index may be determined by the target class index tag and the corresponding raw data storage time. The target class index may also be an index name prefix. For example, assuming that the target class index is marked as index and the storage time of the original data is 2022 years, 3 months and 2 days, the first index corresponding to the original data can be denoted as index _20220302.
S12: and filtering the original data to obtain sampling data, and determining a second index corresponding to the sampling data.
After the original data are determined, if the original data are data needing sampling, the original data are filtered to obtain sampling data. A corresponding second index is determined for the sampled data, the second index being different from the first index, the second index further including a specific identifier. The specific identifier is used to indicate the sample data so as to distinguish the data by the index when searching the data. For example, with a specific identification of sample, the original data stored for 3/2/2022 is sampled, the data of 1 hour thereof is retained, and the second index may be represented as sample1h _ index _20220303 in combination with the data time.
S13: in response to the first index and the original data being deleted, a third index associated with the second index is determined based on the first index.
In general, the electronic device sets a data retention time for the monitoring data, and data exceeding the data retention time is deleted to make up the storage space. When the original data is deleted, the corresponding first index is also deleted. In order to expand the monitoring time range of the monitoring data, the original data of which the storage time is about to exceed the data retention time is sampled in advance, so that the original data is replaced by the sampled data after the original data is deleted. And after the original data with the overlong retention time is deleted, in order to still be capable of inquiring the data information of the corresponding time period during the conventional inquiry, adding an alias to the second index of the sampling data corresponding to the original data. Here the _ identities command can be used. The command associates the third index with the second index to enable jumping to sample data corresponding to the second index when querying the third index.
The third index includes a target class index flag and a specific identification for indicating the sampled data. The third index is matched with the first index, so that the data which is required to be inquired can be inquired through the same inquiry mode when the inquiry is carried out. There are many ways to match the third index with the first index, for example, all use the same target class index as a prefix. For example, the first index corresponding to the original data is index _20220303, and the index of the sampled data after sampling the original data is sample1h _ index _20220303. After the original data is deleted, the second index is aliased and the alias matches the first index, then the third index may be index _20220303 \usample1h. Thus, when the query is performed with the index as the head, the first index, the third index, and the corresponding data are queried. The target index mark can be placed at other positions, and the third index can be inquired only when the first index is inquired according to the corresponding rule.
According to the embodiment, the original data are determined firstly, then the original data are filtered and sampled to obtain corresponding sampled data, the original data correspond to the first index, and the sampled data correspond to the second index. And after the corresponding original data and the first index are deleted, adding the associated third index to the second index as an alias. The third index is matched with the first index, so that the first index and the third index can be obtained based on the same query operation, and corresponding data is obtained. When the original data is deleted, the original data can be replaced during data query only by adding the alias to the sampled data, so that the data vacancy of the deleted original data is filled up by the sampled data, the operation is simple, convenience and rapidness are realized, and the problems that a large amount of equipment performance is occupied and the use of a normal data query function is influenced are avoided while the monitoring time range of the monitoring data in the storage device is expanded.
Referring to fig. 5, fig. 5 is a flowchart illustrating a data management method according to a second embodiment of the present application. The data management method can be executed by an electronic device, and the electronic device can be a terminal device, such as a mobile phone, a computer, an intelligent interaction device and the like; alternatively, the electronic device may also be a server, such as a standalone physical server, or a cloud server performing cloud computing. The electronic device may be deployed in a storage system. Which comprises the following steps:
s21: the original data and a first index corresponding to the original data are determined.
The raw data may be any piece of raw data stored in the storage device, and the raw data is the most real monitoring data without being subjected to sampling processing. To distinguish data, a piece of original data corresponds to a unique data index. The first index may be determined by the target class index tag and the corresponding raw data storage time. The target class index may also be an index name prefix. For example, assuming that the target class index is marked as index and the storage time of the original data is 2022 years, 3 months and 2 days, the first index corresponding to the original data can be denoted as index _20220302.
S22: and judging whether the original data meets the condition to be deleted.
After the original data is determined, in order to still be able to query the partial related data of the corresponding time period after the original data is deleted, it is necessary to sample the partial related data before the partial related data is deleted. That is, only after the original data meets the condition to be deleted, the original data is sampled, so that all the original data are not subjected to indiscriminate sampling processing, and the consumption of sampling resources can be saved.
Based on this, after the original data is determined in step S21, it is first determined whether the original data satisfies the condition to be deleted. Wherein, the conditions to be deleted include any one or more of the following conditions: the storage time length of the original data exceeds a preset first time length, or a deletion command of the original data is received.
The preset first time length is a preset time length used for indicating the storage time length of one piece of original data in the storage device, and the preset first time length is smaller than a preset retention time length of the original data. For example, the preset retention time of the original data is ten days, that is, the time length calculated from the time when the original data is stored in the storage device, the preset first time length is set to nine days. And when the storage time length of the original data is greater than the preset first time length, the original data is determined to be the original data to be deleted. And in the interval time with the difference between the preset first time length and the original data retention time length, the original data to be deleted can be sampled. The interval time is the processing time reserved for the sampling process.
If a deleting command for the original data is received, the original data can be deleted temporarily without deletion, processing time is reserved for the sampling process, and the original data is deleted after the sampling is finished.
The conditions to be deleted are set for the original data, the processing time is reserved for the sampling process of the original data, and the data sampling process is set before the original data are deleted, so that the long-time occupation of the sampling data on the storage space is avoided, and the condition that the sampling is not finished when the data are deleted is also avoided.
S23: and judging whether the original data is data needing sampling processing.
And after the conditions to be deleted are met, further judging whether the original data need to be sampled or not. If the original data does not need to be sampled and processed, the subsequent original data can be directly deleted when the storage time of the subsequent original data exceeds the preset retention time of the original data.
The sequence of the determinations of step S22 and step S23 is only schematically illustrated here, and this embodiment does not limit this.
S24: and if the original data is full and is data needing sampling processing, filtering the original data to obtain sampling data, and determining a second index corresponding to the sampling data.
After the original data needing sampling is determined, the original data is filtered to obtain sampling data, and a second index is determined for the sampling data. The second index includes a target class index tag and a specific identification. The specific identifier is used to indicate the sampled data.
Regardless of whether the original data needs to be sampled or not, the electronic device regularly performs data deletion work to ensure that a part of free storage space can be used for storing new monitoring data.
S25: and acquiring a plurality of indexes comprising the index marks of the target class, and traversing the acquired indexes.
The first index comprises a target class index tag, and the second class index and the third class index comprise a target class specific tag and a specific identification. By querying the tag or identification, the corresponding index can be queried. Through the target class index, the first index, the second index and the third index can be found out. In order to find the original data that needs to be deleted, a distinction needs to be made between these indexes.
S26: it is determined whether the index includes a particular identifier.
It is continuously determined whether there is an index including a specific identifier among indexes including the index mark of the target class. These indices correspond to sample data and subsequent deletion operations cannot be performed.
And if the index of the current traversal comprises the specific identification, determining the index of the current traversal as a second index. Before the storage device does not have data to perform oversampling data to fill the vacancy of the deleted original data, if the index comprises the specific identifier, the index is judged to be a second index, the second index corresponds to the processed sampling data, and deletion operation cannot be performed.
Further, when the original data is subjected to the operation of filling the deleted original data with the data, if the step is judged to be yes, the indexes are third indexes corresponding to the second index and other sampling data, the indexes correspond to the processed sampling data, and the deletion operation cannot be performed.
If the currently traversed index does not comprise the specific identification, the currently traversed index is determined to be a first index, the first index corresponds to the original data, and the original data meeting the conditions are deleted in the subsequent process.
S27: and judging whether the storage time length of the original data exceeds a preset second time length.
And judging whether the storage time of the original data exceeds a preset second time. The preset second time length is a time length for judging whether to delete the original data. The preset second time may be a preset time period from when the original data is determined to be deleted to when the original data is formally deleted, and the preset second time period is longer than the preset first time period. For example, if the preset original data retention time is ten days, the second preset time is set to ten days. And the preset first time length is smaller than the preset second time length so as to reserve time for the data sampling process. And when the storage time length of the original data exceeds a preset second time length, deleting the data.
S28: and deleting the original data.
And deleting the original data, and correspondingly deleting the first index of the original data. And after the deletion, adding a third index to replace the original data by the sampled data.
In another embodiment, if a delete command for the original data is received, the original data is determined to need to be sampled, and after the original data is sampled, the original data is directly deleted without determining whether the storage time of the original data exceeds a preset second time.
S29: a third index associated with the second index is determined based on the first index.
After the original data of which the sampling is partially completed is deleted, in order to query the related data of the time period corresponding to the original data in the subsequent data query process of the original data, a third index serving as an alias is added to the sampling data corresponding to the second index. The third index is matched with the first index, so that the third index can be queried through the same query mode when being queried. There are many ways of matching, e.g., all prefixing with the same target class index tag index. For example, the first index corresponding to the original data is index _20220303, and the index of the sampled data after sampling the original data is sample1h _ index _20220303. After the original data is deleted, the second index is aliased and the alias matches the first index, then the third index may be index _20220303 \usample1h. Thus, when a query is performed with the index as the beginning, the first index and the third index and corresponding data appear. The target class index mark can be placed at other positions, and only when the first index is queried according to the corresponding rule, the third index can appear.
The alias is directly added, so that when original data corresponding to the first index is queried, sample data of deleted original data corresponding to the second index can also be queried, and because the third index serving as the alias is associated with the second index, the data of the second index can be jumped to when the third index is queried. By adding the alias, the monitoring time range of the monitoring data can be enlarged during query without modifying the original index mode, the name of the sampling data is modified when the original data is deleted, the original data is directly deleted, the operation is simple, the operation is convenient and fast, a large amount of computing resources cannot be occupied, and the normal query use of the data is influenced.
Referring to fig. 6, fig. 6 is a schematic flowchart of a data management method according to a third embodiment of the present application. The embodiment is further extended to the sampling process of the original data. The data management method can be executed by electronic equipment, and the electronic equipment can be terminal equipment such as a mobile phone, a computer, intelligent interaction equipment and the like; alternatively, the electronic device may also be a server, such as a standalone physical server, or a cloud server performing cloud computing. The electronic device may be deployed in a storage system. Which comprises the following steps:
s31: and determining an execution condition of a sampling algorithm for filtering the original data, and obtaining a sampling amount according to the execution condition.
The raw data may be filtered using a down-sampling algorithm, for example, 24 whole time points may be sampled for the day's monitored data. Further, the time of sampling, data size, etc. may be defined. For example, when sampling 24 full time points, a sampling time is set to one minute, and assuming that the amount of data per minute is three million pieces, the amount of sampled data for the day is seven thousand two million pieces.
S32: the sampling rate is set based on the amount of samples and the system operating conditions.
After the data volume of the sampled data is determined, the data is sampled according to the operation condition of the storage device, and the system operation condition may be the operation condition of the storage system for sampling the monitoring data, or may include other operation conditions of the storage system under the condition of using the storage device. The operation condition may be a size rate of data input and output, and the like. The sampling rate includes a query for data and a write rate of the sampled data. If the sample size is seven thousand or two million pieces and the sample is set to one thousand pieces per second of inquiry writing, the writing of all sample data and corresponding indexes can be completed in about twenty hours. If the second predetermined time and the first predetermined time of the raw data differ by one day, it is sufficient to complete the sampling of the raw data the previous day. And after the original data are deleted in the next day, the sampling data can be replaced. One thousand query writes per second have little influence on the whole storage system, and the normal use of the data query function can be ensured.
Further, different write polling rates may be set according to the storage device usage period. For example, the memory device has less use from zero to three points, and the performance thereof has more use space, and more use from nine to twelve points, and the performance thereof is mostly occupied, so that it is possible to set a larger sampling write rate from zero to three points, and a smaller sampling write rate from nine to twelve points. Therefore, normal data storage and use of the storage device are guaranteed, the performance of the equipment is utilized to the maximum extent, and data sampling work is completed as soon as possible.
The sampling rate of the sampling data is reasonably arranged by determining the size of the data volume of the sampling data and according to the performance, the actual operation condition and the like of the storage device, so that the data sampling process can be completed as soon as possible while the normal data storage use of the storage device is ensured.
S33: and filtering the original data based on the sampling rate to obtain sampled data.
And processing the original data to be deleted, which needs to be sampled, through the set sampling rate. The commonly used sampling rate is set to be a time interval which can finish the data sampling within the difference between the preset first time length and the preset second time length. Therefore, the storage device cannot occupy the storage space by redundant sampling data in most time, and the sampling of the storage device is not completed when the data needing to be sampled is deleted.
Fig. 7 is a schematic structural diagram of a data management device according to an embodiment of the present application.
The data management apparatus includes an acquisition unit 110 and a data processing unit 120.
The obtaining unit 110 is configured to query data, and is capable of determining original data and a first index corresponding to the original data.
The data processing unit 120 is configured to perform a sampling step, filter the raw data to obtain sampled data, and determine a second index of the sampled data. When the first index and the corresponding original data are deleted, a third index associated with the second index can be determined based on the first index, and the third index is matched with the first index, so that when a data query is carried out based on the first index, the sample data corresponding to the third index can be output.
In an embodiment, the data processing unit 120 is further configured to, when filtering the original data to obtain the sampled data, determine whether the original data meets a condition to be deleted, where the condition to be deleted includes one or more of a storage time length of the original data exceeding a preset first time length or a received deletion command for the original data, determine whether the original data is data that needs to be sampled, and filter the original data to obtain the sampled data if the original data meets the condition to be deleted and needs to be sampled.
In an embodiment, the data processing unit 120 is further configured to delete the first index and the original data corresponding to the first index before determining the third index associated with the second index based on the first index. Deleting the first index and the original data corresponding to the first index comprises acquiring a plurality of indexes including a target index mark and traversing the acquired plurality of indexes in sequence; if the currently traversed index comprises the specific identification, determining the currently traversed index as a second index, and not executing deletion operation; and if the currently traversed index does not comprise the specific identifier, determining that the currently traversed index is a first index, and deleting the first index and the original data corresponding to the first index after the storage time length of the original data corresponding to the first index exceeds a preset second time length.
In an embodiment, the data processing unit 120 is further configured to, in the process of filtering the raw data to obtain the sampled data, determine an execution condition of a sampling algorithm for filtering the raw data, and obtain a sampling amount according to the execution condition; setting a sampling rate based on the sampling amount and the system operation condition; and filtering the original data based on the sampling rate to obtain sampled data.
In the above embodiment, the first index includes a target class index tag, and the second index and the third index each include a target class index tag and a specific identifier. The specific identifier is used to indicate the sampled data. The preset second time length is greater than the preset first time length. In the embodiments of the data management device, the same or corresponding specific technical features and steps of the method used may refer to the description of the above embodiments, and are not described herein again. The data processing unit 120 is capable of implementing any of the embodiments of the data management methods described above and possible combinations.
The data management device can realize the data management method, original data are determined firstly, then the original data are filtered and sampled, corresponding sampled data are obtained, the original data correspond to the first index, and the sampled data correspond to the second index. And after the corresponding original data and the first index are deleted, adding the associated third index to the second index as an alias. The third index is matched with the first index, so that the first index and the third index can be obtained based on the same query operation, and corresponding data is obtained. When the original data is deleted, the original data can be replaced during data query only by adding the alias to the sampled data, so that the data vacancy of the deleted original data is filled with the sampled data, the operation is simple, the convenience and the rapidness are realized, and the problems that a large amount of calculation performance is occupied and the normal query function of the data is influenced are avoided while the monitoring time range of the monitored data is expanded.
As shown in fig. 8, fig. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
The electronic device includes a processor 210, a computer storage medium 220.
The processor 210 controls the operation of the electronic device, and the processor 210 may also be referred to as a Central Processing Unit (CPU). The processor 210 may be an integrated circuit chip having the processing capability of signal sequences. The processor 210 may also be a general purpose processor, a digital signal sequence processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The computer storage medium 220 stores instructions and computer programs needed for the processor 210 to operate.
The processor 210 is configured to execute the instructions to implement the methods provided by any of the above embodiments and possible combinations of the aforementioned data management methods of the present application.
The present application provides a computer-readable storage medium, which stores a computer program, when the computer program is executed, the computer program implements the method provided by any one of the above embodiments and possible combinations of the data management method of the present application.
The Memory may include a medium that can store program instructions, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, or may also be a server that stores the program instructions, and the server may send the stored program instructions to other devices for operation, or may self-operate the stored program instructions.
In summary, according to the present application, the original data is determined first, and then the original data is filtered and sampled to obtain the corresponding sampled data, where the original data corresponds to the first index, and the sampled data corresponds to the second index. And after the corresponding original data and the first index are deleted, adding the associated third index to the second index as an alias. The third index is matched with the first index, so that the first index and the third index can be obtained based on the same query operation, and corresponding data is obtained. When the original data is deleted, the original data can be replaced during data query only by adding the alias to the sampled data, so that the data vacancy of the deleted original data is filled up by the sampled data, the operation is simple, convenience and rapidness are realized, and the problems that a large amount of calculation performance is occupied and the normal query function of the data is affected are avoided while the monitoring time range of the monitored data is expanded. Furthermore, the data sampling process is set before the original data are deleted, so that the long-time occupation of the sampling data on the storage space is avoided, compared with the conventional technical scheme of expanding the monitoring time range of the monitoring data, the problem of repeated storage of the same data is solved, and the storage pressure of the storage device is reduced.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated units in the other embodiments described above may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an example of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for managing data, the method comprising:
determining original data and a first index corresponding to the original data;
filtering the original data to obtain sampling data, and determining a second index corresponding to the sampling data;
in response to the first index and the original data being deleted, determining a third index associated with the second index based on the first index, the third index matching the first index, so as to output the sample data corresponding to the third index when performing data query based on the first index.
2. The method of claim 1, wherein the filtering the raw data to obtain sampled data comprises:
judging whether the original data meet a condition to be deleted; the conditions to be deleted include any one or more of the following conditions:
the storage time length of the original data exceeds a preset first time length;
or receiving a deleting command for the original data;
and if the original data meet the condition to be deleted, executing the step of filtering the original data to obtain the sampling data.
3. The method of claim 2, further comprising:
judging whether the original data is data needing sampling processing or not;
if so, filtering the original data to obtain the sampling data;
and if not, deleting the original data and the first index corresponding to the original data.
4. The method of claim 1, wherein the first index comprises a target class index tag, and wherein the second index and the third index each comprise the target class index tag and the particular identifier;
and the data corresponding to the index comprising the specific identification is sampling data.
5. The method of claim 4, wherein prior to the determining a third index associated with the second index based on the first index, the method further comprises:
deleting the first index and the original data corresponding to the first index;
the deleting the first index and the original data corresponding to the first index includes:
acquiring a plurality of indexes including the target index marks, and traversing the acquired indexes in sequence;
if the currently traversed index comprises the specific identifier, determining the currently traversed index as the second index, and not executing deletion operation;
if the index of the current traversal does not include the specific identifier, determining that the index of the current traversal is the first index, and deleting the first index and the original data corresponding to the first index.
6. The method of claim 5, wherein the deleting the first index and the original data corresponding to the first index comprises:
judging whether the storage time length of the original data exceeds a preset second time length, wherein the preset second time length is greater than the preset first time length;
and if so, deleting the original data and the corresponding first index.
7. The method of claim 1, wherein the filtering the raw data to obtain sampled data comprises:
determining an execution condition of a sampling algorithm for filtering the original data, and obtaining a sampling amount according to the execution condition;
setting a sampling rate based on the sampling amount and the system operating condition;
and filtering the original data based on the sampling rate to obtain the sampling data.
8. A data management apparatus, characterized in that the data processing apparatus comprises:
the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for determining original data and a first index corresponding to the original data;
the data processing unit is used for filtering the original data to obtain sampling data, determining a second index corresponding to the sampling data, and in response to the first index and the original data being deleted, determining a third index associated with the second index based on the first index, wherein the third index is matched with the first index, so that when data query is performed based on the first index, the sampling data corresponding to the third index is output.
9. An electronic device, comprising a processor and a computer storage medium coupled to the processor, wherein a computer program is stored in the computer storage medium, and wherein the processor is configured to execute the computer program to implement the method according to any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.
CN202210583990.0A 2022-05-25 2022-05-25 Data management method and device, electronic equipment and computer readable storage medium Pending CN115237905A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210583990.0A CN115237905A (en) 2022-05-25 2022-05-25 Data management method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210583990.0A CN115237905A (en) 2022-05-25 2022-05-25 Data management method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115237905A true CN115237905A (en) 2022-10-25

Family

ID=83667862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210583990.0A Pending CN115237905A (en) 2022-05-25 2022-05-25 Data management method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115237905A (en)

Similar Documents

Publication Publication Date Title
CN111143158A (en) Monitoring data real-time storage method and system, electronic equipment and storage medium
CN111400361A (en) Data real-time storage method and device, computer equipment and storage medium
CN111125298A (en) Method, equipment and storage medium for reconstructing NTFS file directory tree
CN111061802B (en) Power data management processing method, device and storage medium
CN107391769B (en) Index query method and device
CN113094374A (en) Distributed storage and retrieval method and device and computer equipment
US9213759B2 (en) System, apparatus, and method for executing a query including boolean and conditional expressions
CN109189343B (en) Metadata disk-dropping method, device, equipment and computer-readable storage medium
CN115878027A (en) Storage object processing method and device, terminal and storage medium
CN112764663A (en) Space management method, device and system of cloud storage space, electronic equipment and computer readable storage medium
CN108038253B (en) Log query processing method and device
CN115203148A (en) Method and device for modifying file
CN112231531A (en) Data display method, equipment and medium based on openstb
CN115237905A (en) Data management method and device, electronic equipment and computer readable storage medium
CN109101259B (en) Updating method and device of OSDMap cache container and terminal
CN108073709B (en) Data recording operation method, device, equipment and storage medium
CN116820323A (en) Data storage method, device, electronic equipment and computer readable storage medium
CN114090673A (en) Data processing method, equipment and storage medium for multiple data sources
CN114416731A (en) Data storage method, data reading method, data storage device, electronic device and medium
CN112632266B (en) Data writing method and device, computer equipment and readable storage medium
CN111782588A (en) File reading method, device, equipment and medium
CN112416699A (en) Index data collection method and system
CN105786596A (en) Method for acquiring object information from memory image file in 64-bit Windows 10 operating system
CN111625853B (en) Snapshot processing method, device and equipment and readable storage medium
CN112328327B (en) Configuration partition arrangement method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination