CN110321348B - Data processing method and device and computer equipment - Google Patents

Data processing method and device and computer equipment Download PDF

Info

Publication number
CN110321348B
CN110321348B CN201910480279.0A CN201910480279A CN110321348B CN 110321348 B CN110321348 B CN 110321348B CN 201910480279 A CN201910480279 A CN 201910480279A CN 110321348 B CN110321348 B CN 110321348B
Authority
CN
China
Prior art keywords
data
target storage
instance
determining
cooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910480279.0A
Other languages
Chinese (zh)
Other versions
CN110321348A (en
Inventor
王懂道
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910480279.0A priority Critical patent/CN110321348B/en
Publication of CN110321348A publication Critical patent/CN110321348A/en
Application granted granted Critical
Publication of CN110321348B publication Critical patent/CN110321348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data processing method, a device and computer equipment, wherein the data processing method comprises the following steps: acquiring a target storage instance; determining portrait information of the data record in the target storage instance; determining feature data corresponding to the target storage instance based on the portrait information recorded in the data in the target storage instance; the characteristic data characterizes the data cooling suitability of the target storage instance; and inputting the characteristic data into a cooling detection model to perform cooling detection, and obtaining a detection result of whether the target storage example is suitable for data cooling. The invention makes a full and high-accuracy judgment on whether the storage instance can be subjected to data cooling, so that the subsequent data cooling based on the judgment result can obviously reduce the storage cost and ensure the access quality of the data.

Description

Data processing method and device and computer equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, and computer device.
Background
With the development of internet technology, service data is rapidly growing, and data storage and access amount are huge, so that effective management of data in a database is required. For example, some data in the database is not accessed frequently, but is accessed occasionally, if the data is stored in confusion with the data which needs to be accessed frequently by the computing node, the storage space is wasted, the storage cost is increased, and the data query efficiency is greatly reduced.
In the related art, an effective way to manage data in a database is to separate cold data that is not accessed frequently but is accessed infrequently from hot data that needs to be accessed frequently by a computing node, so as to be stored in different storage media, respectively. The process of transferring data from a high performance, expensive storage medium to a low performance, inexpensive storage medium is known as data cooling. Effective data cooling can reduce storage cost and ensure data query efficiency.
Therefore, a reliable technical solution is needed to accurately determine whether the data in the database is suitable for data cooling.
Disclosure of Invention
In order to solve the problems in the prior art, the embodiment of the invention provides a data processing method, a data processing device and computer equipment. The technical scheme is as follows:
in one aspect, a data processing method is provided, the method including:
acquiring a target storage instance;
determining portrait information of the data record in the target storage instance;
determining feature data corresponding to the target storage instance based on the portrait information recorded in the data in the target storage instance; the characteristic data characterizes the data cooling suitability of the target storage instance;
And inputting the characteristic data into a cooling detection model to perform cooling detection, and obtaining a detection result of whether the target storage example is suitable for data cooling.
In another aspect, there is provided a data processing apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a target storage instance;
the first determining module is used for determining the portrait information of the data record in the target storage instance;
the second determining module is used for determining the characteristic data corresponding to the target storage instance based on the portrait information of the data record in the target storage instance; the characteristic data characterizes the data cooling suitability of the target storage instance;
and the cooling detection module is used for inputting the characteristic data into a cooling detection model to carry out cooling detection so as to obtain a detection result of whether the target storage instance is suitable for data cooling.
Optionally, the feature data determined by the second determining model includes at least one of:
the ratio of unaccessed data, the change rate of the heat of unaccessed data, the cooling benefit of data, the expiration ratio of data and the access density of examples.
Optionally, the first determining module includes:
the third determining module is used for determining the last access time of the data record according to the read-write time stamp information corresponding to the data record;
A fourth determining module, configured to determine a data size of the data record, and an expiration time of the data record;
and a fifth determining module, configured to use the last access time, the data size and the expiration time as portrait information of the data record.
Optionally, the second determining module includes:
a first time difference value determining module, configured to determine a first time difference value between a last access time of the data record in the target storage instance and a current time;
the first judging module is used for judging whether the first time difference value meets a preset time condition or not;
the non-access data determining module is used for determining that the data is recorded as non-access data when the first time difference value meets a preset time condition;
a first number determination module configured to determine a first number of the non-accessed data in the target storage instance and a total number of data records in the target storage instance;
a first calculation module, configured to divide the first number by the total number of data records in the target storage instance to obtain the non-accessed data proportion;
and a sixth determining module, configured to use the ratio of the non-accessed data as feature data corresponding to the target storage instance.
Optionally, the second determining module includes:
the second acquisition module is used for acquiring the ratio of the unaccessed data in a preset historical time period;
a seventh determining module, configured to determine a maximum unaccessed data proportion and a minimum unaccessed data proportion in the unaccessed data proportions;
the second calculation module is used for determining the difference value between the maximum unaccessed data proportion and the minimum unaccessed data proportion to obtain the unaccessed data heat change rate;
and an eighth determining module, configured to take the heat rate of the unaccessed data as characteristic data of the target storage instance.
Optionally, the second determining module includes:
a ninth determining module, configured to determine, according to a data size of each data record in the target storage instance, a total data capacity corresponding to the target storage instance;
a total savings value determination module configured to determine a total savings value for a thermal data storage medium based on the total data capacity, the non-accessed data proportion, a total storage amount of the thermal data storage medium group, and a device cost of the thermal data storage medium group;
an incremental total value determination module configured to determine an incremental total value of a cold data storage medium according to the transparent access amount of the target storage instance, the upper support limit of the cold data storage medium group, the total data capacity, the non-accessed data proportion, and the equipment cost of the cold data storage medium group;
A third calculation module, configured to determine a difference between the total saved value of the hot data storage medium and the total increased value of the cold data storage medium, to obtain the data cooling benefit;
and a tenth determining module, configured to take the data cooling benefit as characteristic data of the target storage instance.
Optionally, the second determining module includes:
a second time difference value determining module, configured to determine a second time difference value between an expiration time of the data record and a current time in the target storage instance;
the second judging module is used for judging whether the second time difference value is smaller than a preset time threshold value or not;
the expiration data determining module is used for determining that the data record is expiration data when the second time difference value is smaller than a preset time threshold value;
a second number determination module for determining a second number of the expiration data in the target storage instance and a total number of data records in the target storage instance;
a fourth calculation module, configured to divide the second number by the total number of data records in the target storage instance to obtain the data expiration proportion;
and an eleventh determining module, configured to use the data expiration proportion as feature data corresponding to the target storage instance.
Optionally, the second determining module includes:
a twelfth determining module, configured to determine, according to a data size of each data record in the target storage instance, a total data capacity corresponding to the target storage instance;
the access amount determining module per second is used for determining the access amount per second corresponding to the target storage instance according to the third number of the data records in the target storage instance accessed in unit time;
a fifth calculation module, configured to divide the access amount per second by the total data capacity to obtain an instance access density;
and a thirteenth determining module, configured to use the instance access density as characteristic data of the target storage instance.
Optionally, the apparatus further includes:
the third acquisition module is used for acquiring a sample storage instance;
the extraction module is used for extracting sample characteristic data corresponding to the sample storage instance;
the label determining module is used for determining labels corresponding to the sample storage examples according to the sample characteristic data and the corresponding characteristic data threshold values; wherein, the label corresponding to the positive sample storage instance is suitable for data cooling; labels corresponding to negative sample storage instances are unsuitable for data cooling;
The training module is used for carrying out data cooling detection training by using a preset machine learning model based on the sample characteristic data and the corresponding label, and adjusting model parameters of the preset machine learning model in the data cooling detection training until the label output by the preset machine learning is matched with the label input by the preset machine learning; and taking the machine learning model corresponding to the current model parameters as the cooling detection model.
In another aspect, a computer device is provided that includes a processor and a memory having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the data processing method described above.
In another aspect, a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set loaded and executed by a processor to implement a data processing method as described above is provided.
According to the embodiment of the invention, the image information of the data record in the target storage instance is determined by acquiring the target storage instance, the characteristic data which is corresponding to the target storage instance and characterizes the data cooling suitability of the target storage instance is determined based on the image information of the data record, the characteristic data is input into the cooling detection model for cooling detection, and the detection result of whether the target storage instance is suitable for cooling data is obtained, so that whether the storage instance in the database can carry out cooling data is comprehensively and accurately judged, and the subsequent cooling of the data based on the judgment result can obviously reduce the storage cost and ensure the access quality of the data.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for determining feature data corresponding to the target storage instance based on portrait information of a data record in the target storage instance according to an embodiment of the present invention;
FIG. 3A is a schematic diagram of a preset time condition according to an embodiment of the present invention;
FIG. 3B is another schematic diagram of a preset time condition provided by an embodiment of the present invention;
FIG. 4 is a flowchart of another method for determining feature data corresponding to the target storage instance based on the portrait information of the data record in the target storage instance according to an embodiment of the present invention;
FIG. 5 is a flowchart of another method for determining feature data corresponding to the target storage instance based on the portrait information of the data record in the target storage instance according to an embodiment of the present invention;
FIG. 6 is a flowchart of another method for determining feature data corresponding to the target storage instance based on the portrait information of the data record in the target storage instance according to an embodiment of the present invention;
FIG. 7 is a flowchart of another method for determining feature data corresponding to the target storage instance based on the portrait information of the data record in the target storage instance according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The data processing method provided by the embodiment of the invention can be applied to a distributed database system, wherein the distributed database system can be a relational database system such as MySQL, oracle and the like, and can also be a non-relational database such as Redis, memcache, mongoDb and the like.
Referring to fig. 1, a flowchart of a data processing method according to an embodiment of the present invention is shown, and it should be noted that the present specification provides the steps of the method according to the embodiment or the flowchart, but may include more or less steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. In actual system or product execution, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment). As shown in fig. 1, the method may include:
s101, acquiring a target storage instance.
In the embodiment of the present specification, the target storage instance may be a data table of related services stored in one storage node of the distributed database system, where the data table includes a plurality of data records, and each data record may include a plurality of fields. In practical application, the distributed database system also stores a read-write time stamp of each data record, a data size of the data record and an expiration time. The read-write time stamp comprises a read time stamp and a write time stamp; the expiration time of a data record may be understood as a set lifetime of the data record, which may be automatically deleted when this set lifetime is reached. Specifically, fields can be respectively allocated for the read-write time stamp, the data size and the expiration time in a plurality of fields of the data record to carry out corresponding records.
In practical applications, the storage instance of the relevant service in one storage node of the distributed database system may include tens of millions of data records, and in order to improve the efficiency of data processing, in this embodiment of the present disclosure, a preset proportion of data records may be extracted from the storage instance of the relevant service, and the preset proportion of data records may be used as the target storage instance. The preset proportion can be set according to actual requirements, for example, the preset proportion can be set to be 1%, if the storage instance of the related service is 1 million data records, the extraction is performed according to the preset proportion of 1%, and the obtained target storage instance is only 10 ten thousand data records. It will be appreciated that the extracted data record should be characterized as an integral storage instance, and thus, a corresponding extraction rule may be set for the extraction, for example, it may be a random extraction or the like.
S103, determining the portrait information of the data record in the target storage example.
In the present description, the portrait information of a data record may be used to describe the characteristics of the data record. Specifically, determining the portrait information of the data record in the target storage instance may include the following steps:
And determining the last access time of the data record according to the read-write time stamp information corresponding to the data record. Specifically, a read operation and a write operation closest to the current time can be obtained according to the read timestamp and the write timestamp of the data record, the read timestamp of the read operation closest to the current time is compared with the write timestamp of the write operation closest to the current time, and the timestamp of the operation closest to the current time is taken as the latest access time of the data record.
A data size of the data record is determined, as well as an expiration time of the data record.
And taking the last access time, the data size and the expiration time as portrait information of the data record.
In practical applications, the portrait information of the data record is not limited to the last access time, the data size, and the expiration time, and other data that can describe the characteristics of the data record may be set as the portrait information of the data record according to practical requirements.
S105, determining feature data corresponding to the target storage instance based on the portrait information of the data record in the target storage instance.
In the embodiment of the specification, the characteristic data corresponding to the target storage instance characterizes the data cooling suitability of the target storage instance, that is, whether the target storage instance is suitable for data cooling processing can be determined through analysis of the characteristic data. Data cooling refers to a process of transferring data from a high-performance expensive storage medium, which may be a memory, to a low-performance inexpensive storage medium, which may be a solid state disk.
It should be noted that, in the embodiment of the present disclosure, the initial storage location in the target storage instance to be detected is a high-performance and expensive storage medium, which is generally used to store hot data that needs to be frequently accessed by the computing node, and a low-performance and inexpensive storage medium is used to store cold data that is not frequently accessed but is accessed occasionally.
In the embodiment of the present disclosure, the feature data may include one feature data or may include a plurality of different feature data, which may be adjusted according to the accuracy requirement for the cooling detection in practical applications. The detection result may include that the target storage instance is suitable for data cooling or that the target storage instance is unsuitable for data cooling.
In some embodiments, the feature data corresponding to the target storage instance includes at least one of: the ratio of unaccessed data, the rate of change of the heat of unaccessed data, the cooling benefit of data, the ratio of outdated data and the access density of instances.
In a specific embodiment, the feature data may include an unaccessed data proportion, and accordingly, the determining, based on the portrait information of the data record in the target storage instance, the feature data corresponding to the target storage instance may adopt a method shown in fig. 2, and the method may include:
s201, determining a first time difference value between the last access time and the current time of the data record in the target storage instance.
Specifically, a difference value between the last access time and the current time corresponding to each data record in the target storage instance is calculated, and a first time difference value is obtained.
S203, judging whether the first time difference value meets a preset time condition, and executing step S205 when the judgment result is yes.
In this embodiment of the present disclosure, the preset time condition may be set according to a definition of cold data in practical application, which may include more than a preset time threshold, or may be included in a preset time interval. For example, the preset time condition may be set to more than 7 days or more than 3 days, or the like; the preset time condition may also be set to be between 1 day and 3 days, or between 3 days and 7 days, or the like.
S205, determining that the data record is not accessed data.
Specifically, when the preset time condition is that a preset time threshold is exceeded, the data record that the first time difference exceeds the preset time threshold may be determined as the non-accessed data. When the preset time condition is within a preset time interval, the data record of the first time difference value within the preset time interval can be determined as unvisited data. It can be seen that the unvisited data in the embodiments of the present description corresponds to a specific preset time condition.
In the time axis shown in fig. 3A, the preset time condition is more than 7 days, when the last access time of the data record is at the a position, it is obvious that the first time difference between the last access time and the current time of the data record exceeds 7 days, and the data record can be determined as unvisited data; and when the last access time of the data record is at the position B, it is obvious that the first time difference between the last access time and the current time of the data record does not exceed 7 days, i.e. the preset time condition is not met, and the data record is not unvisited data.
In the time axis shown in fig. 3B, the preset time condition is that within 7 days outside 3 days, when the last access time of the data record in the drawing is at the a position, it is obvious that the first time difference between the last access time of the data record and the current time exceeds 7 days, that is, the preset time condition is not satisfied, and the data record is not unvisited data; and when the last access time of the data record is at the B position, it is obvious that the first time difference between the last access time and the current time of the data record is within 7 days outside 3 days, that is, the preset time condition is satisfied, and the data record can be determined as the unaccessed data.
S207, determining a first quantity of the non-accessed data in the target storage instance and a total quantity of data records in the target storage instance.
Specifically, the first number is obtained by counting the number of all the above-mentioned non-accessed data in the target storage instance, and the total number is obtained by counting the number of all the data records in the target storage instance.
S209, dividing the first number by the total number of data records in the target storage instance to obtain the non-accessed data proportion.
S211, taking the ratio of the non-accessed data as the characteristic data corresponding to the target storage instance.
In another specific embodiment, the feature data may further include a heat rate of change of the unaccessed data, and accordingly, the determining, based on the portrait information of the data record in the target storage instance, the feature data corresponding to the target storage instance may employ a method shown in fig. 4, and the method may include:
s401, acquiring the ratio of the non-accessed data in a preset historical time period.
In the embodiment of the present specification, the preset history period may be set to one month, two months, or the like according to actual demands. According to the difference of the current time of the non-access data proportion, a plurality of non-access data proportions can be corresponding in the preset historical time period, and correspondingly, the plurality of non-access data proportions in the preset historical time period are obtained.
S403, determining the maximum non-access data proportion and the minimum non-access data proportion in the non-access data proportion.
Specifically, a maximum non-access data proportion and a minimum non-access data proportion are selected from a plurality of non-access data proportions corresponding to a preset historical time period.
S405, determining a difference value between the maximum non-access data proportion and the minimum non-access data proportion, and obtaining the non-access data heat change rate.
Specifically, a difference between the maximum non-accessed data proportion and the minimum non-accessed data proportion is calculated, and the difference is used as the non-accessed data heat change rate. Generally, the larger the non-access data heat rate is, the more obvious the periodic activity of the storage instance is, when the non-access data heat rate exceeds a certain value, the corresponding storage instance is not suitable for data cooling processing, otherwise, the quality problem is easily caused by the fact that a large amount of cold data is accessed at the moment of activity.
S407, taking the non-accessed data heat change rate as the characteristic data of the target storage instance.
In another specific embodiment, the feature data may include a data cooling benefit, and accordingly, the determining, based on the portrait information of the data record in the target storage instance, the feature data corresponding to the target storage instance may use a method shown in fig. 5, and the method may include:
S501, determining the total data capacity corresponding to the target storage instance according to the data size of each data record in the target storage instance.
Specifically, the total data capacity corresponding to the target storage instance can be obtained by adding the data sizes corresponding to all the data records in the target storage instance.
S503, determining a saving total value of the thermal data storage medium according to the total data capacity, the non-access data proportion, the total storage capacity of the thermal data storage medium group and the equipment cost of the thermal data storage medium group.
In practical application, each storage node of the distributed database system generally comprises a master device and a slave device, and the slave device can synchronously backup data in the master device, wherein the slave device can be one device or a plurality of devices. In this embodiment of the present disclosure, the thermal data storage medium group is a high-performance and expensive storage medium in a device group formed by a master device and a slave device corresponding to a storage node, such as a memory corresponding to the device group. The equipment cost of the thermal data storage medium group in the embodiments of the present specification may be determined according to a specific configuration of the equipment cost in practical applications, and specifically, the equipment cost of the thermal data storage medium group may include purchase cost, maintenance cost, site leasing cost for placing equipment, and the like.
In particular, the total savings value for a thermal data storage medium may be calculated according to the following formula:
total value saved for thermal data storage media = ratio of unaccessed data total data capacity/total storage of thermal data storage media group device cost.
S505, determining an increased total value of the cold data storage medium according to the transparent access quantity of the target storage instance, the support upper limit of the cold data storage medium group, the total data capacity, the non-access data proportion and the equipment cost of the cold data storage medium group.
In the embodiment of the present specification, the transparent access amount refers to the total number of accesses that cannot be found in the hot data storage medium and that can be transparent to the cold data storage medium, and may be determined based on the product of the empty Query proportion of the target storage instance and the access amount Per Second (english acronym: QPS). The empty inquiry refers to searching for an inexistent data record, the corresponding access request generally returns an inexistent error prompt, and the empty inquiry proportion can be determined according to the ratio of the inexistent error quantity returned by the access in unit time to the total access quantity in unit time; the amount of accesses per second may be determined based on the number of data records in the target storage instance accessed per unit time.
The upper support limit for a group of cold data storage media refers to the maximum amount of access per second that can be allowed by the group of cold data storage media. The cold data storage medium group is a low-performance and low-cost storage medium in a device group formed by a master device and a slave device corresponding to the storage node, such as a solid state disk corresponding to the device group. The equipment cost of the cold data storage medium group may be determined according to a specific constitution of the equipment cost in practical use, and specifically, the equipment cost of the cold data storage medium group may include purchase cost, maintenance cost, and site lease cost of the equipment to be placed, etc.
Specifically, the total incremental value of the cold data storage medium may be calculated using the following formula:
total value of increase of cold data storage medium = max (empty query proportion x total data capacity/total storage of cold data storage medium group) for the access per second/upper support limit of cold data storage medium group, no access data proportion x total data capacity/total storage of cold data storage medium group).
S507, determining a difference value between the total saving value of the hot data storage medium and the total increasing value of the cold data storage medium, and obtaining the data cooling benefit.
The larger the data cooling gain of the general target storage example is, the higher the cost for cooling saving is, if the data cooling gain is negative, the cost cannot be saved for cooling the data of the target storage example, and the method is not suitable for cooling the data.
S509, taking the data cooling benefit as characteristic data of the target storage instance.
In another specific embodiment, the feature data may include a data expiration proportion, and accordingly, the determining, based on the portrait information of the data record in the target storage instance, the feature data corresponding to the target storage instance may use a method shown in fig. 6, and the method may include:
s601, determining a second time difference value between the expiration time of the data record and the current time in the target storage instance.
In this embodiment of the present disclosure, the second time difference between the expiration time and the current time of the data record indicates the remaining lifetime of the data record.
S603, judging whether the second time difference value is smaller than a preset time threshold value, and executing step S605 when the judgment result is yes.
In the embodiment of the present specification, the preset time threshold may be set according to actual requirements, for example, may be set to 2 days, 3 days, or 7 days, or the like. In practical applications, the preset time threshold is generally set in combination with a corresponding preset time condition when the unaccessed data is determined. For example, if the preset time condition is that the preset time threshold is exceeded, the preset time threshold herein may be set as the preset time threshold in the preset time condition. If the preset time condition is within the preset time interval, the preset time threshold may be set as the preset time interval in the preset time condition.
S605 determines that the data record is expired data.
In this embodiment of the present disclosure, if the second time difference is smaller than the preset time threshold, it indicates that the remaining lifetime of the data record cannot meet the requirement, the data record is determined to be the expired data, and the data cooling of the expired data is effectively an ineffective data transfer, because the remaining lifetime of the expired data is very short, for example, only 1 day, i.e., the data record is automatically deleted after the remaining one day, which obviously is not suitable for performing the data cooling, and may reduce the service quality.
S607, determining a second amount of the expiration data in the target storage instance and a total amount of data records in the target storage instance.
S609, dividing the second number by the total number of data records in the target storage instance to obtain the data expiration proportion.
Specifically, a ratio of the second number of expired data to the total number of data records in the target storage instance is calculated, and the ratio is taken as the data expiration ratio.
S611, taking the data expiration proportion as the characteristic data corresponding to the target storage instance.
In another specific embodiment, the feature data may include an instance access density, and accordingly, the determining, based on the portrait information of the data record in the target storage instance, the feature data corresponding to the target storage instance may employ a method shown in fig. 7, and the method may include:
S701, determining the total data capacity corresponding to the target storage instance according to the data size of each data record in the target storage instance.
S703, according to the third number of data records in the target storage instance accessed in unit time, the access amount of the target storage instance is really corresponding to each second.
S705, dividing the access per second by the total data capacity to obtain an instance access density.
Specifically, when calculating the instance access density, the total data capacity may be converted into the unit G, and the actual obtained instance access density may be the single G instance access density. Generally, the higher the access density, the higher the access amount per unit memory amount.
And S707, taking the instance access density as the characteristic data of the target storage instance.
It should be noted that, the feature data corresponding to the target storage instance is not limited to the above-mentioned ratio of non-accessed data, the change rate of heat of the non-accessed data, the cooling gain of the data, the expiration ratio of the data and the access density of the instance, and may also include the data involved in the calculation process of the above-mentioned feature data, or other data obtained by combining the above-mentioned feature data.
S107, inputting the characteristic data into a cooling detection model for cooling detection, and obtaining a detection result of whether the target storage instance is suitable for data cooling.
In practical applications, before the feature data is input into the cooling detection model, normalization processing may be performed on the feature data to convert each feature data into a space between (0, 1). Specific normalization processing methods can include min-max normalization, Z-score normalization, and other normalization methods.
In the embodiment of the present disclosure, the cooling detection model is a machine learning model that is trained in advance and can detect whether the storage instance is suitable for data cooling. Specifically, the characteristic data can be input into a cooling detection model, and the detection result of the corresponding target storage instance is output through analysis processing of the characteristic data by the cooling detection model.
It should be appreciated that the detection results for the target storage instance include that the target storage instance is suitable for data cooling or that the target storage instance is unsuitable for data cooling.
It should be understood that when the detection result of the cooling detection is that the target storage instance is suitable for data cooling, a corresponding data cooling processing strategy may be invoked according to the detection result to perform data cooling processing on the target storage instance.
In some embodiments, the data processing method may further include a training step of the cooling detection model, and the training step may specifically include:
Acquiring a sample storage instance; extracting sample characteristic data corresponding to a sample storage instance; determining a label corresponding to the sample storage instance according to the sample characteristic data and the corresponding characteristic data threshold value; wherein, the label corresponding to the positive sample storage instance is suitable for data cooling; labels corresponding to negative sample storage instances are unsuitable for data cooling; based on sample feature data and corresponding labels, performing data cooling detection training by using a preset machine learning model, and adjusting model parameters of the preset machine learning model in the data cooling detection training until the labels output by the preset machine learning are matched with the labels input by the preset machine learning; and taking the machine learning model corresponding to the current model parameters as a cooling detection model in the embodiment of the specification.
The number of sample storage instances may be set according to actual requirements, for example, 300 storage instances may be set, or more or less.
The sample feature data may be determined based on the portrait information of the data record in the sample storage instance, and the specific determination method may refer to the content of step S105 in the method embodiment shown in fig. 1, which is not described herein.
The feature data threshold is a threshold corresponding to each feature data, and can be adjusted according to different user requirements. For example, when the sample feature data includes an unaccessed data proportion, if the user has a high requirement on the service quality, a higher unaccessed data proportion threshold may be set appropriately, for example, 80% may be set; if the user has no special requirements for the quality of service, a relatively low ratio threshold of unvisited data may be set, for example, 50%.
When determining the label corresponding to the sample storage instance according to the sample characteristic data and the corresponding characteristic data threshold value, comparing the sample characteristic data with the corresponding characteristic data threshold value, if the sample characteristic data exceeds the characteristic data threshold value, considering that the corresponding sample storage instance is suitable for data cooling, wherein the sample storage instance is a positive sample storage instance, and labeling the sample storage instance with the label of the positive sample storage instance, such as label 1; if the sample characteristic data does not exceed the characteristic data threshold value, the corresponding sample storage instance is considered unsuitable for data cooling, the sample storage instance is a negative sample storage instance, and the sample storage instance is labeled with a label of the negative sample storage instance, such as a label 0. The sample characteristic data and the corresponding labels form training data of the cooling detection model.
In practical applications, the sample characteristic data may include a plurality of sample characteristic data, for example, including the ratio of unaccessed data, the rate of change of heat of unaccessed data, the cooling gain of data, the expiration ratio of data and the access density of instances, where each sample characteristic data corresponds to a characteristic data threshold. When judging the positive and negative sample storage examples, a judgment basis can be set according to the detection accuracy requirement of the actual cooling detection model, and the judgment basis can be: when the relation between the more than N sample characteristic data and the corresponding characteristic data threshold value meets the requirement, determining that the corresponding sample storage instance is a positive sample storage instance; otherwise, the corresponding sample storage instance is determined to be a negative sample storage instance. Wherein N is an integer and can be set according to the detection accuracy requirement of the cooling detection model, for example, when higher detection accuracy is required, N can be set as the sample characteristic data of the whole number; when there is no requirement for detection accuracy, N may be set to half the number of sample feature data, or the like.
Therefore, different positive and negative sample storage examples can be obtained for the same sample storage example through the characteristic data threshold value and the sample characteristic data set according to the user requirement, so that the cooling detection model trained based on the different positive and negative sample storage examples can be more in line with the self business requirement of the user, and the cooling detection of the data of the storage example based on the cooling detection model is more flexible and personalized.
Before the data cooling detection training is performed by using a preset machine learning model based on sample feature data and corresponding labels, training data can be divided into a training set and a testing set, the data in the training set are used for training the machine model, and the data in the testing set are used for testing the accuracy of the machine model obtained by training. The specific dividing ratio may be set according to actual requirements, and in a specific embodiment, training data with a 20% ratio may be selected as a training set, and remaining training data may be used as a test set.
In this embodiment of the present disclosure, the preset machine learning model may be a two-class machine learning model for implementing classification, where the two-class machine learning model may be a machine learning model using a logistic regression algorithm, a machine learning model using a random forest algorithm, or a machine learning model using other classification algorithms, such as a support vector machine, and a decision tree. Accordingly, model parameters of the preset machine learning model may include regularization coefficients of logistic regression, the number of subtrees of the random forest, the number of leaf nodes minimum samples, and so on.
In some embodiments, the method further includes a testing step of training the obtained cooling detection model by using a testing set, specifically, sample characteristic data in the testing set can be input into the trained cooling detection model, and a corresponding cooling detection result, that is, whether a corresponding sample storage instance is suitable for data cooling or not, is output; and then determining the values of the following four parameters according to the labels of the sample storage examples corresponding to the sample characteristic data and the cooling detection result: TP (True Positive, positive to Positive number of instances of the Positive instance), FN (False Positive, positive to Negative number of instances of the Negative instance), FP (False Positive, negative to Positive number of instances of the Positive instance), and TN (True Negative, negative to Negative number of instances of the Negative instance). And carrying out score statistics on the accuracy rate and recall rate of the cooling detection model based on the values of the four parameters and the confusion matrix.
The accuracy is used for measuring the accurate proportion of all sample storage examples to be detected, and the accuracy= (tp+tn)/(tp+fn+fp+tn). Recall is used to indicate the proportion of positive sample storage instances detected correctly to total positive sample storage instances detected, recall = TP/(tp+fn).
In a specific embodiment, 300 sample storage examples are selected, wherein 20% of the sample storage examples are used as training sets, 80% of the sample storage examples are used as test sets, sample characteristic data comprise unvisited data proportion, unvisited data heat change rate, data cooling gain, data expiration proportion and example access density, and the accuracy of a cooling detection model obtained by training by using a logistic regression algorithm as a classification algorithm of a machine learning model reaches 0.92, and the recall rate reaches 0.91. Therefore, the cooling detection model in the embodiment of the invention not only can be adjusted in a personalized way according to actual needs, but also has high cooling detection accuracy and recall rate, and is very suitable for the data cooling detection of a storage example.
According to the technical scheme, the image information of the data record in the target storage instance is determined by acquiring the target storage instance, the characteristic data corresponding to the target storage instance and representing the data cooling suitability of the target storage instance is determined based on the image information of the data record, and the characteristic data is input into a cooling detection model for cooling detection to obtain a detection result of whether the target storage instance is suitable for cooling data, so that whether the storage instance in the database can be subjected to cooling data is comprehensively and accurately judged, the storage cost can be obviously reduced and the access quality of the data can be ensured due to cooling data based on the judgment result.
In addition, by setting the time parameter in the characteristic data determining process, the embodiment of the invention can customize how long to sink (i.e. transfer) the data without access, thereby avoiding frequent conversion of the data between the hot data storage medium and the cold data storage medium, and having controllable access quality and great flexibility.
The embodiment of the present invention also provides a data processing apparatus corresponding to the data processing method provided in the above embodiments, and since the data processing apparatus provided in the embodiment of the present invention corresponds to the data processing method provided in the above embodiments, implementation of the foregoing data processing method is also applicable to the data processing apparatus provided in the embodiment, and will not be described in detail in the embodiment.
Referring to fig. 8, a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention is shown, and as shown in fig. 8, the apparatus may include:
a first obtaining module 810, configured to obtain a target storage instance;
a first determining module 820, configured to determine portrait information of the data record in the target storage instance;
a second determining module 830, configured to determine feature data corresponding to the target storage instance based on the portrait information of the data record in the target storage instance; the characteristic data characterizes the data cooling suitability of the target storage instance;
And the cooling detection module 840 is configured to input the feature data into a cooling detection model to perform cooling detection, so as to obtain a detection result of whether the target storage instance is suitable for data cooling.
Optionally, the feature data determined by the second determining model 830 includes at least one of the following:
the ratio of unaccessed data, the change rate of the heat of unaccessed data, the cooling benefit of data, the expiration ratio of data and the access density of examples.
Optionally, the first determining module 820 includes:
the third determining module is used for determining the last access time of the data record according to the read-write time stamp information corresponding to the data record;
a fourth determining module, configured to determine a data size of the data record, and an expiration time of the data record;
and a fifth determining module, configured to use the last access time, the data size and the expiration time as portrait information of the data record.
In a specific embodiment, the second determining module 830 may include:
a first time difference value determining module, configured to determine a first time difference value between a last access time of the data record in the target storage instance and a current time;
The first judging module is used for judging whether the first time difference value meets a preset time condition or not;
the non-access data determining module is used for determining that the data is recorded as non-access data when the first time difference value meets a preset time condition;
a first number determination module configured to determine a first number of the non-accessed data in the target storage instance and a total number of data records in the target storage instance;
a first calculation module, configured to divide the first number by the total number of data records in the target storage instance to obtain the non-accessed data proportion;
and a sixth determining module, configured to use the ratio of the non-accessed data as feature data corresponding to the target storage instance.
In another specific embodiment, the second determining module 830 may include:
the second acquisition module is used for acquiring the ratio of the unaccessed data in a preset historical time period;
a seventh determining module, configured to determine a maximum unaccessed data proportion and a minimum unaccessed data proportion in the unaccessed data proportions;
the second calculation module is used for determining the difference value between the maximum unaccessed data proportion and the minimum unaccessed data proportion to obtain the unaccessed data heat change rate;
And an eighth determining module, configured to take the heat rate of the unaccessed data as characteristic data of the target storage instance.
In another specific embodiment, the second determining module 830 may include:
a ninth determining module, configured to determine, according to a data size of each data record in the target storage instance, a total data capacity corresponding to the target storage instance;
a total savings value determination module configured to determine a total savings value for a thermal data storage medium based on the total data capacity, the non-accessed data proportion, a total storage amount of the thermal data storage medium group, and a device cost of the thermal data storage medium group;
an incremental total value determination module configured to determine an incremental total value of a cold data storage medium according to the transparent access amount of the target storage instance, the upper support limit of the cold data storage medium group, the total data capacity, the non-accessed data proportion, and the equipment cost of the cold data storage medium group;
a third calculation module, configured to determine a difference between the total saved value of the hot data storage medium and the total increased value of the cold data storage medium, to obtain the data cooling benefit;
and a tenth determining module, configured to take the data cooling benefit as characteristic data of the target storage instance.
In another specific embodiment, the second determining module 830 may include:
a second time difference value determining module, configured to determine a second time difference value between an expiration time of the data record and a current time in the target storage instance;
the second judging module is used for judging whether the second time difference value is smaller than a preset time threshold value or not;
the expiration data determining module is used for determining that the data record is expiration data when the second time difference value is smaller than a preset time threshold value;
a second number determination module for determining a second number of the expiration data in the target storage instance and a total number of data records in the target storage instance;
a fourth calculation module, configured to divide the second number by the total number of data records in the target storage instance to obtain the data expiration proportion;
and an eleventh determining module, configured to use the data expiration proportion as feature data corresponding to the target storage instance.
In another specific embodiment, the second determining module 830 may include:
a twelfth determining module, configured to determine, according to a data size of each data record in the target storage instance, a total data capacity corresponding to the target storage instance;
The access amount determining module per second is used for determining the access amount per second corresponding to the target storage instance according to the third number of the data records in the target storage instance accessed in unit time;
a fifth calculation module, configured to divide the access amount per second by the total data capacity to obtain an instance access density;
and a thirteenth determining module, configured to use the instance access density as characteristic data of the target storage instance.
In another specific embodiment, the apparatus may further include:
the third acquisition module is used for acquiring a sample storage instance;
the extraction module is used for extracting sample characteristic data corresponding to the sample storage instance;
the label determining module is used for determining labels corresponding to the sample storage examples according to the sample characteristic data and the corresponding characteristic data threshold values; wherein, the label corresponding to the positive sample storage instance is suitable for data cooling; labels corresponding to negative sample storage instances are unsuitable for data cooling;
the training module is used for carrying out data cooling detection training by using a preset machine learning model based on the sample characteristic data and the corresponding label, and adjusting model parameters of the preset machine learning model in the data cooling detection training until the label output by the preset machine learning is matched with the label input by the preset machine learning; and taking the machine learning model corresponding to the current model parameters as the cooling detection model.
It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.
According to the data processing device, the image information of the data record in the target storage instance is determined by acquiring the target storage instance, the characteristic data which is corresponding to the target storage instance and characterizes the data cooling suitability of the target storage instance is determined based on the image information of the data record, the characteristic data is input into the cooling detection model for cooling detection, and a detection result of whether the target storage instance is suitable for cooling data is obtained, so that whether the storage instance in the database can be subjected to cooling data is comprehensively and accurately judged, and the subsequent cooling of the data based on the judgment result can obviously reduce the storage cost and ensure the access quality of the data.
The embodiment of the invention provides a computer device, which comprises a processor and a memory, wherein at least one instruction, at least one section of program, code set or instruction set is stored in the memory, and the at least one instruction, the at least one section of program, the code set or instruction set is loaded and executed by the processor to realize the data processing method provided by the embodiment of the method.
The memory may be used to store software programs and modules that the processor executes to perform various functional applications and data processing by executing the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.
The method embodiments provided by the embodiments of the present invention may be performed in a computer terminal, a server, or a similar computing device, i.e., the computer apparatus may include a computer terminal, a server, or a similar computing device. Taking the operation on a server as an example, fig. 9 is a block diagram of a hardware structure of a server of a data processing method according to an embodiment of the present invention. As shown in fig. 9, the server 900 may vary considerably in configuration or performance, and may include one or more central processing units (Central Processing Units, CPU) 910 (the processor 910 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 930 for storing data, one or more storage media 920 (e.g., one or more mass storage devices) for storing applications 923 or data 922. Wherein memory 930 and storage medium 920 may be transitory or persistent storage. The program stored on the storage medium 920 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, the central processor 910 may be configured to communicate with a storage medium 920 and execute a series of instruction operations in the storage medium 920 on the server 900. The server 900 may also include one or more power supplies 960, one or more wired or wireless network interfaces 950, one or more input/output interfaces 940, and/or one or more operating systems 921, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.
The input-output interface 940 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 900. In one example, the input-output interface 940 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices through a base station to communicate with the internet. In one example, the input/output interface 940 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
It will be appreciated by those skilled in the art that the configuration shown in fig. 9 is merely illustrative and is not intended to limit the configuration of the electronic device. For example, server 900 may also include more or fewer components than shown in fig. 9, or have a different configuration than shown in fig. 9.
Embodiments of the present invention also provide a computer readable storage medium that may be disposed in a server to store at least one instruction, at least one program, a code set, or a set of instructions related to implementing a document management method in method embodiments, the at least one instruction, the at least one program, the code set, or the set of instructions being loaded and executed by the processor to implement a data processing method provided in method embodiments described above.
Alternatively, in this embodiment, the storage medium may be located in at least one network server among a plurality of network servers of the computer network. Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (11)

1. A method of data processing, the method comprising:
acquiring a target storage instance; the target storage instance comprises a plurality of data records;
determining the portrait information of each data record in the target storage instance;
determining the total data capacity corresponding to the target storage instance according to the data size in the portrait information corresponding to each data record in the target storage instance;
determining a first product between the ratio of unaccessed data of the target storage instance and the total data capacity; determining a first ratio between the first product and a total storage capacity of the set of thermal data storage media; obtaining a total savings value for the thermal data storage medium based on a product between the first ratio and a device cost for the set of thermal data storage media;
Determining a second ratio between the transparent access volume of the target storage instance and an upper support limit of the set of cold data storage media; determining a third ratio between the first product and a total storage capacity of the group of cold data storage media; obtaining an increased total value of the cold data storage media based on a product between a target ratio and a device cost of the group of cold data storage media, the target ratio being a maximum of the second ratio and the third ratio;
determining a difference value between the total saving value of the hot data storage medium and the total increasing value of the cold data storage medium to obtain data cooling benefits corresponding to the target storage instance;
and inputting the data cooling gain as characteristic data into a cooling detection model for cooling detection to obtain a detection result of whether the target storage instance is suitable for data cooling.
2. The data processing method of claim 1, wherein the characteristic data further comprises at least one of:
unvisited data proportion, unvisited data heat rate of change, data expiration proportion, instance access density.
3. The data processing method of claim 2, wherein determining the representation information of the data record in the target storage instance comprises:
Determining the last access time of the data record according to the read-write time stamp information corresponding to the data record;
determining a data size of the data record and an expiration time of the data record;
and taking the last access time, the data size and the expiration time as portrait information of the data record.
4. A data processing method according to claim 3, wherein when the characteristic data further includes the non-accessed data proportion, the method further comprises:
determining a first time difference value between the last access time of the data record in the target storage instance and the current time;
judging whether the first time difference value meets a preset time condition or not;
when the judgment result is yes, determining that the data record is non-accessed data;
determining a first number of the non-accessed data in the target storage instance and a total number of data records in the target storage instance;
dividing the first number by the total number of data records in the target storage instance to obtain the non-accessed data proportion.
5. The data processing method of claim 4, wherein when the characteristic data further includes the unvisited data heat rate of change, the method further comprises:
Acquiring the ratio of the unaccessed data in a preset historical time period;
determining a maximum unaccessed data proportion and a minimum unaccessed data proportion in the unaccessed data proportions;
and determining a difference value between the maximum non-access data proportion and the minimum non-access data proportion to obtain the non-access data heat change rate.
6. A data processing method according to claim 3, wherein when the characteristic data further includes the data expiration proportion, the method further comprises:
determining a second time difference value between the expiration time of the data record in the target storage instance and the current time;
judging whether the second time difference value is smaller than a preset time threshold value or not;
when the judging result is yes, determining that the data record is outdated data;
determining a second number of the expiration data in the target storage instance and a total number of data records in the target storage instance;
dividing the second number by the total number of data records in the target storage instance to obtain the data expiration proportion.
7. A data processing method according to claim 3, wherein when the characteristic data further includes the instance access density, the method further comprises:
Determining the total data capacity corresponding to the target storage instance according to the data size of each data record in the target storage instance;
determining the access amount per second corresponding to a target storage instance according to the third number of data records in the target storage instance accessed in unit time;
dividing the access per second by the total data capacity to obtain the instance access density.
8. The data processing method of claim 1, wherein the method further comprises:
acquiring a sample storage instance;
extracting sample characteristic data corresponding to the sample storage instance; the sample characteristic data comprises data cooling benefits;
determining a label corresponding to the sample storage instance according to the sample characteristic data and the corresponding characteristic data threshold; wherein, the label corresponding to the positive sample storage instance is suitable for data cooling; labels corresponding to negative sample storage instances are unsuitable for data cooling;
based on the sample characteristic data and the corresponding labels, performing data cooling detection training by using a preset machine learning model, and adjusting model parameters of the preset machine learning model in the data cooling detection training until the labels output by the preset machine learning are matched with the input labels;
And taking the machine learning model corresponding to the current model parameters as the cooling detection model.
9. A data processing apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a target storage instance;
the first determining module is used for determining the portrait information of the data record in the target storage instance;
the second determining module is used for determining the total data capacity corresponding to the target storage instance according to the data size in the portrait information corresponding to each data record in the target storage instance; determining a first product between the ratio of unaccessed data of the target storage instance and the total data capacity; determining a first ratio between the first product and a total storage capacity of the set of thermal data storage media; obtaining a total savings value for the thermal data storage medium based on a product between the first ratio and a device cost for the set of thermal data storage media; determining a second ratio between the transparent access volume of the target storage instance and an upper support limit of the set of cold data storage media; determining a third ratio between the first product and a total storage capacity of the group of cold data storage media; obtaining an increased total value of the cold data storage media based on a product between a target ratio and a device cost of the group of cold data storage media, the target ratio being a maximum of the second ratio and the third ratio; determining a difference value between the total saving value of the hot data storage medium and the total increasing value of the cold data storage medium to obtain data cooling benefits corresponding to the target storage instance;
And the cooling detection module is used for inputting the data cooling gain as characteristic data into a cooling detection model to carry out cooling detection, so as to obtain a detection result of whether the target storage instance can carry out data cooling.
10. A computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set or instruction set, the at least one instruction, the at least one program, code set or instruction set being loaded and executed by the processor to implement the data processing method of any of claims 1-8.
11. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement the data processing method of any one of claims 1-8.
CN201910480279.0A 2019-06-04 2019-06-04 Data processing method and device and computer equipment Active CN110321348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910480279.0A CN110321348B (en) 2019-06-04 2019-06-04 Data processing method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910480279.0A CN110321348B (en) 2019-06-04 2019-06-04 Data processing method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN110321348A CN110321348A (en) 2019-10-11
CN110321348B true CN110321348B (en) 2024-01-09

Family

ID=68119573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910480279.0A Active CN110321348B (en) 2019-06-04 2019-06-04 Data processing method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN110321348B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905405B (en) * 2019-12-03 2022-12-23 武汉斗鱼鱼乐网络科技有限公司 Label data processing method and device and storage medium
CN113064930B (en) * 2020-12-29 2023-04-28 中国移动通信集团贵州有限公司 Cold and hot data identification method and device of data warehouse and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630605A (en) * 2015-12-21 2016-06-01 中电科华云信息技术有限公司 Method for dynamically adjusting data service clusters on basis of data service access condition
WO2016165441A1 (en) * 2015-09-06 2016-10-20 中兴通讯股份有限公司 Migration policy adjustment method, capacity-change suggestion method and device
CN107844269A (en) * 2017-10-17 2018-03-27 华中科技大学 A kind of layering mixing storage system and method based on uniformity Hash
CN109190070A (en) * 2018-08-01 2019-01-11 北京奇艺世纪科技有限公司 A kind of data processing method, device, system and application server
CN109358821A (en) * 2018-12-12 2019-02-19 山东大学 A kind of cold and hot data store optimization method of cloud computing of cost driving

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11070617B2 (en) * 2015-10-26 2021-07-20 International Business Machines Corporation Predicting cloud enablement from storage and data metrics harnessed from across stack
US20180373722A1 (en) * 2017-06-26 2018-12-27 Acronis International Gmbh System and method for data classification using machine learning during archiving

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016165441A1 (en) * 2015-09-06 2016-10-20 中兴通讯股份有限公司 Migration policy adjustment method, capacity-change suggestion method and device
CN105630605A (en) * 2015-12-21 2016-06-01 中电科华云信息技术有限公司 Method for dynamically adjusting data service clusters on basis of data service access condition
CN107844269A (en) * 2017-10-17 2018-03-27 华中科技大学 A kind of layering mixing storage system and method based on uniformity Hash
CN109190070A (en) * 2018-08-01 2019-01-11 北京奇艺世纪科技有限公司 A kind of data processing method, device, system and application server
CN109358821A (en) * 2018-12-12 2019-02-19 山东大学 A kind of cold and hot data store optimization method of cloud computing of cost driving

Also Published As

Publication number Publication date
CN110321348A (en) 2019-10-11

Similar Documents

Publication Publication Date Title
US9009157B2 (en) Apparatus and method for processing a data stream
US10229129B2 (en) Method and apparatus for managing time series database
US11132383B2 (en) Techniques for processing database tables using indexes
CN110321348B (en) Data processing method and device and computer equipment
US20140229482A1 (en) Grouping interdependent fields
WO2008121862A1 (en) Data merging in distributed computing
CN107329983B (en) Machine data distributed storage and reading method and system
WO2018113317A1 (en) Data migration method, apparatus, and system
US20150234883A1 (en) Method and system for retrieving real-time information
CN112580817A (en) Managing machine learning features
CN110865992A (en) Retrieval library management method, retrieval device and retrieval medium
CN104978324A (en) Data processing method and device
CN107229517A (en) Method for scheduling task and device
CN112100219B (en) Report generation method, device, equipment and medium based on database query processing
CN114169401A (en) Data processing and prediction model training method and device
CN111367956B (en) Data statistics method and device
CN109154933A (en) Distributed data base system and distribution and the method for accessing data
CN116610458B (en) Data processing method and system for optimizing power consumption loss
Mouratidis et al. Tree-based partition querying: a methodology for computing medoids in large spatial datasets
CN111611228B (en) Load balancing adjustment method and device based on distributed database
CN111859042A (en) Retrieval method and device and electronic equipment
CN109002446B (en) Intelligent sorting method, terminal and computer readable storage medium
US20190102106A1 (en) Storage device operations using a die translation table
CN109271303B (en) Software configuration recommendation method
CN112085926A (en) River water pollution early warning method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant