CN115480713A - Method, device and medium for determining cold and hot data - Google Patents

Method, device and medium for determining cold and hot data Download PDF

Info

Publication number
CN115480713A
CN115480713A CN202211293452.4A CN202211293452A CN115480713A CN 115480713 A CN115480713 A CN 115480713A CN 202211293452 A CN202211293452 A CN 202211293452A CN 115480713 A CN115480713 A CN 115480713A
Authority
CN
China
Prior art keywords
time
service request
data
state switching
request data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211293452.4A
Other languages
Chinese (zh)
Inventor
雷皓鑫
方浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Data Technology Co Ltd
Original Assignee
Jinan Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Data Technology Co Ltd filed Critical Jinan Inspur Data Technology Co Ltd
Priority to CN202211293452.4A priority Critical patent/CN115480713A/en
Publication of CN115480713A publication Critical patent/CN115480713A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a method, a device and a medium for determining cold and hot data, which are suitable for the technical field of data storage. The method comprises the steps of obtaining service request data of a service layer, adding attenuation time labels to the service request data according to a current attenuation time matrix, and optimizing the attenuation time through an iterative algorithm to determine the state switching times of the service request data; when the preset condition is met, determining cold and hot data of the service request data according to the state switching number average value; and when the preset condition is not met, taking the next iteration number as a new current iteration number. Compared with the existing cold and hot data determined by a simple time threshold and an LRU algorithm, the method reasonably distributes storage resources according to the attenuation time of different services required by the service level, so that the determined cold and hot data embodies the individuation of the requests of different service departments, and the processing efficiency of the system is improved.

Description

Method, device and medium for determining cold and hot data
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a method, an apparatus, and a medium for determining cold and hot data.
Background
In a distributed storage system, storage is often divided into a cache layer composed of a high-speed expensive Solid State Disk (SSD) and a Random Access Memory (RAM), and a permanent storage layer composed of a low-speed inexpensive Hard Disk Drive (HDD). The cache tier is either not sufficient to store all of the data, or the cost of storing all of the data in the cache tier is prohibitive, so it is desirable to store only hot data in the cache, reduce the occupancy of cold data in the cache by compression, or evict cold data to a slower speed storage device.
Generally, all data are determined to be hot and cold data by a simple time threshold node (for example, a fixed time period is selected as a time threshold) mode or a simple Least Recently Used (LRU) method, and feature data of a service layer is not considered.
Therefore, there is a need for a solution to improve the reasonable allocation of storage resources and improve the processing efficiency of a storage system.
Disclosure of Invention
The invention aims to provide a method, a device and a medium for determining cold and hot data, which are used for reasonably distributing storage resources according to attenuation time of different services required by a service level, so that the determined cold and hot data can embody individuation of requests of different service departments, and the processing efficiency of a system is improved.
In order to solve the above technical problem, the present invention provides a method for determining cold and hot data, which is applied to a server and includes:
acquiring a current attenuation time matrix of the current iteration times;
acquiring service request data of a client within preset time, and adding an attenuation time tag to the service request data according to a current attenuation time matrix to determine the state switching times of the service request data, wherein the attenuation time tag is obtained by adjusting the attenuation time according to the data state of the service request data, and the data state comprises an active state and an inactive state;
determining a state switching frequency mean value according to the state switching frequency;
when the current iteration times or the state switching time mean value meets a preset condition, determining cold and hot data of the service request data according to the state switching time mean value;
and when the current iteration times or the state switching times mean value does not meet the preset condition, taking the next iteration times as new current iteration times, and returning to the step of obtaining the current attenuation time matrix of the current iteration times until the preset condition is met.
Preferably, the determination process of the current decay time matrix comprises:
determining a multiplication coefficient matrix of the current iteration times according to the relation between the state switching time average value of the last iteration times and the state switching times, wherein the multiplication coefficient matrix of the first iteration times is an initialization multiplication coefficient matrix;
acquiring service request data of a client, wherein the service request data comprises different data corresponding to service request departments;
classifying the service request data based on a classification algorithm to obtain a probability distribution matrix, wherein each row in the probability distribution matrix is the same type of data and each column is a different service request department;
carrying out weight processing on the probability distribution matrix through a probability function to obtain a corresponding weight coefficient;
and multiplying the weight coefficient by the multiplication coefficient matrix to obtain a current attenuation time matrix, wherein the attenuation time matrix of the first iteration is obtained by multiplying the initialized multiplication coefficient matrix by the weight coefficient.
Preferably, the process of determining the multiplication coefficient matrix of the current iteration number specifically includes:
obtaining a multiplication coefficient matrix of the last iteration times;
when the state switching times are larger than the state switching time mean value, adding a first coefficient to the multiplication coefficient matrix of the last iteration times to obtain a multiplication coefficient matrix of the current iteration times;
when the state switching times are equal to the state switching time mean value, the multiplication coefficient matrix of the last iteration times is the multiplication coefficient matrix of the current iteration times;
and when the state switching times are smaller than the state switching time mean value, multiplying the multiplication coefficient matrix of the last iteration times by a second coefficient to obtain a multiplication coefficient matrix of the current iteration times.
Preferably, the weighting processing of the probability distribution matrix by the probability function to obtain the corresponding weight coefficient includes:
and carrying out weight processing on the probability distribution matrix through a normal distribution probability function to obtain a corresponding weight coefficient.
Preferably, the acquiring the service request data of the client within the preset time, and adding the decay time tag to the service request data according to the current decay time matrix to determine the state switching times of the service request data includes:
acquiring last service request data and current service request data, and respectively determining a first service type and a second service type corresponding to the last service request data and the current service request data and an acquired time interval;
searching corresponding first attenuation time and second attenuation time in the current attenuation time matrix according to the first service type and the second service type respectively, and marking the service request data as an active state;
when the time interval is smaller than the first attenuation time, adjusting a second attenuation time corresponding to the service request data to determine the current final attenuation time;
when the time interval is greater than or equal to the first decay time, keeping the second decay time unchanged to serve as the current final decay time, and marking the service request data as an inactive state;
and recording the data state change marked by the service request data into the state switching times, and counting the state switching times of the service request data in the preset time.
Preferably, when the time interval is smaller than the first decay time, adjusting a second decay time corresponding to the service request data includes:
judging whether the first attenuation time is greater than the second attenuation time;
if so, adjusting the second decay time to be the first decay time as the current final decay time;
if not, the second decay time is kept unchanged as the current final decay time.
Preferably, determining the average value of the state switching times according to the state switching times includes:
determining corresponding state switching times according to different service types corresponding to the service request data;
and summarizing and dividing each state switching frequency by the type of the service type to obtain a state switching frequency average value.
Preferably, the preset condition is that the current iteration number reaches the total value of the iteration number or the type of the service type corresponding to the state switching number smaller than the state switching number mean is smaller than the threshold of the total type of the service type, and when the current iteration number or the state switching number mean meets the preset condition, the cold and hot data of the service request data are determined according to the state switching number mean, and the method includes:
determining the service request data corresponding to the state switching times smaller than the state switching time mean value as hot data;
and determining the service request data corresponding to the state switching times larger than or equal to the average value of the state switching times as cold data.
In order to solve the above technical problem, the present invention further provides a device for determining cold and hot data, which is applied to a server and includes:
the acquisition module is used for acquiring a current attenuation time matrix of the current iteration times;
the first determining module is used for acquiring service request data of a client within preset time, and adding an attenuation time tag to the service request data according to a current attenuation time matrix to determine the state switching times of the service request data, wherein the attenuation time tag is obtained by adjusting attenuation time according to the data state of the service request data, and the data state comprises an active state and an inactive state;
the second determining module is used for determining the average value of the state switching times according to the state switching times;
the third determining module is used for determining cold and hot data of the service request data according to the state switching time mean value when the current iteration time or the state switching time mean value meets the preset condition;
and the return module is used for taking the next iteration time as a new current iteration time when the current iteration time or the state switching time mean value does not meet the preset condition, and returning to the step of obtaining the current attenuation time matrix of the current iteration time until the preset condition is met.
In order to solve the above technical problem, the present invention further provides a device for determining cold and hot data, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of the method for determining the cold and hot data when executing the computer program.
In order to solve the above technical problem, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above method for determining cold and hot data.
The invention provides a method for determining cold and hot data, which is applied to a server and comprises the following steps: acquiring a current attenuation time matrix of current iteration times; acquiring service request data of a client within preset time, and adding an attenuation time tag to the service request data according to a current attenuation time matrix to determine the state switching times of the service request data, wherein the attenuation time tag is obtained by adjusting the attenuation time according to the data state of the service request data, and the data state comprises an active state and an inactive state; determining a state switching frequency mean value according to the state switching frequency; when the current iteration times or the state switching time mean value meets a preset condition, determining cold and hot data of the service request data according to the state switching time mean value; and when the current iteration times or the state switching times mean value does not meet the preset condition, taking the next iteration times as new current iteration times, and returning to the step of obtaining the current attenuation time matrix of the current iteration times until the preset condition is met. The method obtains service request data of a service level, optimizes decay time through an iterative algorithm to obtain state switching times of the service request data, obtains a state switching time mean value serving as a threshold value through the state switching times, and reasonably distributes storage resources according to the decay time of different services required by the service level compared with the existing cold and hot data determined through a simple time threshold value and an LRU algorithm, so that the determined cold and hot data embody individuation of different service department requests, and the processing efficiency of a system is improved.
In addition, the invention also provides a device and a medium for determining the cold and hot data, which have the same beneficial effects as the method for determining the cold and hot data.
Drawings
In order to illustrate the embodiments of the present invention more clearly, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a method for determining cold and hot data according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for determining cold and hot data according to an embodiment of the present invention;
fig. 3 is a structural diagram of a device for determining cold and hot data according to an embodiment of the present invention;
fig. 4 is a block diagram of another device for determining cold and hot data according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
The core of the invention is to provide a method, a device and a medium for determining cold and hot data, which are used for reasonably allocating storage resources according to the attenuation time of different services required by a service level, so that the determined cold and hot data can embody the individuation of the requests of different service departments, and the processing efficiency of a system can be improved.
In order that those skilled in the art will better understand the disclosure, reference will now be made in detail to the embodiments of the disclosure as illustrated in the accompanying drawings.
It should be noted that the method for determining cold and hot data provided by the present invention is mainly applied to the problem of efficient utilization of storage media with different efficiencies in distributed storage. Data in the high-efficiency storage medium is subjected to cold and hot identification, and cold data are expelled to cheap storage equipment, so that the cluster can utilize storage resources more efficiently, and the cluster storage efficiency is improved.
Cold data in a storage system generally refers to data which is not accessed for a long time, hot data refers to data which is accessed frequently, and the characteristic of the cold data and the hot data sufficiently indicates that the cold data and the hot data are related to service requests of an upper layer, while service requests of different departments have obvious differences, and personalized cold data and hot data are defined according to the differences.
Fig. 1 is a flowchart of a method for determining cold and hot data according to an embodiment of the present invention, which is applied to a server, and as shown in fig. 1, the method includes:
s11: acquiring a current attenuation time matrix of current iteration times;
s12: acquiring service request data of a client within preset time, and adding an attenuation time tag to the service request data according to a current attenuation time matrix to determine the state switching times of the service request data;
the attenuation time label is obtained by adjusting attenuation time according to the data state of the service request data, wherein the data state comprises an active state and an inactive state;
s13: determining a state switching frequency average value according to the state switching frequency;
s14: judging whether the current iteration times or the average value of the state switching times meet a preset condition, if so, entering a step S15, and if not, entering a step S16;
s15: determining cold and hot data of the service request data according to the state switching number average value;
s16: and taking the next iteration times as new current iteration times, and returning to the step of obtaining the current attenuation time matrix of the current iteration times until the preset condition is met.
Specifically, a current attenuation time matrix of the current iteration number is obtained, and since the attenuation time matrices of different iteration numbers are different in the iteration process, the current attenuation time matrix corresponding to the current iteration number is obtained. As a preferred embodiment, the determination of the current decay time matrix comprises:
determining a multiplication coefficient matrix of the current iteration times according to the relation between the state switching time average value of the last iteration times and the state switching times, wherein the multiplication coefficient matrix of the first iteration times is an initialization multiplication coefficient matrix;
acquiring service request data of a client, wherein the service request data comprises different data corresponding to service request departments;
classifying the service request data based on a classification algorithm to obtain a probability distribution matrix, wherein each row in the probability distribution matrix is the same type of data and each column is a different service request department;
carrying out weight processing on the probability distribution matrix through a probability function to obtain a corresponding weight coefficient;
and multiplying the weight coefficient by the multiplication coefficient matrix to obtain a current attenuation time matrix, wherein the attenuation time matrix of the first iteration is obtained by multiplying the initialized multiplication coefficient matrix by the weight coefficient.
Because the different iteration processes of the attenuation time matrix are related to the parameters corresponding to the last iteration times, the multiplication coefficient matrix of the current iteration times is determined by obtaining the relation between the state switching time average value of the last iteration times and the state switching times. The multiplication coefficient matrix for the current iteration times is updated according to the iteration times, the updated multiplication coefficient matrix is related to the state switching time mean value and the state switching times, and correspondingly, the determination process of the multiplication coefficient matrix for the current iteration times specifically comprises the following steps:
obtaining a multiplication coefficient matrix of the last iteration times;
when the state switching times are larger than the state switching time mean value, adding a first coefficient to the multiplication coefficient matrix of the last iteration times to obtain a multiplication coefficient matrix of the current iteration times;
when the state switching times are equal to the state switching time mean value, the multiplication coefficient matrix of the last iteration times is the multiplication coefficient matrix of the current iteration times;
and when the state switching times are smaller than the state switching time mean value, multiplying the multiplication coefficient matrix of the last iteration times by the second coefficient to obtain the multiplication coefficient matrix of the current iteration times.
Specifically, the updated multiplication coefficient matrix K is obtained by the following relation:
Figure BDA0003902127710000071
wherein k is i For the multiplication coefficient matrix obtained from the last iteration number, the state switching number n i And obtaining a corresponding multiplication coefficient matrix as a multiplication coefficient matrix of the current iteration times according to the relation with the state switching times mean value mu.
The state switching times and the state switching time average value are both values of the last iteration times.
It is understood that the first coefficient and the second coefficient may be set according to actual conditions, and are not limited herein, and may be the same or different.
Acquiring service request data of a client, wherein the service request data comprises different data corresponding to service request departments, dividing and determining a large amount of original data in the existing determined cold and hot data processing process according to the request data corresponding to different service requirements in a service layer without considering the characteristics of different service requirements, and counting the access frequency of bottom data according to the service requests of different departments of the client, for example, the types of different service requests comprise n q And (4) class. It should be noted that, a classification model needs to be trained for a kind of data that may appear in different department businesses.
Correspondingly, the classification algorithm can be a traditional classification algorithm, and can also be based on a deep learning neural network classification model to obtain the probability that the data belong to different classes, and the obtained probability distribution matrix P A The following formula, wherein each row in the probability distribution matrix is the same type of data, and each column is different service request departments, that is, the probability that the same type of data belongs to different services in each row;
Figure BDA0003902127710000081
the probability distribution matrix is weighted by the probability function to obtain a corresponding weight coefficient, and correspondingly, the probability function is not limited here, and may be a normal distribution function or a probability distribution function, as long as the probability of each value occurrence is given by the discrete variable in a functional form. As a preferred embodiment, the weighting processing on the probability distribution matrix through the probability function to obtain the corresponding weight coefficient includes:
and carrying out weight processing on the probability distribution matrix through a normal distribution probability function to obtain a corresponding weight coefficient.
Specifically, μ =0.5 and σ =0.17 in the probability function of the normal distribution are set, and the weight coefficient B obtained by the probability function of the normal distribution is set A The formula is as follows:
Figure BDA0003902127710000082
wherein, P A Is a probability distribution matrix;
it is understood that the weighting coefficients range between (0, 1), and the weighting coefficients are multiplied by a coefficient matrix K i Multiplying to obtain the current attenuation time matrix T Ai The concrete formula is as follows:
T Ai =K i *BA
in addition, the attenuation time matrix of the first iteration number is obtained by multiplying the initialized multiplication coefficient matrix by the weight coefficient, and the specific formula is as follows:
T A =K×B A
Figure BDA0003902127710000091
wherein, T A For initializing the decay time matrix, K is the initialized multiplication coefficient matrix, B A Are the weight coefficients.
For example: initializing a multiplication coefficient matrix when the first iteration times
Figure BDA0003902127710000092
Initializing the decay time matrix T A Initializing the maximum optimization times N =100, initializing the current iteration times i =1, and initializing the state switching times ni =0 of each type of data.
The current decay time matrix obtained in step S11 obtains the service request data of the client within the preset time. The service request data is acquired as the same data each time, the service request data is acquired for a plurality of times within preset time, the service request data is subjected to state conversion by calculating attenuation time, and the state conversion times are recorded, so that the average value of the state conversion times of different types of data corresponding to the data is solved.
And inquiring the service request data in the current decay time matrix according to the request type and the data type through an iterative algorithm to obtain the corresponding decay time, and obtaining the corresponding decay time in the preset time according to the received same service request data. It should be noted that the service types corresponding to the same kind of service request data may be the same, or may be different, that is, the attenuation time corresponding to each received data may be the same, or may be different. The data state of the service request data is determined by comparing the time interval at which the data is received with the corresponding decay time. The decay time is adjusted to determine the number of state switches based on the different decay times and the change between data states.
It is understood that the decay time tag is added according to different service types of the service request data to determine the state switching times of the service request data. The data state comprises an active state and an inactive state, wherein the corresponding active state is that the same kind of data is received again when the decay time of the currently received data is not exhausted, the inactive state is that the same kind of data is not received again when the decay time of the currently received data is exhausted, and the inactive state is defined as the inactive state.
As an embodiment, acquiring service request data of a client within a preset time, and adding a decay time tag to the service request data according to a current decay time matrix to determine the number of state switching times of the service request data, includes:
acquiring last service request data and current service request data, and respectively determining a first service type and a second service type corresponding to the last service request data and the current service request data and an acquired time interval;
searching corresponding first attenuation time and second attenuation time in the current attenuation time matrix according to the first service type and the second service type respectively, and marking the service request data as an active state;
when the time interval is smaller than the first attenuation time, adjusting a second attenuation time corresponding to the service request data to determine the current final attenuation time;
when the time interval is greater than or equal to the first attenuation time, keeping the second attenuation time unchanged to serve as the current final attenuation time, and marking the service request data in an inactive state;
and recording the data state change marked by the service request data into the state switching times, and counting the state switching times of the service request data in the preset time.
Specifically, last and current service request data are obtained, and corresponding first and second service types and last and current time intervals t0 are respectively determined. And searching corresponding first attenuation time t1 and second attenuation time t2 according to the first service type and the second service type in the current attenuation time matrix, and respectively marking the service request data as an active state. It is understood that the first traffic type and the second traffic type may be the same or different, and if the first and second traffic types are the same, the first and second decay times are the same.
And judging whether the time interval of the two times is less than the first attenuation time, if so, indicating that the same kind of data is received again when the first attenuation time is not exhausted, and if the same kind of data is in an active state all the time, the data state of the same kind of data is not changed. If the attenuation time is larger than or equal to the preset attenuation time, the same kind of data is received again after the first attenuation time is exhausted, the same kind of data is marked as an inactive state after the first attenuation time is exhausted, the same kind of data is in an active state in the process of the first attenuation time, and at the moment, one switching occurs. When the same kind of data is received again, the data is in an active state and is switched from an inactive state to an active state, and one switching occurs. And recording the state change of each data into the state switching times so as to count the state switching times of the same data within preset time. When the decay time expires, the data is marked inactive, and the number of data state switches ni of this type is increased once when the data state changes.
The average value of the state switching times is determined according to the state switching times, and it can be understood that the state switching times corresponding to different service types of the same kind of data are different, and the average value of the state switching times is obtained according to the summary of the state switching times corresponding to the service types counted in the preset time. As an embodiment, determining the average value of the number of state switching times according to the number of state switching times includes:
determining corresponding state switching times according to different service types corresponding to the service request data;
and summarizing the switching times of each state by dividing the switching times of each state by the types of the service types to obtain the average value of the switching times of the states.
Specifically, the specific formula of the average value of the number of state switching times for all types is as follows:
Figure BDA0003902127710000111
wherein n is i For the number of data state transitions, n q The total type of the service type, mu, is the average value of the state switching times.
And further, judging whether the current iteration times or the state switching time mean value meets a preset condition, if so, determining cold and hot data according to the state switching time mean value corresponding to the current iteration times as a final threshold, if not, continuing to perform iterative calculation until the preset condition is met, and terminating the iteration.
Correspondingly, the preset condition is set according to the current iteration times or the average value of the state switching times, and the relationship between the current iteration times and the average value of the state switching times is set as long as any parameter meets the preset condition.
The method for determining cold and hot data provided by the embodiment of the invention is applied to a server and comprises the following steps: acquiring a current attenuation time matrix of the current iteration times; acquiring service request data of a client within preset time, and adding an attenuation time tag to the service request data according to a current attenuation time matrix to determine the state switching times of the service request data, wherein the attenuation time tag is obtained by adjusting attenuation time according to the data state of the service request data, and the data state comprises an active state and an inactive state; determining a state switching frequency mean value according to the state switching frequency; when the current iteration times or the average value of the state switching times meet a preset condition, determining cold and hot data of the service request data according to the average value of the state switching times; and when the current iteration times or the state switching times mean value does not meet the preset condition, taking the next iteration times as new current iteration times, and returning to the step of obtaining the current attenuation time matrix of the current iteration times until the preset condition is met. The method obtains service request data of a service layer, optimizes decay time through an iterative algorithm to obtain state switching times of the service request data, obtains a state switching time mean value serving as a threshold value through the state switching times, and compared with the existing cold and hot data determined through a simple time threshold value and an LRU algorithm, the method reasonably distributes storage resources according to the decay time of different services required by the service layer, enables the determined cold and hot data to embody individuation of different service department requests, and improves the processing efficiency of the system.
On the basis of the foregoing embodiment, when the time interval is smaller than the first decay time, adjusting a second decay time corresponding to the service request data includes:
judging whether the first attenuation time is greater than the second attenuation time;
if so, adjusting the second decay time to be the first decay time as the current final decay time;
if not, the second decay time is kept unchanged as the current final decay time.
Specifically, when the time interval is smaller than the first decay time, the decay time is adjusted to find the optimum number of state switching times. In this embodiment, the strategy for adjusting the decay time is determined by comparing the first decay time with the second decay time. And when the first attenuation time is greater than the second attenuation time, adjusting the first attenuation time as the current final attenuation time, and if the first attenuation time is less than or equal to the current final attenuation time, keeping the first attenuation time unchanged. The adjustment strategy may be an integration of the first decay time and the second decay time as a final decay time, or an average value thereof may be obtained as the final decay time, and the adjustment strategy may be set according to an actual situation, which is not limited herein.
It will be appreciated that the adjusted decay time is obtained only by comparing the service request data obtained from each two adjacent acquisitions.
For example: at a certain time T total The method comprises the following steps that related services are operated by an upper layer, attenuation time labels are added to data according to upper layer service requests, and state switching times are calculated according to attenuation time, and the specific method comprises the following steps:
1. after receiving a service request for the first time, inquiring to obtain the attenuation time t according to the request type and the data type 1 And marking the data as an active state;
2. passing through t 0 Receiving the service request again after the time, and inquiring to obtain the attenuation time t according to the request type and the data type 2 If t is 0 <t 1 Adjusting data decay time:
Figure BDA0003902127710000121
wherein, t 3 The final decay time.
The strategy for adjusting the decay time provided by the embodiment of the invention is convenient for searching better state switching times in the subsequent process.
On the basis of the above embodiment, the preset condition is that the current iteration number reaches the total value of the iteration number or the type of the service type corresponding to the state switching number smaller than the state switching number mean value is smaller than the threshold of the total type of the service type, and when the current iteration number or the state switching number mean value meets the preset condition, the cold and hot data of the service request data are determined according to the state switching number mean value, including:
determining the service request data corresponding to the state switching times smaller than the state switching time mean value as hot data;
and determining the service request data corresponding to the state switching times larger than or equal to the average value of the state switching times as cold data.
It should be noted that, two parameters in the preset condition are in an or relationship, and as long as any one of the parameters correspondingly meets the preset condition of the parameter corresponding to the preset condition, the iteration can be skipped if the preset condition is met, and the state switching time average value corresponding to the current iteration time is used as a threshold value for screening to determine the cold and hot data.
Wherein, the state switching times in the preset condition is less than the threshold value that the type of the service type corresponding to the mean value of the state switching times is less than the total type of the service type, and is n i The class present in the traffic class for which < mu is less than the threshold for the total class, e.g. n is satisfied i There are 2 traffic classes < μ, with 20 total classes, with a threshold of 20%, and then 2 is less than 20 x 20%, then the iterative algorithm can be skipped. The setting of the threshold value for the total category may be set according to actual conditions, and is not particularly limited.
Determining the service request data corresponding to the state switching times smaller than the state switching time mean value as hot data;
and determining the service request data corresponding to the state switching times larger than or equal to the average value of the state switching times as cold data.
According to the switching times of each type of data, the data with the state switching times smaller than the mean value is classified into hot data, and the data with the state switching times larger than the mean value is classified into cold data. Compared with the existing cold and hot data determined by a simple time threshold and an LRU algorithm, the screening threshold is determined by the average value of the switching times, so that the determined cold and hot data can embody the individuation of the requests of different service departments.
Fig. 2 is a flowchart of another method for determining cold and hot data according to an embodiment of the present invention, as shown in fig. 2, the method includes:
s21: obtaining a probability distribution matrix through a classification algorithm;
s22: initializing parameters;
s23: operating an upper layer service;
s24: updating the data decay time;
s25: counting the switching times of the data state;
s26: optimizing a multiplication coefficient matrix;
s27: judging whether the current iteration times reach the iteration termination condition, if so, entering a step S28; if not, returning to the step S23;
s28: and carrying out cold and hot classification on the data through the acquired threshold value.
For another method for determining cold and hot data provided by the present invention, please refer to the above method embodiment, which is not described herein again, and has the same beneficial effects as the method for determining cold and hot data.
On the basis of the above detailed descriptions of the various embodiments corresponding to the method for determining cold and hot data, the present invention further discloses a device for determining cold and hot data corresponding to the above method, and fig. 3 is a structural diagram of a device for determining cold and hot data according to an embodiment of the present invention. As shown in fig. 3, the device for determining hot and cold data includes:
an obtaining module 11, configured to obtain a current decay time matrix of a current iteration number;
the first determining module 12 is configured to acquire service request data of a client within a preset time, and add an attenuation time tag to the service request data according to a current attenuation time matrix to determine the state switching times of the service request data, where the attenuation time tag is obtained by adjusting attenuation time according to a data state of the service request data, and the data state includes an active state and an inactive state;
the second determining module 13 is configured to determine a state switching frequency average value according to the state switching frequency;
a third determining module 14, configured to determine cold and hot data of the service request data according to the state switching time average when the current iteration time or the state switching time average meets a preset condition;
and a returning module 15, configured to, when the current iteration count or the average value of the state switching counts does not meet a preset condition, take the next iteration count as a new current iteration count, and return to the step of obtaining the current attenuation time matrix of the current iteration count until the preset condition is met.
Since the embodiment of the apparatus portion corresponds to the above-mentioned embodiment, the embodiment of the apparatus portion is described with reference to the embodiment of the method portion, and is not described again here.
For the introduction of the device for determining cold and hot data provided by the present invention, please refer to the above method embodiment, which is not described herein again, and has the same beneficial effects as the above method for determining cold and hot data.
Fig. 4 is a structural diagram of another device for determining cold and hot data according to an embodiment of the present invention, as shown in fig. 4, the device includes:
a memory 21 for storing a computer program;
and a processor 22 for implementing the steps of the method for determining cold and hot data when executing the computer program.
The device for determining the hot and cold data provided by the embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, or a desktop computer.
The processor 22 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The Processor 22 may be implemented in hardware using at least one of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), and a Programmable Logic Array (PLA). The processor 22 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in a wake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 22 may be integrated with a Graphics Processing Unit (GPU) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, processor 22 may also include an Artificial Intelligence (AI) processor for processing computational operations related to machine learning.
Memory 21 may include one or more computer-readable storage media, which may be non-transitory. Memory 21 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 21 is at least used for storing a computer program 211, wherein after being loaded and executed by the processor 22, the computer program can implement the relevant steps of the method for determining hot and cold data disclosed in any one of the foregoing embodiments. In addition, the resources stored in the memory 21 may also include an operating system 212, data 213, and the like, and the storage manner may be a transient storage or a permanent storage. Operating system 212 may include Windows, unix, linux, etc., among others. Data 213 may include, but is not limited to, data related to the determination of hot and cold data, and the like.
In some embodiments, the device for determining the hot and cold data may further include a display 23, an input/output interface 24, a communication interface 25, a power supply 26, and a communication bus 27.
Those skilled in the art will appreciate that the configuration shown in FIG. 4 does not constitute a limitation on the means for determining cold thermal data and may include more or fewer components than those shown.
The processor 22 calls the instructions stored in the memory 21 to implement the method for determining the hot and cold data provided by any of the above embodiments.
For the introduction of the device for determining cold and hot data provided by the present invention, please refer to the above method embodiment, which is not described herein again, and has the same beneficial effects as the method for determining cold and hot data.
Further, the present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by the processor 22, implements the steps of the method for determining cold and hot data as described above.
It is understood that, if the method in the above embodiments is implemented in the form of software functional units and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and performs all or part of the steps of the methods according to the embodiments of the present invention, or all or part of the technical solution. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
For the introduction of a computer-readable storage medium provided by the present invention, please refer to the above method embodiments, which are not repeated herein, and have the same beneficial effects as the above method for determining cold and hot data.
The method for determining cold and hot data, the device for determining cold and hot data, and the medium according to the present invention are described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
It should also be noted that, in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Claims (11)

1. A method for determining cold and hot data is applied to a server and comprises the following steps:
acquiring a current attenuation time matrix of the current iteration times;
acquiring service request data of a client within preset time, and adding an attenuation time tag to the service request data according to the current attenuation time matrix to determine the state switching times of the service request data, wherein the attenuation time tag is obtained by adjusting attenuation time according to the data state of the service request data, and the data state comprises an active state and an inactive state;
determining a state switching frequency average value according to the state switching frequency;
when the current iteration times or the state switching time mean value meets a preset condition, determining cold and hot data of the service request data according to the state switching time mean value;
and when the current iteration times or the average value of the state switching times does not meet the preset condition, taking the next iteration times as new current iteration times, and returning to the step of acquiring the current attenuation time matrix of the current iteration times until the preset condition is met.
2. A method for determining cold and hot data according to claim 1, wherein the determination of the current decay time matrix comprises:
determining a multiplication coefficient matrix of the current iteration times according to the relation between the state switching time average value of the last iteration times and the state switching times, wherein the multiplication coefficient matrix of the first iteration times is an initialization multiplication coefficient matrix;
acquiring each service request data of the client, wherein each service request data comprises different data corresponding to each service request department;
classifying the service request data based on a classification algorithm to obtain a probability distribution matrix, wherein each row in the probability distribution matrix is the same type of data and each column is different service request departments;
carrying out weight processing on the probability distribution matrix through a probability function to obtain a corresponding weight coefficient;
and multiplying the weight coefficient by the multiplication coefficient matrix to obtain the current attenuation time matrix, wherein the attenuation time matrix of the first iteration times is obtained by multiplying the initialized multiplication coefficient matrix by the weight coefficient.
3. A method for determining cold and hot data according to claim 2, wherein the determination of the multiplication coefficient matrix of the current iteration number specifically includes:
obtaining a multiplication coefficient matrix of the last iteration times;
when the state switching times are larger than the state switching time mean value, adding a first coefficient to the multiplication coefficient matrix of the last iteration times to obtain a multiplication coefficient matrix of the current iteration times;
when the state switching times are equal to the state switching time mean value, the multiplication coefficient matrix of the last iteration times is the multiplication coefficient matrix of the current iteration times;
and when the state switching times are smaller than the state switching time mean value, multiplying the multiplication coefficient matrix of the last iteration times by a second coefficient to obtain a multiplication coefficient matrix of the current iteration times.
4. The method for determining cold and hot data according to claim 3, wherein the weighting the probability distribution matrix by the probability function to obtain the corresponding weight coefficient comprises:
and carrying out weight processing on the probability distribution matrix through a normal distribution probability function to obtain the corresponding weight coefficient.
5. The method for determining cold and hot data according to claim 4, wherein the obtaining service request data of a client within a preset time and adding a decay time tag to the service request data according to the current decay time matrix to determine the number of state switching times of the service request data comprises:
acquiring the last service request data and the current service request data, and respectively determining a first service type and a second service type corresponding to the last acquired service request data and the current acquired service request data and an acquired time interval;
searching corresponding first attenuation time and second attenuation time in the current attenuation time matrix according to the first service type and the second service type respectively, and marking the service request data as the active state;
when the time interval is smaller than the first attenuation time, adjusting a second attenuation time corresponding to the service request data to determine a current final attenuation time;
when the time interval is greater than or equal to the first decay time, keeping the second decay time unchanged as the current final decay time, and marking the service request data in the inactive state;
and recording the data state change marked by the service request data into the state switching times, and counting the state switching times of the service request data within the preset time.
6. The method of claim 5, wherein when the time interval is smaller than the first decay time, adjusting the second decay time corresponding to the service request data comprises:
judging whether the first decay time is larger than the second decay time;
if so, adjusting the second decay time to the first decay time as the current final decay time;
if not, the second decay time is kept unchanged to be used as the current final decay time.
7. A method for determining cold and hot data according to claim 5, wherein the determining a mean value of the number of state switches according to the number of state switches comprises:
determining corresponding state switching times according to different service types corresponding to the service request data;
and summarizing and dividing each state switching frequency by the type of the service type to obtain the average value of the state switching frequency.
8. The method according to any one of claims 1 to 7, wherein the preset condition is that the current iteration number reaches a total iteration number value or the state switching number is smaller than a threshold value, where a type of a service type corresponding to the state switching number mean value is smaller than the total type of the service type, and when the current iteration number or the state switching number mean value satisfies the preset condition, the method determines the hot and cold data of the service request data according to the state switching number mean value, including:
determining the service request data corresponding to the state switching times smaller than the average value of the state switching times as hot data;
and determining the service request data corresponding to the state switching times larger than or equal to the average value of the state switching times as cold data.
9. A cold and hot data determination device is applied to a server side and comprises the following components:
the acquisition module is used for acquiring a current attenuation time matrix of the current iteration times;
the first determining module is used for acquiring service request data of a client within preset time, and adding an attenuation time tag to the service request data according to the current attenuation time matrix to determine the state switching times of the service request data, wherein the attenuation time tag is obtained by adjusting attenuation time according to the data state of the service request data, and the data state comprises an active state and an inactive state;
the second determining module is used for determining the average value of the state switching times according to the state switching times;
a third determining module, configured to determine cold and hot data of the service request data according to the state switching time average when the current iteration time or the state switching time average meets a preset condition;
and the return module is used for taking the next iteration time as a new current iteration time when the current iteration time or the state switching time average value does not meet the preset condition, and returning to the step of acquiring the current attenuation time matrix of the current iteration time until the preset condition is met.
10. A device for determining cold and hot data, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method of determining cold thermal data of any one of claims 1 to 8 when executing the computer program.
11. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for determining cold data and hot data according to any one of claims 1 to 8.
CN202211293452.4A 2022-10-21 2022-10-21 Method, device and medium for determining cold and hot data Pending CN115480713A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211293452.4A CN115480713A (en) 2022-10-21 2022-10-21 Method, device and medium for determining cold and hot data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211293452.4A CN115480713A (en) 2022-10-21 2022-10-21 Method, device and medium for determining cold and hot data

Publications (1)

Publication Number Publication Date
CN115480713A true CN115480713A (en) 2022-12-16

Family

ID=84395349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211293452.4A Pending CN115480713A (en) 2022-10-21 2022-10-21 Method, device and medium for determining cold and hot data

Country Status (1)

Country Link
CN (1) CN115480713A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116301667A (en) * 2023-05-24 2023-06-23 山东浪潮科学研究院有限公司 Database system, data access method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116301667A (en) * 2023-05-24 2023-06-23 山东浪潮科学研究院有限公司 Database system, data access method, device, equipment and storage medium
CN116301667B (en) * 2023-05-24 2023-09-01 山东浪潮科学研究院有限公司 Database system, data access method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109002358B (en) Mobile terminal software self-adaptive optimization scheduling method based on deep reinforcement learning
EP3907609A1 (en) Improved quantum ant colony algorithm-based spark platform task scheduling method
CN105205014B (en) A kind of date storage method and device
CN102801792A (en) Statistical-prediction-based automatic cloud CDN (Content Delivery Network) resource automatic deployment method
JP2012118987A (en) Computer implementation method, computer program, and system for memory usage query governor (memory usage query governor)
CN112866136B (en) Service data processing method and device
CN111338801B (en) Subtree migration method and device for realizing metadata load balance
CN106802772A (en) The method of data record, device and solid state hard disc
CN108491255B (en) Self-service MapReduce data optimal distribution method and system
WO2014138234A1 (en) Demand determination for data blocks
CN114064284B (en) Cloud server resource allocation method and device, electronic equipment and medium
US20190332531A1 (en) Storage management method, electronic device and computer program product
CN115480713A (en) Method, device and medium for determining cold and hot data
CN110347477B (en) Service self-adaptive deployment method and device in cloud environment
CN116700634B (en) Garbage recycling method and device for distributed storage system and distributed storage system
CN112073327B (en) Anti-congestion software distribution method, device and storage medium
CN105407383A (en) Multi-version video-on-demand streaming media server cluster resource prediction method
CN110162272B (en) Memory computing cache management method and device
CN110865798B (en) Thread pool optimization method and system
CN115499513A (en) Data request processing method and device, computer equipment and storage medium
CN112269721B (en) Method, system, equipment and readable storage medium for performance data statistics
CN115203072A (en) File pre-reading cache allocation method and device based on access heat
CN114968073A (en) Data prefetching method, equipment and system
CN113434286A (en) Energy efficiency optimization method suitable for mobile application processor
CN113296934A (en) Method and device for scheduling process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination