CN117493400A - Data processing method and device and electronic equipment - Google Patents

Data processing method and device and electronic equipment Download PDF

Info

Publication number
CN117493400A
CN117493400A CN202410001639.5A CN202410001639A CN117493400A CN 117493400 A CN117493400 A CN 117493400A CN 202410001639 A CN202410001639 A CN 202410001639A CN 117493400 A CN117493400 A CN 117493400A
Authority
CN
China
Prior art keywords
data
warm
area
memory
hot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410001639.5A
Other languages
Chinese (zh)
Other versions
CN117493400B (en
Inventor
蒋忠强
王宝晗
时家幸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202410001639.5A priority Critical patent/CN117493400B/en
Publication of CN117493400A publication Critical patent/CN117493400A/en
Application granted granted Critical
Publication of CN117493400B publication Critical patent/CN117493400B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a data processing device and electronic equipment, and relates to the technical field of databases. The method comprises the following steps: dividing the memory space into a hot area and a warm area; based on the label information, distinguishing the data to obtain warm area data and hot area data; the tag information comprises an indication field used for indicating the area where the current data is located; acquiring the current used memory quantity of the memory space; based on the current amount of memory used, the hot zone data is eliminated and/or the warm zone data is evicted. According to the method, the memory space is divided into the layers, different elimination algorithms are adopted in the two spaces, the heat zone data are converted into temperature treatment and the temperature zone data are evicted according to the memory quantity, and the data conversion of the temperature zone is realized by modifying the tag information. Therefore, the problem of limitation in some scenes, such as the problem that cold data caused by periodic cyclic data access enter a non-obsolete area and hot data caused by sudden data access or other reasons is obsolete, is effectively solved.

Description

Data processing method and device and electronic equipment
Technical Field
The present disclosure relates to the field of database technologies, and in particular, to a data processing method, a data processing device, and an electronic device.
Background
The current conventional cache elimination algorithm includes three types, LRU (Least Recently Used, least recently used algorithm), LFU (Least Frequently Used, least frequently used algorithm) and Random (Random selection algorithm), respectively. The three elimination algorithms only consider a single factor, and in some special cases, erroneous judgment is easy to exist, so that the three elimination algorithms have limitations. For example: the LRU algorithm can make partial cold data enter a non-obsolete area for periodically accessed data; or for some burst accesses of cold data, resulting in hot data being obsolete. Therefore, the existing traditional cache elimination algorithm only considers a single factor, so that the problem of limitation in some scenes exists.
Disclosure of Invention
In view of this, the present application provides a data processing method, apparatus and electronic device, which is mainly aimed at alleviating the problem of limitations in some situations due to the fact that only a single factor is considered in the existing conventional cache elimination algorithm.
In a first aspect, the present application provides a data processing method, including:
dividing the memory space into a hot area and a warm area;
based on the label information, distinguishing the data to obtain warm area data and hot area data; the label information comprises an indication field used for indicating the area where the current data is located;
acquiring the current used memory quantity of the memory space;
and eliminating the hot zone data and/or expelling the warm zone data based on the current used memory amount.
Optionally, the eliminating the hot zone data and/or the evicting the warm zone data based on the current amount of memory used includes: comparing the current used memory amount with a first memory threshold; modifying an indication field of the hot zone data and eliminating the hot zone data to a warm zone under the condition that the current used memory quantity is larger than a first memory threshold value; under the condition that the current used memory amount is larger than a second memory threshold value, expelling the warm area data; wherein the second memory threshold is greater than the first memory threshold.
Optionally, the data processing method further comprises: under the condition that newly-increased data is received, storing the newly-increased data into a warm area; in the event that queried or updated warm zone data exists, the queried or updated warm zone data is added to the hot zone.
Optionally, in the case of receiving the new added data, storing the new added data in a warm area, including: creating tag information for the newly added data; and modifying the indication field in the tag information, and storing the newly added data into a warm area.
Optionally, the adding the queried or updated warm zone data to the hot zone if the queried or updated warm zone data exists includes: modifying the indication field of the queried or updated warm zone data, and adding the queried or updated warm zone data to a hot zone.
Optionally, the elimination of the hot zone data and/or the eviction of the warm zone data includes: and eliminating the hot zone data based on an LRU algorithm, and expelling the hot zone data based on an LSTM algorithm model.
Optionally, the expelling the warm region data based on the LSTM algorithm model includes: calculating the next access time interval of each key value in the temperature region data to the data by using an LSTM algorithm; and ordering the size of the next access time interval of the data according to all the key values, and expelling the data from the corresponding key value with the largest next access time interval.
In a second aspect, the present application provides a data processing apparatus comprising:
the dividing unit is configured to divide the memory space into a hot area and a warm area;
the processing unit is configured to distinguish the data based on the label information to obtain warm area data and hot area data; the label information comprises an indication field used for indicating the area where the current data is located;
an obtaining unit configured to obtain a current used memory amount of the memory space;
and the eviction unit is configured to eliminate the hot zone data and/or evict the warm zone data based on the current used memory amount.
In a third aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method of the first aspect.
In a fourth aspect, the present application provides an electronic device, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, where the processor implements the data processing method according to the first aspect when executing the computer program.
By means of the technical scheme, the data processing method, the data processing device and the electronic equipment provided by the application are characterized in that firstly, the memory space is divided into a hot area and a warm area, and based on label information, the data are distinguished to obtain warm area data and hot area data. The tag information includes an indication field for indicating an area where the current data is located. And further obtaining the current used memory quantity of the memory space, and based on the current used memory quantity, eliminating the hot zone data and/or expelling the warm zone data. According to the method, the memory space is divided into the layers, different elimination algorithms are adopted in the two spaces, the heat zone data are converted into temperature treatment and the temperature zone data are evicted according to the memory quantity, and the data conversion of the temperature zone is realized by modifying the tag information. The problem of limitations in some scenarios, in particular, the problem that cold data caused by periodic cyclic data access enters a non-obsolete area and hot data caused by bursty data access or other unknown reasons is obsolete, is effectively solved.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 shows a flow chart of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of another data processing method according to an embodiment of the present application;
FIG. 3 shows a block diagram of an LSTM-based cache elimination model provided by an embodiment of the application;
FIG. 4 shows a flowchart of an elimination algorithm based on the LSTM model according to an embodiment of the present application;
fig. 5 shows a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
In order that the above objects, features and advantages of the present application may be more clearly understood, a further description of the aspects of the present application will be provided below. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
The data processing method provided in this embodiment is applied to a data processing apparatus or electronic device, which may be installed or integrated in some server devices or systems, and may perform any of the data processing methods described below when running.
In order to alleviate the problem of limitations in some situations due to the fact that only a single factor is considered in the existing conventional cache elimination algorithm, the embodiment provides a data processing method, as shown in fig. 1, which includes:
s101, dividing the memory space into a hot area and a warm area.
In this embodiment, the memory space is first partitioned into a hot area and a warm area. The two data areas are used for storing data with different deleting priorities.
S102, distinguishing the data based on the label information to obtain warm area data and hot area data.
The tag information includes an indication field for indicating an area where the current data is located. The tag information may be a defined structure corresponding to the data, and in the redis database, the data usually appears in the form of key-value pairs (key-value), and the definition is a redis Object, and by adding a one-bit field in the structure, the area where the data is located is indicated. For example, setting this field to 0, then this indicates that the data exists in a warm zone; a setting of 1 is then denoted as hot zone. The parts in the existing bytes can also be multiplexed so that no extra memory is added.
The redis database is a high-performance key-value non-relational database, supports data persistence, can store data in a memory in a disk, and can be reloaded for use when restarting. The method can be used for relieving the pressure of the database, and the query memory is higher than the database query efficiency.
S103, obtaining the current used memory quantity of the memory space.
S104, based on the current memory usage, the hot zone data is eliminated and/or the warm zone data is evicted.
First, redis is the most widely used memory database at present, and plays a vital role in improving the whole storage system as a cache system, and frequently missed data of redis can cause cache avalanche, cache penetration and other column problems, so that the cache hit rate of redis needs to be improved.
Three traditional cache elimination algorithms, namely LRU, LFU and Random, are designed for improving the cache hit rate. LRU means selecting least recently used key value pairs from all keys for elimination within a certain period of time; LFU means selecting the least frequently used key value pair from all keys for elimination; random represents randomly selecting a key value for elimination. The elimination algorithm of redis selection only considers a single factor, and the algorithm is simple enough and has the advantage of better performance. However, these three elimination algorithms only consider a single factor, and in some special cases, there is a limitation in that erroneous judgment is easy to occur. For example: the LRU algorithm can make partial cold data enter a non-obsolete area for periodically accessed data; or for some burst accesses of cold data, resulting in hot data being obsolete. Therefore, the existing traditional cache elimination algorithm only considers a single factor, so that the problem of limitation in some scenes exists.
In the embodiment, the memory space is first divided into a hot area and a warm area, and based on the tag information, the data are distinguished to obtain warm area data and hot area data. The tag information includes an indication field for indicating an area where the current data is located. And further obtaining the current used memory quantity of the memory space, and based on the current used memory quantity, eliminating the hot zone data and/or expelling the warm zone data. In the embodiment, the memory space is divided into layers, different elimination algorithms are adopted in the two spaces, the heat zone data is converted into temperature according to the memory quantity, the temperature zone data is evicted, and the data conversion of the temperature zone is realized by modifying the label information. The problem of limitations in some scenarios, in particular, the problem that cold data caused by periodic cyclic data access enters a non-obsolete area and hot data caused by bursty data access or other unknown reasons is obsolete, is effectively solved.
Optionally, based on the current amount of memory used, the hot zone data is eliminated and/or the warm zone data is evicted, including: comparing the current used memory amount with a first memory threshold; under the condition that the current used memory quantity is larger than a first memory threshold value, modifying an indication field of the hot zone data, and eliminating the hot zone data to a warm zone; under the condition that the current used memory quantity is larger than a second memory threshold value, expelling the warm area data; wherein the second memory threshold is greater than the first memory threshold.
In the present embodiment, two different stages of data processing are provided. The first memory threshold may be a thermal transition obsolete memory threshold, and the second memory threshold may be an upper usage limit for memory space. Firstly, comparing the used memory quantity of the current memory space with a memory threshold value for thermal transition temperature elimination, and if the used memory quantity of the current memory space is larger than the memory threshold value for thermal transition temperature elimination, starting to eliminate the hot zone data to a warm zone. In addition, when the used memory amount of the current memory space is larger than the upper limit of the memory space, the temperature zone data is started to be evicted and cached. The two-stage method can prevent the hot data from being eliminated due to sudden data access or other unknown reasons.
Optionally, the data processing method further comprises: under the condition of receiving the newly added data, storing the newly added data into a warm area; in the event that queried or updated warm zone data is present, the queried or updated warm zone data is added to the hot zone.
In this embodiment, the damage to the buffer area by the periodically recurring data access can be prevented by controlling the temperature change of the newly added and changed data.
Optionally, in the case of receiving the new added data, storing the new added data in the warm zone, including: creating tag information for the newly added data; and modifying an indication field in the tag information, and storing the newly added data into a warm area.
In this embodiment, the warm area conversion is performed on the cache data, which is implemented by modifying the indication field corresponding to the tag information, so as to store the data. Specifically, for newly added data, the indication field is assigned to 0, which means that the newly added data is stored in a warm area, and the newly added data is not directly put in the warm area but is put in the warm area first. The destruction of the buffer area by the periodic circular data access can be prevented.
Optionally, in the presence of queried or updated warm zone data, adding the queried or updated warm zone data to the hot zone, comprising: the indication field of the queried or updated warm zone data is modified and the queried or updated warm zone data is added to the hot zone.
In this embodiment, for a query or update operation, its indication field is modified from 0 to 1, indicating entry into the hot zone. By controlling the change of the temperature of the modified data, the damage of the periodical cyclic data access to the cache area can be prevented.
Optionally, the elimination of the hot zone data and/or the eviction of the hot zone data includes: the hot zone data is eliminated based on the LRU algorithm, and the hot zone data is evicted based on the LSTM algorithm model.
In this embodiment, the hot data temperature conversion adopts a LRU algorithm based on random extraction with a sufficiently fast speed, and the temperature area data elimination is completed by adopting an LSTM (Long Short-term memory) model, and the model comprehensively considers the characteristics of kv data in redis on access time, access frequency, data characteristics and time sequence, so that the memory hit rate can be effectively improved by using the scheme.
LSTM is a special RNN network model, can solve the gradient disappearance problem in the long sequence training process, and better performs in the sequence problem compared with common RNN and CNN.
Optionally, evicting the warm region data based on the LSTM algorithm model includes: calculating the next access time interval of each key value in the warm area data to the data by utilizing an LSTM algorithm; and ordering the size of the next access time interval of the data according to all the key values, and expelling the data from the corresponding key value with the largest next access time interval.
In this embodiment, when the warm area data is evicted, the next access time interval of each key value pair data in the warm area data is first predicted by using the LSTM algorithm. And then the corresponding key value with the largest next access time interval is evicted from the data. The meaning of the largest next access time interval is that the probability of being accessed is the smallest, so that the corresponding kv data with the largest next access time interval is used as an object to be eliminated for eviction, and the data cache hit rate is higher.
Further, as shown in fig. 2, a flow chart of another data processing method according to an embodiment of the present application is shown. The method comprises the following steps:
s201, randomly selecting N KV pairs in a memory space.
KV pair data, which refers to key value pair data (KV data), is stored in the redis database. The random selection of N KV pairs mentioned here may take part of the data in the memory space for processing, or may process all the data.
S202, dividing N KV data areas into hot area data and warm area data according to label information.
The method comprises the steps of performing data heat transfer temperature treatment on a hot zone by adopting an LRU algorithm, and performing data elimination on the hot zone by adopting an LSTM.
Specifically, the tag information may be a defined structure corresponding to the data, and the tag information includes an indication field for indicating an area where the current data is located, and may distinguish KV data according to the indication field in the tag information, for example, if the indication field is set to 0, the indication field indicates that the data exists in a warm area; a setting of 1 is then denoted as hot zone. The parts in the existing bytes can also be multiplexed so that no extra memory is added.
S203, judging whether the current memory usage is greater than a first memory threshold. If yes, go to S2041; if not, the process advances to S2042.
And S2041, eliminating the hot zone data to a warm zone.
The first memory threshold may be a thermal transition obsolete memory threshold, comparing the amount of memory used in the current memory space with the thermal transition obsolete memory threshold, and if the amount of memory used in the current memory space is greater than the thermal transition obsolete memory threshold, starting to obsolete the hot zone data to the hot zone.
S2042, no processing is performed.
S205, judging whether the current memory usage is greater than a second memory threshold, if so, entering S2061; if not, the process advances to S2062.
The second memory threshold may be an upper usage limit of the memory space, and the eviction cache of the warm region data is started when the current memory space usage memory amount is greater than the upper usage limit of the memory space.
S2061, each key value in the temperature zone data is pre-measured for the next access time interval of the data, and the corresponding KV data with the largest next access time interval is evicted.
S2062, no processing is performed.
Optionally, under the condition that the newly added data is received, storing the newly added data into a warm zone; in the event that queried or updated warm zone data is present, the queried or updated warm zone data is added to the hot zone.
For the newly added data, the mark of the newly added data is firstly marked as hot and is assigned as 0, and the newly added data is represented as an entering temperature area. We do not put the data directly into the hot zone but first into the warm zone. For query or update operations, we will refer to modifying it from 0 to 1, indicating entry into the hot zone. By controlling the temperature change of the newly added and changed data, the damage of the periodical circular data access to the cache area can be prevented. In addition, the data elimination is carried out in a two-stage mode, firstly, the data of the hot zone is eliminated to the warm zone, and secondly, the data of the warm zone is truly eliminated. The two-stage method can prevent the hot data from being eliminated due to sudden data access or other unknown reasons.
In this embodiment, the data area is divided into a warm area and a hot area based on the tag information, the hot data temperature conversion adopts a random extraction-based LRU algorithm with a sufficiently high speed, the warm area data elimination is completed by adopting an LSTM model, and the model comprehensively considers the characteristics of kv data in redis on access time, access frequency, data characteristics and time sequence, so that the memory hit rate can be effectively improved by using the scheme.
Next, an LSTM model is described, where in the cache replacement policy, there is an optimal replacement algorithm (OPTimal replacement, OPT) that, on the premise of knowing the access sequence of the data at the future time, calculates the access interval from the next access to the current time of each piece of data in the current cache each time the data is eliminated, and eliminates the data with the largest access interval from the cache. This way, the data that will be accessed the latest in the future will be eliminated, so that the highest hit rate of the cache can be achieved. The data processing method proposed in this embodiment may be understood as a specific implementation of OPT. The hit rate is improved mainly by predicting the next access time by LSTM and then eliminating the data that will not be accessed at the latest.
Specifically, since the training includes time sequence data, we choose LSTM as the model body, and the model can well deal with the time sequence prediction problem. The model first uses CNN for input data characterization and then LSTM for prediction. The specific structure is shown in fig. 3.
Step one, initializing the feature, and the result is shown as input1 to input4 in fig. 3.
Timepre is represented by a 24-bit integer, timecur is represented by a 24-bit integer, (Timecur-Timepre) is represented by a 24-bit integer, freq is represented by an 8-bit integer, value_size is represented by a 24-bit integer, more than 16M is represented by 16M, value_type is represented by 4 bits, and value_encoding is represented by 4 bits.
And secondly, performing feature extraction by using CNN.
The CNN input data is shown in step one, and the part primarily extracts the features and outputs a feature vector of 100 bits.
And thirdly, inputting the characteristics into an LSTM network for time sequence training.
The LSTM is a special RNN network model, can solve the problem of gradient disappearance in the long sequence training process, has better performance in the sequence problem compared with common RNNs and CNNs, and accords with our requirements, so we use the LSTM as a candidate model.
Loss function training data label is next access time interval, prediction result is a real number, we use MAE as loss function.
Step four: the prediction result is output representing the next access time interval of KEY 1.
The elimination algorithm is shown in fig. 4, and first, the next access time interval corresponding to each KEY is calculated by using LSTM. And secondly, sequencing a batch of next access time intervals of keys needing to be eliminated. And finally, selecting the corresponding KEY with the largest value as an object to be eliminated, and expelling the corresponding KEY from the memory.
It should be noted that, in this embodiment, the time-series processing is performed according to all the accesses of KEY1, and the final training data is a two-dimensional structure, where the first dimension is collected according to the time access of the KEY as a sequence, the second dimension represents the fixed KEY access interval feature, the second dimension represents other KEY sequences accessed after the KEY, and the second dimension represents the correlation feature between different KEYs.
In this embodiment, an LSTM-based redis data elimination model is provided, and the model is used as a specific implementation of the OPT concept, comprehensively considers redis data access time characteristics, frequency specific lines, data characteristics and time characteristics, and finally predicts the time interval of the next access, so as to effectively improve the data cache hit rate. In addition, the redis data is divided into a warm area and a hot area through the tag area, so that the problems that cold data enters a non-obsolete area due to periodic cyclic data access and hot data is obsolete due to sudden data access or other unknown reasons are effectively solved under the condition that memory is not occupied.
Further, as a specific implementation of the method shown in fig. 1 to fig. 4, the present embodiment provides a data processing apparatus, as shown in fig. 5, including: a dividing unit 51, a processing unit 52, an acquiring unit 53, and an eviction unit 54.
A dividing unit 51 configured to divide the memory space into a hot zone and a warm zone;
a processing unit 52 configured to distinguish the data based on the tag information to obtain warm zone data and hot zone data; the label information comprises an indication field used for indicating the area where the current data is located;
an obtaining unit 53 configured to obtain a current used memory amount of the memory space;
an eviction unit 54 is configured to eliminate the hot zone data and/or evict the warm zone data based on the current amount of memory used.
In a specific application scenario, the eviction unit 54 is specifically configured to compare the currently used memory amount with a first memory threshold; modifying an indication field of the hot zone data and eliminating the hot zone data to a warm zone under the condition that the current used memory quantity is larger than a first memory threshold value; under the condition that the current used memory amount is larger than a second memory threshold value, expelling the warm area data; wherein the second memory threshold is greater than the first memory threshold.
In a specific application scenario, the processing unit 52 is specifically further configured to store the newly added data to the warm zone if the newly added data is received; in the event that queried or updated warm zone data exists, the queried or updated warm zone data is added to the hot zone.
In a specific application scenario, the processing unit 52 is specifically further configured to create tag information for the newly added data; and modifying the indication field in the tag information, and storing the newly added data into a warm area.
In a specific application scenario, the processing unit 52 is specifically further configured to modify the indication field of the queried or updated warm zone data and add the queried or updated warm zone data to the hot zone.
In a specific application scenario, the eviction unit 54 is specifically further configured to eliminate the hot zone data based on the LRU algorithm, and to evict the hot zone data based on the LSTM algorithm model.
In a specific application scenario, the eviction unit 54 is specifically further configured to calculate, using an LSTM algorithm, a next access time interval of each key value in the warm region data to the data; and ordering the size of the next access time interval of the data according to all the key values, and expelling the data from the corresponding key value with the largest next access time interval.
It should be noted that, for other corresponding descriptions of each functional unit related to the data processing apparatus provided in this embodiment, reference may be made to corresponding descriptions in fig. 1 to fig. 4, and no further description is given here.
Based on the above-described methods shown in fig. 1 to 4, correspondingly, the present embodiment further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the above-described methods shown in fig. 1 to 4.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the method of each implementation scenario of the present application.
Based on the method shown in fig. 1 to 4 and the virtual device embodiment shown in fig. 5, in order to achieve the above object, the embodiment of the present application further provides an electronic device, which may be configured on a computer side or the like, where the device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the method as described above and shown in fig. 1 to 4.
Based on the method shown in fig. 1 to 4 and the virtual device embodiment shown in fig. 5, in order to achieve the above object, the embodiment of the present application further provides a chip, including one or more interface circuits and one or more processors; the interface circuit is configured to receive a signal from a memory of an electronic device and to send the signal to the processor, the signal including computer instructions stored in the memory; the computer instructions, when executed by the processor, cause the electronic device to perform the methods described above and illustrated in fig. 1-4.
Optionally, the entity device may further include a user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
It will be appreciated by those skilled in the art that the above-described physical device structure provided in this embodiment is not limited to this physical device, and may include more or fewer components, or may combine certain components, or may be a different arrangement of components.
The storage medium may also include an operating system, a network communication module. The operating system is a program that manages the physical device hardware and software resources described above, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among all components in the storage medium and communication with other hardware and software in the information processing entity equipment.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. By applying the scheme of the embodiment, the memory space is divided into a hot zone and a warm zone, and the data are distinguished based on the label information to obtain warm zone data and hot zone data. The tag information includes an indication field for indicating an area where the current data is located. And further obtaining the current used memory quantity of the memory space, and based on the current used memory quantity, eliminating the hot zone data and/or expelling the warm zone data. In the embodiment, the memory space is divided into layers, different elimination algorithms are adopted in the two spaces, the heat zone data is converted into temperature according to the memory quantity, the temperature zone data is evicted, and the data conversion of the temperature zone is realized by modifying the label information. The problem of limitations in some scenarios, in particular, the problem that cold data caused by periodic cyclic data access enters a non-obsolete area and hot data caused by bursty data access or other unknown reasons is obsolete, is effectively solved.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data processing, comprising:
dividing the memory space into a hot area and a warm area;
based on the label information, distinguishing the data to obtain warm area data and hot area data; the label information comprises an indication field used for indicating the area where the current data is located;
acquiring the current used memory quantity of the memory space;
and eliminating the hot zone data and/or expelling the warm zone data based on the current used memory amount.
2. The method of claim 1, wherein the eliminating the hot zone data and/or the evicting the warm zone data based on the current amount of memory used comprises:
comparing the current used memory amount with a first memory threshold;
modifying an indication field of the hot zone data and eliminating the hot zone data to a warm zone under the condition that the current used memory quantity is larger than a first memory threshold value;
under the condition that the current used memory amount is larger than a second memory threshold value, expelling the warm area data;
wherein the second memory threshold is greater than the first memory threshold.
3. The method according to claim 1, wherein the method further comprises:
under the condition that newly-increased data is received, storing the newly-increased data into a warm area;
in the event that queried or updated warm zone data exists, the queried or updated warm zone data is added to the hot zone.
4. A method according to claim 3, wherein, in the event that new data is received, storing the new data to a warm zone comprises:
creating tag information for the newly added data;
and modifying the indication field in the tag information, and storing the newly added data into a warm area.
5. The method of claim 3, wherein the adding the queried or updated warm zone data to a hot zone in the presence of the queried or updated warm zone data comprises:
modifying the indication field of the queried or updated warm zone data, and adding the queried or updated warm zone data to a hot zone.
6. The method of any one of claims 1 to 5, wherein the elimination of the hot zone data and/or the eviction of the warm zone data comprises:
and eliminating the hot zone data based on an LRU algorithm, and expelling the hot zone data based on an LSTM algorithm model.
7. The method of claim 6, wherein the evicting the warm region data based on the LSTM algorithm model comprises:
calculating the next access time interval of each key value in the temperature region data to the data by using an LSTM algorithm;
and ordering the size of the next access time interval of the data according to all the key values, and expelling the data from the corresponding key value with the largest next access time interval.
8. A data processing apparatus, comprising:
the dividing unit is configured to divide the memory space into a hot area and a warm area;
the processing unit is configured to distinguish the data based on the label information to obtain warm area data and hot area data; the label information comprises an indication field used for indicating the area where the current data is located;
an obtaining unit configured to obtain a current used memory amount of the memory space;
and the eviction unit is configured to eliminate the hot zone data and/or evict the warm zone data based on the current used memory amount.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1 to 7.
10. An electronic device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the computer program.
CN202410001639.5A 2024-01-02 2024-01-02 Data processing method and device and electronic equipment Active CN117493400B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410001639.5A CN117493400B (en) 2024-01-02 2024-01-02 Data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410001639.5A CN117493400B (en) 2024-01-02 2024-01-02 Data processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN117493400A true CN117493400A (en) 2024-02-02
CN117493400B CN117493400B (en) 2024-04-09

Family

ID=89673060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410001639.5A Active CN117493400B (en) 2024-01-02 2024-01-02 Data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117493400B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763103A (en) * 2018-05-24 2018-11-06 郑州云海信息技术有限公司 A kind of EMS memory management process, device, system and computer readable storage medium
CN111159066A (en) * 2020-01-07 2020-05-15 杭州电子科技大学 Dynamically-adjusted cache data management and elimination method
CN111737170A (en) * 2020-05-28 2020-10-02 苏州浪潮智能科技有限公司 Cache data management method, system, terminal and storage medium
CN111966293A (en) * 2020-08-18 2020-11-20 北京明略昭辉科技有限公司 Cold and hot data analysis method and system
WO2022233272A1 (en) * 2021-05-06 2022-11-10 北京奥星贝斯科技有限公司 Method and apparatus for eliminating cache memory block, and electronic device
CN115563029A (en) * 2022-09-23 2023-01-03 建信金融科技有限责任公司 Caching method and device based on two-layer caching structure
WO2023029971A1 (en) * 2021-08-30 2023-03-09 阿里巴巴(中国)有限公司 Heterogeneous memory-based data migration method
CN116186085A (en) * 2023-01-31 2023-05-30 安徽大学 Key value storage system and method based on cache gradient cold and hot data layering mechanism
CN116578593A (en) * 2023-04-20 2023-08-11 深圳市联影高端医疗装备创新研究院 Data caching method, system, device, computer equipment and storage medium
CN116881252A (en) * 2023-07-07 2023-10-13 西安交通大学 Key value storage method and system based on LSM tree
CN117111834A (en) * 2023-06-02 2023-11-24 阿里巴巴(中国)有限公司 Memory and computing system including memory
CN117130792A (en) * 2023-10-26 2023-11-28 腾讯科技(深圳)有限公司 Processing method, device, equipment and storage medium for cache object

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763103A (en) * 2018-05-24 2018-11-06 郑州云海信息技术有限公司 A kind of EMS memory management process, device, system and computer readable storage medium
CN111159066A (en) * 2020-01-07 2020-05-15 杭州电子科技大学 Dynamically-adjusted cache data management and elimination method
CN111737170A (en) * 2020-05-28 2020-10-02 苏州浪潮智能科技有限公司 Cache data management method, system, terminal and storage medium
CN111966293A (en) * 2020-08-18 2020-11-20 北京明略昭辉科技有限公司 Cold and hot data analysis method and system
WO2022233272A1 (en) * 2021-05-06 2022-11-10 北京奥星贝斯科技有限公司 Method and apparatus for eliminating cache memory block, and electronic device
WO2023029971A1 (en) * 2021-08-30 2023-03-09 阿里巴巴(中国)有限公司 Heterogeneous memory-based data migration method
CN115563029A (en) * 2022-09-23 2023-01-03 建信金融科技有限责任公司 Caching method and device based on two-layer caching structure
CN116186085A (en) * 2023-01-31 2023-05-30 安徽大学 Key value storage system and method based on cache gradient cold and hot data layering mechanism
CN116578593A (en) * 2023-04-20 2023-08-11 深圳市联影高端医疗装备创新研究院 Data caching method, system, device, computer equipment and storage medium
CN117111834A (en) * 2023-06-02 2023-11-24 阿里巴巴(中国)有限公司 Memory and computing system including memory
CN116881252A (en) * 2023-07-07 2023-10-13 西安交通大学 Key value storage method and system based on LSM tree
CN117130792A (en) * 2023-10-26 2023-11-28 腾讯科技(深圳)有限公司 Processing method, device, equipment and storage medium for cache object

Also Published As

Publication number Publication date
CN117493400B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
US10509772B1 (en) Efficient locking of large data collections
CN108255958B (en) Data query method, device and storage medium
CN110502452B (en) Method and device for accessing mixed cache in electronic equipment
TWI684099B (en) Profiling cache replacement
US11762828B2 (en) Cuckoo filters and cuckoo hash tables with biasing, compression, and decoupled logical sparsity
US20180329823A1 (en) System and method for spatial memory streaming training
CN111143242B (en) Cache prefetching method and device
CN110909025A (en) Database query method, query device and terminal
CN109308191B (en) Branch prediction method and device
CN111274039B (en) Memory recycling method and device, storage medium and electronic equipment
US11113195B2 (en) Method, device and computer program product for cache-based index mapping and data access
US20230376420A1 (en) Method And Apparatus For a Page-Local Delta-Based Prefetcher
US20230011790A1 (en) Adaptive caching for hybrid columnar database with heterogeneous page sizes
CN114817651B (en) Data storage method, data query method, device and equipment
KR102476620B1 (en) Cache automatic control system
CN113094392A (en) Data caching method and device
CN117493400B (en) Data processing method and device and electronic equipment
CN114637700A (en) Address translation method for target virtual address, processor and electronic equipment
CN114253458B (en) Memory page fault exception processing method, device, equipment and storage medium
CN116594925A (en) Address translation system, processor, address translation method and electronic equipment
CN106649143B (en) Cache access method and device and electronic equipment
CN107967306B (en) Method for rapidly mining association blocks in storage system
CN115964395A (en) Data reading method and device and electronic equipment
CN113656330B (en) Method and device for determining access address
CN116185287A (en) Method and device for reducing read delay and solid state disk

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant