CN114817425B - Method, device and equipment for classifying cold and hot data and readable storage medium - Google Patents

Method, device and equipment for classifying cold and hot data and readable storage medium Download PDF

Info

Publication number
CN114817425B
CN114817425B CN202210740213.2A CN202210740213A CN114817425B CN 114817425 B CN114817425 B CN 114817425B CN 202210740213 A CN202210740213 A CN 202210740213A CN 114817425 B CN114817425 B CN 114817425B
Authority
CN
China
Prior art keywords
data
sub
database
parameter
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210740213.2A
Other languages
Chinese (zh)
Other versions
CN114817425A (en
Inventor
李小军
杨柳
吴壮壮
张学刚
任双宏
刘恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Jiaoda Big Data Technology Co ltd
Southwest Jiaotong University
China Railway Jinan Engineering Design Institute Co Ltd
Original Assignee
Chengdu Jiaoda Big Data Technology Co ltd
Southwest Jiaotong University
China Railway Jinan Engineering Design Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Jiaoda Big Data Technology Co ltd, Southwest Jiaotong University, China Railway Jinan Engineering Design Institute Co Ltd filed Critical Chengdu Jiaoda Big Data Technology Co ltd
Priority to CN202210740213.2A priority Critical patent/CN114817425B/en
Publication of CN114817425A publication Critical patent/CN114817425A/en
Application granted granted Critical
Publication of CN114817425B publication Critical patent/CN114817425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a device and equipment for classifying cold and hot data and a readable storage medium, and relates to the technical field of data storage. According to the method, the temperature of the warehoused data at the current moment is calculated through statistics of the access characteristics of the warehoused data and the value of the warehoused data, the temperature of the warehoused data at the current moment is calculated through a cold and hot data classification model based on the improved Newton's cooling law of the data value, so that the warehoused data can be divided according to the temperature value of the warehoused data at any moment, the warehoused data is divided into hot data, warm data and cold data in real time, a solid foundation is provided for layered storage of the data based on the cold and hot characteristics, the data with high data value and high access frequency can be placed in high-speed storage equipment, the data with low data value and low access frequency can be placed in low-speed storage equipment, and the storage cost of mass data in a data management terminal can be effectively reduced through the method.

Description

Method, device and equipment for classifying cold and hot data and readable storage medium
Technical Field
The invention relates to the technical field of data storage, in particular to a method, a device, equipment and a readable storage medium for cold and hot data classification.
Background
With long-term data accumulation in a traffic structure monitoring scene, the conventional solution is to continuously expand the storage capacity of a machine or increase a higher-performance storage device, but this method is accompanied by serious waste of storage resources and increase of energy consumption. Aiming at the current situation, a plurality of experts and scholars divide and monitor the cold and hot characteristics of data according to the access frequency and store the data in a classified manner based on the cold and hot characteristics of the data. However, most of the existing cold and hot data determination model F considers the access characteristics of data more, calculates the temperature of the data according to the access characteristics of the data, and does not consider the value of the data.
Disclosure of Invention
The present invention is directed to a method, an apparatus, a device and a readable storage medium for hot and cold data classification, so as to improve the above problems. In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
in a first aspect, the present application provides a method for hot and cold data classification, comprising:
acquiring at least two data stream information, wherein each data stream information comprises warehousing data, a temperature parameter corresponding to the warehousing data and a first moment.
Judging whether the warehousing data is accessed, if so, acquiring a second moment and the temperature parameter, data value increment, data temperature increment and data value weight corresponding to the warehousing data at the first moment; updating the temperature parameter corresponding to the warehousing data at the second moment according to the temperature parameter, the data value increment, the data temperature increment and the data value weight; and the second moment is a time parameter when the warehousing data is accessed.
And classifying the cold and hot data of the data flow information based on the updated temperature parameter corresponding to the warehousing data.
In a second aspect, the present application further provides a device for cold and hot data classification, including an obtaining module, a determining module and a classifying module, wherein:
an acquisition module: the method comprises the steps of obtaining at least two data stream information, wherein each data stream information comprises warehousing data, a temperature parameter corresponding to the warehousing data and a first moment.
A judgment module: the system comprises a storage unit, a temperature parameter acquisition unit, a data value increment acquisition unit, a data temperature increment acquisition unit and a data value weight acquisition unit, wherein the storage unit is used for judging whether the storage data is accessed, and if the storage data is accessed, the temperature parameter, the data value increment, the data temperature increment and the data value weight corresponding to the storage data at the first moment are acquired at a second moment; updating the temperature parameter corresponding to the warehoused data at the second moment according to the temperature parameter, the data value increment, the data temperature increment and the data value weight; and the second moment is a time parameter when the warehouse entry data is accessed.
A classification module: and the system is used for classifying the cold and hot data of the data flow information based on the updated temperature parameter corresponding to the warehousing data.
In a third aspect, the present application further provides a device for cold and hot data classification, including:
a memory for storing a computer program;
a processor for implementing the steps of the method for hot and cold data classification when executing the computer program.
In a fourth aspect, the present application further provides a readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method for hot and cold data classification.
The invention has the beneficial effects that:
according to the method, the temperature of the warehouse data at the current moment is calculated by counting access characteristics (such as access time, access frequency and the like) of the warehouse data and the value of the warehouse data, based on a cold and hot data classification model of an improved Newton cooling law of data value, so that the warehouse data can be divided according to the temperature value (namely activity degree) of the warehouse data at any moment, the warehouse data is divided into hot data, warm data and cold data in real time, a solid foundation is provided for layered storage of the data based on the cold and hot characteristics, the data with high data value and high access frequency can be placed in high-speed storage equipment, the data with low data value and low access frequency can be placed in low-speed storage equipment, and the storage cost of mass data in a data management terminal can be effectively reduced through the method.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart illustrating a method for hot and cold data classification according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a device for hot and cold data classification according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a device for classifying cold and hot data according to an embodiment of the present invention.
In the figure: 700-means for cold-hot data classification; 710. an acquisition module; 711. a first processing unit; 712. a second processing unit; 7121. a first subunit; 7122. a second subunit; 7123. a third subunit; 7124. a fourth subunit; 720. a judgment module; 721. acquiring a subunit; 722. a first calculation unit; 723. a second calculation unit; 724. a third calculation unit; 730. a classification module; 800. equipment for cold and hot data classification; 801. a processor; 802. a memory; 803. a multimedia component; 804. an I/O interface; 805. a communication component.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
First, an application scenario to which the present application is applicable and a system architecture in the application scenario are described. The present application can be applied to a scenario where a data management terminal stores data based on the cold and hot attributes of the data, and the present application will be described in detail by taking monitoring data of a traffic structure as an example.
Example 1:
referring to fig. 1, fig. 1 is a schematic flow chart illustrating a method for classifying hot and cold data according to the present embodiment. The application provides a method for classifying cold and hot data, comprising steps S1, S2 and S3, wherein:
step S1, obtaining at least two pieces of data flow information, where each piece of data flow information includes warehousing data, a temperature parameter corresponding to the warehousing data, and a first time.
It can be understood that, in this step, the data management terminal acquires the warehousing data in real time through tools such as a sensor and a monitoring device, and the temperature parameter corresponding to the warehousing data and the first time constitute data stream information, the initial value of the first time is the time from the inspection of the data management terminal to the warehousing data, the warehousing data is the experimental data detected by the sensor or the monitoring device, and the temperature parameter when the warehousing data is just transmitted to the data management terminal is the user-defined temperature input by the client.
In another exemplary embodiment, step S1 further includes step S101 and step S102, wherein:
step S101, respectively judging whether each data stream information is in a preset classification condition, if so, obtaining at least one sub-database, wherein the sub-database is a data set which is subjected to clustering processing and division on the data stream information and has the same attribute; the classification condition is information for dividing data with the same attribute.
It is understood that, in this step, the data flow information is classified according to the classification condition and the binned data included in the data flow information. The classification condition in this embodiment is type information of the data to be stored, and the classification condition is assumed to be two types of information, namely cable force and wind direction, in a traffic structure scene, and the data flow information is divided into two sub-databases according to two classification conditions in the classification condition.
Step S102, judging whether missing data exists in each sub-database or not based on the first moment, if not, respectively rejecting each sub-database, and updating each sub-database to be a rejected sub-database; and if the missing data exists, completing each sub-database, and updating the sub-databases into the completed sub-databases.
It can be understood that, in this step, whether the two types of data, i.e., the cable force and the wind direction, have a certain period of time or lack of time is respectively judged according to the first time based on the time sequence, and if so, the sub-database is complemented and the sub-database is updated; the data completion method can be used for filling by using numbers such as zero, average value, missing point adjacent value and the like, and can also be used for filling by adopting algorithms such as linear interpolation, Lagrange interpolation and the like. If the defect does not exist, the abnormal value in the sub-database is removed and the sub-database is updated, and the abnormal value can be removed from the sub-database by setting the upper limit and the lower limit of a removal threshold value.
In order to prevent the abnormal monitoring data from interfering with the prediction of the subsequent time series data, it is important to select a correct abnormal data processing method according to the specific situation of the data set tested in real time. Thus, the above step S102 includes step S1021, step S1022, step S1023, and step S1024, in which:
step S1021, a subset is obtained, wherein the subset is a set of at least ten continuous data stream information intercepted from the sub-database based on the first time.
It is understood that, in this step, at least ten consecutive data stream information are selected from the wind direction sub-database based on the time series to form a sub-set. The cable force sub-database intercepts the subset in the manner described above.
Step S1022, calculating rejection conditions: respectively calculating the arithmetic mean value and the standard deviation value of the warehousing data in the subset based on the subset; and calculating according to the arithmetic mean and the standard deviation value to obtain a first condition.
It is understood that, in this step, the first condition is obtained by calculating the arithmetic mean and the standard deviation value according to the arithmetic mean and the standard deviation value of all the binned data in the subset, respectively. Assuming that, in this embodiment, the abnormal data is eliminated by calculating the mean and the standard deviation based on the law of raydea, the first condition is obtained by calculation as shown in equation (1):
Figure DEST_PATH_IMAGE001
(1)
wherein:
Figure 455334DEST_PATH_IMAGE002
is the arithmetic mean of the binned data,
Figure DEST_PATH_IMAGE003
in order to put the data in storage,
Figure 434791DEST_PATH_IMAGE004
is the standard deviation value of the data put in storage.
And S1023, removing the sub-database according to the first condition to obtain the removed sub-database.
It can be understood that, in this step, the warehousing data exceeding the range of the first condition is abnormal data, and the abnormal data is marked and then removed to obtain the removed sub-database.
Step S1024, judging whether the removed sub-database is in normal distribution or not, and if the removed sub-database is not in normal distribution, calculating the removal condition again according to the removed sub-database until the removed sub-database is in normal distribution.
It can be understood that, in this step, it is determined whether the remaining data obey normal distribution based on the time series according to the removed sub-database, if not, the removal condition is recalculated according to the remaining warehousing data, and the removal is performed according to the updated removal condition until the removed sub-database obeys normal distribution.
Due to various uncertain factors, such as disconnection of monitoring equipment, failure of a sensor, interruption of network transmission, replacement of the monitoring equipment and the like, missing data conditions of different degrees exist in monitoring time sequence data, and the integrity of the monitoring data and the accuracy of evaluation of a monitoring object are further influenced. Thus, step S102 further includes step S1025, step S1026, step S1027, and step S1028, wherein:
and S1025, acquiring the missing section information in the sub database based on the first moment.
It is understood that, in this step, those places where there is data missing are obtained based on the time series.
And step S1026, acquiring data parameters of the head end and the tail end of the missing segment information according to the missing segment information, wherein the data parameters comprise the warehousing data and the first moment.
It can be understood that, in this step, according to the missing segment information, the warehousing data at the head end and the tail end of the missing segment information and the first time corresponding to the warehousing data are respectively obtained.
And step S1027, calculating to obtain interpolation coefficients according to the data parameters at the head end and the tail end of the missing segment information.
It can be understood that, in this step, the data parameters at the beginning and end of the missing segment information are respectively set as (a) 1 、b 1 ) And (a) 2 、b 2 ) And calculating an interpolation coefficient according to a formula (2) according to the two data parameters, wherein the formula (2) is as follows:
Figure DEST_PATH_IMAGE005
(2)
wherein:
Figure 214528DEST_PATH_IMAGE006
is an interpolation coefficient; a is a first time corresponding to any point in the middle of the missing segment information, a 1 A first time corresponding to the head end of the missing segment information 2 The first time corresponding to the tail end of the missing segment information.
Step S1028, filling the missing segment information according to the data parameters and the interpolation coefficients.
It is understood that, in this step, the data parameter and the interpolation coefficient fill the missing segment information according to formula (3), where formula (3) is as follows:
Figure DEST_PATH_IMAGE007
(3)
wherein: y is the warehousing data corresponding to any first moment in the middle of the missing section information,
Figure 672055DEST_PATH_IMAGE006
as interpolation coefficient, y 1 The first time corresponding to the first end of the missing section information is the corresponding input data, y 2 And the data is the corresponding input data at the first moment of the tail end of the missing section information.
Step S2, judging whether the warehousing data is accessed, if the warehousing data is accessed, acquiring a second moment and the temperature parameter, the data value increment, the data temperature increment and the data value weight corresponding to the warehousing data at the first moment; updating the temperature parameter corresponding to the warehousing data at the second moment according to the temperature parameter, the data value increment, the data temperature increment and the data value weight; and the second moment is a time parameter when the warehousing data is accessed.
It can be understood that, in this step, while transmitting data to the data management terminal, the client performs custom input of the temperature parameter, the data value increment, the data temperature increment, and the data value weight, when the warehoused data is accessed, the custom parameter corresponding to the warehoused data is found, an updated temperature parameter is calculated according to the custom parameter, and the updated temperature parameter corresponds to the second moment when the warehoused data is accessed. The access to the warehousing data comprises operations of modification, query and the like.
In this embodiment, step S2 further includes step S201, step S202, and step S203, where:
step S201, calculating according to the data value increment, the data temperature increment and the data value weight to obtain a data temperature change value.
It is understood that, in this step, the data temperature change value is calculated according to equation (4), where equation (4) is as follows:
Figure 486427DEST_PATH_IMAGE008
(4)
wherein: t is a unit of Z The data temperature change value is obtained;
Figure DEST_PATH_IMAGE009
is a data value weight; r is a data value increment and is a custom constant;
Figure 308889DEST_PATH_IMAGE010
the data temperature increment is a custom constant.
And S202, calculating according to the temperature parameter, the data temperature change value and the Newton' S attenuation coefficient to obtain the updated temperature parameter.
It is understood that, in this step, the temperature parameter of the warehousing data is updated according to the formula (5), and the formula (5) is as follows:
Figure DEST_PATH_IMAGE011
(5)
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE013
for warehousing data at t n Corresponding temperature parameters when accessed at any moment;
Figure DEST_PATH_IMAGE015
t before being accessed for the binned data n-1 Temperature parameters corresponding to the moment; e is a natural constant; c is Newton's attenuation coefficient; t is Z Is the data temperature variation value.
Step S203, based on the second time, establishing a mapping relation between the warehousing data, the updated temperature parameter and the second time.
It can be understood that, in this step, a mapping relationship between the second time, the warehousing data and the updated temperature parameter is established according to the time when the warehousing data is accessed and the updated temperature parameter, so that the corresponding temperature parameter is updated iteratively in the later period when the warehousing data is accessed again.
The method for determining the value weight in the above steps further includes step S2011, step S2012, step S2013 and step S2014, where:
step S2011, a first parameter, a second parameter and research set information are obtained, wherein the first parameter is the number of users accessing the sub-database; the second parameter is an evaluation value of the application value of the data corresponding to the sub-database; the research set information is a data set formed by the sub-databases with different attributes.
It can be understood that, in this step, if the number of users accessing a piece of data is larger, the value of the data is higher; meanwhile, if one data is accessed by a plurality of users, its change and its access performance affect more users, and thus it is necessary to obtain the first parameter from the access record of the data management terminal. And acquiring a second parameter of the sub-database input by the client from the data management terminal, wherein the parameter is determined by the user through artificial evaluation on the use condition of the value. And according to different research problems, the sub-database related to the research problems is called from the data management terminal to form research set information.
Step S2012, calculating according to the first parameter and the second parameter to obtain a first data value, where the first data value is a sum of the first parameter and the second parameter corresponding to the sub-database.
Step S2013, calculating according to the first data value and the research set information to obtain a second data value, wherein the second data value is the sum of the first data values of all the sub-databases in the research set information.
Step S2014, calculating according to the first data value and the second data value, and obtaining a data value weight corresponding to the sub-database.
It is understood that, in this step, the data value weight corresponding to the sub-database is determined according to formula (6), where formula (6) is as follows:
Figure 638239DEST_PATH_IMAGE016
(6)
wherein:
Figure DEST_PATH_IMAGE017
the data value weight corresponding to the i sub database;
Figure 837140DEST_PATH_IMAGE018
a first data value for the i sub-database; i is the type of the database-entering data corresponding to the sub-database, and n is the total number of the sub-databases. The data value weight of each warehouse data can be determined again based on the difference of key factors in each research problem through the formula (4), the actual condition of the research is better met, and the accuracy of the final result is improved.
And step S3, classifying the cold and hot data of the data flow information based on the updated temperature parameter corresponding to the warehousing data.
It can be understood that, in this step, based on the updated temperature parameter corresponding to the warehousing data, the cold-hot data is classified for the data flow information, the second time is updated to the first time after the classification is completed, and the updated data flow information composed of the updated first time, the warehousing data and the updated temperature parameter is stored in the data management terminal.
Example 2:
referring to fig. 2, fig. 2 is a schematic structural diagram of a cold and hot data classification apparatus 700 according to the embodiment, the cold and hot data classification apparatus 700 includes an obtaining module 710, a determining module 720 and a classifying module 730, wherein:
the obtaining module 710: the method comprises the steps of obtaining at least two data stream information, wherein each data stream information comprises warehousing data, a temperature parameter corresponding to the warehousing data and a first moment.
Preferably, the obtaining module 710 further includes a first processing unit 711 and a second processing unit 712, where:
the first processing unit 711: the data flow information processing device is used for respectively judging whether each piece of data flow information is in a preset classification condition, if so, at least one sub-database is obtained, and the sub-databases are data sets with the same attribute after the data flow information is clustered and divided; the classification condition is information for dividing data with the same attribute.
The second processing unit 712: the database management system is used for judging whether missing data exist in each sub-database or not based on the first moment, if not, each sub-database is removed, and each sub-database is updated to be the removed sub-database; and if the data is missing, completing each sub-database, and updating the sub-databases to be the completed sub-databases.
Preferably, the second processing unit 712 includes a first sub-unit 7121, a second sub-unit 7122, a third sub-unit 7123 and a fourth sub-unit 7124, wherein:
first subunit 7121: and the database management module is used for acquiring the missing segment information in the sub database based on the first time.
Second subunit 7122: and the data parameter acquisition unit is used for acquiring data parameters of the head end and the tail end of the missing segment information according to the missing segment information, wherein the data parameters comprise the warehousing data and the first moment.
Third subunit 7123: and the interpolation coefficient is obtained by calculating the data parameters at the head end and the tail end of the missing segment information.
Fourth subunit 7124: and the interpolation coefficient is used for filling the missing segment information according to the data parameters and the interpolation coefficient.
The judging module 720: the system comprises a storage unit, a temperature parameter acquisition unit, a data value increment acquisition unit, a data temperature increment acquisition unit and a data value weight acquisition unit, wherein the storage unit is used for judging whether the storage data is accessed, and if the storage data is accessed, the temperature parameter, the data value increment, the data temperature increment and the data value weight corresponding to the storage data at the first moment are acquired at a second moment; updating the temperature parameter corresponding to the warehousing data at the second moment according to the temperature parameter, the data value increment, the data temperature increment and the data value weight; and the second moment is a time parameter when the warehousing data is accessed.
Preferably, the above-mentioned determining module 720 further includes an obtaining sub-unit 721, a first calculating unit 722, a second calculating unit 723 and a third calculating unit 724, wherein:
acquisition subunit 721: the system comprises a database, a database management module and a database management module, wherein the database management module is used for acquiring a first parameter, a second parameter and research set information, and the first parameter is the number of users accessing the sub-database; the second parameter is an evaluation value of the application value of the data corresponding to the sub database; the research set information is a data set formed by the sub databases with different attributes;
the first calculation unit 722: the database management system is used for calculating according to the first parameter and the second parameter to obtain a first data value, wherein the first data value is the sum of the first parameter and the second parameter corresponding to the sub-database;
the second calculation unit 723: the database management system is used for calculating according to the first data value and the research set information to obtain a second data value, wherein the second data value is the sum of the first data values of all the sub-databases in the research set information;
the third calculation unit 724: and the data value weight corresponding to the sub-database is obtained by calculating according to the first data value and the second data value.
The classification module 730: and the system is used for classifying the cold and hot data of the data flow information based on the updated temperature parameter corresponding to the warehousing data.
It should be noted that, regarding the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.
Example 3:
corresponding to the above method embodiments, a device 800 for classifying hot and cold data is also provided in this embodiment, and a device 800 for classifying hot and cold data described below and a method for classifying hot and cold data described above may be referred to in correspondence.
Fig. 3 is a block diagram illustrating an apparatus 800 for hot and cold data classification, according to an example embodiment. As shown in fig. 3, the apparatus 800 for hot and cold data classification may include: a processor 801, a memory 802. The device 800 for hot and cold data classification may further include one or more of a multimedia component 803, an I/O interface 804, and a communication component 805.
The processor 801 is configured to control the overall operation of the hot and cold data classifying device 800, so as to complete all or part of the steps of the hot and cold data classifying method. The memory 802 is used to store various types of data to support the operation of the device 800 for hot and cold data classification, which may include, for example, instructions for any application or method operating on the device 800 for hot and cold data classification, as well as application-related data, such as contact data, messages sent or received, pictures, audio, video, and the like. The Memory 802 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 803 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 802 or transmitted through the communication component 805. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the device 800 for hot and cold data classification and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 805 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the Device 800 for hot and cold data classification may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components for performing the above method for hot and cold data classification.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described method of hot and cold data classification is also provided. For example, the computer readable storage medium may be the memory 802 described above comprising program instructions that are executable by the processor 801 of the hot and cold data sorting apparatus 800 to perform the method of hot and cold data sorting described above.
Example 4:
corresponding to the above method embodiment, a readable storage medium is also provided in this embodiment, and a readable storage medium described below and a method for hot and cold data classification described above may be referred to in correspondence.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of hot and cold data classification of the above-mentioned method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for cold-hot data classification, comprising:
acquiring at least two data stream information, wherein each data stream information comprises warehousing data, a temperature parameter corresponding to the warehousing data and a first moment;
judging whether the warehousing data is accessed, if so, acquiring a second moment and the temperature parameter, data value increment, data temperature increment and data value weight corresponding to the warehousing data at the first moment; updating the temperature parameter corresponding to the warehousing data at the second moment according to the temperature parameter, the data value increment, the data temperature increment and the data value weight; the second moment is a time parameter when the warehouse entry data is accessed;
and classifying the cold and hot data of the data stream information based on the updated temperature parameter corresponding to the warehousing data.
2. The method for cold-hot data classification according to claim 1, wherein all the warehoused data need to be preprocessed before determining whether the warehoused data is accessed, and the preprocessing includes:
respectively judging whether each data stream information is in a preset classification condition, if so, obtaining at least one sub-database, wherein the sub-database is a data set with the same attribute after the data stream information is clustered and divided; the classification condition is information for dividing data with the same attribute;
judging whether each sub-database has missing data or not based on the first moment, if not, respectively removing each sub-database, and updating each sub-database to be the removed sub-database; and if the missing data exists, completing each sub-database, and updating the sub-databases into the completed sub-databases.
3. The method of claim 2, wherein complementing the sub-database comprises:
acquiring missing segment information in the sub database based on the first moment;
acquiring data parameters of the head end and the tail end of the missing segment information according to the missing segment information, wherein the data parameters comprise the warehousing data and the first moment;
calculating to obtain interpolation coefficients according to the data parameters at the head end and the tail end of the missing segment information;
and filling the missing section information according to the data parameters and the interpolation coefficient.
4. A method of cold-hot data classification according to claim 2, wherein the method of determining the data value weight comprises:
acquiring a first parameter, a second parameter and research set information, wherein the first parameter is the number of users accessing the sub-database; the second parameter is an evaluation value of the application value of the data corresponding to the sub database; the research set information is a data set formed by the sub databases with different attributes;
calculating according to the first parameter and the second parameter to obtain a first data value, wherein the first data value is the sum of the first parameter and the second parameter corresponding to the sub-database;
calculating according to the first data value and the research set information to obtain a second data value, wherein the second data value is the sum of the first data values of all the sub-databases in the research set information;
and calculating according to the first data value and the second data value to obtain a data value weight corresponding to the sub-database.
5. A device for cold-hot data classification, comprising:
an acquisition module: the system comprises a database, a database server and a database server, wherein the database server is used for acquiring at least two data stream information, and each data stream information comprises warehouse-in data, a temperature parameter corresponding to the warehouse-in data and a first moment;
a judging module: the system comprises a storage unit, a temperature parameter acquisition unit, a data value increment acquisition unit, a data temperature increment acquisition unit and a data value weight acquisition unit, wherein the storage unit is used for judging whether the storage data is accessed, and if the storage data is accessed, the temperature parameter, the data value increment, the data temperature increment and the data value weight corresponding to the storage data at the first moment are acquired at a second moment; updating the temperature parameter corresponding to the warehousing data at the second moment according to the temperature parameter, the data value increment, the data temperature increment and the data value weight; the second moment is a time parameter when the warehousing data is accessed;
a classification module: and the system is used for classifying the cold and hot data of the data flow information based on the updated temperature parameter corresponding to the warehousing data.
6. The device for cold-hot data classification according to claim 5, wherein the obtaining module further comprises:
a first processing unit: the data flow information processing device is used for respectively judging whether each piece of data flow information is in a preset classification condition, if so, at least one sub-database is obtained, and the sub-databases are data sets with the same attribute after the data flow information is clustered and divided; the classification condition is information for dividing data with the same attribute;
a second processing unit: the database management system is used for judging whether missing data exist in each sub-database or not based on the first moment, if not, each sub-database is removed, and each sub-database is updated to be the removed sub-database; and if the missing data exists, completing each sub-database, and updating the sub-databases into the completed sub-databases.
7. A device for cold and hot data classification as claimed in claim 6, wherein the second processing unit comprises:
a first subunit: the database management system is used for acquiring the missing segment information in the sub database based on the first moment;
a second subunit: the data processing device is used for acquiring data parameters of the head end and the tail end of the missing segment information according to the missing segment information, wherein the data parameters comprise the warehousing data and the first moment;
a third subunit: the interpolation coefficient is obtained by calculation according to the data parameters at the head end and the tail end of the missing segment information;
a fourth subunit: and the interpolation coefficient is used for filling the missing segment information according to the data parameters and the interpolation coefficient.
8. The device for cold-hot data classification according to claim 6, wherein the determining module further comprises:
an acquisition subunit: the system comprises a database, a database management module and a database management module, wherein the database management module is used for acquiring a first parameter, a second parameter and research set information, and the first parameter is the number of users accessing the sub-database; the second parameter is an evaluation value of the application value of the data corresponding to the sub database; the research set information is a data set formed by the sub databases with different attributes;
the first calculation unit: the database management system is used for calculating according to the first parameter and the second parameter to obtain a first data value, wherein the first data value is the sum of the first parameter and the second parameter corresponding to the sub-database;
a second calculation unit: the database management system is used for calculating according to the first data value and the research set information to obtain a second data value, wherein the second data value is the sum of the first data values of all the sub-databases in the research set information;
a third calculation unit: and the data value weight corresponding to the sub-database is obtained by calculating according to the first data value and the second data value.
9. An apparatus for hot and cold data classification, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method of hot and cold data classification as claimed in any one of claims 1 to 4 when said computer program is executed.
10. A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of cold-hot data classification according to any one of claims 1 to 4.
CN202210740213.2A 2022-06-28 2022-06-28 Method, device and equipment for classifying cold and hot data and readable storage medium Active CN114817425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210740213.2A CN114817425B (en) 2022-06-28 2022-06-28 Method, device and equipment for classifying cold and hot data and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210740213.2A CN114817425B (en) 2022-06-28 2022-06-28 Method, device and equipment for classifying cold and hot data and readable storage medium

Publications (2)

Publication Number Publication Date
CN114817425A CN114817425A (en) 2022-07-29
CN114817425B true CN114817425B (en) 2022-09-02

Family

ID=82522454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210740213.2A Active CN114817425B (en) 2022-06-28 2022-06-28 Method, device and equipment for classifying cold and hot data and readable storage medium

Country Status (1)

Country Link
CN (1) CN114817425B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115827653B (en) * 2022-11-25 2023-09-05 深圳计算科学研究院 Pure column type updating method and device for HTAP and mass data
CN116204138B (en) * 2023-05-05 2023-07-07 成都三合力通科技有限公司 Efficient storage system and method based on hierarchical storage
CN117076523B (en) * 2023-10-13 2024-02-09 华能资本服务有限公司 Local data time sequence storage method

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731794A (en) * 2013-12-19 2015-06-24 北京华易互动科技有限公司 Cold-hot data fragmenting, mining and storing method
US9274941B1 (en) * 2009-06-30 2016-03-01 Emc Corporation Facilitating data migration between tiers
CN107273409A (en) * 2017-05-03 2017-10-20 广州赫炎大数据科技有限公司 A kind of network data acquisition, storage and processing method and system
CN107315540A (en) * 2017-06-13 2017-11-03 深圳神州数码云科数据技术有限公司 A kind of AUTOMATIC ZONING storage method and system
CN112699142A (en) * 2020-12-29 2021-04-23 平安普惠企业管理有限公司 Cold and hot data processing method and device, electronic equipment and storage medium
DE102020117890A1 (en) * 2019-10-28 2021-04-29 Samsung Electronics Co., Ltd. Storage device, method for operating the storage device, and computer system with storage device
CN112819775A (en) * 2021-01-28 2021-05-18 中国空气动力研究与发展中心超高速空气动力研究所 Segmentation and reinforcement method for damage detection image of aerospace composite material
CN112948398A (en) * 2021-04-29 2021-06-11 电子科技大学 Hierarchical storage system and method for cold and hot data
CN112988884A (en) * 2019-12-17 2021-06-18 中国移动通信集团陕西有限公司 Big data platform data storage method and device
CN113021818A (en) * 2021-03-25 2021-06-25 弘丰塑胶制品(深圳)有限公司 Control system of injection mold with automatic stripping function
CN113190585A (en) * 2021-04-12 2021-07-30 郑州轻工业大学 Big data acquisition and analysis system for clothing design
CN113515497A (en) * 2020-04-09 2021-10-19 奇安信安全技术(珠海)有限公司 Database data processing method, device and system
CN113886587A (en) * 2021-10-09 2022-01-04 杭州凡闻科技有限公司 Data classification method based on deep learning and map building method
CN114186002A (en) * 2021-12-14 2022-03-15 智博天宫(苏州)人工智能产业研究院有限公司 Scientific and technological achievement data processing and analyzing method and system
CN114266492A (en) * 2021-12-27 2022-04-01 天元大数据信用管理有限公司 Enterprise financing fund matching method based on data mining

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101741509B1 (en) * 2015-07-01 2017-06-15 지속가능발전소 주식회사 Device and method for analyzing corporate reputation by data mining of news, recording medium for performing the method

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9274941B1 (en) * 2009-06-30 2016-03-01 Emc Corporation Facilitating data migration between tiers
CN104731794A (en) * 2013-12-19 2015-06-24 北京华易互动科技有限公司 Cold-hot data fragmenting, mining and storing method
CN107273409A (en) * 2017-05-03 2017-10-20 广州赫炎大数据科技有限公司 A kind of network data acquisition, storage and processing method and system
CN107315540A (en) * 2017-06-13 2017-11-03 深圳神州数码云科数据技术有限公司 A kind of AUTOMATIC ZONING storage method and system
DE102020117890A1 (en) * 2019-10-28 2021-04-29 Samsung Electronics Co., Ltd. Storage device, method for operating the storage device, and computer system with storage device
CN112988884A (en) * 2019-12-17 2021-06-18 中国移动通信集团陕西有限公司 Big data platform data storage method and device
CN113515497A (en) * 2020-04-09 2021-10-19 奇安信安全技术(珠海)有限公司 Database data processing method, device and system
CN112699142A (en) * 2020-12-29 2021-04-23 平安普惠企业管理有限公司 Cold and hot data processing method and device, electronic equipment and storage medium
CN112819775A (en) * 2021-01-28 2021-05-18 中国空气动力研究与发展中心超高速空气动力研究所 Segmentation and reinforcement method for damage detection image of aerospace composite material
CN113021818A (en) * 2021-03-25 2021-06-25 弘丰塑胶制品(深圳)有限公司 Control system of injection mold with automatic stripping function
CN113190585A (en) * 2021-04-12 2021-07-30 郑州轻工业大学 Big data acquisition and analysis system for clothing design
CN112948398A (en) * 2021-04-29 2021-06-11 电子科技大学 Hierarchical storage system and method for cold and hot data
CN113886587A (en) * 2021-10-09 2022-01-04 杭州凡闻科技有限公司 Data classification method based on deep learning and map building method
CN114186002A (en) * 2021-12-14 2022-03-15 智博天宫(苏州)人工智能产业研究院有限公司 Scientific and technological achievement data processing and analyzing method and system
CN114266492A (en) * 2021-12-27 2022-04-01 天元大数据信用管理有限公司 Enterprise financing fund matching method based on data mining

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于价值评估的数据迁移策略研究;江菲等;《电子设计工程》;20110405(第07期);第11-13页 *
基于内容分析法的电子商务模式分类研究;柳俊等;《管理工程学报》;20110715(第03期);第204-209页 *
基于数据温度的冷热数据识别机制研究;解玉琳;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190815(第2019年第08期);第I138-461页 *
数据库信息资源内容质量用户满意度模型及实证研究;莫祖英等;《中国图书馆学报》;20130315(第02期);第87-99页 *

Also Published As

Publication number Publication date
CN114817425A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN114817425B (en) Method, device and equipment for classifying cold and hot data and readable storage medium
US10484413B2 (en) System and a method for detecting anomalous activities in a blockchain network
CN107305611B (en) Method and device for establishing model corresponding to malicious account and method and device for identifying malicious account
EP4020315A1 (en) Method, apparatus and system for determining label
US20190220710A1 (en) Data processing method and data processing device
CN108833139B (en) OSSEC alarm data aggregation method based on category attribute division
KR102090239B1 (en) Method for detecting anomality quickly by using layer convergence statistics information and system thereof
CN114090402A (en) User abnormal access behavior detection method based on isolated forest
CN109857618B (en) Monitoring method, device and system
CN113379176A (en) Telecommunication network abnormal data detection method, device, equipment and readable storage medium
CN111340075B (en) Network data detection method and device for ICS
CN113708987B (en) Network anomaly detection method and device
CN111400126A (en) Network service abnormal data detection method, device, equipment and medium
US10810458B2 (en) Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors
CN115577152A (en) Online book borrowing management system based on data analysis
CN114978877A (en) Exception handling method and device, electronic equipment and computer readable medium
CN105989152B (en) Method, device and system for monitoring service quality of search engine
CN113869526A (en) Data processing model performance improving method and device, storage medium and electronic equipment
CN112541595A (en) Model construction method and device, storage medium and electronic equipment
TWI665568B (en) Method and device for clustering data stream
US9471663B1 (en) Classification of media in a media sharing system
CN115618050A (en) Video data storage and analysis method, device, system, communication equipment and storage medium
CN112764935B (en) Big data processing method and device, electronic equipment and storage medium
CN113535038A (en) Front-end menu tree generation method and device, computer equipment and storage medium
CN111984867A (en) Network resource determination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant