CN109918448A - A kind of cloud storage data classification method based on user behavior - Google Patents
A kind of cloud storage data classification method based on user behavior Download PDFInfo
- Publication number
- CN109918448A CN109918448A CN201910166893.XA CN201910166893A CN109918448A CN 109918448 A CN109918448 A CN 109918448A CN 201910166893 A CN201910166893 A CN 201910166893A CN 109918448 A CN109918448 A CN 109918448A
- Authority
- CN
- China
- Prior art keywords
- data
- value
- time
- file
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to cloud storage technical field, a kind of specifically cloud storage data classification method based on user behavior.Method of the invention specifically includes that cloud platform counts operation behavior of the user to data, the operation behavior, which includes at least, to be uploaded, downloading, transmitting, collects and delete, the statistics includes at least the time and number that statistical operation behavior occurs, according to statistical data, it calculates the value of certain moment data file: in the target time period, assigning data value according to number of operations;Estimated data current time value: according to the value in the data whithin a period of time different sequences, it is estimated using appraisal procedure, it is worth according to data current time, sorting out value is carried out to all data of storage, and it is defined as dsc data by before ordered set 10%, preceding 20%-10% is defined as general data, remaining is defined as cold data, according to the definition to data, data are carried out with the movement of respective memory locations.
Description
Technical field
The invention belongs to cloud storage technical field, a kind of specifically cloud storage data staging side based on user behavior
Method.
Background technique
Data increase with number of users and time in geometry grade, and 95% or more is unmanageable non-in cloud storage platform
Structural data, such as all kinds of documents, photo, audio file, video file etc., these 80% or more data after creation just at
For the data that will not be accessed, but these data are also required to be saved by cloud storage system according to user demand.Data storage medium
Price influenced by the performance factor of access rate, service life etc., therefore can use data staging mode, by rate of people logging in compared with
Low data (cold data) are stored in the slower storage medium of access rate, and the higher data of rate of people logging in (dsc data) are stored in
In the faster storage medium of access speed, the building of cloud storage system, management cost are reduced with this.
In cloud storage system, according to different storage medium cost performances, different grades of hardware hierarchical storage system is formed,
And it will be stored after data staging into the storage medium of corresponding level using software algorithm, had both been able to satisfy user's magnanimity in this way
The storage demand of data can not also influence to reduce totle drilling cost in the case where data read-write efficiency.The specific implementation skill of data staging
Art is one of the core technology during cloud storage system is built.
Existing data separation method is divided into two classes: Cache replacement method based on data access rate and according to data valence
It is worth criterion.Based on the Cache replacement method of data access rate when facing large-scale data, performance is to cloud storage platform shadow
Sound is larger.It according to data value criterion, needs to calculate all data values every time, then be ranked up, only simply consider
Access time successive problem, does not consider the data value in different time sequence.
Summary of the invention
In view of the above-mentioned problems, the present invention provides a kind of cloud storage data classification method based on user behavior.
The technical scheme is that
A kind of cloud storage data classification method based on user behavior, which comprises the following steps:
S1, cloud platform count operation behavior of the user to data, the operation behavior include at least upload, under
It carries, transmitting, collect and delete, the statistics includes at least the time and number that statistical operation behavior occurs;
S2, according to the statistical data in step S1, calculate the value of certain moment data file: in the target time period, root
Data value is assigned according to number of operations;The definition of the value is the significance level for response data;
S3, estimated data current time value: according to the value in the data whithin a period of time different sequences, using commenting
The method of estimating is estimated;
S4, it is worth according to data current time, sorting out value is carried out to all data of storage, and will be before ordered set
10% is defined as dsc data, and preceding 20%-10% is defined as general data, remaining is defined as cold data;
S5, the movement according to the definition to data, to data progress respective memory locations.Specifically, dsc data is retained in
In high speed storing medium, such as solid state hard disk, general data is moved in high speed SATA hard disc medium, and cold data is moved to vulgar deposit
In storage media, in low speed SATA hard disc medium or nearline storage.Mobile is that multithreading moves in batches, and when mobile, data are moved
It moves to after designated position, deletes raw position data, and carry out the update of respective paths record in the database.
For data staging calculating process because consuming compared with multi -CPU, data moving process can consume more storage medium I/O resource,
Selection is carried out in the daily morning user volume lesser time, can not influence the normal use of user.
The technical program behavioural analysis of user from cloud storage platform is attached most importance to, and has been carried out point according to the characteristic of data
Grade, can amount of access is higher " dsc data " be stored in high-speed processing apparatus, and amount of access lower " cold data " is stored in
In low speed storage device.Not only it ensure that user was using the efficiency in operating process of the cloud storage to data unaffected, but also pole
The investment for reducing hardware cost in cloud storage system of big degree.
Specific embodiment
In order to make it easy to understand, the solution of the present invention is described in more detail below.
The method comprise the steps that
S1: operation behavior of the analysis user in cloud storage platform mainly have upload (updates), downloading, transmit (sharing),
Collection is deleted.
Upload (update): the webpage that user is provided by cloud storage platform uploads to the data in local computer disk
Cloud storage platform.When user initiates upload request, cloud storage platform judges whether data have stored by check code, for example
There is the direct feedback data storage address of data to user, for example not stored data, cloud storage platform, which stores data into, to be deposited at a high speed
In storage media, after data upload, add (updates) one record in the database, for describe path that data store,
The Various types of data description information such as file size, file type, time point, and feedback record information is to user.
Downloading: the data that user needs to download by the selection of the cloud storage platform page, cloud storage platform are obtained from database
File is taken to store information, according to file storage path, acquisition data feedback is to user from corresponding storage medium, after the completion of downloading,
The information such as the file of downloading, downloading rate, download time are recorded in database.
Transmitting (sharing): user selects the data for needing to share, and link is shared in creation in the page, and cloud storage platform is in number
It is recorded according to a sharing is increased in library newly, the information such as link, share time of data for describing user's transmitting (sharing).If any
Other users access the link and obtain data, in the database recording-related information, such as obtain file link, time.
Collection: user selects the data of collection, which can be user's data and be also possible to other users data.
For example other users data, do not need replicate data, need to only increase a user data information in the database.Same number
According to file, the user of collection is more, and the probability of access is bigger.
It deletes: since the same data may be collected or be used by multiple users, but only preserving one in cloud storage platform
Part data, so the data file in cloud platform can't be really deleted when certain user initiates delete operation, and only in number
It is recorded accordingly according in library.Until certain time point, after the completion of data staging calculates, judge that this document no user uses,
It is erased in storage medium during Data Migration.
User uploaded (update), downloading, transmitting (sharings), collect, delete operation when, cloud storage platform only can be
It can be related to the read-write of file in upload (update), down operation process, other operations can only carry out the record in database
Operation.
S2: the value Vf of data file in a period of time is calculated:
Vf=Uf+Df+Sf+CfFormula 1
(1)Uf: " writing frequency " index of data.Same user or different user may repeatedly upload same data, be
Unique description one data, cloud storage platform are calculated the unique information abstract of data using MD5 algorithm, guarantee same number
It is stored in cloud storage according to file once, repeatedly write-in is avoided to result in waste of resources.But user uploads every time can be in database
Middle record, total upload number of certain data in a certain period is U in staqtistical data basef。
(2)Df: " reading frequency " index of data.Certain data file is counted in all download time D of certain periodf.Downloading time
Number is more, and it is bigger to represent its value.
(3)Sf: certain data file is counted in all transmitting (sharing) number S of certain periodf.It is more to share number, represents
Its possible value is bigger.
(4)Cf: count certain data file collection number C all in certain periodf.It is more to collect number, it is possible to represent its
It is worth bigger.
Here there is no counting to the deleting act of user, when user executes delete operation, this data is represented
It,, can be by the number when executing data movement if the data are deleted by all users to the user without any value
According to deletion.
S3: the value at certain data current time is estimated using exponential smoothing.
S31, the user's operation user behaviors log recorded in cloud storage platform database is recorded and is divided with timeslice, from t0,
t1,…,tn, each timeslice includes a large amount of log recording.
S32, calculate certain data file current time estimate value EVf。
T represents time series in formula 2, and value is from 0 to n, tnCurrent time is represented,It represents in tnThe file valence at moment
Value, EVf(tn-1) it is tn-1Moment estimates file value, EVf(tn) it is tnMoment estimates file value.Variable α is
Number, for value range from 0 to 1, representative is worth the fall off rate being relatively early worth recently, and α is closer to 1, then value accounts for recently
Weight it is bigger;On the contrary, α is closer to 0, then it is smaller to be worth the weight accounted for recently.tlastIndicate that data file f last time goes out
Current moment,It represents from t0To tlastTotal value.Index tn-tlastCurrent moment arrives+1 expression data file f last time out
The interval time at current time.
S4: being worth according to calculated data current time and carry out sorting out value, and access is fixed according to before ordered set 10%
Justice is dsc data, and preceding 20%-10% is defined as general data, remaining is cold data.
S5: data are carried out to the movement of respective memory locations according to calculated result.Dsc data is retained in high speed storing medium
In, such as solid state hard disk, general data is moved in high speed SATA hard disc medium, and cold data is moved in vulgar storage medium, such as
In low speed SATA hard disc medium or nearline storage.Mobile is that multithreading moves in batches, when mobile, moves the data into designated position
Afterwards, raw position data is deleted, and carries out the update of respective paths record in the database.
Claims (3)
1. a kind of cloud storage data classification method based on user behavior, which comprises the following steps:
S1, cloud platform count operation behavior of the user to data, and the operation behavior, which includes at least, to be uploaded, downloading, passes
It passs, collect and deletes, the statistics includes at least the time and number that statistical operation behavior occurs;
S2, according to the statistical data in step S1, calculate the value of data file: in the target time period, according to number of operations
Assign data value;The definition of the value is the significance level for response data;
S3, estimated data current time value: according to the value in the data whithin a period of time different sequences, using assessment side
Method is estimated;
S4, it is worth according to data current time, sorting out value is carried out to all data of storage, and by before ordered set 10%
It is defined as dsc data, preceding 20%-10% is defined as general data, remaining is defined as cold data;
S5, the movement according to the definition to data, to data progress respective memory locations.
2. a kind of cloud storage data classification method based on user behavior according to claim 1, which is characterized in that described
The specific method that the value of data file is calculated in step S2 is that the valence of data file in a period of time is calculated using following formula
Value Vf:
Vf=Uf+Df+Sf+Cf
Wherein, UfFor upload number of the data within the given period, DfFor download time of the data within the given period, SfFor number
According to the degree of transitivity within the given period, CfFor collection number of the data within the given period.
3. a kind of cloud storage data classification method based on user behavior according to claim 2, which is characterized in that described
Step S3 is the value that certain data current time is estimated using exponential smoothing, specifically:
S31, the user's operation user behaviors log recorded in cloud storage platform database is recorded and is divided with timeslice, be defined as t0,
t1,…,tn, subscript n refers to the number of segmentation, and the value of data in each timeslice is calculated using method described in step S2
S32, EV is worth using following formula estimated data current timef:
Wherein, t represents time series, and value is from 0 to n, tnCurrent time is represented,It represents in tnThe file at moment is worth, EVf
(tn-1) it is tn-1Moment estimates file value, EVf(tn) it is tnMoment estimates file value, and variable α is smoothing factor, is taken
It is worth range from 0 to 1, representative is worth the fall off rate being relatively early worth recently, i.e. α is closer to 1, then being worth the power accounted for recently
It is again bigger;On the contrary, α is closer to 0, then it is smaller to be worth the weight accounted for recently, tlastWhen indicating that data file f last time occurs
It carves,It represents from t0To tlastTotal value, index tn-tlast+ 1 indicates that data file f last time goes out current moment to currently
The interval time at moment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910166893.XA CN109918448A (en) | 2019-03-06 | 2019-03-06 | A kind of cloud storage data classification method based on user behavior |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910166893.XA CN109918448A (en) | 2019-03-06 | 2019-03-06 | A kind of cloud storage data classification method based on user behavior |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109918448A true CN109918448A (en) | 2019-06-21 |
Family
ID=66963495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910166893.XA Pending CN109918448A (en) | 2019-03-06 | 2019-03-06 | A kind of cloud storage data classification method based on user behavior |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109918448A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472004A (en) * | 2019-08-23 | 2019-11-19 | 国网山东省电力公司电力科学研究院 | A kind of method and system of scientific and technological information data multilevel cache management |
CN111475316A (en) * | 2020-04-14 | 2020-07-31 | 中国人民解放军战略支援部队信息工程大学 | Persistence operation method, device, equipment and system for mimicry construction cloud service system |
CN111565144A (en) * | 2020-04-26 | 2020-08-21 | 广州数源畅联科技有限公司 | Data layered storage management method for instant communication tool |
CN113360553A (en) * | 2020-03-03 | 2021-09-07 | 中国移动通信集团贵州有限公司 | Data cold and hot degree evaluation method and server |
CN113721854A (en) * | 2021-08-31 | 2021-11-30 | 中国建设银行股份有限公司 | Data storage method and device |
CN113869359A (en) * | 2021-08-18 | 2021-12-31 | 北京工业大学 | Modular neural network-based prediction method for nitrogen oxides in urban solid waste incineration process |
CN114817200A (en) * | 2022-05-06 | 2022-07-29 | 安徽森江人力资源服务有限公司 | Document data cloud management method and system based on Internet of things and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103106050A (en) * | 2013-02-22 | 2013-05-15 | 浪潮电子信息产业股份有限公司 | Method for achieving layered storage and copy of data of storage system |
US20160119429A1 (en) * | 2014-09-22 | 2016-04-28 | International Business Machines Corporation | Multi-service cloud storage decision optimization process |
CN106055272A (en) * | 2016-05-20 | 2016-10-26 | 乐视控股(北京)有限公司 | Selection method and apparatus of storage medium |
CN105653591B (en) * | 2015-12-22 | 2019-02-05 | 浙江中控研究院有限公司 | A kind of industrial real-time data classification storage and moving method |
CN109377363A (en) * | 2018-09-26 | 2019-02-22 | 电子科技大学 | A kind of internet of things data transaction construction and its transaction security method based on block chain |
-
2019
- 2019-03-06 CN CN201910166893.XA patent/CN109918448A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103106050A (en) * | 2013-02-22 | 2013-05-15 | 浪潮电子信息产业股份有限公司 | Method for achieving layered storage and copy of data of storage system |
US20160119429A1 (en) * | 2014-09-22 | 2016-04-28 | International Business Machines Corporation | Multi-service cloud storage decision optimization process |
CN105653591B (en) * | 2015-12-22 | 2019-02-05 | 浙江中控研究院有限公司 | A kind of industrial real-time data classification storage and moving method |
CN106055272A (en) * | 2016-05-20 | 2016-10-26 | 乐视控股(北京)有限公司 | Selection method and apparatus of storage medium |
CN109377363A (en) * | 2018-09-26 | 2019-02-22 | 电子科技大学 | A kind of internet of things data transaction construction and its transaction security method based on block chain |
Non-Patent Citations (1)
Title |
---|
刘维: "基于蓝光存储的异构云存储平台设计与研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472004A (en) * | 2019-08-23 | 2019-11-19 | 国网山东省电力公司电力科学研究院 | A kind of method and system of scientific and technological information data multilevel cache management |
CN113360553A (en) * | 2020-03-03 | 2021-09-07 | 中国移动通信集团贵州有限公司 | Data cold and hot degree evaluation method and server |
CN111475316A (en) * | 2020-04-14 | 2020-07-31 | 中国人民解放军战略支援部队信息工程大学 | Persistence operation method, device, equipment and system for mimicry construction cloud service system |
CN111475316B (en) * | 2020-04-14 | 2023-01-24 | 中国人民解放军战略支援部队信息工程大学 | Persistence operation method, device, equipment and system for mimicry construction cloud service system |
CN111565144A (en) * | 2020-04-26 | 2020-08-21 | 广州数源畅联科技有限公司 | Data layered storage management method for instant communication tool |
CN113869359A (en) * | 2021-08-18 | 2021-12-31 | 北京工业大学 | Modular neural network-based prediction method for nitrogen oxides in urban solid waste incineration process |
CN113869359B (en) * | 2021-08-18 | 2024-05-28 | 北京工业大学 | Method for predicting nitrogen oxides in urban solid waste incineration process based on modularized neural network |
CN113721854A (en) * | 2021-08-31 | 2021-11-30 | 中国建设银行股份有限公司 | Data storage method and device |
CN114817200A (en) * | 2022-05-06 | 2022-07-29 | 安徽森江人力资源服务有限公司 | Document data cloud management method and system based on Internet of things and storage medium |
CN114817200B (en) * | 2022-05-06 | 2024-04-05 | 新疆利丰智能科技股份有限公司 | Internet of things-based document data cloud management method, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109918448A (en) | A kind of cloud storage data classification method based on user behavior | |
US7117294B1 (en) | Method and system for archiving and compacting data in a data storage array | |
CN102332029B (en) | Hadoop-based mass classifiable small file association storage method | |
EP3832450A1 (en) | Method for aggregation optimization of time series data | |
CN103902623B (en) | Method and system for the accessing file in storage system | |
US20090228669A1 (en) | Storage Device Optimization Using File Characteristics | |
US20180349053A1 (en) | Data Deduplication in a Storage System | |
CN101673192B (en) | Method for time-sequence data processing, device and system therefor | |
CN104462389B (en) | Distributed file system implementation method based on classification storage | |
EP2026184B1 (en) | Device, method, and program for selecting data storage destination from a plurality of tape recording devices | |
CN1675614A (en) | Moving data among storage units | |
US20090271456A1 (en) | Efficient backup data retrieval | |
US9275068B2 (en) | De-duplication deployment planning | |
CN111913925B (en) | Data processing method and system in distributed storage system | |
KR101744892B1 (en) | System and method for data searching using time series tier indexing | |
CN108804661A (en) | Data de-duplication method based on fuzzy clustering in a kind of cloud storage system | |
CN104050057B (en) | Historical sensed data duplicate removal fragment eliminating method and system | |
CN110795614A (en) | Index automatic optimization method and device | |
CN102332004A (en) | Data processing method and system for managing mass data | |
CN112799597A (en) | Hierarchical storage fault-tolerant method for stream data processing | |
EP0668555A2 (en) | Method and apparatus for reclaiming data storage volumes in a data storage library | |
CN110019017B (en) | High-energy physical file storage method based on access characteristics | |
CN113254270B (en) | Self-recovery method, system and storage medium for storing cache hot spot data | |
CN114510474A (en) | Sample deleting method based on time attenuation, device thereof and storage medium | |
CN113778964B (en) | Recording device for storing multiple temporary storage files and management method of temporary storage files |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190621 |