CN109918448A - A kind of cloud storage data classification method based on user behavior - Google Patents

A kind of cloud storage data classification method based on user behavior Download PDF

Info

Publication number
CN109918448A
CN109918448A CN201910166893.XA CN201910166893A CN109918448A CN 109918448 A CN109918448 A CN 109918448A CN 201910166893 A CN201910166893 A CN 201910166893A CN 109918448 A CN109918448 A CN 109918448A
Authority
CN
China
Prior art keywords
data
value
time
file
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910166893.XA
Other languages
Chinese (zh)
Inventor
颜凯
董茜
张力
张明
吴涵莹
李婷蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU RESEARCH INSTITUTE OF UESTC
University of Electronic Science and Technology of China
Original Assignee
CHENGDU RESEARCH INSTITUTE OF UESTC
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU RESEARCH INSTITUTE OF UESTC, University of Electronic Science and Technology of China filed Critical CHENGDU RESEARCH INSTITUTE OF UESTC
Priority to CN201910166893.XA priority Critical patent/CN109918448A/en
Publication of CN109918448A publication Critical patent/CN109918448A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to cloud storage technical field, a kind of specifically cloud storage data classification method based on user behavior.Method of the invention specifically includes that cloud platform counts operation behavior of the user to data, the operation behavior, which includes at least, to be uploaded, downloading, transmitting, collects and delete, the statistics includes at least the time and number that statistical operation behavior occurs, according to statistical data, it calculates the value of certain moment data file: in the target time period, assigning data value according to number of operations;Estimated data current time value: according to the value in the data whithin a period of time different sequences, it is estimated using appraisal procedure, it is worth according to data current time, sorting out value is carried out to all data of storage, and it is defined as dsc data by before ordered set 10%, preceding 20%-10% is defined as general data, remaining is defined as cold data, according to the definition to data, data are carried out with the movement of respective memory locations.

Description

A kind of cloud storage data classification method based on user behavior
Technical field
The invention belongs to cloud storage technical field, a kind of specifically cloud storage data staging side based on user behavior Method.
Background technique
Data increase with number of users and time in geometry grade, and 95% or more is unmanageable non-in cloud storage platform Structural data, such as all kinds of documents, photo, audio file, video file etc., these 80% or more data after creation just at For the data that will not be accessed, but these data are also required to be saved by cloud storage system according to user demand.Data storage medium Price influenced by the performance factor of access rate, service life etc., therefore can use data staging mode, by rate of people logging in compared with Low data (cold data) are stored in the slower storage medium of access rate, and the higher data of rate of people logging in (dsc data) are stored in In the faster storage medium of access speed, the building of cloud storage system, management cost are reduced with this.
In cloud storage system, according to different storage medium cost performances, different grades of hardware hierarchical storage system is formed, And it will be stored after data staging into the storage medium of corresponding level using software algorithm, had both been able to satisfy user's magnanimity in this way The storage demand of data can not also influence to reduce totle drilling cost in the case where data read-write efficiency.The specific implementation skill of data staging Art is one of the core technology during cloud storage system is built.
Existing data separation method is divided into two classes: Cache replacement method based on data access rate and according to data valence It is worth criterion.Based on the Cache replacement method of data access rate when facing large-scale data, performance is to cloud storage platform shadow Sound is larger.It according to data value criterion, needs to calculate all data values every time, then be ranked up, only simply consider Access time successive problem, does not consider the data value in different time sequence.
Summary of the invention
In view of the above-mentioned problems, the present invention provides a kind of cloud storage data classification method based on user behavior.
The technical scheme is that
A kind of cloud storage data classification method based on user behavior, which comprises the following steps:
S1, cloud platform count operation behavior of the user to data, the operation behavior include at least upload, under It carries, transmitting, collect and delete, the statistics includes at least the time and number that statistical operation behavior occurs;
S2, according to the statistical data in step S1, calculate the value of certain moment data file: in the target time period, root Data value is assigned according to number of operations;The definition of the value is the significance level for response data;
S3, estimated data current time value: according to the value in the data whithin a period of time different sequences, using commenting The method of estimating is estimated;
S4, it is worth according to data current time, sorting out value is carried out to all data of storage, and will be before ordered set 10% is defined as dsc data, and preceding 20%-10% is defined as general data, remaining is defined as cold data;
S5, the movement according to the definition to data, to data progress respective memory locations.Specifically, dsc data is retained in In high speed storing medium, such as solid state hard disk, general data is moved in high speed SATA hard disc medium, and cold data is moved to vulgar deposit In storage media, in low speed SATA hard disc medium or nearline storage.Mobile is that multithreading moves in batches, and when mobile, data are moved It moves to after designated position, deletes raw position data, and carry out the update of respective paths record in the database.
For data staging calculating process because consuming compared with multi -CPU, data moving process can consume more storage medium I/O resource, Selection is carried out in the daily morning user volume lesser time, can not influence the normal use of user.
The technical program behavioural analysis of user from cloud storage platform is attached most importance to, and has been carried out point according to the characteristic of data Grade, can amount of access is higher " dsc data " be stored in high-speed processing apparatus, and amount of access lower " cold data " is stored in In low speed storage device.Not only it ensure that user was using the efficiency in operating process of the cloud storage to data unaffected, but also pole The investment for reducing hardware cost in cloud storage system of big degree.
Specific embodiment
In order to make it easy to understand, the solution of the present invention is described in more detail below.
The method comprise the steps that
S1: operation behavior of the analysis user in cloud storage platform mainly have upload (updates), downloading, transmit (sharing), Collection is deleted.
Upload (update): the webpage that user is provided by cloud storage platform uploads to the data in local computer disk Cloud storage platform.When user initiates upload request, cloud storage platform judges whether data have stored by check code, for example There is the direct feedback data storage address of data to user, for example not stored data, cloud storage platform, which stores data into, to be deposited at a high speed In storage media, after data upload, add (updates) one record in the database, for describe path that data store, The Various types of data description information such as file size, file type, time point, and feedback record information is to user.
Downloading: the data that user needs to download by the selection of the cloud storage platform page, cloud storage platform are obtained from database File is taken to store information, according to file storage path, acquisition data feedback is to user from corresponding storage medium, after the completion of downloading, The information such as the file of downloading, downloading rate, download time are recorded in database.
Transmitting (sharing): user selects the data for needing to share, and link is shared in creation in the page, and cloud storage platform is in number It is recorded according to a sharing is increased in library newly, the information such as link, share time of data for describing user's transmitting (sharing).If any Other users access the link and obtain data, in the database recording-related information, such as obtain file link, time.
Collection: user selects the data of collection, which can be user's data and be also possible to other users data. For example other users data, do not need replicate data, need to only increase a user data information in the database.Same number According to file, the user of collection is more, and the probability of access is bigger.
It deletes: since the same data may be collected or be used by multiple users, but only preserving one in cloud storage platform Part data, so the data file in cloud platform can't be really deleted when certain user initiates delete operation, and only in number It is recorded accordingly according in library.Until certain time point, after the completion of data staging calculates, judge that this document no user uses, It is erased in storage medium during Data Migration.
User uploaded (update), downloading, transmitting (sharings), collect, delete operation when, cloud storage platform only can be It can be related to the read-write of file in upload (update), down operation process, other operations can only carry out the record in database Operation.
S2: the value Vf of data file in a period of time is calculated:
Vf=Uf+Df+Sf+CfFormula 1
(1)Uf: " writing frequency " index of data.Same user or different user may repeatedly upload same data, be Unique description one data, cloud storage platform are calculated the unique information abstract of data using MD5 algorithm, guarantee same number It is stored in cloud storage according to file once, repeatedly write-in is avoided to result in waste of resources.But user uploads every time can be in database Middle record, total upload number of certain data in a certain period is U in staqtistical data basef
(2)Df: " reading frequency " index of data.Certain data file is counted in all download time D of certain periodf.Downloading time Number is more, and it is bigger to represent its value.
(3)Sf: certain data file is counted in all transmitting (sharing) number S of certain periodf.It is more to share number, represents Its possible value is bigger.
(4)Cf: count certain data file collection number C all in certain periodf.It is more to collect number, it is possible to represent its It is worth bigger.
Here there is no counting to the deleting act of user, when user executes delete operation, this data is represented It,, can be by the number when executing data movement if the data are deleted by all users to the user without any value According to deletion.
S3: the value at certain data current time is estimated using exponential smoothing.
S31, the user's operation user behaviors log recorded in cloud storage platform database is recorded and is divided with timeslice, from t0, t1,…,tn, each timeslice includes a large amount of log recording.
S32, calculate certain data file current time estimate value EVf
T represents time series in formula 2, and value is from 0 to n, tnCurrent time is represented,It represents in tnThe file valence at moment Value, EVf(tn-1) it is tn-1Moment estimates file value, EVf(tn) it is tnMoment estimates file value.Variable α is Number, for value range from 0 to 1, representative is worth the fall off rate being relatively early worth recently, and α is closer to 1, then value accounts for recently Weight it is bigger;On the contrary, α is closer to 0, then it is smaller to be worth the weight accounted for recently.tlastIndicate that data file f last time goes out Current moment,It represents from t0To tlastTotal value.Index tn-tlastCurrent moment arrives+1 expression data file f last time out The interval time at current time.
S4: being worth according to calculated data current time and carry out sorting out value, and access is fixed according to before ordered set 10% Justice is dsc data, and preceding 20%-10% is defined as general data, remaining is cold data.
S5: data are carried out to the movement of respective memory locations according to calculated result.Dsc data is retained in high speed storing medium In, such as solid state hard disk, general data is moved in high speed SATA hard disc medium, and cold data is moved in vulgar storage medium, such as In low speed SATA hard disc medium or nearline storage.Mobile is that multithreading moves in batches, when mobile, moves the data into designated position Afterwards, raw position data is deleted, and carries out the update of respective paths record in the database.

Claims (3)

1. a kind of cloud storage data classification method based on user behavior, which comprises the following steps:
S1, cloud platform count operation behavior of the user to data, and the operation behavior, which includes at least, to be uploaded, downloading, passes It passs, collect and deletes, the statistics includes at least the time and number that statistical operation behavior occurs;
S2, according to the statistical data in step S1, calculate the value of data file: in the target time period, according to number of operations Assign data value;The definition of the value is the significance level for response data;
S3, estimated data current time value: according to the value in the data whithin a period of time different sequences, using assessment side Method is estimated;
S4, it is worth according to data current time, sorting out value is carried out to all data of storage, and by before ordered set 10% It is defined as dsc data, preceding 20%-10% is defined as general data, remaining is defined as cold data;
S5, the movement according to the definition to data, to data progress respective memory locations.
2. a kind of cloud storage data classification method based on user behavior according to claim 1, which is characterized in that described The specific method that the value of data file is calculated in step S2 is that the valence of data file in a period of time is calculated using following formula Value Vf:
Vf=Uf+Df+Sf+Cf
Wherein, UfFor upload number of the data within the given period, DfFor download time of the data within the given period, SfFor number According to the degree of transitivity within the given period, CfFor collection number of the data within the given period.
3. a kind of cloud storage data classification method based on user behavior according to claim 2, which is characterized in that described Step S3 is the value that certain data current time is estimated using exponential smoothing, specifically:
S31, the user's operation user behaviors log recorded in cloud storage platform database is recorded and is divided with timeslice, be defined as t0, t1,…,tn, subscript n refers to the number of segmentation, and the value of data in each timeslice is calculated using method described in step S2
S32, EV is worth using following formula estimated data current timef:
Wherein, t represents time series, and value is from 0 to n, tnCurrent time is represented,It represents in tnThe file at moment is worth, EVf (tn-1) it is tn-1Moment estimates file value, EVf(tn) it is tnMoment estimates file value, and variable α is smoothing factor, is taken It is worth range from 0 to 1, representative is worth the fall off rate being relatively early worth recently, i.e. α is closer to 1, then being worth the power accounted for recently It is again bigger;On the contrary, α is closer to 0, then it is smaller to be worth the weight accounted for recently, tlastWhen indicating that data file f last time occurs It carves,It represents from t0To tlastTotal value, index tn-tlast+ 1 indicates that data file f last time goes out current moment to currently The interval time at moment.
CN201910166893.XA 2019-03-06 2019-03-06 A kind of cloud storage data classification method based on user behavior Pending CN109918448A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910166893.XA CN109918448A (en) 2019-03-06 2019-03-06 A kind of cloud storage data classification method based on user behavior

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910166893.XA CN109918448A (en) 2019-03-06 2019-03-06 A kind of cloud storage data classification method based on user behavior

Publications (1)

Publication Number Publication Date
CN109918448A true CN109918448A (en) 2019-06-21

Family

ID=66963495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910166893.XA Pending CN109918448A (en) 2019-03-06 2019-03-06 A kind of cloud storage data classification method based on user behavior

Country Status (1)

Country Link
CN (1) CN109918448A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472004A (en) * 2019-08-23 2019-11-19 国网山东省电力公司电力科学研究院 A kind of method and system of scientific and technological information data multilevel cache management
CN111475316A (en) * 2020-04-14 2020-07-31 中国人民解放军战略支援部队信息工程大学 Persistence operation method, device, equipment and system for mimicry construction cloud service system
CN111565144A (en) * 2020-04-26 2020-08-21 广州数源畅联科技有限公司 Data layered storage management method for instant communication tool
CN113360553A (en) * 2020-03-03 2021-09-07 中国移动通信集团贵州有限公司 Data cold and hot degree evaluation method and server
CN113721854A (en) * 2021-08-31 2021-11-30 中国建设银行股份有限公司 Data storage method and device
CN113869359A (en) * 2021-08-18 2021-12-31 北京工业大学 Modular neural network-based prediction method for nitrogen oxides in urban solid waste incineration process
CN114817200A (en) * 2022-05-06 2022-07-29 安徽森江人力资源服务有限公司 Document data cloud management method and system based on Internet of things and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106050A (en) * 2013-02-22 2013-05-15 浪潮电子信息产业股份有限公司 Method for achieving layered storage and copy of data of storage system
US20160119429A1 (en) * 2014-09-22 2016-04-28 International Business Machines Corporation Multi-service cloud storage decision optimization process
CN106055272A (en) * 2016-05-20 2016-10-26 乐视控股(北京)有限公司 Selection method and apparatus of storage medium
CN105653591B (en) * 2015-12-22 2019-02-05 浙江中控研究院有限公司 A kind of industrial real-time data classification storage and moving method
CN109377363A (en) * 2018-09-26 2019-02-22 电子科技大学 A kind of internet of things data transaction construction and its transaction security method based on block chain

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106050A (en) * 2013-02-22 2013-05-15 浪潮电子信息产业股份有限公司 Method for achieving layered storage and copy of data of storage system
US20160119429A1 (en) * 2014-09-22 2016-04-28 International Business Machines Corporation Multi-service cloud storage decision optimization process
CN105653591B (en) * 2015-12-22 2019-02-05 浙江中控研究院有限公司 A kind of industrial real-time data classification storage and moving method
CN106055272A (en) * 2016-05-20 2016-10-26 乐视控股(北京)有限公司 Selection method and apparatus of storage medium
CN109377363A (en) * 2018-09-26 2019-02-22 电子科技大学 A kind of internet of things data transaction construction and its transaction security method based on block chain

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘维: "基于蓝光存储的异构云存储平台设计与研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472004A (en) * 2019-08-23 2019-11-19 国网山东省电力公司电力科学研究院 A kind of method and system of scientific and technological information data multilevel cache management
CN113360553A (en) * 2020-03-03 2021-09-07 中国移动通信集团贵州有限公司 Data cold and hot degree evaluation method and server
CN111475316A (en) * 2020-04-14 2020-07-31 中国人民解放军战略支援部队信息工程大学 Persistence operation method, device, equipment and system for mimicry construction cloud service system
CN111475316B (en) * 2020-04-14 2023-01-24 中国人民解放军战略支援部队信息工程大学 Persistence operation method, device, equipment and system for mimicry construction cloud service system
CN111565144A (en) * 2020-04-26 2020-08-21 广州数源畅联科技有限公司 Data layered storage management method for instant communication tool
CN113869359A (en) * 2021-08-18 2021-12-31 北京工业大学 Modular neural network-based prediction method for nitrogen oxides in urban solid waste incineration process
CN113869359B (en) * 2021-08-18 2024-05-28 北京工业大学 Method for predicting nitrogen oxides in urban solid waste incineration process based on modularized neural network
CN113721854A (en) * 2021-08-31 2021-11-30 中国建设银行股份有限公司 Data storage method and device
CN114817200A (en) * 2022-05-06 2022-07-29 安徽森江人力资源服务有限公司 Document data cloud management method and system based on Internet of things and storage medium
CN114817200B (en) * 2022-05-06 2024-04-05 新疆利丰智能科技股份有限公司 Internet of things-based document data cloud management method, system and storage medium

Similar Documents

Publication Publication Date Title
CN109918448A (en) A kind of cloud storage data classification method based on user behavior
US7117294B1 (en) Method and system for archiving and compacting data in a data storage array
CN102332029B (en) Hadoop-based mass classifiable small file association storage method
EP3832450A1 (en) Method for aggregation optimization of time series data
CN103902623B (en) Method and system for the accessing file in storage system
US20090228669A1 (en) Storage Device Optimization Using File Characteristics
US20180349053A1 (en) Data Deduplication in a Storage System
CN101673192B (en) Method for time-sequence data processing, device and system therefor
CN104462389B (en) Distributed file system implementation method based on classification storage
EP2026184B1 (en) Device, method, and program for selecting data storage destination from a plurality of tape recording devices
CN1675614A (en) Moving data among storage units
US20090271456A1 (en) Efficient backup data retrieval
US9275068B2 (en) De-duplication deployment planning
CN111913925B (en) Data processing method and system in distributed storage system
KR101744892B1 (en) System and method for data searching using time series tier indexing
CN108804661A (en) Data de-duplication method based on fuzzy clustering in a kind of cloud storage system
CN104050057B (en) Historical sensed data duplicate removal fragment eliminating method and system
CN110795614A (en) Index automatic optimization method and device
CN102332004A (en) Data processing method and system for managing mass data
CN112799597A (en) Hierarchical storage fault-tolerant method for stream data processing
EP0668555A2 (en) Method and apparatus for reclaiming data storage volumes in a data storage library
CN110019017B (en) High-energy physical file storage method based on access characteristics
CN113254270B (en) Self-recovery method, system and storage medium for storing cache hot spot data
CN114510474A (en) Sample deleting method based on time attenuation, device thereof and storage medium
CN113778964B (en) Recording device for storing multiple temporary storage files and management method of temporary storage files

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190621