CN108462605A - A kind of prediction technique and device of data - Google Patents

A kind of prediction technique and device of data Download PDF

Info

Publication number
CN108462605A
CN108462605A CN201810120980.7A CN201810120980A CN108462605A CN 108462605 A CN108462605 A CN 108462605A CN 201810120980 A CN201810120980 A CN 201810120980A CN 108462605 A CN108462605 A CN 108462605A
Authority
CN
China
Prior art keywords
prediction
target
prediction model
sequence
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810120980.7A
Other languages
Chinese (zh)
Other versions
CN108462605B (en
Inventor
乔学明
王贻亮
张媛
杨军洲
刘乘麟
荣以平
朱伟义
刘宁
傅忠传
朱东杰
林艳
孟平
王超
孙海峰
姜婷
汤耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Weihai Power Supply Co of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201810120980.7A priority Critical patent/CN108462605B/en
Publication of CN108462605A publication Critical patent/CN108462605A/en
Application granted granted Critical
Publication of CN108462605B publication Critical patent/CN108462605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching

Abstract

The present invention provides a kind of prediction technique of data and devices, are related to the technical field of data processing, and this method includes:File access daily record is obtained, target sequence is obtained;Target sequence is handled, target training sample and target detection sample are obtained;Target training sample and target detection sample are input in prediction model, with the prediction result obtained according to prediction model, prediction model is adjusted, obtains target prediction model;User access request is analyzed based on target prediction model, obtains prediction data set, and the data volume based on the prediction data set caches the prediction data set.When the present invention solves the access existing in the prior art for carrying out data as user, the problem of the data reading performance using redundancy of distributed memory system lower technology, invention achieves the technique effects for the data reading performance using redundancy for promoting distributed memory system.

Description

A kind of prediction technique and device of data
Technical field
The present invention relates to the technical fields of data processing, more particularly, to the prediction technique and device of a kind of data.
Background technology
In general file system, the metadata management of block is determined by the application program stored, and each file system System has oneself unique data distribution mode and maintains the data structure of magnetic disk metadata.Cause from mass small documents storage Metadata management problem, object store more traditional file system and carry out certain optimization.Object storage is named using flattening Mode reduces the index expense of the multistage form of catalogue.Simultaneously because using the de-centralized management of part metadata, object storage The problem of reducing main control server load, alleviating distributed system metadata access bottleneck to a certain extent.But in data It reads and caching management aspect, object storage is still undesirable with regard to mass small documents reading efficiency.
In view of the above-mentioned problems, not proposing effective solution scheme also.
Invention content
In view of this, the purpose of the present invention is to provide a kind of prediction technique of data and device, to alleviate existing skill User is when accessing data in art, the relatively low technical problem of the data reading performance using redundancy of distributed memory system.
In a first aspect, an embodiment of the present invention provides a kind of prediction technique of data, this method includes:Obtain file access Daily record obtains target sequence, wherein includes the access information for accessing user in the target sequence;To the target sequence into Row processing, obtains target training sample and target detection sample;The target training sample and the target detection sample is defeated Enter into prediction model, the prediction model is adjusted with the prediction result obtained according to the prediction model, obtains target prediction Model;User access request is analyzed based on the target prediction model, obtains prediction data, and be based on the prediction number The prediction data set is cached according to the data volume of set, wherein the prediction data is for characterizing user next The data acquisition system that moment accesses.
Further, the file history access sequence is handled, obtains target training sample data and target is surveyed Sample notebook data includes:The target sequence is sorted out, multiple sub-goal sequences are obtained, wherein the sub-goal sequence Include access record of each user under different access request;The multiple sub-goal sequence is carried out according to preset ratio Cutting, obtains the target training sample and the target detection sample.
Further, the target sequence is sorted out, obtaining multiple sub-goal sequences includes:According to user information pair The target sequence is classified, and intermediate sub-goal sequence is obtained, wherein the intermediate sub-goal sequence includes same user Multiple access record;Based on the multiple first time interval pair accessed in record between any two connected reference record The intermediate sub-goal sequence carries out subseries again, obtains the sub-goal sequence.
Further, based on the multiple first time interval accessed in record between any two connected reference record Carrying out again subseries to the intermediate sub-goal sequence includes:By any two connected reference note corresponding to first time interval Record be classified as under the same access request access record, wherein the first time interval be less than the first predetermined interval when Between be spaced.
Further, file access daily record is obtained, obtaining target sequence includes:Inquire the establishment of the file access daily record Time;Calculate the second time interval between current time and the creation time;If second time interval is more than the Two prefixed time intervals then store file access daily record to calculate node, to obtain the target sequence, wherein described Calculate node is the node for being handled the file access daily record in document storage system.
Further, the target training sample and the target detection sample are input in prediction model, with basis The prediction result that the prediction model obtains adjusts the prediction model, obtains target prediction model and includes:Build the prediction Model;The target training sample is inputted in the prediction model, the prediction model is trained;The target is surveyed Sample is originally input in the prediction model after training, obtains prediction result;Based on the prediction result, to after training The parameter of the prediction model be adjusted, obtain the target prediction model.
Further, user access request is analyzed based on the target prediction model, obtains prediction data set, And the data volume based on the prediction data set to the prediction data set cache and includes:It is asked according to the access of user It asks, the prediction data set for including the prediction data is obtained by the prediction model;Judge the prediction data set Whether data volume is more than preset data amount;If it is judged that being yes, then the prediction data set was not cached to the generation Manage node in, wherein the agent node be in document storage system be used for store the file access daily record and institute State the node of prediction data set;If it is judged that being no, then the prediction data set is cached to the agent node In.
Second aspect, an embodiment of the present invention provides a kind of prediction meanss of data, which includes:Acquisition device, place Manage device, calibrating installation and prediction meanss, wherein the acquisition device is used to obtain file access daily record, obtains target sequence, Wherein, the access information for accessing user is included in the target sequence;The processing unit is used to carry out the target sequence Processing, obtains target training sample and target detection sample;The calibrating installation is used for the target training sample and described Target detection sample is input in prediction model, and the prediction mould is adjusted with the prediction result obtained according to the prediction model Type obtains target prediction model;The prediction meanss are for dividing user access request based on the target prediction model Analysis, obtains prediction data, and the data volume based on the prediction data set caches the prediction data set, In, the prediction data is for characterizing the data acquisition system that user accesses in subsequent time.
Further, the processing unit is additionally operable to:The target sequence is sorted out, multiple sub-goal sequences are obtained Row, wherein the sub-goal sequence includes access record of each user under different access request;To the multiple specific item Mark sequence is cut according to preset ratio, obtains the target training sample and the target detection sample.
Further, the processing unit is additionally operable to:Classify to the target sequence according to user information, in obtaining Between sub-goal sequence, wherein the intermediate sub-goal sequence include same user multiple access record;Based on the multiple It accesses the time interval in record between any two connected reference record and subseries again is carried out to the intermediate sub-goal sequence, Obtain the sub-goal sequence.
In embodiments of the present invention, first, file access daily record is obtained, target sequence is obtained;Then, to target sequence into Row processing, obtains target training sample and target detection sample;Next, target training sample and target detection sample are inputted Into prediction model, with the prediction result obtained according to prediction model, prediction model is adjusted, target prediction model is obtained;Finally, User access request is analyzed based on target prediction model, obtains prediction data set, and be based on the predictive data set The data volume of conjunction caches the prediction data set.
In the embodiment of the present invention, user access request is analyzed by target prediction model, to obtain prediction number According to the mode of set, the data that user's future may access can be cached, reduce the number of system input/output, The reading efficiency for improving distributed memory system on the whole solves the visit existing in the prior art that data are carried out as user When asking, the problem of the data reading performance using redundancy of distributed memory system lower technology, invention achieves promote distributed storage The technique effect of the data reading performance using redundancy of system.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification It obtains it is clear that understand through the implementation of the invention.The purpose of the present invention and other advantages are in specification, claims And specifically noted structure is realized and is obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.
Description of the drawings
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described, it should be apparent that, in being described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, other drawings may also be obtained based on these drawings.
Fig. 1 is a kind of flow chart of the prediction technique of data provided in an embodiment of the present invention;
Fig. 2 is the flow chart of the prediction technique of another data provided in an embodiment of the present invention;
Fig. 3 is a kind of detail flowchart of the prediction technique of data provided in an embodiment of the present invention;
Fig. 4 is a kind of schematic diagram of the prediction meanss of data provided in an embodiment of the present invention.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Lower obtained every other embodiment, shall fall within the protection scope of the present invention.
Embodiment one:
According to embodiments of the present invention, a kind of prediction technique embodiment of data is provided, it should be noted that in attached drawing The step of flow illustrates can execute in the computer system of such as a group of computer-executable instructions, although also, Logical order is shown in flow chart, but in some cases, it can be to execute shown different from sequence herein or retouch The step of stating.
Fig. 1 is a kind of prediction technique of data according to the ... of the embodiment of the present invention, as shown in Figure 1, this method includes following step Suddenly:
Step S102 obtains file access daily record, obtains target sequence, wherein is used comprising access in the target sequence The access information at family.
Step S104 handles the target sequence, obtains target training sample and target detection sample.
The target training sample and the target detection sample are input in prediction model, with basis by step S106 The prediction result that the prediction model obtains adjusts the prediction model, obtains target prediction model.
Step S108 analyzes user access request based on the target prediction model, obtains prediction data set, And the data volume based on the prediction data set caches the prediction data set, wherein the predictive data set Share the data acquisition system accessed in subsequent time in characterization user.
In the embodiment of the present invention, user access request is analyzed by target prediction model, to obtain prediction number According to the mode of set, the data that user's future may access can be cached, reduce the number of system input/output, The reading efficiency for improving distributed memory system on the whole solves the visit existing in the prior art that data are carried out as user When asking, the problem of the data reading performance using redundancy of distributed memory system lower technology, invention achieves promote distributed storage The technique effect of the data reading performance using redundancy of system.
It should be noted that in embodiments of the present invention, the prediction model used for LSTM-RNN models, visit by the file Ask that daily record is the daily record for storing user access activity data stored from distributed memory system.
In embodiments of the present invention, as shown in Fig. 2, step S102, obtains file access daily record, obtain target sequence packet It includes:
Step S1021 inquires the creation time of the file access daily record.
Step S1022 calculates the second time interval between current time and the creation time.
Step S1023 deposits file access daily record if second time interval is more than the second prefixed time interval Storage is to calculate node, to obtain the target sequence, wherein the calculate node is to be used in document storage system The node that the file access daily record is handled.
In embodiments of the present invention, the file access daily record in distributed memory system is stored in the form of text file In agent node.In the embodiment of the present invention, it can be calculated with the creation time of file access log in inquiry proxy node, then The second time interval between current time and the creation time, wherein second prefixed time interval can be 2 hours, Second time interval is not especially limited in embodiments of the present invention by model measurement personnel's sets itself.
If second time interval is more than the second prefixed time interval, file access daily record is stored to calculating and is saved In point, to obtain the target sequence.The file access Log backup is named as current time to agent node simultaneously Stamp.
If second time interval is less than the second prefixed time interval, the file access daily record is not grasped Make, in embodiments of the present invention, by the file access daily record in the agent node to distributed memory system carry out inquiry and Analysis, obtains the target sequence.
In embodiments of the present invention, as shown in Fig. 2, step S104, handles the target sequence, target instruction is obtained Practice sample and target detection sample includes:
Step S1041 sorts out the target sequence, obtains multiple sub-goal sequences, wherein the sub-goal sequence Row include access record of each user under different access request.
Step S1042 cuts the multiple sub-goal sequence according to preset ratio, obtains the target training sample Sheet and the target detection sample.
In embodiments of the present invention, multiple sub-goal sequences are obtained by sorting out to the target sequence first;It connects It, the multiple sub-goal sequence is cut according to preset ratio, obtains target training sample and target detection sample.
For example, including 1000 target subsequences in the target sequence, wherein 700 target subsequences target will be formed Remaining 300 target subsequences are formed target detection sample by training sample;The preset ratio by user's sets itself, It is not specifically limited in the embodiment of the present invention.
Optionally, as shown in figure 3, step S1041, sorts out the target sequence, multiple sub-goal sequences are obtained Further include:
Step S21 classifies to the target sequence according to user information, obtains intermediate sub-goal sequence, wherein institute State multiple access record that intermediate sub-goal sequence includes same user.
Step S22, based on the multiple time interval accessed in record between any two connected reference record to institute It states intermediate sub-goal sequence and carries out subseries again, obtain the sub-goal sequence.
In embodiments of the present invention, first, target sequence is read line by line, will be continuously accessed per any two in a line Information is detached with space, then obtains the creation time of each access information, and makeup time row obtain each access information Source, composition user information row;And the access object name of each access information is obtained, composition accesses object and ranks.
The time is arranged, the user information row and the access object rank preservation to array a0In, wherein due to The access information of the target sequence is that the time is orderly, so array a0In data be also the time it is orderly.
Then, by the target sequence according to array a0In the user information row and the time row classify, Obtain multiple intermediate sub-goal sequences, wherein each access information in each intermediate objective subsequence occupies one Row.
Finally, according to array a0In the access object rank, by each of each described intermediate sub-goal sequence The access object name of access request is added to after corresponding access request, is used in combination space by the access pair of each access request As name and corresponding access request separate;Then, based on arbitrarily connecting connected reference in each intermediate objective subsequence First time interval between record classifies again to the intermediate objective subsequence, obtains the sub-goal sequence.That is, If the access request of the i-th row and the first time interval of the access request of the (i-1)-th row are more than the first prefixed time interval, The two access requests are then separated into access request twice, wherein first prefixed time interval can be 5 seconds, described First prefixed time interval is not specifically limited in embodiments of the present invention by user's sets itself.
Optionally, as shown in figure 3, the step S22, is remembered based on any two connected reference in the multiple access record First time interval between record carries out again subseries to the intermediate sub-goal sequence and further includes:
Any two connected reference record corresponding to first time interval is classified as the same access request by step S221 Under access record, wherein the first time interval be less than the first predetermined interval time interval.
In embodiments of the present invention, if the first time of the access request of the i-th row and the access request of the (i-1)-th row Interval is less than the first prefixed time interval, then the two access requests is considered as an access request.
In embodiments of the present invention, as shown in Fig. 2, step S106, by the target training sample and the target detection Sample is input in prediction model, with the prediction result obtained according to the prediction model, is adjusted the prediction model, is obtained mesh Marking prediction model further includes:
Step S1061 builds the prediction model.
The target training sample is inputted in the prediction model, is instructed to the prediction model by step S1062 Practice.
The target detection sample is input in the prediction model after training by step S1063, is obtained described pre- Survey the prediction result of model.
Step S1064, based on the prediction result of the prediction model, to the parameter of the prediction model after training into Row adjustment, maximizes the precision of prediction of the prediction model, obtains the target prediction model.
In embodiments of the present invention, first, the prediction model is built, to the input gate formula of the prediction model, is lost Forget a formula, out gate formula and unit activating vector form are configured, and formula is as follows:
ft=σ (Wf·[Ct-1,ht-1,xt]+bf)
C′t=tanh (WC·[ht-1,xt]+bC)
it=σ (Wi·[Ct-1,ht-1,xt]+bi)
Ct=ft*Ct-1+(1-ft)*C′t
ot=σ (Wo·[ht-1,xt]+bo)
ht=ot*tanh(Ct)
Wherein, ftIt indicates to forget door, itIndicate input gate, otIndicate out gate, CtIndicate neuron t unit activatings to Amount, WfIt indicates to forget door weight matrix, WiIndicate input gate weight matrix, WoIndicate out gate weight matrix, WCIndicate hidden layer Weight matrix between neuron and activation vector, Ct' indicate update candidate value, htIndicate the output vector of neuron t, xtTable Show the input vector of neuron t.bfIt indicates to forget door deviation biIndicate input gate deviation, boIndicate out gate deviation, bC Indicate that neuronal activation vector deviation, tanh and σ are activation primitive.
σ activation primitive calculation formula are:
Tanh activation primitive calculation formula are:
The hidden neuron number parameter of the prediction model is initially set to 2000, learning rate is initially set to 0.001, initialization hidden state is set as 0.
Then, by the target training sample input prediction model, the loss of prediction model described in training process is calculated Value terminates training when penalty values, which tend towards stability, no longer to be declined, and obtains the prediction model structure of training completion.
Finally, the target detection sample is input in the prediction model for completing training, calculation document prediction knot Difference between fruit and actual conditions, obtains predictablity rate, to be adjusted to the network parameter in the prediction model, The precision of prediction for maximizing the prediction model obtains the target prediction model.
In embodiments of the present invention, by the training to the prediction model, test and parameter adjustment, it is accurate to obtain prediction The highest prediction model is spent, using the highest prediction model of the accuracy as the target prediction model, and will be described Target prediction model is as later prediction user in the prediction model to the data in distributed memory system.
In embodiments of the present invention, as shown in Fig. 2, the step S108, visits user based on the target prediction model It asks that request is analyzed, obtains prediction data set, and the data volume based on the prediction data set is to the prediction data Set carries out caching:
S1081 obtains the prediction number for including the prediction data according to the access request of user by the prediction model According to set;
Step S1082, judges whether the data volume of the prediction data set is more than preset data amount.
Step S1083 does not cache the prediction data set to the agent node then if it is judged that being yes In, wherein the agent node is in document storage system for storing the file access daily record and the prediction The node of data acquisition system.
Step S1084 then caches the prediction data set into the agent node if it is judged that being no.
In embodiments of the present invention, the target prediction model is deployed in the agent node of the distributed memory system On, when the target prediction model gets user request information, the target prediction model exports the predictive data set The size for the data volume closed, and include to the prediction data set judges.
If the data volume that the prediction data set includes is more than preset data amount, the data volume is not more than pre- If the prediction data set of data volume is cached to agent node;If the data volume that the prediction data set includes is less than Preset data amount, then the prediction data set that the data volume is less than to preset data amount are cached to agent node, wherein The preset quantity is not specifically limited in embodiments of the present invention by user's sets itself.
The embodiment of the present invention is by caching the prediction data set to agent node, to reduce distributed storage I/O number of system, improves the data reading performance using redundancy of distributed memory system.
Embodiment two:
The embodiment of the present invention additionally provides a kind of prediction meanss of data, and the prediction meanss of the data are for executing the present invention The prediction technique for the data that embodiment the above is provided below does the prediction meanss of data provided in an embodiment of the present invention It is specific to introduce.
Fig. 4 is according to a kind of prediction meanss schematic diagram of data of the embodiment of the present invention, as shown in figure 4, the machine learning The apparatus for evaluating of model effectiveness includes mainly:Acquisition device 10, processing unit 20, calibrating installation 30 and test device 40, wherein
The acquisition device 10 obtains target sequence, wherein wrapped in the target sequence for obtaining file access daily record Containing the access information for accessing user;
The processing unit 20 obtains target training sample and target detection sample for handling the target sequence This;
The calibrating installation 30 is used to the target training sample and the target detection sample being input to prediction model In, the prediction model is adjusted with the prediction result obtained according to the prediction model, obtains target prediction model;
The prediction meanss 40 are predicted for being analyzed user access request based on the target prediction model Data, and the data volume based on the prediction data set caches the prediction data set, wherein the prediction number According to the data acquisition system accessed in subsequent time for characterizing user.
In the embodiment of the present invention, user access request is analyzed by target prediction model, to obtain prediction number According to the mode of set, the data that user's future may access can be cached, reduce the number of system input/output, The reading efficiency for improving distributed memory system on the whole solves the visit existing in the prior art that data are carried out as user When asking, the problem of the data reading performance using redundancy of distributed memory system lower technology, invention achieves promote distributed storage The technique effect of the data reading performance using redundancy of system.
In addition, in the description of the embodiment of the present invention unless specifically defined or limited otherwise, term " installation ", " phase Even ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can Can also be electrical connection to be mechanical connection;It can be directly connected, can also indirectly connected through an intermediary, Ke Yishi Connection inside two elements.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete condition Concrete meaning in invention.
In the description of the present invention, it should be noted that term "center", "upper", "lower", "left", "right", "vertical", The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely to Convenient for the description present invention and simplify description, do not indicate or imply the indicated device or element must have a particular orientation, With specific azimuth configuration and operation, therefore it is not considered as limiting the invention.In addition, term " first ", " second ", " third " is used for description purposes only, and is not understood to indicate or imply relative importance.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of division of logic function, formula that in actual implementation, there may be another division manner, in another example, multiple units or component can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be by some communication interfaces, device or unit it is indirect Coupling or communication connection can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer read/write memory medium of a processor.Based on this understanding, of the invention Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention State all or part of step of method.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with Store the medium of program code.
Finally it should be noted that:Embodiment described above, only specific implementation mode of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, it will be understood by those of ordinary skill in the art that:Any one skilled in the art In the technical scope disclosed by the present invention, it can still modify to the technical solution recorded in previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover the protection in the present invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. a kind of prediction technique of data, which is characterized in that including:
File access daily record is obtained, target sequence is obtained, wherein includes the access information for accessing user in the target sequence;
The target sequence is handled, target training sample and target detection sample are obtained;
The target training sample and the target detection sample are input in prediction model, to be obtained according to the prediction model The prediction result arrived adjusts the prediction model, obtains target prediction model;
User access request is analyzed based on the target prediction model, obtains prediction data set, and based on described pre- The data volume of measured data set caches the prediction data set, wherein the predictive data set is shared to be used in characterization The data acquisition system that family is accessed in subsequent time.
2. according to the method described in claim 1, it is characterized in that, handle the target sequence, target training is obtained Sample data and target detection sample data include:
The target sequence is sorted out, multiple sub-goal sequences are obtained, wherein the sub-goal sequence includes each uses Access record of the family under different access request;
The multiple sub-goal sequence is cut according to preset ratio, the target training sample is obtained and the target is surveyed Sample sheet.
3. according to the method described in claim 2, it is characterized in that, sort out to the target sequence, multiple specific items are obtained Marking sequence includes:
Classify to the target sequence according to user information, obtain intermediate sub-goal sequence, wherein the intermediate sub-goal Sequence includes multiple access record of same user;
Based on the multiple first time interval accessed in record between any two connected reference record to the intermediate son Target sequence carries out subseries again, obtains the sub-goal sequence.
4. according to the method described in claim 3, it is characterized in that, continuously being visited based on any two in the multiple access record Ask that the first time interval between record carries out again subseries to the intermediate sub-goal sequence and includes:
Any two connected reference record corresponding to first time interval is classified as the record of the access under the same access request, Wherein, the first time interval is the time interval less than the first predetermined interval.
5. according to the method described in claim 1, it is characterized in that, obtaining file access daily record, obtaining target sequence includes:
Inquire the creation time of the file access daily record;
Calculate the second time interval between current time and the creation time;
If second time interval is more than the second prefixed time interval, file access daily record is stored to calculate node, To obtain the target sequence, wherein the calculate node is in document storage system for visiting the file Ask the node that daily record is handled.
6. according to the method described in claim 1, it is characterized in that, by the target training sample and the target detection sample It is input in prediction model, with the prediction result obtained according to the prediction model, adjusts the prediction model, it is pre- to obtain target Surveying model includes:
Build the prediction model;
The target training sample is inputted in the prediction model, the prediction model is trained;
The target detection sample is input in the prediction model after training, the prediction knot of the prediction model is obtained Fruit;
Based on the prediction result of the prediction model, the parameter of the prediction model after training is adjusted, is maximized The precision of prediction of the prediction model obtains the target prediction model.
7. according to the method described in claim 1, it is characterized in that, based on the target prediction model to user access request into Row analysis, obtains prediction data set, and the data volume based on the prediction data set carries out the prediction data set Caching includes:
According to the access request of user, the prediction data set for including the prediction data is obtained by the prediction model;
Judge whether the data volume of the prediction data set is more than preset data amount;
If it is judged that being yes, then the prediction data set is not cached into agent node, wherein the agent node For the node for storing the file access daily record and the prediction data set in document storage system;
If it is judged that being no, then the prediction data set is cached into the agent node.
8. a kind of prediction meanss of data, which is characterized in that described device includes:Acquisition device, processing unit, calibrating installation and Prediction meanss, wherein
The acquisition device obtains target sequence for obtaining file access daily record, wherein includes access in the target sequence The access information of user;
The processing unit obtains target training sample and target detection sample for handling the target sequence;
The calibrating installation is for the target training sample and the target detection sample to be input in prediction model, with root The prediction model is adjusted according to the prediction result that the prediction model obtains, obtains target prediction model;
The prediction meanss are used to analyze user access request based on the target prediction model, obtain prediction data, And the data volume based on the prediction data set caches the prediction data set, wherein the prediction data is used In the data acquisition system that characterization user accesses in subsequent time.
9. device according to claim 8, which is characterized in that the processing unit is additionally operable to:
The target sequence is sorted out, multiple sub-goal sequences are obtained, wherein the sub-goal sequence includes each uses Access record of the family under different access request;
The multiple sub-goal sequence is cut according to preset ratio, the target training sample is obtained and the target is surveyed Sample sheet.
10. device according to claim 9, which is characterized in that the processing unit is additionally operable to:
Classify to the target sequence according to user information, obtain intermediate sub-goal sequence, wherein the intermediate sub-goal Sequence includes multiple access record of same user;
Based on the multiple time interval accessed in record between any two connected reference record to the intermediate sub-goal Sequence carries out subseries again, obtains the sub-goal sequence.
CN201810120980.7A 2018-02-06 2018-02-06 Data prediction method and device Active CN108462605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810120980.7A CN108462605B (en) 2018-02-06 2018-02-06 Data prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810120980.7A CN108462605B (en) 2018-02-06 2018-02-06 Data prediction method and device

Publications (2)

Publication Number Publication Date
CN108462605A true CN108462605A (en) 2018-08-28
CN108462605B CN108462605B (en) 2022-03-15

Family

ID=63239787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810120980.7A Active CN108462605B (en) 2018-02-06 2018-02-06 Data prediction method and device

Country Status (1)

Country Link
CN (1) CN108462605B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109831801A (en) * 2019-01-04 2019-05-31 东南大学 The node B cache algorithm of user's behavior prediction based on deep learning neural network
CN110968272A (en) * 2019-12-16 2020-04-07 华中科技大学 Time sequence prediction-based method and system for optimizing storage performance of mass small files
CN111192170A (en) * 2019-12-25 2020-05-22 平安国际智慧城市科技股份有限公司 Topic pushing method, device, equipment and computer readable storage medium
WO2020177366A1 (en) * 2019-03-07 2020-09-10 平安科技(深圳)有限公司 Data processing method and apparatus based on time sequence data, and computer device
CN111830192A (en) * 2020-06-02 2020-10-27 合肥通用机械研究院有限公司 Air-mixed fuel gas combustion performance test system and test method thereof
CN111858469A (en) * 2020-07-24 2020-10-30 成都成信高科信息技术有限公司 Self-adaptive hierarchical storage method based on time sliding window
CN111970718A (en) * 2020-07-22 2020-11-20 西北工业大学 Deep learning-based power distribution method in energy collection untrusted relay network
CN113850929A (en) * 2021-09-18 2021-12-28 广州文远知行科技有限公司 Display method, device, equipment and medium for processing marked data stream
CN117370272A (en) * 2023-10-25 2024-01-09 浙江星汉信息技术股份有限公司 File management method, device, equipment and storage medium based on file heat

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140173070A1 (en) * 2012-12-13 2014-06-19 Microsoft Corporation Updating of digital content buffering order
CN106454437A (en) * 2015-08-12 2017-02-22 中国移动通信集团设计院有限公司 Streaming media service rate prediction method and device
CN107292388A (en) * 2017-06-27 2017-10-24 郑州云海信息技术有限公司 A kind of Forecasting Methodology and system of the hot spot data based on neutral net

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140173070A1 (en) * 2012-12-13 2014-06-19 Microsoft Corporation Updating of digital content buffering order
CN106454437A (en) * 2015-08-12 2017-02-22 中国移动通信集团设计院有限公司 Streaming media service rate prediction method and device
CN107292388A (en) * 2017-06-27 2017-10-24 郑州云海信息技术有限公司 A kind of Forecasting Methodology and system of the hot spot data based on neutral net

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109831801A (en) * 2019-01-04 2019-05-31 东南大学 The node B cache algorithm of user's behavior prediction based on deep learning neural network
WO2020177366A1 (en) * 2019-03-07 2020-09-10 平安科技(深圳)有限公司 Data processing method and apparatus based on time sequence data, and computer device
CN110968272A (en) * 2019-12-16 2020-04-07 华中科技大学 Time sequence prediction-based method and system for optimizing storage performance of mass small files
CN111192170A (en) * 2019-12-25 2020-05-22 平安国际智慧城市科技股份有限公司 Topic pushing method, device, equipment and computer readable storage medium
CN111192170B (en) * 2019-12-25 2023-05-30 平安国际智慧城市科技股份有限公司 Question pushing method, device, equipment and computer readable storage medium
CN111830192B (en) * 2020-06-02 2022-05-31 合肥通用机械研究院有限公司 Air-mixed fuel gas combustion performance test system and test method thereof
CN111830192A (en) * 2020-06-02 2020-10-27 合肥通用机械研究院有限公司 Air-mixed fuel gas combustion performance test system and test method thereof
CN111970718A (en) * 2020-07-22 2020-11-20 西北工业大学 Deep learning-based power distribution method in energy collection untrusted relay network
CN111970718B (en) * 2020-07-22 2022-03-11 西北工业大学 Deep learning-based power distribution method in energy collection untrusted relay network
CN111858469A (en) * 2020-07-24 2020-10-30 成都成信高科信息技术有限公司 Self-adaptive hierarchical storage method based on time sliding window
CN111858469B (en) * 2020-07-24 2024-01-26 成都成信高科信息技术有限公司 Self-adaptive hierarchical storage method based on time sliding window
CN113850929A (en) * 2021-09-18 2021-12-28 广州文远知行科技有限公司 Display method, device, equipment and medium for processing marked data stream
CN117370272A (en) * 2023-10-25 2024-01-09 浙江星汉信息技术股份有限公司 File management method, device, equipment and storage medium based on file heat

Also Published As

Publication number Publication date
CN108462605B (en) 2022-03-15

Similar Documents

Publication Publication Date Title
CN108462605A (en) A kind of prediction technique and device of data
EP3299972B1 (en) Efficient query processing using histograms in a columnar database
US7756881B2 (en) Partitioning of data mining training set
CN105279240B (en) The metadata forecasting method and system of client origin information association perception
US20110258176A1 (en) Minimizing Visibility of Stale Content in Web Searching Including Revising Web Crawl Intervals of Documents
CN107862173A (en) A kind of lead compound virtual screening method and device
Koskela et al. Web cache optimization with nonlinear model using object features
Tashkova et al. Parameter estimation with bio-inspired meta-heuristic optimization: modeling the dynamics of endocytosis
CN108446340A (en) A kind of user's hot spot data access prediction technique towards mass small documents
CN103559300B (en) The querying method and inquiry unit of data
CN109634924A (en) File system parameter automated tuning method and system based on machine learning
Malensek et al. Fast, ad hoc query evaluations over multidimensional geospatial datasets
CN112668688B (en) Intrusion detection method, system, equipment and readable storage medium
CN107229575A (en) The appraisal procedure and device of caching performance
CN110347706A (en) For handling method, Database Systems and the computer readable storage medium of inquiry
CN109154933A (en) Distributed data base system and distribution and the method for accessing data
WO2018194565A1 (en) Monitoring the thermal health of an electronic device
US9679036B2 (en) Pattern mining based on occupancy
CN109074313A (en) Caching and method
CN108241864A (en) Server performance Forecasting Methodology based on multivariable grouping
Rammer et al. Alleviating i/o inefficiencies to enable effective model training over voluminous, high-dimensional datasets
CN114580791B (en) Method and device for identifying working state of bulking machine, computer equipment and storage medium
CN116453209A (en) Model training method, behavior classification method, electronic device, and storage medium
Nimishan et al. An approach to improve the performance of web proxy cache replacement using machine learning techniques
CN114881343A (en) Short-term load prediction method and device of power system based on feature selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant