CN101882155A

CN101882155A - Statistical method and device of file prediction accuracy

Info

Publication number: CN101882155A
Application number: CN 201010205803
Authority: CN
Inventors: 程旭; 何俊; 管雪涛
Original assignee: BEIDA ZHONGZHI MICROSYSTEM SCIENCE AND TECHNOLOGY Co Ltd BEIJING
Current assignee: Beijing Zhongzhi Core Technology Co Ltd
Priority date: 2010-06-22
Filing date: 2010-06-22
Publication date: 2010-11-10
Anticipated expiration: 2030-06-22
Also published as: CN101882155B

Abstract

The invention provides a statistical method and a device of file prediction accuracy. The method comprises the following steps of: configuring a first statistic chain and a second statistic chain, wherein the first statistic chain and the second statistic chain have the same statistical period, and the starting time of the first statistic chain differs by half of the statistical period from that of the second statistic chain; controlling the first statistic chain and the second statistic chain to start from respective starting time, and counting the result of candidate prediction in the first half period of the statistical period; and continuously counting the result of the candidate prediction in the second half statistical period, and calculating and outputting the accuracy of the candidate prediction according to the currently counted results of the candidate prediction in the period. The method not only can reduce the occasional undulatory property of the prediction to the lowest point, but also can ensure the continuity of performance evaluation output.

Description

A kind of statistical method of file prediction accuracy and device

Technical field

The present invention relates to file system, file prefetching algorithm and file access behavior modeling and quantitative analysis method.

Background technology

Because in the computer memory system, big data quantity memory devices such as tape, disk are because the restriction of self mechanical property, and the room for promotion of message transmission rate is limited.Therefore high speed access equipment (as internal memory) is increasing with low speed access means (as disk, tape) access speed gap.

In general the access speed of disk has limited the raising of computing machine overall performance to a large extent.But along with the appearance of caching technology, this situation gets a new look.After the required data of system are loaded into the internal memory from disk, can reside in the internal memory for a long time.Follow-up visit to identical data can visit again disk and directly obtain from internal memory.

But caching technology is a kind of method of passive acceleration disk file addressing speed.No matter be which type of caching technology, for the first time data access always need be waited for that the operation of disk is finished just and can carry out.If the file of visit is many and less to the time ratio of each file access, caching technology almost can't improve the data access speed of system so.

Just because of this, file is looked ahead and is widely used as a kind of method of initiatively quickening file access speed.Because the time and the spatial locality of data access, the follow-up access file of a file has predictability to a certain extent.Itself have certain cost but look ahead, file is looked ahead and can be increased the weight of system data processing load.The frequency of failure is more if look ahead, and can make to have a strong impact on system performance.

In this case, the accuracy of looking ahead becomes an important evaluation index whether implementing the behavior of looking ahead.Have only when the accuracy of looking ahead reaches certain threshold value, think that just the enforcement meeting of the behavior of looking ahead brings lifting to performance.The nearest historical statistics of file prediction is mainly used in the assessment of accuracy of looking ahead.

Referring to Fig. 1, the figure shows the division that a kind of file is read measurement period in advance, wherein, a file prefetching algorithm can provide a plurality of candidate's predictions, predicts P for a candidate _AB, P in candidate's forecasting sequence _ABPredict the outcome and be divided into several measurement periods (statistical cycles), contain 2N P in each measurement period _ABCandidate prediction.Each measurement period further is divided into preheating (warm-up) and uses (in-use) two stages (respectively accounting for measurement period half): wherein warm-up phase is only added up P _ABPrediction result does not provide P _ABPrediction accuracy; Operational phase also can provide P outside continuing to finish the statistical work that relates in the warm-up phase _ABCorrect probability.

The applicant notices division and the processing mode of reading measurement period according to file shown in Figure 1 in advance by deep research, and the preceding half period in one-period can't provide P _ABCorrect probability statistics value, and, at the back P that provides of half period _ABCorrect probability statistics value is the statistical value in the one-period, because file access has temporal locality, therefore statistical information can not reflect and estimate the degree of accuracy that a certain prediction is current effectively for a long time.Certainly, too short statistical information of time also can't reflect the lasting accuracy of a prediction effectively.Therefore, how to design a kind of effective statistical method problem that will solve required for the present invention just.

Summary of the invention

The technical problem to be solved in the present invention is, a kind of statistical method and device of file prediction accuracy is provided, and not only the accidental fluctuation of prediction can be dropped to minimumly, and can guarantee the continuity of Performance Evaluation output.

In order to solve the problems of the technologies described above, the present invention proposes a kind of statistical method of file prediction accuracy, comprising:

Dispose one first statistic chain and one second statistic chain, described first statistic chain has identical measurement period with described second statistic chain, the described measurement period of phase difference of half start-up time of described first statistic chain and described second statistic chain;

Control described first statistic chain and described second statistic chain from separately start-up time, in the preceding half period of described measurement period, add up result of candidate prediction; In half measurement period in back, continue the statistics result of candidate prediction, and calculate and export the accuracy of candidate's prediction according to the current result of candidate prediction that counts in this cycle.

Further, above-mentioned statistical method also can have following characteristics:

In described measurement period, preceding half period is a warm-up phase, and back half period is an operational phase;

Described first statistic chain is when warm-up phase, and described second statistic chain is in operational phase; Perhaps described first statistic chain is when operational phase, and described second statistic chain is at warm-up phase.

Described candidate prediction is the prediction about the follow-up access file of a file that goes out according to default file prediction algorithm computation.

In order to solve the problems of the technologies described above, the present invention also proposes a kind of statistical processing device of file prediction accuracy, comprises configuration module and statistical treatment module, wherein:

Described configuration module, in order to dispose one first statistic chain and one second statistic chain, described first statistic chain has identical measurement period with described second statistic chain, the described measurement period of phase difference of half start-up time of described first statistic chain and described second statistic chain;

Described statistical treatment module in order to control described first statistic chain that disposes in the described configuration module and described second statistic chain from separately start-up time, is added up result of candidate prediction in the preceding half period of described measurement period; In half measurement period in back, continue the statistics result of candidate prediction, and calculate and export the accuracy of candidate's prediction according to the current result of candidate prediction that counts in this cycle.

Further, said apparatus also can have following characteristics:

In the measurement period of described configuration module configuration, preceding half period is a warm-up phase, and back half period is an operational phase;

Described statistical treatment module is controlled described first statistic chain when warm-up phase, and described second statistic chain is in operational phase; Perhaps control described first statistic chain when operational phase, described second statistic chain is at warm-up phase.

Further, said apparatus also can have following characteristics:

The statistical method of a kind of file prediction accuracy provided by the invention and device with respect to prior art, have following advantage:

The first, the statistics of predicting all about the candidate all is to accumulate through phase of history, can drop to the accidental fluctuation of prediction minimum;

The second, (individual the predicting the outcome of N＜=M＜=2N) guaranteed the estimation to " current " performance all to be to use the M of nearest history about the estimation of the precision of prediction of candidate prediction;

The 3rd, any time can both be exported the accuracy of candidate's prediction, has guaranteed the continuity of Performance Evaluation output.

Based on above-mentioned 3 points, adopt technical solution of the present invention can fully excavate the file access behavior and the pattern of current system.

Description of drawings

Fig. 1 is that file is read the division of measurement period in advance and handled synoptic diagram in the prior art;

Fig. 2 is a kind of file prediction precision statistics of embodiment of the invention method flow diagram;

Fig. 3 A and Fig. 3 B are the concrete synoptic diagram of implementing of a kind of file prediction precision statistics of embodiment of the invention method;

Fig. 4 is a kind of file prediction precision statistics of embodiment of the invention device block scheme.

Embodiment

Referring to Fig. 2, the figure shows a kind of file prediction precision statistics of embodiment of the invention method, comprise the steps:

Step S201: dispose one first statistic chain and one second statistic chain, described first statistic chain has identical measurement period with described second statistic chain, the described measurement period of phase difference of half start-up time of described first statistic chain and described second statistic chain;

Step S202: control described first statistic chain and described second statistic chain from separately start-up time, in the preceding half period of described measurement period, add up result of candidate prediction; In half measurement period in back, continue the statistics result of candidate prediction, and calculate and export the accuracy of candidate's prediction according to the current result of candidate prediction that counts in this cycle.

Result of candidate prediction comprises that the candidate predicts correct and candidate's prediction error; The accuracy of candidate prediction is meant that certain file prediction predict that in a period of time correct number of times accounts for the number percent that all predict number of times.

The preceding half period of described measurement period is a warm-up phase, and back half period is an operational phase.

Described first statistic chain and described second statistic chain are carried out warm-up phase at preceding half measurement period and are handled all from separately zero-time, carry out operational phase and handle in half measurement period in back.Because the embodiment of the invention designs 2 statistic chains dexterously, and the described measurement period of phase difference of half start-up time of 2 statistic chains, therefore, described first statistic chain is when warm-up phase, and described second statistic chain is in operational phase; Perhaps, described first statistic chain is when operational phase, and described second statistic chain is at warm-up phase, as shown in Figure 3.Adopt embodiment of the invention technical scheme, can guarantee effectively that any time all has a statistic chain can export the accuracy of candidate's prediction, and then can export the accuracy of current candidate's prediction in real time.

Carrying out the warm-up phase processing comprises: the statistics result of candidate prediction.The statistics of warm-up phase is through phase of history accumulation (half measurement period), therefore the accidental fluctuation of prediction can be dropped to minimumly, and it mainly is used to eliminate the undulatory property that may occur in the statistics.

Carrying out the operational phase processing comprises: go back the calculated candidate prediction accuracy in the statistics result of candidate prediction.Operational phase is the statistics semiperiod of also being responsible for output on the preheating basis, operational phase can be exported stable and up-to-date statistics, because an operational phase continues the identical time with preheating, this has guaranteed the stable of data, can also reflect nearest historical situation simultaneously.

Warm-up phase is handled and operational phase is handled because described first statistic chain and described second statistic chain are all carried out in a measurement period, and therefore, the accuracy of the candidate of its output prediction has not only been eliminated the undulatory property that may occur, and is very stable; And have the characteristics of real-time output statistics, and guaranteed estimation to " current " performance, guaranteed the continuity of Performance Evaluation output.

The prediction that described candidate prediction (Candidate Prediction) is meant that certain specific file prediction algorithm (as Last Successor, Static Successor, Recent Popularity etc.) provides about the follow-up access file of a file.Usually use P _ABRepresent candidate's prediction, this candidate's prediction points out that the follow-up access file of file A is the probability of B.

Owing to contain 2N P in each measurement period _ABCandidate prediction, and therefore the described measurement period of phase difference of half start-up time of described first statistic chain and described second statistic chain, differs N P between described first statistic chain and described second statistic chain _ABThe result.

The said method that the embodiment of the invention provides has taken into full account in the temporal locality of data access and the statistics " the jolting " that may occur, and uses alternately the method for statistics to estimate certain candidate's accuracy of predicting in the prefetching algorithm implementation process efficiently cleverly.Application of the present invention is not limited to the estimation of file prefetching algorithm degree of accuracy, also is used in the statistical modeling process that other have similar demand.

In order to realize said method, the embodiment of the invention also provides a kind of statistic device of file prediction accuracy, comprises configuration module and statistical treatment module, wherein:

Described configuration module, in order to dispose one first statistic chain and one second statistic chain, described first statistic chain has identical measurement period with described second statistic chain, the described measurement period of phase difference of half start-up time of described first statistic chain and described second statistic chain.

Further, in the measurement period of described configuration module configuration, preceding half period is a warm-up phase, and back half period is an operational phase.Described statistical treatment module is controlled described first statistic chain when warm-up phase, and described second statistic chain is in operational phase; Perhaps control described first statistic chain when operational phase, described second statistic chain is at warm-up phase.

Further, described candidate prediction is the prediction about the follow-up access file of a file that goes out according to default file prediction algorithm computation.

Certainly; the present invention also can have other various embodiments; under the situation that does not deviate from spirit of the present invention and essence thereof; those skilled in the art work as can make various corresponding changes and distortion according to the present invention, but these corresponding changes and distortion all should belong to the protection domain of the appended claim of the present invention.

Claims

1. the statistical method of a file prediction accuracy comprises:

2. the method for claim 1 is characterized in that,

3. the method for claim 1 is characterized in that:

4. the statistic device of a file prediction accuracy is characterized in that, comprises configuration module and statistical treatment module, wherein:

5. device as claimed in claim 4 is characterized in that:

6. device as claimed in claim 4 is characterized in that: