CN103995869B

CN103995869B - Data-caching method based on Apriori algorithm

Info

Publication number: CN103995869B
Application number: CN201410214776.3A
Authority: CN
Inventors: 张莉; 郭昆; 杨乐游
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2014-05-20
Filing date: 2014-05-20
Publication date: 2017-02-22
Anticipated expiration: 2034-05-20
Also published as: CN103995869A

Abstract

The invention discloses a data-caching method based on an Apriori algorithm. A query log is established for the condition attribute in a magnetic disk, the query frequency of each data block is computed, a plurality of data blocks with high query frequency form a frequent data block set, the query frequency of the condition attributes in the frequent data block set is computed, and a plurality of condition attributes with high query frequency form a frequent condition attribute set. A frequent condition attribute group set is obtained through the Apriori algorithm, the query frequency is mapped into the supporting degree in the Apriori algorithm, the frequent condition attribute group set is obtained, data corresponding to the frequent condition attribute group set are cached in an internal storage, and an index is established for the frequent condition attributes. According to the data-caching method, data query efficiency can be obviously improved in a frequent zone, compared with a single condition attribute, a plurality of condition attribute groups are cached, higher query efficiency is achieved, then database searching pressure is lowered, and higher query efficiency is achieved.

Description

A kind of data cache method based on Apriori algorithm

Technical field

The invention belongs to data query technique field is and in particular to a kind of data cache method based on Apriori algorithm.

Background technology

The rise of the social networking application such as developing rapidly with internet in the last few years, especially microblogging, wechat, data volume is quick-fried The growth of fried property, 2011, the mankind formally entered the ZB epoch.It will be recognized that, we have lived in the big data epoch. However, big data has just had been assigned since being born, and value density is low, wide variety feature, this also determines mass data and exists Problems will be faced during inquiry.In the case that data scale is less big, traditional relevant database has good Performance, high stable type, experience all sorts of history test.But when data volume reaches certain scale, for relational database, efficiency is Extremely low, insufferable.Sum it up, relevant database can not meet the big data epoch reading to database high concurrent Demand, the high efficiency storage to mass data and the demand accessing and the enhanced scalability to database and the high availability write Demand.

The discovery of problem has expedited the emergence of out new technology NoSQL.NoSQL implies that " being not only SQL ", is non-relational The generalized definition of data storage.It has broken the situation of relevant database and ACID theory big unification for a long time.NoSQL number Do not need the table structure of fixation according to storage, generally also there is not attended operation.Big data access possesses relevant database Incomparable performance advantage.However, the NoSQL database of current main-stream is many realizes data buffer storage mechanism using LIRS algorithm, But LIRS algorithm effectively cannot be counted to the data of frequently inquiry in the long period it is impossible to take targetedly strategy Cache data to be checked.

Content of the invention

In view of the shortcomings of the prior art, the invention provides a kind of data cache method based on Apriori algorithm.

The technical scheme is that：

A kind of data cache method based on Apriori algorithm, comprises the following steps：

Step 1：Record the conditional attribute in user's query statement in T days in disk, set up T inquiry in units of sky Daily record, i.e. user's inquiry content.

Step 2：Calculate the inquiry frequent degree of each data block in inquiry log, inquire about the big of frequent degree according to obtaining data block Little acquisition inquires about the high multiple data blocks of frequent degree, forms frequent data item set of blocks.

Step 2.1：Determine the inquiry times of data in each data block in T inquiry log.

Step 2.2：Standardization processing is carried out to the inquiry times of the data in each data block：Recent daily record ratio area is set Divide recent daily record data and history log data, when the inquiry times of the history log data in data block are higher than history log number During according to inquiry times upper limit threshold, then the inquiry times value of this history log data is this upper limit threshold；When in data block When in the recent period the inquiry times of daily record data are higher than recent daily record data inquiry times upper limit threshold, then the looking into of this recent daily record data Inquiry number of times value is this upper limit threshold.

Step 2.3：The inquiry times of data in the data block after standardization processing are weighted operating：Respectively to specification Average after the inquiry times weighted sum of data in each data block in T inquiry log after change process, that is, obtain each number Inquiry frequent degree according to block.

Step 2.4：The high multiple data blocks of inquiry frequent degree, i.e. frequency are selected according to the size that each data block inquires about frequent degree Numerous data block, each frequent data item block forms frequent data item set of blocks.

Step 3：The conditional attribute formation condition community set of each frequent data item block.

Step 4：The inquiry frequent degree of each conditional attribute in design conditions community set, looks into according to obtaining conditional attribute The size asking frequent degree obtains the high multiple conditional attributes of inquiry frequent degree, forms frequent conditional attribute set.

Step 4.1：Determine the inquiry times of each conditional attribute in frequent data item block in T inquiry log.

Step 4.2：Standardization processing is carried out to the inquiry times of each conditional attribute：Distinguished in the recent period according to recent daily record ratio Log conditions attribute and history log conditional attribute, when the inquiry times of history log conditional attribute are higher than that history log condition belongs to Property inquiry times upper limit threshold when, then the inquiry times value of this history log conditional attribute be this upper limit threshold；When recent day When the inquiry times of will conditional attribute are higher than recent Log conditions attribute query number of times upper limit threshold, then this recent Log conditions belongs to Property inquiry times value be this upper limit threshold.

Step 4.3：Conditional attribute inquiry times after standardization processing are weighted operate：Respectively to standardization processing Sue for peace after the inquiry times weighting of each conditional attribute in T days afterwards and average, that is, the inquiry obtaining each conditional attribute is frequent Degree.

Step 4.4：According to the inquiry frequent degree of each conditional attribute obtaining, select the high multiple conditional attributes of frequent degree, I.e. frequent conditional attribute, each frequent conditional attribute forms frequent conditional attribute set.

Step 5：Obtain frequent conditional attribute group set using Apriori algorithm and frequent conditional attribute set, condition belongs to Property the inquiry frequent degree support that is mapped as in Apriori algorithm, the frequent item set that Apriori algorithm obtains is frequent bar Part set of properties set.

Step 6：By corresponding for frequent conditional attribute group set data buffer storage to internal memory, and to frequent conditional attribute set In frequent conditional attribute set up index, complete data buffer storage.

Step 7：When client needs to carry out data query, according to the conditional attribute of data to be inquired about, inquired about Operation：To the frequent conditional attribute of all memory caches of conditional attribute of the data of inquiry, then directly obtain Query Result； To inquiry data a conditional attribute part for memory cache frequent conditional attribute, then according to this partly frequently condition belong to Meet the data of this partial condition attribute in the search index disk database of property, complete inquiry operation；Data to inquiry Conditional attribute all not in the frequent conditional attribute set of memory cache, then from disk load data block carry out inquiry behaviour Make.

The beneficial effects of the present invention is it is proposed that a kind of brand-new data cache method, in conjunction with NoSQL database, Frequent data item row buffer area is opened up, notebook data caching method can significantly improve data in frequent region in back end internal memory Search efficiency, and for the data query in other regions, due to not doing any process, therefore do not interfere with its inquiry operation, delay Deposit multiple conditional attribute groups and compare single conditional attribute and there is higher search efficiency, for conditional attribute group conditional attribute Number is in the caching of pilot-scale, although sacrificing the complete hit rate of part of cache, such caching can be completeer with flying colors Become the work of simplifying of intermediate record, reduce the intermediate result set producing because of the hit of partial condition attribute in internal memory, and according to frequency Numerous conditional attribute indexes rapidly locating, and then the retrieval pressure of mitigation database, achieves higher search efficiency.

Brief description

Fig. 1 is running environment HBase partition data table procedure chart in the specific embodiment of the invention；

Fig. 2 is improved data query procedure chart in the specific embodiment of the invention；

Fig. 3 is the data cache method flow chart in the specific embodiment of the invention based on Apriori algorithm；

Fig. 4 is querying condition attribute difference hit situation process chart in the specific embodiment of the invention；

Fig. 5 is different cache way search efficiency comparison diagrams in the specific embodiment of the invention；

Fig. 6 is different cache way conditional attribute hit situation comparison diagrams in the specific embodiment of the invention.

Specific embodiment

Specific embodiments of the present invention is described in detail below in conjunction with the accompanying drawings.

Present embodiment, under Hadoop-HBase environment, uses Sina weibo to inquiry data and user query behavior User data has carried out analogue simulation, makes T=7, emulation data is divided into 7 equal portions, to simulate the inquiry log of different time.

HBase is a NoSQL database towards row, and it runs on HDFS literary composition as a part for Hadoop project On part system.In terms of digital independent, HBase takes by row storage method, compared to by row storage method, decreases data In reading process, the reading of redundant data, improves data reading performance using redundancy, makes data retrieval more effective rapidly.In storage side Face, larger tables of data is divided into some data areas by HBase, i.e. data block, each zone sequence data table memory In a number of record, by multiple relevant ranges union operation, you can obtain complete table information.HBase tables of data is split Process is as shown in Figure 1.

Region corresponding data block concept in HBase, based on the data cache method of present embodiment, according to data query Situation filters out inquiry frequently data area, and that is, frequent data item is fast, by the frequent condition in some for frequent degree highest regions Attribute data caches to core buffer.When the data in data area is accessed, according to querying condition attribute and internal memory In caching hit situation, carry out different data query operation.Under HBase environment, data query process such as Fig. 2 institute Show, client sends inquiry request to data area server, data area server returns Query Result according to request for information Or inquire about further, to the frequent conditional attribute of all memory caches of conditional attribute of the data of inquiry, then directly obtain Query Result；A conditional attribute part to the data of inquiry is the frequent conditional attribute of memory cache, then according to this part Meet the data of this partial condition attribute in the search index disk database of conditional attribute, complete inquiry operation；To inquiry Data conditional attribute all not in the frequent conditional attribute set of memory cache, then from disk load data block looked into Ask operation.And the memory node in Hadoop layer is then responsible for loading data in magnetic disk and execution inquiry operation.

The data cache method based on Apriori algorithm of present embodiment is as shown in figure 3, comprise the following steps：

Step 1 records the conditional attribute in user's query statement in T days in disk in units of sky, sets up T inquiry Daily record, i.e. user's inquiry content.

Present embodiment conditional attribute is age of Sina weibo user, sex, location, the registration date, online The personal information of time, in implementation process, creates and builds 7 inquiry logs, represents user's inquiry record of nearest 7 days.

Step 2：Calculate the inquiry frequent degree of each data block block in inquiry log, data block inquiry is frequent according to obtaining The size of degree obtains the high multiple data blocks of inquiry frequent degree, i.e. frequent data item set of blocks block_fd.Assume to have 3 data Block, respectively block₁、block₂、block₃, data block block₁Inquiry frequent degree calculating process is as follows：

Step 2.1：Determine the inquiry times of each data block data in 7 inquiry logs.

Counted according to inquiry log and obtain block₁Inquiry times Count (t) in 7 days.According to statistics, block₁Related Inquiry inquiry times Count (t) when t value is 0,1,2,3,4,5,6 are respectively 1350,1433,1236,1546,1354, 1029,1175.

Step 2.2：Standardization processing is carried out to the inquiry times of the data in each data block：Recent daily record ratio area is set Divide recent daily record data and history log data, when the inquiry times of the history log data in data block are higher than history log number During according to inquiry times upper limit threshold, then this history log data inquiry times value is this upper limit threshold；When near in data block When the inquiry times of phase daily record data are higher than recent daily record data inquiry times upper limit threshold, then the inquiry of this recent daily record data Number of times value is this upper limit threshold.

Standardization processing is carried out to inquiry times Count (t) of the data in each data block：Recent daily record ratio is set q_recDistinguish recent daily record data and history log data, q_recSetting, span 0 ＜ q are actually needed according to user_rec＜ 1, Present embodiment takes q_rec=0.3, then as t ＜ q_recDuring × T, that is, first 5 days inquiry log data belong to history log data, work as t ≥q_recDuring × T, that is, nearest 2 days inquiry log data belong to recent daily record data, for the history log data in data block, Setting history log data inquiry times upper limit threshold Max_hisIt is generally the case that Max_hisAll records should be set to averagely look into Ask number of times 1.5 times, Max_his=1400, when this data block inquiry times is higher than history log data inquiry times upper limit threshold, Then this several piece is this upper limit threshold according to inquiry times value, to daily record data in the near future, arranges recent daily record data inquiry times Upper limit threshold Max_rec, Max_rec2 times of all record the mean search frequencies should be set to, Max_rec=1700, when this data block is looked into When asking number of times higher than recent daily record data inquiry times upper limit threshold, then this data block inquiry times value is this upper limit threshold, Standardization processing is carried out to inquiry times Count (t) according to normalizing (1)：

Due in step 2.1, as t=1 and t=3, relevant inquiring number of times has exceeded in history log data inquiry times Limit threshold value, therefore Count (1)=Count (3)=Max_his=1400, Count_std1T () is 1350,1400,1236,1400, 1354,1029,1175.

By standardization processing is carried out to inquiry times, can avoid to a certain extent because indivedual skies inquiry times too high And lead to conditional attribute to inquire about the situation of frequent degree virtual height.

Step 2.3：To data block inquiry times Count after standardization processing_std1T () is weighted operating：Right respectively Average after the inquiry times weighted sum of data in each data block in 7 inquiry logs after standardization processing, that is, obtain This data block inquires about frequent degree FD_block：

Wherein Count_std1T () is the inquiry times of data in the data block after standardization processing, W (t) is weighting function, For increasing function.

With monotonically increasing direct proportion type function, the correspondence department in first quartile is allocated as weighting function present embodiment, I.e. W (t)=t+1, wherein 0≤t≤6

Calculate block₁Frequent degree：

Step 2.4：The high multiple data blocks of inquiry frequent degree, i.e. frequency are selected according to the size that each data block inquires about frequent degree Numerous data block, each frequent data item block forms frequent data item set of blocks, wherein block₂Inquiry frequent degree is 5973.13648, block₃Inquiry frequent degree is 5294.65.Data block is carried out size sequence, acquisition data block is interior after suing for peace successively to be existed in 1G The high multiple data blocks of inquiry frequent degree, wherein block₂Belong to frequent data item block.

Step 3：The conditional attribute formation condition community set of each frequent data item block：Age of user in present embodiment, Sex, location, registration date, the conditional attribute collection of line duration are combined into conditional attribute set.

Step 4：The inquiry frequent degree of each conditional attribute in design conditions community set, looks into according to obtaining conditional attribute The size asking frequent degree obtains the high multiple conditional attributes of inquiry frequent degree, forms frequent conditional attribute set.With age condition As a example attribute, conditional attribute inquiry frequent degree calculating process is as follows：

Step 4.1：Determine the inquiry times of each conditional attribute in frequent data item block in 7 inquiry logs, this embodiment party In formula the inquiry related to age conditional attribute t value be 0,1,2,3,4,5,6 when inquiry times be respectively 130,135, 125、160、110、115、120.

Standardization processing is carried out to conditional attribute inquiry times, according to recent daily record ratio q_rec=0.3, first 5 days inquiry days Will conditional attribute belongs to history log conditional attribute, and the inquiry log conditional attribute of nearest 2 days belongs to recent Log conditions attribute, For history log conditional attribute, history log conditional attribute inquiry times upper limit threshold Max is set_his, Max_his=140, when When this conditional attribute inquiry times is higher than history log conditional attribute inquiry times upper limit threshold, then the inquiry of this conditional attribute time Number value is this upper limit threshold.To Log conditions attribute in the near future, recent Log conditions attribute query number of times upper limit threshold is set Max_rec, Max_rec=150, when this conditional attribute inquiry times is higher than recent Log conditions attribute query number of times upper limit threshold, Then this conditional attribute inquiry times value is this upper limit threshold, according to normalizing (1) to conditional attribute inquiry times Count T () carries out standardization processing, due to obtaining as t=3 in step 4.1, Count (3)=160, and exceed history log and looked into Ask number of times upper limit threshold, therefore make Count (3)=Max_his=140.Count_std2(t) Count (stt) d be 130,135,125, 140、110、115、120.

Step 4.3：Conditional attribute inquiry times after standardization processing are weighted operate：Respectively to standardization processing Sue for peace after the inquiry times weighting of each conditional attribute in 7 days afterwards and average, that is, the inquiry obtaining each conditional attribute is frequent Degree FD_sa：

Same with monotonically increasing direct proportion type function, the correspondence department in first quartile is allocated as weighting function, i.e. W (t) =t+1, wherein 0≤t≤6, calculate the frequent degree of age conditional attribute：

Step 4.4：According to the inquiry frequent degree of each conditional attribute obtaining, select the high multiple conditional attributes of frequent degree, I.e. frequent conditional attribute, each frequent conditional attribute forms frequent conditional attribute set, and the inquiry frequent degree at wherein age is 487.8571, the inquiry frequent degree of sex is 539.2857143, and the inquiry frequent degree of location is 632.1428571, registration The inquiry frequent degree on date is 217.1429, and the inquiry frequent degree of line duration is 103.4923.

Step 5.1：Make A₁=φ, if k is current highest frequent conditional attribute group length, as k=1, represents that length is 1 Frequent conditional attribute group set A₁.

Step 5.2：Count each conditional attribute in frequent conditional attribute set and inquire about frequent degree, wherein age, sex, place Area, the registration date, line duration the corresponding frequent degree of conditional attribute be respectively 487.8571,539.2857,632.1428, 217.1429th, 103.4923, minimum frequent degree threshold value min of setting_fd=175, by all more than or equal to minimum frequent degree threshold value Age, sex, location, registration date, the conditional attribute of line duration put into A₁In, obtain the frequent condition that length is 1 Set of properties set A₁.

Step 5.3：To A₁In element do according to condition attribute-name and be referred to as dictionary and sort and carry out Nature Link, obtain length Frequent conditional attribute group Candidate Set C for 2₂, wherein C₂Including location registration date, location age, location Area's sex, age-sex, age-registration date, sex-registration date.

Step 5.4：Make A₂=φ, inquires about C₂In each conditional attribute group, and retrieve all frequent conditional attribute set, system Meter C₂In each conditional attribute group inquiry frequent degree, wherein location registration date, location age, location The corresponding frequent degree of set of properties such as area's sex, age-sex, age-registration date, sex-registration date is respectively 202.14285th, 339.2857,401.4285,321.4285,98.4957,135.671, conditional attribute group frequent degree is more than etc. Regional registration date, location age, location sex, age-sex etc. in minimum frequent degree threshold value Conditional attribute group puts into the frequent conditional attribute group set A that A2 length is 2₂In.

Step 5.5：To A₂In element according to condition attribute-name is referred to as dictionary and sorts and carry out Nature Link, obtaining length is 3 frequent conditional attribute group Candidate Set C₃, wherein C₃Including regional Sex, Age conditional attribute group.

Step 5.6：Make A₃=φ, inquires about C₃In each conditional attribute group, and retrieve all frequent conditional attribute set, system Meter C₃The inquiry frequent degree of conditional set of properties, wherein regional this corresponding frequent degree of conditional attribute group of Sex, Age is divided Not Wei 183.5714286, conditional attribute group frequent degree is more than or equal to the regional Sex, Age of minimum frequent degree threshold value Conditional attribute group puts into the frequent conditional attribute group set A that length is 3₃In.

Step 5.7：To A₃In element according to condition attribute-name is referred to as dictionary and sorts and carry out Nature Link, obtaining length is 4 frequent conditional attribute group Candidate Set C₄, wherein C₄=φ.

Step 5.8：Obtain the frequent querying condition set of properties collection A, wherein A=∪ of each length in inquiry log_kA_k=A₁∪ A₂∪…∪A_k：

Length be 1 frequent conditional attribute has age, sex, location, the registration date, corresponding frequent degree is respectively 487.8571、539.2857、632.1428、217.1429.

Length is that 2 frequent conditional attribute group has location registration date, regional age, regional sex, year Not, corresponding frequent degree is respectively 202.14285,339.2857,401.4285,321.4285 to rheological properties.

Length is that 3 frequent conditional attribute group has the place provincialism other age, and frequent degree is 183.5714286.

In internal memory, only cache 3 row conditional attribute data, have 3 groups of cache way, in the 1st group of memory cache caching the age, Sex, location data, caching location age, regional gender data in the 2nd group of memory cache, due to location Area's conditional attribute repeats, therefore is not take up memory headroom, caching location Sex, Age data in the 3rd group of memory cache.

Step 7：When client needs to carry out data query, according to the conditional attribute of data to be inquired about, inquired about Operation：To the frequent conditional attribute of all memory caches of conditional attribute of the data of inquiry, then directly obtain Query Result； To inquiry data a conditional attribute part for memory cache frequent conditional attribute, then according to this partly frequently condition belong to Meet the data of this partial condition attribute in the search index disk database of property, complete inquiry operation；Data to inquiry Conditional attribute all not in the frequent conditional attribute set of memory cache, then from disk load data block carry out inquiry operation Property all not in the frequent conditional attribute set of memory cache, that is, miss, then from disk load data block carry out inquiry behaviour Make.

Real data is inquired about, has 3 kinds of possible different hit situation, as shown in Figure 4.

When user inquiry date of birth conditional attribute when, date of birth conditional attribute uncached in internal memory, belong to inquiry In conditional attribute all not situations in memory cache, then load data block from disk and carry out inquiry operation.

When user's inquiry location date of birth conditional attribute group, belong to the conditional attribute only in inquiry Situation point in memory cache, then belong to according to meeting this area's condition in the search index disk database of this area's conditional attribute The data of property, completes inquiry operation.

When user querying regional conditional attribute, belong to the conditional attribute all situations in memory cache in inquiry, this When direct retrieval related data returning result in internal memory.

Under different cache way, average lookup efficiency comparative is as shown in Figure 5.Before application this method, a normal SQL The query time of Select sentence is averagely about 1500 milliseconds.

The data cache method of present embodiment can significantly improve efficiency data query in frequent region, and for it Data query in his region, due to not doing any process, therefore does not interfere with inquiry operation thereon.Caching two, three condition belongs to Property group is compared single conditional attribute and is had higher search efficiency, and this is the single conditional attribute due to during actual queries Condition query frequency relatively low, it is undesirable to cache complete hit rate, compared to many condition attribute query, single conditional attribute caching Uncorrelated record can not be removed well, recording of filtering out is larger, to the index inspection work in database afterwards Bring huge time overhead.

Although the contrast of query hit situation is as shown in fig. 6, two conditional attribute group cachings are compared three conditional attribute groups and cached Full hit rate difference is more, but its partial hit rate up to 63.93%.Conditional attribute group conditional attribute number is in The caching of pilot-scale, although sacrificing the complete hit rate of part of cache, such caching can complete centre more with flying colors Record simplifies work, reduces the intermediate result set producing in internal memory because of the hit of partial condition attribute, and according to frequent condition Property index rapidly locating, and then mitigate the retrieval pressure of database, achieve higher search efficiency, two conditional attributes Group caching average lookup efficiency slightly above three conditional attribute group caching is just belonging to this situation.

Claims

1. a kind of data cache method based on Apriori algorithm is it is characterised in that comprise the following steps：

Step 1：Record the conditional attribute in user's query statement in T days in disk, set up T inquiry day in units of sky Will, i.e. user's inquiry content；

Step 2：Calculate the inquiry frequent degree of each data block in inquiry log, the size according to obtaining data block inquiry frequent degree obtains The high multiple data blocks of frequent degree must be inquired about, form frequent data item set of blocks；

Step 3：The conditional attribute formation condition community set of each frequent data item block；

Step 4：The inquiry frequent degree of each conditional attribute in design conditions community set, inquires about frequency according to obtaining conditional attribute The size of numerous degree obtains the high multiple conditional attributes of inquiry frequent degree, forms frequent conditional attribute set；

Step 5：Obtain frequent conditional attribute group set using Apriori algorithm and frequent conditional attribute set, conditional attribute Inquiry frequent degree is mapped as the support in Apriori algorithm, and the frequent item set that Apriori algorithm obtains is frequent condition and belongs to Property group set；

Step 6：By corresponding for frequent conditional attribute group set data buffer storage to internal memory, and in frequent conditional attribute set Frequently conditional attribute sets up index, completes data buffer storage；

Step 7：When client needs to carry out data query, according to the conditional attribute of data to be inquired about, carry out inquiry operation： To the frequent conditional attribute of all memory caches of conditional attribute of the data of inquiry, then directly obtain Query Result；To A conditional attribute part for the data of inquiry is the frequent conditional attribute of memory cache, then according to this partly frequent conditional attribute Meet the data of this partial condition attribute in search index disk database, complete inquiry operation；Bar to the data of inquiry Part attribute all not in the frequent conditional attribute set of memory cache, then loads data block from disk and carries out inquiry operation.

2. the data cache method based on Apriori algorithm according to claim 1 is it is characterised in that described step 2 has Body executes as follows：

Step 2.1：Determine the inquiry times of data in each data block in T inquiry log；

Step 2.2：Standardization processing is carried out to the inquiry times of the data in each data block：Arrange recent daily record ratio to distinguish closely Phase daily record data and history log data, when the inquiry times of the history log data in data block are looked into higher than history log data When asking number of times upper limit threshold, then the inquiry times value of this history log data is this upper limit threshold；When recent in data block When the inquiry times of daily record data are higher than recent daily record data inquiry times upper limit threshold, then the inquiry of this recent daily record data time Number value is this upper limit threshold；

Step 2.3：The inquiry times of data in the data block after standardization processing are weighted operating：At respectively to standardization Average after the inquiry times weighted sum of data in each data block in T inquiry log after reason, that is, obtain each data block Inquiry frequent degree；

Step 2.4：The high multiple data blocks of inquiry frequent degree are selected according to the size that each data block inquires about frequent degree, frequently counts According to block, each frequent data item block forms frequent data item set of blocks.

3. the data cache method based on Apriori algorithm according to claim 1 is it is characterised in that described step 4 has Body executes as follows：

Step 4.1：Determine the inquiry times of each conditional attribute in frequent data item block in T inquiry log；

Step 4.2：Standardization processing is carried out to the inquiry times of each conditional attribute：Recent daily record is distinguished according to recent daily record ratio Conditional attribute and history log conditional attribute, when the inquiry times of history log conditional attribute are looked into higher than history log conditional attribute When asking number of times upper limit threshold, then the inquiry times value of this history log conditional attribute is this upper limit threshold；When recent daily record bar When the inquiry times of part attribute are higher than recent Log conditions attribute query number of times upper limit threshold, then this recent Log conditions attribute is looked into Inquiry number of times value is this upper limit threshold；

Step 4.3：Conditional attribute inquiry times after standardization processing are weighted operate：After respectively to standardization processing Sue for peace after the inquiry times weighting of each conditional attribute in T days and average, that is, obtain the inquiry frequent degree of each conditional attribute；

Step 4.4：According to the inquiry frequent degree of each conditional attribute obtaining, select the high multiple conditional attributes of frequent degree, i.e. frequency Numerous conditional attribute, each frequent conditional attribute forms frequent conditional attribute set.