CN107798014A - A kind of frequent item set data digging method for taking into account fractional sample - Google Patents

A kind of frequent item set data digging method for taking into account fractional sample Download PDF

Info

Publication number
CN107798014A
CN107798014A CN201610802933.1A CN201610802933A CN107798014A CN 107798014 A CN107798014 A CN 107798014A CN 201610802933 A CN201610802933 A CN 201610802933A CN 107798014 A CN107798014 A CN 107798014A
Authority
CN
China
Prior art keywords
sample
item
frequent
data
apriori
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610802933.1A
Other languages
Chinese (zh)
Inventor
柴明亮
高冰
宋宝宇
李连成
刘宝权
张岩
宋君
王靖震
杨东晓
费静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Angang Steel Co Ltd
Original Assignee
Angang Steel Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Angang Steel Co Ltd filed Critical Angang Steel Co Ltd
Priority to CN201610802933.1A priority Critical patent/CN107798014A/en
Publication of CN107798014A publication Critical patent/CN107798014A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of frequent item set data digging method for taking into account fractional sample, is arranged in order from high to low according to support, the competitive Principle that the quantity according to interception is accepted or rejected, and the total principle accepted or rejected to each sample item collection of form according to percentage.The frequent item collection generation of conceptual data sample is carried out successively, one ensemble average thresholding of local data's sample calculates, the frequent K item collections generation of conceptual data sample and local data's sample K item ensemble averages thresholding calculate.Frequent item set data mining algorithm LS Apriori algorithms of the invention based on Apriori properties, using the basic thought of Apriori algorithm, according to the Average Supports size of the Average Supports of fractional sample and population sample, competitive Principle is respectively adopted and total principle finds frequent item set, so as to take into account fractional sample data in Apriori algorithm, the defects of apriori traditional can not take into account local optimum well is efficiently solved.

Description

A kind of frequent item set data digging method for taking into account fractional sample
Technical field
The invention belongs to data digging method, more particularly to a kind of frequent item set data mining side for taking into account fractional sample Method.
Background technology
Apriori algorithm will be seen that the process of correlation rule is divided into two steps:The first step is met accident by iterative searching The item collection of all frequent item sets being engaged in database, i.e. support not less than the threshold value of user's setting;Second step is using frequent Item collection constructs the rule for meeting user's min confidence, wherein, it is the core of the algorithm to excavate and identify all frequent item sets, Occupy the major part of whole amount of calculation.Apriori algorithm is led to the thought of the subset necessarily frequent item set of frequent item set Cross known frequent item set and construct bigger item collection, and be referred to as candidate's frequent item set, only calculate the branch of post option collection later Degree of holding.Apriori algorithm thus exists by the way of thresholding is manually set and sets thresholding and reality according to the experience of people The problem of whether data mining matches, how the emphasis of recent researches is so that being manually set thresholding and actual data mining The problem of matching, fractional sample data how are taken into account for Apriori algorithm is studied very few.But in the application of reality, Apriori algorithm is it can be found that global frequentItemset, but the frequent item set of fractional sample can not be but embodied as, such existing As more and more.
The content of the invention
The present invention provides a kind of frequent item set data mining algorithm LS-Apriori algorithms based on Apriori properties, its Purpose is fully to take into account fractional sample data solve the defects of apriori traditional can not take into account local optimum very well.
Therefore, the technical solution that the present invention is taken is:
A kind of frequent item set data digging method for taking into account fractional sample, it is the frequent item set number based on Apriori properties According to mining algorithm LS-Apriori algorithms, its competitive Principle:It is arranged in order from high to low according to support, the quantity according to interception Accepted or rejected;Total principle:Each sample item collection is accepted or rejected according to the form of percentage.Its specific method and step are:
(1) the frequent item collection generation of conceptual data sample:Data sample is reconfigured, according to conceptual data sample, is calculated The item collection C of candidate one1Support and Average Supports ZS1, it is determined that frequent item collection L1, L1Quantity is counted as M1
(2) one ensemble average thresholding of local data's sample calculates:An ensemble average is calculated according to local data's sample to support Spend JS1;If JS1≥ZS1, according to competitive Principle, redefine frequent item set;If JS1< ZS1, fractional sample average≤totality Sample average, the fractional sample for illustrating this part is weak support sample, in order to take into account the fractional sample of this part, according to sum Principle, frequent item set is redefined, sum is according to M1/ 2 are handled.
(3) the frequent K item collections generation of conceptual data sample:Data sample is reconfigured, kth step, frequently k- is walked according to k-1 1 item collection Lk-1, the k item collections C that is selected after being produced according to Apriori_genkCollection;According to conceptual data sample, the item collection C of candidate one is calculatedk Support and Average Supports ZSk, it is determined that frequent item collection Lk, LkQuantity is counted as Mk
(4) local data's sample K item ensemble averages thresholding calculates:K item collection Average Supports are calculated according to local data's sample JSk;If JSk≥ZSk, according to competitive Principle, redefine frequent k item collections;If JSk< ZSk, then according to total principle, again It is determined that frequent k item collections, sum is according to Mk/ 2 are handled.
Beneficial effects of the present invention are:
The present invention proposes that a kind of new frequent item set data mining algorithm LS-Apriori based on Apriori properties is calculated Method, the basic thought of this algorithm application Apriori algorithm, according to the average branch of the Average Supports of fractional sample and population sample Degree of holding size, is respectively adopted competitive Principle and total principle finds frequent item set, so as to take into account part in Apriori algorithm Sample data, efficiently solve the defects of apriori traditional can not take into account local optimum well.
Brief description of the drawings
Fig. 1 is that LS-Apriori algorithms find frequent item set procedure chart;
Fig. 2 is LS-Apriori algorithm flow charts.
Embodiment
In order to illustrate the validity of LS-Ariori algorithms, the present invention have chosen one that Apriori algorithm finds frequent item set Individual classical example, transaction database such as table 1~4, there are 9 affairs in each sample database.
The item collection of 1 sample of table 1
The item collection of 2 sample of table 2
TID T100 T200 T300 T400 T500 T600 T700 T800 T900
Item ID lists I2,I5 I1,I4 I1,I3,I5 I1,I2,I5 I2,I3,I5 I1,I3 I2,I4 I1,I3,I4 I1,I2,I4
The item collection of 3 sample of table 3
TID T100 T200 T300 T400 T500 T600 T700 T800 T900
Item ID lists I1,I5 I2,I5 I2,I3,I5 I1,I3,I4 I1,I2,I5 I4,I5 I2,I3 I1,I2,I3,I4 I1,I2
The item collection of 4 sample of table 4
TID T100 T200 T300 T400 T500 T600 T700 T800 T900
Item ID lists I2,I3,I4 I2,I5 I2,I3,I4 I1,I3,I5 I1,I2,I4 I3,I5 I2,I4 I1,I2,I3,I5 I1,I5
Support counting in table 1 is support and the product of total things number.Using LS-Apriori algorithms, to table 1 ~4 data carry out frequently the mutually discovery of collection, its flow such as Fig. 2.Fig. 1 is that LS-Apriori algorithms find frequent item set process, The item collection of candidate one shares 5 in each sample, and fractional sample and population sample averagely support number result of calculation such as table 5.
The fractional sample of table 5 and population sample averagely support number
Sample sequence number S1 S2 S3 S4 S
Average Supports 100/36 92/36 92/36 96/36 95/36
According to LS-Apriori algorithm properties, sample S1、S4Number is averagely supported to be more than population sample S average support number, choosing Principle is taken to use competitive Principle.Sample S2、S3Number is averagely supported to be less than population sample S average support number, selection principle is using total Number principle.Frequent item set 11 just is found frequently with Apriori algorithm, in order to round, so finally determining an item collection, is adopted 6 are found with competition, 6 is found using sum, adds up to 12.Due to just frequently with Apriori algorithm when, only S2Lack one , so increased 1 has been given sample S2.The item collection of candidate two shares 12, just finds frequency frequently with equal thresholding Apriori algorithm Numerous item collection 8, sample S2~S4Using total selection principle, it should there is 4 frequent item sets;S1Using competitive Principle, but due to S1Only 3 item collections, so finally determining sample S2~S4There are 5 frequent item sets.

Claims (1)

1. a kind of frequent item set data digging method for taking into account fractional sample, it is the frequent item set data based on Apriori properties The LS-Apriori algorithms of mining algorithm, it is characterised in that competitive Principle:It is arranged in order from high to low according to support, foundation The quantity of interception is accepted or rejected;Total principle:Each sample item collection is accepted or rejected according to the form of percentage;Its specific method and Step is:
(1) the frequent item collection generation of conceptual data sample:Data sample is reconfigured, according to conceptual data sample, calculates candidate One item collection C1Support and Average Supports ZS1, it is determined that frequent item collection L1, L1Quantity is counted as M1
(2) one ensemble average thresholding of local data's sample calculates:One item collection Average Supports JS is calculated according to local data's sample1; If JS1≥ZS1, according to competitive Principle, redefine frequent item set;If JS1< ZS1, fractional sample average≤population sample is equal Value, the fractional sample for illustrating this part is weak support sample, in order to take into account the fractional sample of this part, according to total principle, Frequent item set is redefined, sum is according to M1/ 2 are handled;
(3) the frequent K item collections generation of conceptual data sample:Data sample is reconfigured, kth step, frequently k-1 items are walked according to k-1 Collect Lk-1, the k item collections C that is selected after being produced according to Apriori_genkCollection;According to conceptual data sample, the item collection C of candidate one is calculatedkBranch Degree of holding and Average Supports ZSk, it is determined that frequent item collection Lk, LkQuantity is counted as Mk
(4) local data's sample K item ensemble averages thresholding calculates:K item collection Average Supports JS is calculated according to local data's samplek;Such as Fruit JSk≥ZSk, according to competitive Principle, redefine frequent k item collections;If JSk< ZSk, then according to total principle, frequency is redefined Numerous k item collections, sum is according to Mk/ 2 are handled.
CN201610802933.1A 2016-09-06 2016-09-06 A kind of frequent item set data digging method for taking into account fractional sample Pending CN107798014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610802933.1A CN107798014A (en) 2016-09-06 2016-09-06 A kind of frequent item set data digging method for taking into account fractional sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610802933.1A CN107798014A (en) 2016-09-06 2016-09-06 A kind of frequent item set data digging method for taking into account fractional sample

Publications (1)

Publication Number Publication Date
CN107798014A true CN107798014A (en) 2018-03-13

Family

ID=61530402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610802933.1A Pending CN107798014A (en) 2016-09-06 2016-09-06 A kind of frequent item set data digging method for taking into account fractional sample

Country Status (1)

Country Link
CN (1) CN107798014A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109669967A (en) * 2018-12-13 2019-04-23 深圳市信义科技有限公司 A kind of space-time data association analysis method based on big data technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109669967A (en) * 2018-12-13 2019-04-23 深圳市信义科技有限公司 A kind of space-time data association analysis method based on big data technology
CN109669967B (en) * 2018-12-13 2022-04-15 深圳市信义科技有限公司 Big data technology-based spatio-temporal data correlation analysis method

Similar Documents

Publication Publication Date Title
Aldino et al. Implementation of K-means algorithm for clustering corn planting feasibility area in south lampung regency
CN102364498B (en) Multi-label-based image recognition method
CN104991968B (en) The Internet media user property analysis method based on text mining
Veldre et al. Evolution of nutritional modes of Ceratobasidiaceae (Cantharellales, Basidiomycota) as revealed from publicly available ITS sequences
CN105608179B (en) The method and apparatus for determining the relevance of user identifier
CN104809117B (en) Video data aggregation processing method, paradigmatic system and video search platform
TWI518528B (en) Method, apparatus and system for identifying target words
CN105550583A (en) Random forest classification method based detection method for malicious application in Android platform
CN103914491B (en) To the data digging method and system of high-quality user-generated content
CN106202430A (en) Live platform user interest-degree digging system based on correlation rule and method for digging
CN103838804A (en) Social network user interest association rule mining method based on community division
CN104217015B (en) Based on the hierarchy clustering method for sharing arest neighbors each other
CN105095188B (en) Sentence similarity computational methods and device
CN104915359B (en) Theme label recommended method and device
CN113222181B (en) Federated learning method facing k-means clustering algorithm
CN110728322A (en) Data classification method and related equipment
CN108304476A (en) A kind of user's representation data integration method and system based on uncertain data table
CN110472677A (en) A kind of density peaks clustering method based on natural arest neighbors and shortest path
Vall et al. The Importance of Song Context in Music Playlists.
CN107798014A (en) A kind of frequent item set data digging method for taking into account fractional sample
CN116072302A (en) Medical unbalanced data classification method based on biased random forest model
Wang et al. Multi-omics cancer prognosis analysis based on graph convolution network
CN107169520A (en) A kind of big data lacks attribute complementing method
CN106933799A (en) A kind of Chinese word cutting method and device of point of interest POI titles
Orman et al. A method for characterizing communities in dynamic attributed complex networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180313