CN113568942A - Data set frequent item set mining availability evaluation method - Google Patents

Data set frequent item set mining availability evaluation method Download PDF

Info

Publication number
CN113568942A
CN113568942A CN202110579345.7A CN202110579345A CN113568942A CN 113568942 A CN113568942 A CN 113568942A CN 202110579345 A CN202110579345 A CN 202110579345A CN 113568942 A CN113568942 A CN 113568942A
Authority
CN
China
Prior art keywords
mis
item
score
fis
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110579345.7A
Other languages
Chinese (zh)
Inventor
吴卓超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Normal University
Original Assignee
Nanjing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Normal University filed Critical Nanjing Normal University
Priority to CN202110579345.7A priority Critical patent/CN113568942A/en
Publication of CN113568942A publication Critical patent/CN113568942A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data set frequent item set mining availability evaluation method, which comprises the following steps: (1) let C ═ I1,I2,…,InIs a collection of items, given a transactional dataset D1And D2Wherein each transaction T is a non-empty set of items, such that
Figure DDA0003085459410000011
To D1、D2Mining by using Apriori algorithm to obtain maximum frequent item set, and recording as FIS1、FIS2(ii) a (2) Mixing FIS1Any set MIS1And FIS2Any set of MIS2Matching is carried out through an item set matching algorithm F to obtain a paired item set table Pairs, and the Pairs is paired by the item set<MIS1,MIS2,score1>Composition of (score)1Representation of MIS1、MIS2The item similarity is calculated in the matching process. (3) For each item in Pairs<MIS1,MIS2,score1>Computing MIS1,MIS2Support degree similarity score of2Further calculating to obtain MIS1,MIS2The composite similarity score of (1) updates pair to<MIS1,MIS2,score>(ii) a (4) Accumulating the composite similarity score of each term in Pairs, and dividing by the number of terms in Pairs to obtain D1And D2The SCORE of the similarity is [0,1 ]]。

Description

Data set frequent item set mining availability evaluation method
Technical Field
The invention relates to a method for evaluating the mining availability of a frequent itemset of a data set, which is used for evaluating the availability of the data set on the mining analysis availability of the frequent itemset.
Background
At present, frequent item set mining analysis has been widely researched, however, currently, the evaluation of the usability of the frequent item set of the data set is still in the starting stage, currently, there is no research specially used for the usability evaluation of the frequent item set, and the evaluation indexes used in the field of the frequent item set mining analysis currently are precision, relative error RE and the like.
However, the precision of the current common evaluation method is mainly measured based on the item similarity of the frequent item sets, the RE uses the median of the support similarity to represent the support similarity between the frequent item sets, and the two measurement indexes are relatively independent and are all in one-sided comparison. The similarity of the frequent item sets has an unsingurable relation with the similarity of the items and the similarity of the support degree, and meanwhile, the similarity of the frequent item sets cannot be compared in a unified dimension by using two evaluation indexes, so that the mining and analyzing availability of the data set on the frequent item sets cannot be quantized.
Disclosure of Invention
The invention aims to provide a method for evaluating mining availability of a frequent item set of a data set, which combines item set similarity and support degree similarity by applying the scheme, provides a new measurement index SCORE, can reflect the similarity of two data sets through the SCORE, and quantifies the availability of the data set on mining analysis of the frequent item set. According to the method, the composite similarity among the data sets is calculated, so that the mining and analyzing availability of frequent item sets of the data sets is evaluated, and the higher the similarity is, the better the availability is.
The technical scheme adopted by the invention is as follows: a method of dataset frequent item set mining availability assessment, the method comprising the steps of:
step (1) given data set D1And D2To D, pair1、D2Mining by using Apriori algorithm to obtain the maximum frequent item set which is recorded as FIS1、FIS2Wherein l is1,l2Is FIS1,FIS2Cardinality of the collection of items;
step (2) FIS1Item set I of1And FIS2Item set I of2Pairing to obtain pairing result pair<I1,I2, score1>And added to Pairs, where score1Is represented by1、I2Item similarity of (2);
(a) for FIS1I of (A)1,FIS2I of (A)2If I is1、I2If the compositions are completely the same, matching is performed and score is set1=1, k=1;
(b) For FIS1I of (A)1,FIS2I of (A)2Calculating I1、I2If dis is equal to k, will I2Joining to the current I1In the candidate matching set of (2), will I1Is added to I2In the candidate matching set of (3);
(c) for FIS1Item set I of1If the candidate matching set PList is empty, the current item set is directly skipped, otherwise, the optimal item set is selected in PList and set
Figure RE-GDA0003241809070000021
And (6) matching.
(d) k + +, if k is less than MAX (l)1,l2) Returning to step (b), if k is equal to MAX (l)1,l2) The FIS is1First n terms of and FIS2The first n items are matched one by one, and score is set in the matching process10.1, n is MIN (| FIS)1|, |FIS2|).
(e) Mixing FIS1,FIS2Set of middle and remaining items, match with empty set, set score1=0。
Step (3) for Pair in Pairs<I1,I2,score1>Calculating I1、I2Support degree similarity score of2Thereby obtaining I1、I2The similarity score of (c) is updated to<I1,I2,score>;
Step (4) adding the scores of all Pairs in the Pairs, and dividing the scores by the number of Pairs in the Pairs to obtain D1And D2The SCORE of the similarity is [0,1 ]]。
Wherein score in step (3)1、score2Score is defined as follows:
definition (project similarity score)1) Item set I1、I2Similarity based on items is recorded as score1. The calculation is as follows:
if I1、I2Has the same composition, score1=1;
If I1、I2Is different from the prior art in that,
Figure RE-GDA0003241809070000022
if I1、I2One of them is an empty set, score1=0;
Definition (support similarity score)2): paired item set I1、I2Similarity based on support degree is recorded as score2The calculation is as follows:
for I in pair in Pairs1、I2,I1Has a support degree of s1,I2Has a support degree of s2
Figure RE-GDA0003241809070000031
Definition (item set similarity score): item set I1、I2The similarity of (c) is denoted as score, and score is mainly based on the item similarity score1At score1On the basis, the support degree score is utilized2Further refinement, the calculation is as follows: score ═ score1*score2
The matching operation adopted in the algorithm step (2) (a), (c), (d) and (e) is to perform<I1,I2,score>Adding Pairs, setting different score values according to different scenes, and simultaneously respectively selecting from FIS1And FIS2Deletion in1、 I2
The distance dis in step (b) of algorithm (2) is defined as follows:
definition (item set distance dis): the item set distance represents the number of non-coincident items between the item sets, is recorded as dis, and is calculated as follows:
dis=MAX(l1,l2)-|I1∩I2|
the matching algorithm (2) adopts a heuristic algorithm to match item sets, the matching rule is to perform preferential matching on two item sets with close distances, wherein the used k is used for controlling the heuristic rule, the k represents that the matching is performed only by considering the distance between the two item sets as k in the current matching process, the two item sets with close distances can be preferentially matched through the iteration of the k, the disordered searching is changed into the ordered matching through a nearest matching principle, the calculating process can be reused every time, the repeated calculating process is not needed, all the item sets with the distance of k-1 are excluded when the k-distance searching is performed every time, and the item set of k +1 is not in the searching range, so that the searching space is effectively reduced, and the repeated calculation generated by matching with all the other item sets every time is avoided.
In the algorithm (2), the candidate matching set PList is used for storing an item set, the distance between the item set PList and the current item set I is less than k, and each item set, which is k away from the current item set I, is stored to obtain a candidate matching set, so that the optimal matching is selected from the candidate matching sets while the closest matching principle is ensured, and the selection mode is as follows: if PList has and has only one item set I2If there are multiple item sets, then match with a certain item set I in PList2Pairing with the condition that2The cardinality of the candidate matching set is minimum, thereby ensuring that the matching item selected each time has the minimum influence on the matching of other item sets.
Compared with the prior art, the invention has the following technical effects: the similarity of the item sets and the similarity of the support degree are combined, a new measurement index SCORE is provided, the similarity of the two data sets can be reflected through the SCORE, the data sets before and after the privacy protection frequent item set issuing algorithm is processed are further compared, the usability of the privacy protection frequent item set issuing is quantized, meanwhile, a heuristic algorithm of a nearest principle is adopted in the item set matching process, the nearest principle enables the algorithm to preferentially select the item set with the nearest distance for matching, the algorithm is prevented from being matched with all the other item sets when the algorithm is matched every time, a large number of repeated calculation processes are avoided, the search space is reduced, and the algorithm operation efficiency is improved on the premise of ensuring the optimal matching.
Drawings
FIG. 1 is a process flow diagram of the present invention.
Detailed Description
The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1: referring to fig. 1, the invention is a method for evaluating the publishing availability of a frequent itemset for privacy protection, comprising the following steps:
the method comprises the following steps: given data set D1And D2To D, pair1、D2Mining by using Apriori algorithm to obtain the maximum frequent item set which is recorded as FIS1、FIS2Wherein l is1,l2Is FIS1,FIS2Cardinality of the collection of items;
in this example, the support threshold is set to 3, and FIS is obtained by mining1Is { { a, b, c }:4, { a, c, d }:4, { b, d, e }:3, { a, d, e }:3, { b, d, f }:3},
FIS2is { { a, b, c, d }:3, { b, c, d, e }:4, { a, d, e, f }:3, { b, d, g, h }:3}, then l1=3, l2=4;
Step two: mixing FIS1Item set I of1And FIS2Item set I of2Pairing to obtain pairing result pair<I1,I2, score1>And added to Pairs, where score1Is represented by1、I2The item similarity of (2) and the pairing specific steps are as follows:
(a) for FIS1I of (A)1,FIS2I of (A)2If I is1、I2Are completely the same, then theMatch, set score11, k is 1, there are no Pairs of identical sets of items in this example, current Pairs is { };
(b) for FIS1I of (A)1,FIS2I of (A)2Calculating I1、I2If dis is equal to k, will I2Joining to the current I1In the candidate matching set of (2), will I1Is added to I2In the candidate matching set of (3);
(c) for FIS1Item set I of1If the candidate matching set PList is empty, the current item set is directly skipped, otherwise, the optimal item set is selected in PList and set
Figure RE-GDA0003241809070000051
Matching is carried out;
(d) k + +, if k is less than MAX (l)1,l2) Returning to step (b), if k is equal to MAX (l)1,l2) The FIS is1First n terms of and FIS2The first n items are matched one by one, and score is set in the matching process10.1, n is MIN (| FIS)1|, |FIS2|).
In this example, when k is 1,
obtaining FIS by step (b)1、FIS2The list of candidate matching sets of (a) is as follows:
TABLE 1 FIS1List of candidate matching sets
Frequent itemset I Candidate matching set
{a,b,c} {a,b,c,d}
{a,c,d} {a,b,c,d}
{b,d,e} {b,c,d,e}
{a,d,e} {a,d,e,f}
{b,d,f}
TABLE 2 FIS2List of candidate matching sets
Frequent itemset I Candidate matching set
{a,b,c,d} {a,b,c},{a,c,d}
{b,c,d,e} {b,d,e}
{a,d,e,f} {a,d,e}
{b,d,g,h}
In step (c), for FIS1Mid-frequent itemset, { a, b, c }Matching with { a, b, c, d }, and calculating to obtain score10.75, add to Pairs<{a,b,c},{a,b,c,d},0.75>And the { b, d, e } is matched with the { b, c, d, e } to obtain score through calculation10.75, add to Pairs, { a, d, e } Pairs with { a, d, e, f } and calculate score10.75, add to Pairs, then pair the item set from FIS1And FIS2Deleting to obtain FIS1={{a,c,d},{b,d,f}},FIS2={{b,d,g,h}};
In step (d), returning to step (b) when k is 2;
when k is equal to 2, the number of the bits is increased,
obtaining FIS by step (b)1、FIS2The list of candidate matching sets of (a) is as follows:
TABLE 3 FIS1List of candidate matching sets
Frequent itemset I Candidate matching set
{a,c,d}
{b,d,f} {b,d,g,h}
TABLE 4 FIS2List of candidate matching sets
Frequent itemset I Candidate matching set
{b,d,g,h} {b,d,f}
In step (c), for FIS1Matching the { b, d, f } with the { b, d, g, h } to obtain score10.5, adding into Pairs<{b,d,f},{b,d,g,h},0.5>Then the matched item set is selected from the FIS1And FIS2Deleting to obtain FIS1={{a,c,d}},FIS2={};
In step (d), k is 3, the process returns to step (b),
TABLE 5 FIS1List of candidate matching sets
Frequent itemset I Candidate matching set
{a,c,d}
TABLE 6 FIS2List of candidate matching sets
Frequent itemset I Candidate matching set
Obtaining FIS by step (c)1={{a,c,d}},FIS2={};
In step (d), k is 4, and FIS is added1And FIS2The first 0 entries of (a) are matched one by one.
(e) Mixing FIS1,FIS2Set of middle and remaining items, match with empty set, set score1=0。
Adding < { a, c, d }, { },0> to Pairs through step (e), and finally adding the Pairs as { < { a, b, c }, { a, b, c, d },0.75>, { b, d, e }, { b, c, d, e },0.75>, { a, d, e },
{a,d,e,f},0.75>,<{b,d,f},{b,d,g,h},0.5>,<{a,c,d},{},0>};
step three, pair of Pair in Pairs<I1,I2,score1>Calculating I1、I2Support degree similarity score of2Thereby obtaining I1、I2The similarity score of (c) is updated to<I1,I2,score>;
Through the third step, for<{a,b,c},{a,b,c,d},0.75>Calculate score20.75, final score 0.5625; for the<{b,d,e},{b,c,d,e},0.75>Calculate score20.75, final score 0.5625; for the<{a,d,e},{a,d,e,f},0.75>Calculate score21, final score 0.75; for the<{b,d,f},{b,d,g,h},0.5>Calculate score21, final score 0.5; for the<{a,c,d},{},0>}, calculate score20, final score 0; finally obtaining Pairs ═ ready pocket<{a,b,c},{a,b,c,d}0.5625>,<{b,d,e},{b,c,d,e},0.5625>,<{a,d,e},{a,d,e,f},0.75>,<{b ,d,f},{b,d,g,h},0.5>,<{a,c,d},{},0>};
Step four: adding the fractions of all Pairs in Pairs, and dividing by the number of Pairs in Pairs to obtain D1And D2The SCORE of the similarity is [0,1 ]]。
Figure RE-GDA0003241809070000071
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the basis of the above-mentioned technical solutions belong to the scope of the present invention.

Claims (7)

1. A data set frequent item set mining availability evaluation method is characterized by comprising the following steps: the method comprises the following steps:
step (1) setting C ═ I1,I2,…,InIs a collection of items, given a transactional dataset D1And D2Wherein each transaction T is a non-empty set of items, such that
Figure FDA0003085459380000011
To D1、D2Mining by using Apriori algorithm to obtain maximum frequent item set, and recording as FIS1、FIS2(ii) a Defining: (most frequent item set MIS) the most frequent item set MIS represents an item set that is itself frequent but whose superset is not,
Figure FDA0003085459380000012
FIS1、FIS2contains several MIS and support information1、l2Separately representing FIS1,FIS2Mi | MIS1|、|MIS2The maximum value of |; text MIS1、MIS2Representation from FIS1,FIS2A certain set of items, the following is the same;
step (2) FIS1Any set of MIS1And FIS2Any set of MIS2Matching is carried out through an item set matching algorithm F to obtain a paired item set table Pairs, and the Pairs is paired by a plurality of item sets<MIS1,MIS2,score1>Composition of (score)1Representation of MIS1、MIS2The item similarity is obtained by calculation in the matching process;
step (3) for all Pairs<MIS1,MIS2,score1>Computing MIS1,MIS2Support degree similarity score of2Further calculating to obtain MIS1,MIS2The composite similarity score of (1) updates pair to<MIS1,MIS2,score>;
Step (4) accumulating the composite similarity score of each item in Pairs, and dividing the cumulative similarity score by the number of items in Pairs to obtain D1And D2The SCORE of the similarity is [0,1 ]]。
2. The dataset frequent item set mining availability evaluation method of claim 1, further comprising: the item set matching algorithm F in step (2) is described as follows:
(a) setting score11, FIS1,FIS2In the same item set<MIS1,MIS2,score1>In the form of Pairs, while separately from FIS1And FIS2Midamble matched MIS1,MIS2Setting k to 1;
(b) initializing FIS1、FIS2The candidate matching set of each item in the set is an empty set, for FIS1Arbitrary sets MIS1,FIS2Is of arbitrary sets MIS2Computing MIS1、MIS2If dis equals k, MIS2Joining to current MIS1Of the candidate matching set of (1), MIS1Adding to MIS2In the candidate matching set of (3);
(c) for FIS1MIS of arbitrary set of items1If the candidate matching set PList is empty, the current item set is directly skipped, otherwise the item set MIS is selected in PList according to the minimum influence matching strategy2Calculating
Figure FDA0003085459380000013
Will be provided with<MIS1,MIS2,score1>Adding Pairs, simultaneously separately from FIS1And FIS2Middlete MIS1、MIS2.
(d) k + +, if k is less than MAX (l)1,l2) Returning to step (b), if k is equal to MAX (l)1,l2) The FIS is1First n terms of and FIS2The first n items of (1) are matched one by one, and n is MIN (| FIS)1|,|FIS2|) score is set during the matching process10.1, added to Pairs at the same time, and finally the matched term set is passed from the FIS1,FIS2Deleting;
(e) setting score10, FIS1,FIS2And (4) matching the rest item sets with the empty sets and adding the empty sets into Pairs.
3. The dataset frequent item set mining availability evaluation method of claim 1, further comprising: item similarity score in steps (2) and (3)1Support similarity score2The composite similarity score is defined as follows:
definition (project similarity score)1) Item set MIS1、MIS2Similarity based on items is recorded as score1The calculation is as follows:
if MIS1、MIS2Has the same composition, score1=1;
If MIS1、MIS2Different, and are not all empty sets,
Figure FDA0003085459380000021
if MIS1、MIS2One of them is an empty set, score1=0;
Definition (support similarity score)2): paired item set MIS1、MIS2Similarity based on support degree is recorded as score2The calculation is as follows: for a certain item in Pairs<MIS1,MIS2,score1>,MIS1Has a support degree of s1,MIS2Has a support degree of s2
Figure FDA0003085459380000022
Definition (composite similarity score): item set MIS1、MIS2The composite similarity of (a) is recorded as score which is mainly based on the item similarity score1At score1On the basis, the support degree score is utilized2Further refinement is carried out, and the calculation process is as follows: score ═ score1*score2
4. The method of claim 2 for assessing data set frequent itemset mining availability, characterized in that: an item set matching algorithm F, characterized by: the distance dis in step (b) is defined as follows:
definition (item set distance dis): the item set distance represents the number of non-coincident items between the item sets, is recorded as dis, and is calculated as follows:
dis=MAX(|MIS1|,|MIS2|)-|MIS1∩MIS2|。
5. the method of claim 2 for assessing data set frequent itemset mining availability, characterized in that: an item set matching algorithm F, which adopts a heuristic algorithm of a nearest principle to match item sets, wherein the matching rule is to perform preferential matching on two item sets with similar distances, wherein, k is used for controlling the heuristic rule, k represents that the matching is carried out only by considering the distance between two item sets with k in the current matching process, and the two item sets with close distance are preferentially matched through the iteration of k, i.e., when a match of distance k is made, all pairs of sets of terms having distances less than k have been matched, the disorder searching is changed into the ordered matching through the nearest matching principle, each calculation process can be multiplexed without repeated calculation processes, and every time k-distance search is performed, all the item sets with the distance of k-1 are excluded, and the k +1 item set is not in the searching range, so that the searching space is effectively reduced, and repeated calculation caused by matching with all the other item sets in each matching is avoided.
6. The method of claim 2 for assessing data set frequent itemset mining availability, characterized in that: in step (c), the candidate matching set is described as follows: the candidate matching set is used for storing an item set with the distance from the current item set MIS to k, and each item set with the distance from the current item set MIS to k is stored to obtain the candidate matching set, so that the superiority of a matching result can be ensured by a minimum influence matching strategy in the candidate matching set while a closest matching principle is ensured; meanwhile, in each iteration, the matched item sets can be deleted from the candidate matching sets, and the change condition of the item sets which can be matched with each item set is recorded in real time, so that repeated matching is avoided.
7. The method of claim 2 for assessing data set frequent itemset mining availability, characterized in that: the minimum impact matching strategy in step (c) is described as follows: if the current MIS1Has only one item set MIS2If there are multiple item sets, then choose one item set MIS from PList2Pairing is performed with the selection condition being MIS among PList2The number of matched items in the candidate matching set is minimum, so that the matching item selected each time is guaranteed to have the minimum matching influence on other item sets.
CN202110579345.7A 2021-05-26 2021-05-26 Data set frequent item set mining availability evaluation method Pending CN113568942A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110579345.7A CN113568942A (en) 2021-05-26 2021-05-26 Data set frequent item set mining availability evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110579345.7A CN113568942A (en) 2021-05-26 2021-05-26 Data set frequent item set mining availability evaluation method

Publications (1)

Publication Number Publication Date
CN113568942A true CN113568942A (en) 2021-10-29

Family

ID=78161591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110579345.7A Pending CN113568942A (en) 2021-05-26 2021-05-26 Data set frequent item set mining availability evaluation method

Country Status (1)

Country Link
CN (1) CN113568942A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481929A (en) * 2022-10-17 2022-12-16 四川大学华西医院 Method and device for evaluating effectiveness of reconstruction measures, terminal equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320756A (en) * 2015-10-15 2016-02-10 江苏省邮电规划设计院有限责任公司 Improved Apriori algorithm based method for mining database association rule
US20180107654A1 (en) * 2016-10-18 2018-04-19 Samsung Sds Co., Ltd. Method and apparatus for managing synonymous items based on similarity analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320756A (en) * 2015-10-15 2016-02-10 江苏省邮电规划设计院有限责任公司 Improved Apriori algorithm based method for mining database association rule
US20180107654A1 (en) * 2016-10-18 2018-04-19 Samsung Sds Co., Ltd. Method and apparatus for managing synonymous items based on similarity analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张娅: "基于 K 均值聚类的大数据频繁项集挖掘研究", 计算机仿真, vol. 37, no. 8, pages 457 - 461 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481929A (en) * 2022-10-17 2022-12-16 四川大学华西医院 Method and device for evaluating effectiveness of reconstruction measures, terminal equipment and storage medium
CN115481929B (en) * 2022-10-17 2023-11-24 四川大学华西医院 Reconstruction measure effectiveness evaluation method and device, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
JP5391633B2 (en) Term recommendation to define the ontology space
JP5391634B2 (en) Selecting tags for a document through paragraph analysis
US8032507B1 (en) Similarity-based searching
CN110879864B (en) Context recommendation method based on graph neural network and attention mechanism
CN110110094A (en) Across a network personage&#39;s correlating method based on social networks knowledge mapping
Guan et al. Comparison and evaluation of Chinese research performance in the field of bioinformatics
CN103258025B (en) Generate the method for co-occurrence keyword, the method that association search word is provided and system
US9442991B2 (en) Ascribing actionable attributes to data that describes a personal identity
CN104731886B (en) A kind of processing method and system of mass small documents
US8352496B2 (en) Entity name matching
CN106991141B (en) Association rule mining method based on deep pruning strategy
CN110442618B (en) Convolutional neural network review expert recommendation method fusing expert information association relation
CN107145519B (en) Image retrieval and annotation method based on hypergraph
EP2788897B1 (en) Optimally ranked nearest neighbor fuzzy full text search
CN113568942A (en) Data set frequent item set mining availability evaluation method
CN106528790A (en) Method and device for selecting support point in metric space
CN108062355B (en) Query term expansion method based on pseudo feedback and TF-IDF
CN104871152A (en) Providing organized content
CN116362236A (en) Target word mining method and device and storage medium
CN110825965A (en) Improved collaborative filtering recommendation method based on trust mechanism and time weighting
CN115204967A (en) Recommendation method integrating implicit feedback of long-term and short-term interest representation of user
KR20200051300A (en) Data clustering apparatus and method based on range query using cf tree
US7716203B2 (en) Method and system for tracking, evaluating and ranking results of multiple matching engines
JP5416659B2 (en) Information storage search device, information storage method, and information storage program
CN113420214B (en) Electronic transaction object recommendation method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination