CN105653627A - Bloom filter-based data classification method - Google Patents

Bloom filter-based data classification method Download PDF

Info

Publication number
CN105653627A
CN105653627A CN201510995650.9A CN201510995650A CN105653627A CN 105653627 A CN105653627 A CN 105653627A CN 201510995650 A CN201510995650 A CN 201510995650A CN 105653627 A CN105653627 A CN 105653627A
Authority
CN
China
Prior art keywords
bloom filter
data classification
content
classification method
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510995650.9A
Other languages
Chinese (zh)
Inventor
曹志富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Yi Fang Softcom Ltd
Original Assignee
Hunan Yi Fang Softcom Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Yi Fang Softcom Ltd filed Critical Hunan Yi Fang Softcom Ltd
Priority to CN201510995650.9A priority Critical patent/CN105653627A/en
Publication of CN105653627A publication Critical patent/CN105653627A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The invention relates to the technical field of big data classification, in particular to a Bloom filter-based data classification method. The method comprises the following steps: S101, Bloom filter selection: generating a corresponding Bloom filter according to a user attribute obtained through Hadoop offline analysis; S102, filter judgement key assembling: carrying out Bloom filter judgement key assembling according to a content creator; S103, content classification containing judgement: judging whether an appointed Bloom filter contains content classification according to generated Bloom filter judgement keys, if the judging result is positive, entering step S104, and if the judging result is negative, entering step S105; S104, content classification: carrying out set classification on contents and sticking corresponding tags; and S105, classifying the next attribute. After adopting the Bloom filter-based data classification method, the contents created by the users can be effectively classified according to the user attributes in the real-time processing link; and compared with the offline analysis such as Hadoop and the like, the method has real-time performance.

Description

A kind of data classification method based on Bloom filter
Technical field
The present invention relates to big data sorting technique field, particularly a kind of data classification method based on Bloom filter.
Background technology
In the UGC epoch, the content that every day, user created, it is possible to weigh with PB, simultaneously user's identity information on the internet and attribute, at it at the beginning of establishment, just determine substantially. And in the increasing situation of data volume, the content how produced according to the attribute of user is classified fast and effectively, just becomes a problem.
Chinese invention patent application CN102253991A discloses a kind of URL storage means, comprising: step S11, is classified by URL according to predetermined classifying rules; Step S12, generates the Bloom filter for storing each type URL respectively; Step S13, according to the type of each URL, is stored in described URL in corresponding described Bloom filter. Although, the present invention can provide high efficiency URL to inquire about when performing home page filter, thus improves network performance; But, content in the real-time processing links of UGC, can not be classified by this invention fast and effectively.
Summary of the invention
The technical issues that need to address of the present invention provide a kind of in the real-time processing links judgement of UGC and fast effectively by the data classification method of classifying content.
For solving above-mentioned technical problem, a kind of data classification method based on Bloom filter of the present invention, comprises the following steps,
Step S101: Bloom filter is selected, according to hadoop off-line analysis user property out, generates corresponding Bloom filter;
Step S102: filter and judge that key is assembled, according to creator of content, carry out Bloom filter and judge that key is assembled;
Step S103: whether classifying content comprises judgement, the Bloom filter according to generating judges key is to whether the Bloom filter specified comprises judgement, if it does, then enter step S104; If not, then step S105 is entered;
Step S104: classifying content, carries out fixed classification by content, stamps corresponding tag;
Step S105: carry out next attributive classification.
Further, the user property described in step S101 comprises label, social bean vermicelli number and robot.
Further, the Bloom filter specified described in step S103 is the user property according to classifying content, selects the Bloom filter generated.
Further, the Bloom filter generated described in step S103 judges that key is that the Bloom filter generated according to user and class categories judges key.
Adopting after aforesaid method, the content that user creates, in real-time processing links, according to user property, is effectively classified by a kind of data classification method based on Bloom filter of the present invention, compared with the off-line analysis such as hadoop, has real-time.
Accompanying drawing explanation
Below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
Fig. 1 is the schema of a kind of data classification method based on Bloom filter of the present invention.
Embodiment
As shown in Figure 1, a kind of data classification method based on Bloom filter of the present invention, comprises the following steps,
Step S101: Bloom filter is selected, according to hadoop off-line analysis user property out, generates corresponding Bloom filter. User property described in present embodiment comprises label, social bean vermicelli number and robot.
Step S102: filter and judge that key is assembled, according to creator of content, carry out Bloom filter and judge that key is assembled.
Step S103: whether classifying content comprises judgement, the Bloom filter according to generating judges key is to whether the Bloom filter specified comprises judgement, if it does, then enter step S104; If not, then step S105 is entered. Here the Bloom filter specified is the user property according to classifying content, selects the Bloom filter generated, and the Bloom filter generated is the Bloom filter generated in step S101. The Bloom filter of generation described here judges that key is that the Bloom filter generated according to user and class categories judges key.
Step S104: classifying content, carries out fixed classification by content, stamps corresponding tag.
Step S105: carry out next attributive classification.
Although the foregoing describing the specific embodiment of the present invention; but those skilled in the art are to be understood that; these are only illustrate; present embodiment can be made various changes or modifications; and not deviating from principle and the essence of invention, protection scope of the present invention is only defined by the appended claims.

Claims (4)

1. the data classification method based on Bloom filter, it is characterised in that, comprise the following steps,
Step S101: Bloom filter is selected, according to hadoop off-line analysis user property out, generates corresponding Bloom filter;
Step S102: filter and judge that key is assembled, according to creator of content, carry out Bloom filter and judge that key is assembled;
Step S103: whether classifying content comprises judgement, the Bloom filter according to generating judges key is to whether the Bloom filter specified comprises judgement, if it does, then enter step S104; If not, then step S105 is entered;
Step S104: classifying content, carries out fixed classification by content, stamps corresponding tag;
Step S105: carry out next attributive classification.
2. according to a kind of data classification method based on Bloom filter according to claim 1, it is characterised in that: the user property described in step S101 comprises label, social bean vermicelli number and robot.
3. according to a kind of data classification method based on Bloom filter according to claim 1, it is characterised in that: the Bloom filter specified described in step S103 is the user property according to classifying content, selects the Bloom filter generated.
4. according to a kind of data classification method based on Bloom filter according to claim 1, it is characterised in that: the Bloom filter generated described in step S103 judges that key is that the Bloom filter generated according to user and class categories judges key.
CN201510995650.9A 2015-12-28 2015-12-28 Bloom filter-based data classification method Pending CN105653627A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510995650.9A CN105653627A (en) 2015-12-28 2015-12-28 Bloom filter-based data classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510995650.9A CN105653627A (en) 2015-12-28 2015-12-28 Bloom filter-based data classification method

Publications (1)

Publication Number Publication Date
CN105653627A true CN105653627A (en) 2016-06-08

Family

ID=56477723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510995650.9A Pending CN105653627A (en) 2015-12-28 2015-12-28 Bloom filter-based data classification method

Country Status (1)

Country Link
CN (1) CN105653627A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11741258B2 (en) 2021-04-16 2023-08-29 International Business Machines Corporation Dynamic data dissemination under declarative data subject constraints

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150522A1 (en) * 2003-10-07 2007-06-28 International Business Machines Corporation Method, system, and program for processing a file request
CN102253991A (en) * 2011-05-25 2011-11-23 北京星网锐捷网络技术有限公司 Uniform resource locator (URL) storage method, web filtering method, device and system
CN102930035A (en) * 2011-11-10 2013-02-13 微软公司 Driving content items from multiple different content sources
CN104462096A (en) * 2013-09-13 2015-03-25 北大方正集团有限公司 Public opinion monitoring and analysis method and device
CN104794193A (en) * 2015-04-17 2015-07-22 南京大学 Webpage increment capture method for valid link acquisition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070150522A1 (en) * 2003-10-07 2007-06-28 International Business Machines Corporation Method, system, and program for processing a file request
CN102253991A (en) * 2011-05-25 2011-11-23 北京星网锐捷网络技术有限公司 Uniform resource locator (URL) storage method, web filtering method, device and system
CN102930035A (en) * 2011-11-10 2013-02-13 微软公司 Driving content items from multiple different content sources
CN104462096A (en) * 2013-09-13 2015-03-25 北大方正集团有限公司 Public opinion monitoring and analysis method and device
CN104794193A (en) * 2015-04-17 2015-07-22 南京大学 Webpage increment capture method for valid link acquisition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张华 等: "Bloom Filter 技术及应用", 《阜阳师范学院学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11741258B2 (en) 2021-04-16 2023-08-29 International Business Machines Corporation Dynamic data dissemination under declarative data subject constraints

Similar Documents

Publication Publication Date Title
CN105247507B (en) Method, system and storage medium for the influence power score for determining brand
Younis Sentiment analysis and text mining for social media microblogs using open source tools: an empirical study
US10467664B2 (en) Method for detecting spam reviews written on websites
JP5805188B2 (en) Method and apparatus for sorting query results
CN109508383A (en) The construction method and device of knowledge mapping
CN104536973B (en) The method and browser client of picture recognition
CN109597936B (en) New user screening system and method
EP2973038A1 (en) Classifying resources using a deep network
CN104778208A (en) Method and system for optimally grasping search engine SEO (search engine optimization) website data
Stone et al. Extracting consumer preference from user-generated content sources using classification
CN105069077A (en) Search method and device
US20130238375A1 (en) Evaluating email information and aggregating evaluation results
CN104462396B (en) Character string processing method and device
US11789946B2 (en) Answer facts from structured content
CN104021125A (en) Search engine sorting method and system and search engine
US9355166B2 (en) Clustering signifiers in a semantics graph
KR20210063874A (en) A method and an apparatus for analyzing marketing information based on knowledge graphs
CN105096023A (en) System and method for pushing data relevant to working standard
CN104978406A (en) User behavior analysis method of Internet platform
CN111126071B (en) Method and device for determining questioning text data and method for processing customer service group data
CN111611484A (en) Stock recommendation method and system based on article attribute identification
CN106933798B (en) Information analysis method and device
Zhang et al. CRUC: Cold-start recommendations using collaborative filtering in internet of things
US20130117316A1 (en) Method and system for modeling data
KR20220074574A (en) A method and an apparatus for analyzing real-time chat content of live stream

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160608