CN105653627A - Bloom filter-based data classification method - Google Patents
Bloom filter-based data classification method Download PDFInfo
- Publication number
- CN105653627A CN105653627A CN201510995650.9A CN201510995650A CN105653627A CN 105653627 A CN105653627 A CN 105653627A CN 201510995650 A CN201510995650 A CN 201510995650A CN 105653627 A CN105653627 A CN 105653627A
- Authority
- CN
- China
- Prior art keywords
- bloom filter
- data classification
- content
- classification method
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Abstract
The invention relates to the technical field of big data classification, in particular to a Bloom filter-based data classification method. The method comprises the following steps: S101, Bloom filter selection: generating a corresponding Bloom filter according to a user attribute obtained through Hadoop offline analysis; S102, filter judgement key assembling: carrying out Bloom filter judgement key assembling according to a content creator; S103, content classification containing judgement: judging whether an appointed Bloom filter contains content classification according to generated Bloom filter judgement keys, if the judging result is positive, entering step S104, and if the judging result is negative, entering step S105; S104, content classification: carrying out set classification on contents and sticking corresponding tags; and S105, classifying the next attribute. After adopting the Bloom filter-based data classification method, the contents created by the users can be effectively classified according to the user attributes in the real-time processing link; and compared with the offline analysis such as Hadoop and the like, the method has real-time performance.
Description
Technical field
The present invention relates to big data sorting technique field, particularly a kind of data classification method based on Bloom filter.
Background technology
In the UGC epoch, the content that every day, user created, it is possible to weigh with PB, simultaneously user's identity information on the internet and attribute, at it at the beginning of establishment, just determine substantially. And in the increasing situation of data volume, the content how produced according to the attribute of user is classified fast and effectively, just becomes a problem.
Chinese invention patent application CN102253991A discloses a kind of URL storage means, comprising: step S11, is classified by URL according to predetermined classifying rules; Step S12, generates the Bloom filter for storing each type URL respectively; Step S13, according to the type of each URL, is stored in described URL in corresponding described Bloom filter. Although, the present invention can provide high efficiency URL to inquire about when performing home page filter, thus improves network performance; But, content in the real-time processing links of UGC, can not be classified by this invention fast and effectively.
Summary of the invention
The technical issues that need to address of the present invention provide a kind of in the real-time processing links judgement of UGC and fast effectively by the data classification method of classifying content.
For solving above-mentioned technical problem, a kind of data classification method based on Bloom filter of the present invention, comprises the following steps,
Step S101: Bloom filter is selected, according to hadoop off-line analysis user property out, generates corresponding Bloom filter;
Step S102: filter and judge that key is assembled, according to creator of content, carry out Bloom filter and judge that key is assembled;
Step S103: whether classifying content comprises judgement, the Bloom filter according to generating judges key is to whether the Bloom filter specified comprises judgement, if it does, then enter step S104; If not, then step S105 is entered;
Step S104: classifying content, carries out fixed classification by content, stamps corresponding tag;
Step S105: carry out next attributive classification.
Further, the user property described in step S101 comprises label, social bean vermicelli number and robot.
Further, the Bloom filter specified described in step S103 is the user property according to classifying content, selects the Bloom filter generated.
Further, the Bloom filter generated described in step S103 judges that key is that the Bloom filter generated according to user and class categories judges key.
Adopting after aforesaid method, the content that user creates, in real-time processing links, according to user property, is effectively classified by a kind of data classification method based on Bloom filter of the present invention, compared with the off-line analysis such as hadoop, has real-time.
Accompanying drawing explanation
Below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
Fig. 1 is the schema of a kind of data classification method based on Bloom filter of the present invention.
Embodiment
As shown in Figure 1, a kind of data classification method based on Bloom filter of the present invention, comprises the following steps,
Step S101: Bloom filter is selected, according to hadoop off-line analysis user property out, generates corresponding Bloom filter. User property described in present embodiment comprises label, social bean vermicelli number and robot.
Step S102: filter and judge that key is assembled, according to creator of content, carry out Bloom filter and judge that key is assembled.
Step S103: whether classifying content comprises judgement, the Bloom filter according to generating judges key is to whether the Bloom filter specified comprises judgement, if it does, then enter step S104; If not, then step S105 is entered. Here the Bloom filter specified is the user property according to classifying content, selects the Bloom filter generated, and the Bloom filter generated is the Bloom filter generated in step S101. The Bloom filter of generation described here judges that key is that the Bloom filter generated according to user and class categories judges key.
Step S104: classifying content, carries out fixed classification by content, stamps corresponding tag.
Step S105: carry out next attributive classification.
Although the foregoing describing the specific embodiment of the present invention; but those skilled in the art are to be understood that; these are only illustrate; present embodiment can be made various changes or modifications; and not deviating from principle and the essence of invention, protection scope of the present invention is only defined by the appended claims.
Claims (4)
1. the data classification method based on Bloom filter, it is characterised in that, comprise the following steps,
Step S101: Bloom filter is selected, according to hadoop off-line analysis user property out, generates corresponding Bloom filter;
Step S102: filter and judge that key is assembled, according to creator of content, carry out Bloom filter and judge that key is assembled;
Step S103: whether classifying content comprises judgement, the Bloom filter according to generating judges key is to whether the Bloom filter specified comprises judgement, if it does, then enter step S104; If not, then step S105 is entered;
Step S104: classifying content, carries out fixed classification by content, stamps corresponding tag;
Step S105: carry out next attributive classification.
2. according to a kind of data classification method based on Bloom filter according to claim 1, it is characterised in that: the user property described in step S101 comprises label, social bean vermicelli number and robot.
3. according to a kind of data classification method based on Bloom filter according to claim 1, it is characterised in that: the Bloom filter specified described in step S103 is the user property according to classifying content, selects the Bloom filter generated.
4. according to a kind of data classification method based on Bloom filter according to claim 1, it is characterised in that: the Bloom filter generated described in step S103 judges that key is that the Bloom filter generated according to user and class categories judges key.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510995650.9A CN105653627A (en) | 2015-12-28 | 2015-12-28 | Bloom filter-based data classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510995650.9A CN105653627A (en) | 2015-12-28 | 2015-12-28 | Bloom filter-based data classification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105653627A true CN105653627A (en) | 2016-06-08 |
Family
ID=56477723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510995650.9A Pending CN105653627A (en) | 2015-12-28 | 2015-12-28 | Bloom filter-based data classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105653627A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11741258B2 (en) | 2021-04-16 | 2023-08-29 | International Business Machines Corporation | Dynamic data dissemination under declarative data subject constraints |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070150522A1 (en) * | 2003-10-07 | 2007-06-28 | International Business Machines Corporation | Method, system, and program for processing a file request |
CN102253991A (en) * | 2011-05-25 | 2011-11-23 | 北京星网锐捷网络技术有限公司 | Uniform resource locator (URL) storage method, web filtering method, device and system |
CN102930035A (en) * | 2011-11-10 | 2013-02-13 | 微软公司 | Driving content items from multiple different content sources |
CN104462096A (en) * | 2013-09-13 | 2015-03-25 | 北大方正集团有限公司 | Public opinion monitoring and analysis method and device |
CN104794193A (en) * | 2015-04-17 | 2015-07-22 | 南京大学 | Webpage increment capture method for valid link acquisition |
-
2015
- 2015-12-28 CN CN201510995650.9A patent/CN105653627A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070150522A1 (en) * | 2003-10-07 | 2007-06-28 | International Business Machines Corporation | Method, system, and program for processing a file request |
CN102253991A (en) * | 2011-05-25 | 2011-11-23 | 北京星网锐捷网络技术有限公司 | Uniform resource locator (URL) storage method, web filtering method, device and system |
CN102930035A (en) * | 2011-11-10 | 2013-02-13 | 微软公司 | Driving content items from multiple different content sources |
CN104462096A (en) * | 2013-09-13 | 2015-03-25 | 北大方正集团有限公司 | Public opinion monitoring and analysis method and device |
CN104794193A (en) * | 2015-04-17 | 2015-07-22 | 南京大学 | Webpage increment capture method for valid link acquisition |
Non-Patent Citations (1)
Title |
---|
张华 等: "Bloom Filter 技术及应用", 《阜阳师范学院学报(自然科学版)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11741258B2 (en) | 2021-04-16 | 2023-08-29 | International Business Machines Corporation | Dynamic data dissemination under declarative data subject constraints |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105247507B (en) | Method, system and storage medium for the influence power score for determining brand | |
Younis | Sentiment analysis and text mining for social media microblogs using open source tools: an empirical study | |
US10467664B2 (en) | Method for detecting spam reviews written on websites | |
JP5805188B2 (en) | Method and apparatus for sorting query results | |
CN109508383A (en) | The construction method and device of knowledge mapping | |
CN104536973B (en) | The method and browser client of picture recognition | |
CN109597936B (en) | New user screening system and method | |
EP2973038A1 (en) | Classifying resources using a deep network | |
CN104778208A (en) | Method and system for optimally grasping search engine SEO (search engine optimization) website data | |
Stone et al. | Extracting consumer preference from user-generated content sources using classification | |
CN105069077A (en) | Search method and device | |
US20130238375A1 (en) | Evaluating email information and aggregating evaluation results | |
CN104462396B (en) | Character string processing method and device | |
US11789946B2 (en) | Answer facts from structured content | |
CN104021125A (en) | Search engine sorting method and system and search engine | |
US9355166B2 (en) | Clustering signifiers in a semantics graph | |
KR20210063874A (en) | A method and an apparatus for analyzing marketing information based on knowledge graphs | |
CN105096023A (en) | System and method for pushing data relevant to working standard | |
CN104978406A (en) | User behavior analysis method of Internet platform | |
CN111126071B (en) | Method and device for determining questioning text data and method for processing customer service group data | |
CN111611484A (en) | Stock recommendation method and system based on article attribute identification | |
CN106933798B (en) | Information analysis method and device | |
Zhang et al. | CRUC: Cold-start recommendations using collaborative filtering in internet of things | |
US20130117316A1 (en) | Method and system for modeling data | |
KR20220074574A (en) | A method and an apparatus for analyzing real-time chat content of live stream |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160608 |