CN102693236A - Bad information filtering method based on content understanding - Google Patents
Bad information filtering method based on content understanding Download PDFInfo
- Publication number
- CN102693236A CN102693236A CN2011100712318A CN201110071231A CN102693236A CN 102693236 A CN102693236 A CN 102693236A CN 2011100712318 A CN2011100712318 A CN 2011100712318A CN 201110071231 A CN201110071231 A CN 201110071231A CN 102693236 A CN102693236 A CN 102693236A
- Authority
- CN
- China
- Prior art keywords
- content
- flame
- information
- text
- filter method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The invention relates to a bad information filtering method based on content understanding. The method comprises the following steps of: firstly performing data pretreatment on the content in a network information source, extracting the dominant and recessive features capable of reflecting the content or helpful in distinguishing the content, and effectively expressing the bad information content through the feature item; matching a bad information template with the bad information content to be processed according to the matching rule and method; performing corresponding filtration of the information source according to the matching result; and finally, returning the processed result to a user of the Web page. Therefore, the method provided by the invention can accurately and effectively filter the bad information in the network information according to the context of the text information content and various features of the image information so as to provide a clean network environment to the user; and the application prospects of the method are very broad.
Description
Technical field
The present invention relates to a kind of information filtering method, relate in particular to a kind of flame filter method of content-based understanding.
Background technology
Along with the development of Internet technology, various very different information contents sharply expand in recent years, and network information security problem becomes increasingly conspicuous, and serious have ruined social general mood, and therefore society is strong day by day to the filtration needs of information with the individual.Yet in conjunction with the flame filter software and the system that are using at present; Exist the phenomenon of failing to report, misrepresenting deliberately, and filter velocity is slower, and the method for the content-based analysis that the present invention proposes; Not only can accurately effectively filter flame; For the user provides clean network environment, and filter velocity is very fast, and application prospect is boundless.
Summary of the invention
The object of the invention is exactly the problems referred to above that exist in the prior art in order to solve, and a kind of flame filter method of content-based understanding is provided.
The object of the invention is realized through following technical scheme:
The flame filter method of content-based understanding, it may further comprise the steps:
1. step carries out the data pre-service to the content in the network information source, therefrom extracts to reflect or to help dominance and the recessive character of differentiating content, makes the flame content through the characteristic item effectively expressing;
2. step according to matched rule and method, matees flame template and pending flame content;
3. step is carried out corresponding filter according to matching result to information source and is handled;
4. step returns to the result after handling the user of Web page or leaf.
The flame filter method of above-mentioned content-based understanding, wherein: described network information source comprises content of text information and image content information.
Further, the flame filter method of above-mentioned content-based understanding, wherein: the filtration of described text message is context of co-text, the text elements according to content of text, through analyzing and understand the semanteme of content of text, finds flame.
Further; The flame filter method of above-mentioned content-based understanding; Wherein: the filtration of said picture material is color, texture, shape, profile and color, texture, shape, the spatial relationship characteristic between the profile and semantic as index according to image, filters through the coupling of the similarity degree between the image.
Further, the flame filter method of above-mentioned content-based understanding, wherein: the 2. described flame of step comprises, obscene pornographic, reaction violence and junk information.
Again further; The flame filter method of above-mentioned content-based understanding; Wherein: described pre-service is the irrelevant information of removing in the network information source; Keep Useful Information and it is described characteristic separate and quantize, will reflect then or help to distinguish that the dominance of content character and recessive information extract, make flame can pass through the characteristic item effective expression.
The advantage of technical scheme of the present invention is mainly reflected in: can be according to the context of co-text of content of text messages and the various characteristics of image information; Flame in the accurately effective screen information; For the user provides a clean network environment, its application prospect is boundless.
The object of the invention, advantage and characteristics will make an explanation through the non-limitative illustration of following preferred embodiment.These embodiment only are the prominent examples of using technical scheme of the present invention, and all technical schemes of taking to be equal to replacement or equivalent transformation and forming all drop within the scope of requirement protection of the present invention.
Embodiment
The flame filter method of content-based understanding; Its unusual part is may further comprise the steps: at first; Content in the network information source is carried out the data pre-service; Therefrom extract and to reflect or to help dominance and the recessive character of differentiating content, make the flame content through the characteristic item effectively expressing.Specifically, described network information source comprises content of text information and image content information.
Afterwards, according to matched rule and method, flame template and pending flame content are mated.Specifically, described flame comprises, obscene pornographic, reaction violence and junk information.
Then, according to matching result information source being carried out corresponding filter handles.At last, the result after handling is returned to the user of Web page or leaf.
In conjunction with actual implementation process of the present invention, adopting the filtration of text message is context of co-text, text elements according to content of text, through analyzing and understand the semanteme of content of text, finds flame.Simultaneously, the filtration of said picture material is color, texture, shape, profile and color, texture, shape, the spatial relationship characteristic between the profile and semantic as index according to image, filters through the coupling of the similarity degree between the image.And; In order to play preferable filter effect; The pre-service of adopting is the irrelevant information of removing in the network information source; Keep Useful Information and it is described characteristic separate and quantize, will reflect then or help to distinguish that the dominance of content character and recessive information extract, make flame can pass through the characteristic item effective expression.
Can find out through above-mentioned character express; After adopting the present invention; Can be according to the context of co-text of content of text messages and the various characteristics of image information; The accurate effectively flame in the screen information, for the user provides a clean network environment, its application prospect is boundless.
Claims (6)
1. the flame filter method of content-based understanding is characterized in that may further comprise the steps:
1. step carries out the data pre-service to the content in the network information source, therefrom extracts to reflect or to help dominance and the recessive character of differentiating content, makes the flame content through the characteristic item effectively expressing;
2. step according to matched rule and method, matees flame template and pending flame content;
3. step is carried out corresponding filter according to matching result to information source and is handled;
4. step returns to the result after handling the user of Web page or leaf.
2. the flame filter method of content-based understanding according to claim 1, it is characterized in that: described network information source comprises content of text information and image content information.
3. the flame filter method of content-based understanding according to claim 2; It is characterized in that: the filtration of described text message is context of co-text, the text elements according to content of text; Through analyzing and understand the semanteme of content of text, find flame.
4. the flame filter method of content-based understanding according to claim 2; It is characterized in that: the filtration of said picture material is color, texture, shape, profile and color, texture, shape, the spatial relationship characteristic between the profile and semantic as index according to image, filters through the coupling of the similarity degree between the image.
5. the flame filter method of content-based understanding according to claim 1 is characterized in that: the 2. described flame of step comprises, obscene pornographic, reaction violence and junk information.
6. the flame filter method of content-based understanding according to claim 1; It is characterized in that: described pre-service is the irrelevant information of removing in the network information source; Keep Useful Information and it is described characteristic separate and quantize; To reflect then or help to distinguish that the dominance of content character and recessive information extract, and make flame can pass through the characteristic item effective expression.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100712318A CN102693236A (en) | 2011-03-24 | 2011-03-24 | Bad information filtering method based on content understanding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100712318A CN102693236A (en) | 2011-03-24 | 2011-03-24 | Bad information filtering method based on content understanding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102693236A true CN102693236A (en) | 2012-09-26 |
Family
ID=46858693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011100712318A Pending CN102693236A (en) | 2011-03-24 | 2011-03-24 | Bad information filtering method based on content understanding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102693236A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609516A (en) * | 2012-02-08 | 2012-07-25 | 苏州中联互通信息科技有限公司 | Content understanding-based bad information filter method |
CN103473299A (en) * | 2013-09-06 | 2013-12-25 | 北京锐安科技有限公司 | Website bad likelihood obtaining method and device |
WO2015058631A1 (en) * | 2013-10-23 | 2015-04-30 | Tencent Technology (Shenzhen) Company Limited | Method, server and system for malicious url identification |
CN105740752A (en) * | 2014-12-11 | 2016-07-06 | 世纪龙信息网络有限责任公司 | Method and system for sensitive image filtering |
WO2018000273A1 (en) * | 2016-06-29 | 2018-01-04 | 深圳狗尾草智能科技有限公司 | Device and method for detecting unacceptable corpus data content |
CN107547555A (en) * | 2017-09-11 | 2018-01-05 | 北京匠数科技有限公司 | A kind of web portal security monitoring method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101055621A (en) * | 2006-04-10 | 2007-10-17 | 中国科学院自动化研究所 | Content based sensitive web page identification method |
CN101359329A (en) * | 2008-04-01 | 2009-02-04 | 北京恒金恒泰信息技术有限公司 | Plugin for filtrating erotic software based on browser |
-
2011
- 2011-03-24 CN CN2011100712318A patent/CN102693236A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101055621A (en) * | 2006-04-10 | 2007-10-17 | 中国科学院自动化研究所 | Content based sensitive web page identification method |
CN101359329A (en) * | 2008-04-01 | 2009-02-04 | 北京恒金恒泰信息技术有限公司 | Plugin for filtrating erotic software based on browser |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609516A (en) * | 2012-02-08 | 2012-07-25 | 苏州中联互通信息科技有限公司 | Content understanding-based bad information filter method |
CN103473299A (en) * | 2013-09-06 | 2013-12-25 | 北京锐安科技有限公司 | Website bad likelihood obtaining method and device |
CN103473299B (en) * | 2013-09-06 | 2017-02-08 | 北京锐安科技有限公司 | Website bad likelihood obtaining method and device |
WO2015058631A1 (en) * | 2013-10-23 | 2015-04-30 | Tencent Technology (Shenzhen) Company Limited | Method, server and system for malicious url identification |
CN105740752A (en) * | 2014-12-11 | 2016-07-06 | 世纪龙信息网络有限责任公司 | Method and system for sensitive image filtering |
CN105740752B (en) * | 2014-12-11 | 2021-05-11 | 世纪龙信息网络有限责任公司 | Sensitive picture filtering method and system |
WO2018000273A1 (en) * | 2016-06-29 | 2018-01-04 | 深圳狗尾草智能科技有限公司 | Device and method for detecting unacceptable corpus data content |
CN107547555A (en) * | 2017-09-11 | 2018-01-05 | 北京匠数科技有限公司 | A kind of web portal security monitoring method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106202211B (en) | Integrated microblog rumor identification method based on microblog types | |
CN102279894B (en) | Method for searching, integrating and providing comment information based on semantics and searching system | |
CN102693236A (en) | Bad information filtering method based on content understanding | |
CN103631948B (en) | Identifying method of named entities | |
Jiang et al. | Spotting suspicious behaviors in multimodal data: A general metric and algorithms | |
CN109684513B (en) | Low-quality video identification method and device | |
CN107391598B (en) | Automatic threat information generation method and system | |
CN104504150A (en) | News public opinion monitoring system | |
CN102542061B (en) | Intelligent product classification method | |
CN103020159A (en) | Method and device for news presentation facing events | |
CN103744877A (en) | Public opinion monitoring application system deployed in internet and application method | |
CN103324622A (en) | Method and device for automatic generating of front page abstract | |
CN103473340A (en) | Classifying method for internet multimedia contents based on video image | |
CN103729178A (en) | Method and system for processing multiple tabs of browsers | |
WO2021114634A1 (en) | Text annotation method, device, and storage medium | |
CN105117434A (en) | Webpage classification method and webpage classification system | |
JP2009506394A5 (en) | ||
CN106844588A (en) | A kind of analysis method and system of the user behavior data based on web crawlers | |
CN102609516A (en) | Content understanding-based bad information filter method | |
Jin et al. | Filtering spam in Weibo using ensemble imbalanced classification and knowledge expansion | |
US8266140B2 (en) | Tagging system using internet search engine | |
CN103092838B (en) | A kind of method and device for obtaining English words | |
CN104331396A (en) | Intelligent advertisement identifying method | |
US20140379806A1 (en) | Data matching method and device | |
CN101562603A (en) | Method and system for parsing telnet protocol by echoing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120926 |