CN102693236A - Bad information filtering method based on content understanding - Google Patents

Bad information filtering method based on content understanding Download PDF

Info

Publication number
CN102693236A
CN102693236A CN2011100712318A CN201110071231A CN102693236A CN 102693236 A CN102693236 A CN 102693236A CN 2011100712318 A CN2011100712318 A CN 2011100712318A CN 201110071231 A CN201110071231 A CN 201110071231A CN 102693236 A CN102693236 A CN 102693236A
Authority
CN
China
Prior art keywords
content
flame
information
text
filter method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100712318A
Other languages
Chinese (zh)
Inventor
宦奕奕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUZHOU STYLE INFORMATION TECHNOLOGY CO LTD
Original Assignee
SUZHOU STYLE INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUZHOU STYLE INFORMATION TECHNOLOGY CO LTD filed Critical SUZHOU STYLE INFORMATION TECHNOLOGY CO LTD
Priority to CN2011100712318A priority Critical patent/CN102693236A/en
Publication of CN102693236A publication Critical patent/CN102693236A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a bad information filtering method based on content understanding. The method comprises the following steps of: firstly performing data pretreatment on the content in a network information source, extracting the dominant and recessive features capable of reflecting the content or helpful in distinguishing the content, and effectively expressing the bad information content through the feature item; matching a bad information template with the bad information content to be processed according to the matching rule and method; performing corresponding filtration of the information source according to the matching result; and finally, returning the processed result to a user of the Web page. Therefore, the method provided by the invention can accurately and effectively filter the bad information in the network information according to the context of the text information content and various features of the image information so as to provide a clean network environment to the user; and the application prospects of the method are very broad.

Description

The flame filter method of content-based understanding
Technical field
The present invention relates to a kind of information filtering method, relate in particular to a kind of flame filter method of content-based understanding.
Background technology
Along with the development of Internet technology, various very different information contents sharply expand in recent years, and network information security problem becomes increasingly conspicuous, and serious have ruined social general mood, and therefore society is strong day by day to the filtration needs of information with the individual.Yet in conjunction with the flame filter software and the system that are using at present; Exist the phenomenon of failing to report, misrepresenting deliberately, and filter velocity is slower, and the method for the content-based analysis that the present invention proposes; Not only can accurately effectively filter flame; For the user provides clean network environment, and filter velocity is very fast, and application prospect is boundless.
Summary of the invention
The object of the invention is exactly the problems referred to above that exist in the prior art in order to solve, and a kind of flame filter method of content-based understanding is provided.
The object of the invention is realized through following technical scheme:
The flame filter method of content-based understanding, it may further comprise the steps:
1. step carries out the data pre-service to the content in the network information source, therefrom extracts to reflect or to help dominance and the recessive character of differentiating content, makes the flame content through the characteristic item effectively expressing;
2. step according to matched rule and method, matees flame template and pending flame content;
3. step is carried out corresponding filter according to matching result to information source and is handled;
4. step returns to the result after handling the user of Web page or leaf.
The flame filter method of above-mentioned content-based understanding, wherein: described network information source comprises content of text information and image content information.
Further, the flame filter method of above-mentioned content-based understanding, wherein: the filtration of described text message is context of co-text, the text elements according to content of text, through analyzing and understand the semanteme of content of text, finds flame.
Further; The flame filter method of above-mentioned content-based understanding; Wherein: the filtration of said picture material is color, texture, shape, profile and color, texture, shape, the spatial relationship characteristic between the profile and semantic as index according to image, filters through the coupling of the similarity degree between the image.
Further, the flame filter method of above-mentioned content-based understanding, wherein: the 2. described flame of step comprises, obscene pornographic, reaction violence and junk information.
Again further; The flame filter method of above-mentioned content-based understanding; Wherein: described pre-service is the irrelevant information of removing in the network information source; Keep Useful Information and it is described characteristic separate and quantize, will reflect then or help to distinguish that the dominance of content character and recessive information extract, make flame can pass through the characteristic item effective expression.
The advantage of technical scheme of the present invention is mainly reflected in: can be according to the context of co-text of content of text messages and the various characteristics of image information; Flame in the accurately effective screen information; For the user provides a clean network environment, its application prospect is boundless.
The object of the invention, advantage and characteristics will make an explanation through the non-limitative illustration of following preferred embodiment.These embodiment only are the prominent examples of using technical scheme of the present invention, and all technical schemes of taking to be equal to replacement or equivalent transformation and forming all drop within the scope of requirement protection of the present invention.
Embodiment
The flame filter method of content-based understanding; Its unusual part is may further comprise the steps: at first; Content in the network information source is carried out the data pre-service; Therefrom extract and to reflect or to help dominance and the recessive character of differentiating content, make the flame content through the characteristic item effectively expressing.Specifically, described network information source comprises content of text information and image content information.
Afterwards, according to matched rule and method, flame template and pending flame content are mated.Specifically, described flame comprises, obscene pornographic, reaction violence and junk information.
Then, according to matching result information source being carried out corresponding filter handles.At last, the result after handling is returned to the user of Web page or leaf.
In conjunction with actual implementation process of the present invention, adopting the filtration of text message is context of co-text, text elements according to content of text, through analyzing and understand the semanteme of content of text, finds flame.Simultaneously, the filtration of said picture material is color, texture, shape, profile and color, texture, shape, the spatial relationship characteristic between the profile and semantic as index according to image, filters through the coupling of the similarity degree between the image.And; In order to play preferable filter effect; The pre-service of adopting is the irrelevant information of removing in the network information source; Keep Useful Information and it is described characteristic separate and quantize, will reflect then or help to distinguish that the dominance of content character and recessive information extract, make flame can pass through the characteristic item effective expression.
Can find out through above-mentioned character express; After adopting the present invention; Can be according to the context of co-text of content of text messages and the various characteristics of image information; The accurate effectively flame in the screen information, for the user provides a clean network environment, its application prospect is boundless.

Claims (6)

1. the flame filter method of content-based understanding is characterized in that may further comprise the steps:
1. step carries out the data pre-service to the content in the network information source, therefrom extracts to reflect or to help dominance and the recessive character of differentiating content, makes the flame content through the characteristic item effectively expressing;
2. step according to matched rule and method, matees flame template and pending flame content;
3. step is carried out corresponding filter according to matching result to information source and is handled;
4. step returns to the result after handling the user of Web page or leaf.
2. the flame filter method of content-based understanding according to claim 1, it is characterized in that: described network information source comprises content of text information and image content information.
3. the flame filter method of content-based understanding according to claim 2; It is characterized in that: the filtration of described text message is context of co-text, the text elements according to content of text; Through analyzing and understand the semanteme of content of text, find flame.
4. the flame filter method of content-based understanding according to claim 2; It is characterized in that: the filtration of said picture material is color, texture, shape, profile and color, texture, shape, the spatial relationship characteristic between the profile and semantic as index according to image, filters through the coupling of the similarity degree between the image.
5. the flame filter method of content-based understanding according to claim 1 is characterized in that: the 2. described flame of step comprises, obscene pornographic, reaction violence and junk information.
6. the flame filter method of content-based understanding according to claim 1; It is characterized in that: described pre-service is the irrelevant information of removing in the network information source; Keep Useful Information and it is described characteristic separate and quantize; To reflect then or help to distinguish that the dominance of content character and recessive information extract, and make flame can pass through the characteristic item effective expression.
CN2011100712318A 2011-03-24 2011-03-24 Bad information filtering method based on content understanding Pending CN102693236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100712318A CN102693236A (en) 2011-03-24 2011-03-24 Bad information filtering method based on content understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100712318A CN102693236A (en) 2011-03-24 2011-03-24 Bad information filtering method based on content understanding

Publications (1)

Publication Number Publication Date
CN102693236A true CN102693236A (en) 2012-09-26

Family

ID=46858693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100712318A Pending CN102693236A (en) 2011-03-24 2011-03-24 Bad information filtering method based on content understanding

Country Status (1)

Country Link
CN (1) CN102693236A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609516A (en) * 2012-02-08 2012-07-25 苏州中联互通信息科技有限公司 Content understanding-based bad information filter method
CN103473299A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Website bad likelihood obtaining method and device
WO2015058631A1 (en) * 2013-10-23 2015-04-30 Tencent Technology (Shenzhen) Company Limited Method, server and system for malicious url identification
CN105740752A (en) * 2014-12-11 2016-07-06 世纪龙信息网络有限责任公司 Method and system for sensitive image filtering
WO2018000273A1 (en) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 Device and method for detecting unacceptable corpus data content
CN107547555A (en) * 2017-09-11 2018-01-05 北京匠数科技有限公司 A kind of web portal security monitoring method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055621A (en) * 2006-04-10 2007-10-17 中国科学院自动化研究所 Content based sensitive web page identification method
CN101359329A (en) * 2008-04-01 2009-02-04 北京恒金恒泰信息技术有限公司 Plugin for filtrating erotic software based on browser

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055621A (en) * 2006-04-10 2007-10-17 中国科学院自动化研究所 Content based sensitive web page identification method
CN101359329A (en) * 2008-04-01 2009-02-04 北京恒金恒泰信息技术有限公司 Plugin for filtrating erotic software based on browser

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609516A (en) * 2012-02-08 2012-07-25 苏州中联互通信息科技有限公司 Content understanding-based bad information filter method
CN103473299A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Website bad likelihood obtaining method and device
CN103473299B (en) * 2013-09-06 2017-02-08 北京锐安科技有限公司 Website bad likelihood obtaining method and device
WO2015058631A1 (en) * 2013-10-23 2015-04-30 Tencent Technology (Shenzhen) Company Limited Method, server and system for malicious url identification
CN105740752A (en) * 2014-12-11 2016-07-06 世纪龙信息网络有限责任公司 Method and system for sensitive image filtering
CN105740752B (en) * 2014-12-11 2021-05-11 世纪龙信息网络有限责任公司 Sensitive picture filtering method and system
WO2018000273A1 (en) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 Device and method for detecting unacceptable corpus data content
CN107547555A (en) * 2017-09-11 2018-01-05 北京匠数科技有限公司 A kind of web portal security monitoring method and device

Similar Documents

Publication Publication Date Title
CN106202211B (en) Integrated microblog rumor identification method based on microblog types
CN102279894B (en) Method for searching, integrating and providing comment information based on semantics and searching system
CN102693236A (en) Bad information filtering method based on content understanding
CN103631948B (en) Identifying method of named entities
Jiang et al. Spotting suspicious behaviors in multimodal data: A general metric and algorithms
CN109684513B (en) Low-quality video identification method and device
CN107391598B (en) Automatic threat information generation method and system
CN104504150A (en) News public opinion monitoring system
CN102542061B (en) Intelligent product classification method
CN103020159A (en) Method and device for news presentation facing events
CN103744877A (en) Public opinion monitoring application system deployed in internet and application method
CN103324622A (en) Method and device for automatic generating of front page abstract
CN103473340A (en) Classifying method for internet multimedia contents based on video image
CN103729178A (en) Method and system for processing multiple tabs of browsers
WO2021114634A1 (en) Text annotation method, device, and storage medium
CN105117434A (en) Webpage classification method and webpage classification system
JP2009506394A5 (en)
CN106844588A (en) A kind of analysis method and system of the user behavior data based on web crawlers
CN102609516A (en) Content understanding-based bad information filter method
Jin et al. Filtering spam in Weibo using ensemble imbalanced classification and knowledge expansion
US8266140B2 (en) Tagging system using internet search engine
CN103092838B (en) A kind of method and device for obtaining English words
CN104331396A (en) Intelligent advertisement identifying method
US20140379806A1 (en) Data matching method and device
CN101562603A (en) Method and system for parsing telnet protocol by echoing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120926