CN102170640A - Mode library-based smart mobile phone terminal adverse content website identifying method - Google Patents
Mode library-based smart mobile phone terminal adverse content website identifying method Download PDFInfo
- Publication number
- CN102170640A CN102170640A CN201110146136XA CN201110146136A CN102170640A CN 102170640 A CN102170640 A CN 102170640A CN 201110146136X A CN201110146136X A CN 201110146136XA CN 201110146136 A CN201110146136 A CN 201110146136A CN 102170640 A CN102170640 A CN 102170640A
- Authority
- CN
- China
- Prior art keywords
- keyword
- storehouse
- content
- grade
- mobile phone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention relates to a mode library-based smart mobile phone terminal adverse content website identifying method. A network is provided with a cloud end server for providing a mode library for a smart mobile phone to download. The method comprises the following steps of: (1) carrying out keyword extraction processing on the traditional adverse website content sample and grading according to the occurrence frequency and adverse degree; dividing the keyword mode library into a plurality of grades, wherein each grade of keyword mode library is endowed with a unique adverse grade score, and the higher is score is, the larger the possibility that the content containing the keywords in the grade of keyword mode library is adverse information is; (2) carrying out keyword extraction processing on the website content to be accessed by using a word segmentation algorithm; (3) matching the extracted keyword with the keyword mode library for confirming the grade of the attributed mode library; and (4) summarizing the adverse grade score of the keyword, and when the sum is larger than a preset threshold value, judging the website content to be adverse information. The invention has the advantages of high detecting rate and low error detecting rate.
Description
One, technical field
The present invention relates to the method for using the pattern storehouse that smart mobile phone end objectionable website content is differentiated.
Two, background technology
Along with the fast development of mobile Internet, it has advanced smart mobile phone to move the flourish of industry greatly.Mobile phone, the same with PC, become a kind of important the Internet and insert and access means.Show that according to update China cellphone subscriber has reached 700,000,000 crowd, and has also surpassed 1.5 hundred million by the personage that cell phone platform is surfed the Net.Derive thereupon, the obscene porn site of mobile phone, and user's the visual field is also stepped in too numerous to mention mobile terminal network swindle etc.At being on the rise of a breach of security problem of all intelligent terminals, how the access to netwoks of mobile phone is effectively controlled and protected, more and more become an important subject under discussion.Present countermeasure mainly concentrates on checks and closes on the various vulgar websites; protected mode does not cover the transmission chain of whole pornographic information; also only limit to administration means, the means of protection and control should all be arranged, in each link especially for the portable terminal that inserts the Internet.In addition, because the temptation of huge interests, emerge in an endless stream in mobile phone both domestic and external porn site, only depends on to shut down web sites, and there are very big technology in inevitable ductility and missing rate to a certain degree sometimes and take precautions against leak,
Three, summary of the invention
The present invention seeks to: provide a kind of and be applied in intelligent mobile phone terminal, utilized renewable hierarchical pattern storehouse that web site contents is analyzed, the system schema of passing judgment on and feeding back.Especially the method that the objectionable website content of using the pattern storehouse may conduct interviews to the smart mobile phone end is differentiated.Can make smart mobile phone isolate the harmful effect of flame website automatically; Whether be flame, the standard of judgement is more accurately with comprehensive by polynary bad keyword in the pattern storehouse if especially adjudicating certain content of pages.
The objective of the invention is to be achieved through the following technical solutions: based on the smart mobile phone end harmful content website discrimination method in pattern storehouse, network is provided with the high in the clouds server storehouse that supplies a pattern allows smart mobile phone end (client) download.Pattern storehouse (keyword patterns storehouse) is set up in the following manner: (1) is made to extract keyword to existing objectionable website content sample and is handled, and carries out classification according to its frequency of occurrences and undesirable level; The keyword patterns storehouse is divided into some levels, and only one bad rating fraction is given in every grade of keyword patterns storehouse, and mark is high more, and the content that representative comprises keyword in this grade keyword patterns storehouse is that the possibility of flame is big more; (2), use and divide word algorithm that it is carried out the keyword extraction processing for the contained content in website to be visited; (3) keyword and the keyword patterns storehouse of extracting are mated, determine the grade in its affiliated pattern storehouse, promptly obtain the pairing bad rating fraction of this keyword, if no match pattern storehouse, the bad rating fraction of this keyword is 0; (4) the bad rating fraction with keyword adds up, and when summation during greater than a certain predetermined threshold value, can adjudicate this web page contents is flame; (5) when certain bad rating fraction summation did not reach threshold value, the keyword patterns storehouse also provided semantic clues behavior discriminant approach; Promptly in the pattern storehouse, define a flame keyword sequence A, B, C, D, wherein A, B, C, D are bad keyword, its bad rating fraction summation does not reach threshold value, but when a certain web site contents comprised these four keywords with the order of presetting sequence definition, then adjudicating this content of pages was flame; (6) with this objectionable website content uploading to the high in the clouds server; Server operation mode storehouse, high in the clouds is upgraded, and allows client downloads arrive up-to-date pattern storehouse.
Characteristics of the present invention are: the scheme that the present invention proposes can be differentiated the flame website on intelligent mobile phone terminal.The present invention makes full use of pattern storehouse matching technique and network technology, especially use the method in hierarchical pattern storehouse that the Website page content is given a mark to obtain its bad grade, to avoid the high False Rate of common keyword matching method, can make smart mobile phone isolate the harmful effect of flame website automatically; Use semantic clues behavior diagnostic method simultaneously, remedy the deficiency of threshold decision method, reduced misdetection rate.Whether the present invention especially adjudicates certain content of pages by polynary bad keyword in the pattern storehouse is flame, and the standard of judgement is more accurately with comprehensive.The present invention can be used for the technological means to the integrated management of network.
Four, description of drawings
Fig. 1 is the application block diagram of the scheme among the present invention.
Five, embodiment
The application block diagram of decision algorithm among the present invention in smart mobile phone end harmful content identification system as shown in Figure 1.
1. generate objectionable website content keyword pattern storehouse.Existing objectionable website content sample is made to extract keyword handle, carry out classification according to its frequency of occurrences and undesirable level.The keyword patterns storehouse can be divided into some levels, only one bad rating fraction is given in every grade of pattern storehouse, and mark is high more, and the content that representative comprises keyword in this grade pattern storehouse is that the possibility of flame is big more;
2. use bottom hook technology to obtain web site contents to be visited, use and divide word algorithm that it is carried out the keyword extraction processing;
3. the keyword and the pattern storehouse of extracting are mated, determine the grade in its affiliated pattern storehouse, promptly obtain the pairing bad rating fraction of this keyword, if no match pattern storehouse, the bad rating fraction of this keyword is 0;
4. the bad rating fraction with keyword adds up, and when summation during greater than a certain predetermined threshold value, can adjudicate this web page contents is flame.Can establish multistage threshold value, select to use according to client, threshold value is high more, and misdetection rate is low more, but False Rate is high more, and threshold value is low more, and misdetection rate is high more, but False Rate is low more;
With the objectionable website content uploading to the cloud storage server, so that improve the keyword patterns storehouse and adjust predetermined threshold value, to reduce misdetection rate and False Rate;
6. except that the method based on threshold decision was, the pattern storehouse also provided semantic clues behavior discriminant approach.The bad rating fraction of the keyword of some objectionable website content does not reach threshold value, can't the applicable threshold diagnostic method, and can use semantic clues behavior diagnostic method this moment.Promptly in the pattern storehouse, define a flame keyword sequence, for example (A, B, C, D), A wherein, B, C, D is bad keyword, its bad rating fraction summation does not reach threshold value, but when a certain web site contents comprised these four keywords with the order of presetting sequence definition, can adjudicate this content of pages was flame.
Be described further with embodiment with reference to the accompanying drawings:
1, as shown in Figure 1, when obtaining content of pages 1 to be visited, use word-dividing mode 2 to obtain the keyword 3 of this page.
2, as shown in Figure 1, use hierarchical pattern storehouse 4 to carry out the classification coupling, obtain the bad rating fraction 5 of this page page key words.
3, as shown in Figure 1, bad rating fraction 5 and predetermined threshold value 7 are made threshold ratio than 6, if threshold value comparative result 8 shows that bad rating fraction 5 is bigger, then adjudicating this page is the flame content; Otherwise, carry out semantic sequence behavior and differentiate 9.
4, as shown in Figure 1, semantic sequence behavior is differentiated 9 and according to the sequence criterion in the pattern storehouse page key words 3 is further adjudicated, and obtains final judging result 10.
5, as shown in Figure 1, will differentiate result 10 and feed back 11, upload to high in the clouds server 12 as the result.
6, as shown in Figure 1, high in the clouds server 12 is according to feedback result, and the operation mode storehouse upgrades 13, allows client downloads arrive up-to-date pattern storehouse.Promptly use high in the clouds server collection terminal feedback information and new model storehouse more, utilize the harmful content keyword that identifies that existing pattern storehouse is replenished, and distribute up-to-date pattern storehouse and arrive each intelligent mobile phone terminal.
Claims (1)
1. based on the smart mobile phone end harmful content website discrimination method in pattern storehouse, it is characterized in that network is provided with the high in the clouds server storehouse that supplies a pattern and allows smart mobile phone end (client) download, pattern storehouse (keyword patterns storehouse) is set up in the following manner: (1) is made to extract keyword to existing objectionable website content sample and is handled, and carries out classification according to its frequency of occurrences and undesirable level; The keyword patterns storehouse is divided into some levels, and only one bad rating fraction is given in every grade of keyword patterns storehouse, and mark is high more, and the content that representative comprises keyword in this grade keyword patterns storehouse is that the possibility of flame is big more; (2) treat the access websites content, use and divide word algorithm that it is carried out the keyword extraction processing; (3) keyword and the keyword patterns storehouse of extracting are mated, determine the grade in its affiliated pattern storehouse, promptly obtain the pairing bad rating fraction of this keyword, if no match pattern storehouse, the bad rating fraction of this keyword is 0; (4) the bad rating fraction with keyword adds up, and when summation during greater than a certain predetermined threshold value, can adjudicate this web page contents is flame; (5) when certain bad rating fraction summation did not reach threshold value, the keyword patterns storehouse also provided semantic clues behavior discriminant approach; Promptly in the pattern storehouse, define a flame keyword sequence A, B, C, D, wherein A, B, C, D are bad keyword, its bad rating fraction summation does not reach threshold value, but when a certain web site contents comprised these four keywords with the order of presetting sequence definition, then adjudicating this content of pages was flame; (6) with this objectionable website content uploading to the high in the clouds server; Server operation mode storehouse, high in the clouds is upgraded, and allows client downloads arrive up-to-date pattern storehouse.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110146136XA CN102170640A (en) | 2011-06-01 | 2011-06-01 | Mode library-based smart mobile phone terminal adverse content website identifying method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110146136XA CN102170640A (en) | 2011-06-01 | 2011-06-01 | Mode library-based smart mobile phone terminal adverse content website identifying method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102170640A true CN102170640A (en) | 2011-08-31 |
Family
ID=44491581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110146136XA Pending CN102170640A (en) | 2011-06-01 | 2011-06-01 | Mode library-based smart mobile phone terminal adverse content website identifying method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102170640A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663093A (en) * | 2012-04-10 | 2012-09-12 | 中国科学院计算机网络信息中心 | Method and device for detecting bad website |
CN102902790A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Web page classification system and method |
CN102902794A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Web page classification system and method |
CN103116647A (en) * | 2013-02-27 | 2013-05-22 | 武汉虹旭信息技术有限责任公司 | Data mining system and method based on mobile internet harmful information |
CN103167499A (en) * | 2012-09-07 | 2013-06-19 | 深圳市金立通信设备有限公司 | Entertainment safe limiting system and method of smartphone |
CN103208014A (en) * | 2012-01-13 | 2013-07-17 | 施亿民 | Image recognition system and operation method thereof |
CN103279476A (en) * | 2013-04-11 | 2013-09-04 | 深圳市易聆科信息技术有限公司 | Detection method and system for WEB application system sensitive words |
CN103324615A (en) * | 2012-03-19 | 2013-09-25 | 哈尔滨安天科技股份有限公司 | Method and system for detecting phishing website based on SEO (search engine optimization) |
CN103475642A (en) * | 2013-08-22 | 2013-12-25 | 北京奇虎科技有限公司 | Malicious forum identification method and malicious forum identification device |
CN103473299A (en) * | 2013-09-06 | 2013-12-25 | 北京锐安科技有限公司 | Website bad likelihood obtaining method and device |
CN103841076A (en) * | 2012-11-20 | 2014-06-04 | 天讯天网(福建)网络科技有限公司 | Pornographic-webpage monitoring method |
TWI456511B (en) * | 2012-01-06 | 2014-10-11 | ||
CN104217156A (en) * | 2013-06-03 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Method and device for preventing plug-in of games |
WO2015058631A1 (en) * | 2013-10-23 | 2015-04-30 | Tencent Technology (Shenzhen) Company Limited | Method, server and system for malicious url identification |
CN104933055A (en) * | 2014-03-18 | 2015-09-23 | 腾讯科技(深圳)有限公司 | Webpage identification method and webpage identification device |
CN106815200A (en) * | 2015-11-30 | 2017-06-09 | 任子行网络技术股份有限公司 | Objectionable text detection method and device based on keyword |
CN107547555A (en) * | 2017-09-11 | 2018-01-05 | 北京匠数科技有限公司 | A kind of web portal security monitoring method and device |
CN109076167A (en) * | 2016-06-17 | 2018-12-21 | 索尼公司 | Image processor, photographic device and image processing system |
US10176000B2 (en) | 2016-02-29 | 2019-01-08 | International Business Machines Corporation | Dynamic assistant for applications based on pattern analysis |
US10262041B2 (en) | 2017-03-29 | 2019-04-16 | Accenture Global Solutions Limited | Scoring mechanism for discovery of extremist content |
CN112507086A (en) * | 2020-12-21 | 2021-03-16 | 中电福富信息科技有限公司 | Bad information monitoring method combining deep learning and keyword factors |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1761204A (en) * | 2005-11-18 | 2006-04-19 | 郑州金惠计算机系统工程有限公司 | System for blocking off erotic images and unhealthy information in internet |
CN101035128A (en) * | 2007-04-18 | 2007-09-12 | 大连理工大学 | Three-folded webpage text content recognition and filtering method based on the Chinese punctuation |
CN101692639A (en) * | 2009-09-15 | 2010-04-07 | 西安交通大学 | Bad webpage recognition method based on URL |
CN101996203A (en) * | 2009-08-13 | 2011-03-30 | 阿里巴巴集团控股有限公司 | Web information filtering method and system |
-
2011
- 2011-06-01 CN CN201110146136XA patent/CN102170640A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1761204A (en) * | 2005-11-18 | 2006-04-19 | 郑州金惠计算机系统工程有限公司 | System for blocking off erotic images and unhealthy information in internet |
CN101035128A (en) * | 2007-04-18 | 2007-09-12 | 大连理工大学 | Three-folded webpage text content recognition and filtering method based on the Chinese punctuation |
CN101996203A (en) * | 2009-08-13 | 2011-03-30 | 阿里巴巴集团控股有限公司 | Web information filtering method and system |
CN101692639A (en) * | 2009-09-15 | 2010-04-07 | 西安交通大学 | Bad webpage recognition method based on URL |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI456511B (en) * | 2012-01-06 | 2014-10-11 | ||
CN103208014A (en) * | 2012-01-13 | 2013-07-17 | 施亿民 | Image recognition system and operation method thereof |
CN103324615A (en) * | 2012-03-19 | 2013-09-25 | 哈尔滨安天科技股份有限公司 | Method and system for detecting phishing website based on SEO (search engine optimization) |
CN102663093B (en) * | 2012-04-10 | 2014-07-09 | 中国科学院计算机网络信息中心 | Method and device for detecting bad website |
CN102663093A (en) * | 2012-04-10 | 2012-09-12 | 中国科学院计算机网络信息中心 | Method and device for detecting bad website |
CN103167499A (en) * | 2012-09-07 | 2013-06-19 | 深圳市金立通信设备有限公司 | Entertainment safe limiting system and method of smartphone |
CN102902790A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Web page classification system and method |
CN102902794A (en) * | 2012-09-29 | 2013-01-30 | 北京奇虎科技有限公司 | Web page classification system and method |
CN102902794B (en) * | 2012-09-29 | 2016-08-03 | 北京奇虎科技有限公司 | Web page classification system and method |
CN103841076A (en) * | 2012-11-20 | 2014-06-04 | 天讯天网(福建)网络科技有限公司 | Pornographic-webpage monitoring method |
CN103116647A (en) * | 2013-02-27 | 2013-05-22 | 武汉虹旭信息技术有限责任公司 | Data mining system and method based on mobile internet harmful information |
CN103279476A (en) * | 2013-04-11 | 2013-09-04 | 深圳市易聆科信息技术有限公司 | Detection method and system for WEB application system sensitive words |
CN103279476B (en) * | 2013-04-11 | 2016-12-28 | 深圳市易聆科信息技术股份有限公司 | The detection method of a kind of WEB application system sensitive word and system |
CN104217156A (en) * | 2013-06-03 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Method and device for preventing plug-in of games |
CN104217156B (en) * | 2013-06-03 | 2018-04-20 | 腾讯科技(深圳)有限公司 | Prevent plug-in method and device of playing |
CN103475642A (en) * | 2013-08-22 | 2013-12-25 | 北京奇虎科技有限公司 | Malicious forum identification method and malicious forum identification device |
CN103473299B (en) * | 2013-09-06 | 2017-02-08 | 北京锐安科技有限公司 | Website bad likelihood obtaining method and device |
CN103473299A (en) * | 2013-09-06 | 2013-12-25 | 北京锐安科技有限公司 | Website bad likelihood obtaining method and device |
WO2015058631A1 (en) * | 2013-10-23 | 2015-04-30 | Tencent Technology (Shenzhen) Company Limited | Method, server and system for malicious url identification |
CN104933055A (en) * | 2014-03-18 | 2015-09-23 | 腾讯科技(深圳)有限公司 | Webpage identification method and webpage identification device |
CN104933055B (en) * | 2014-03-18 | 2020-01-31 | 腾讯科技(深圳)有限公司 | Webpage identification method and webpage identification device |
CN106815200A (en) * | 2015-11-30 | 2017-06-09 | 任子行网络技术股份有限公司 | Objectionable text detection method and device based on keyword |
US10176000B2 (en) | 2016-02-29 | 2019-01-08 | International Business Machines Corporation | Dynamic assistant for applications based on pattern analysis |
CN109076167A (en) * | 2016-06-17 | 2018-12-21 | 索尼公司 | Image processor, photographic device and image processing system |
US10262041B2 (en) | 2017-03-29 | 2019-04-16 | Accenture Global Solutions Limited | Scoring mechanism for discovery of extremist content |
CN107547555A (en) * | 2017-09-11 | 2018-01-05 | 北京匠数科技有限公司 | A kind of web portal security monitoring method and device |
CN112507086A (en) * | 2020-12-21 | 2021-03-16 | 中电福富信息科技有限公司 | Bad information monitoring method combining deep learning and keyword factors |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102170640A (en) | Mode library-based smart mobile phone terminal adverse content website identifying method | |
CN110119948B (en) | Power consumer credit evaluation method and system based on time-varying weight dynamic combination | |
CN105550583A (en) | Random forest classification method based detection method for malicious application in Android platform | |
CN104735074A (en) | Malicious URL detection method and implement system thereof | |
CN105426762A (en) | Static detection method for malice of android application programs | |
CN105787366A (en) | Android software visualization safety analysis method based on module relations | |
CN107870945B (en) | Content rating method and apparatus | |
CN103136372A (en) | Method of quick location, classification and filtration of universal resource locator (URL) in network credibility behavior management | |
CN102073707A (en) | Method and device for identifying short text category information in real time, and computer equipment | |
CN102867038A (en) | Method and device for determining type of file | |
Cummins et al. | Evolving local and global weighting schemes in information retrieval | |
CN107958154A (en) | A kind of malware detection device and method | |
CN114338064B (en) | Method, device, system, equipment and storage medium for identifying network traffic type | |
CN107341371A (en) | A kind of script control method suitable for web configurations | |
CN111310021A (en) | Network public opinion monitoring method | |
CN112765660A (en) | Terminal security analysis method and system based on MapReduce parallel clustering technology | |
CN102999538A (en) | Character searching method and equipment | |
CN103914534B (en) | Content of text sorting technique based on specialist system URL classification knowledge base | |
CN104714947A (en) | Preset type number recognition method and device | |
CN107766342A (en) | A kind of recognition methods of application and device | |
CN105099996B (en) | Website verification method and device | |
CN110225007A (en) | The clustering method of webshell data on flows and controller and medium | |
CN105447616A (en) | Knowledge management system based on multidimensional classification and full-text retrieval | |
CN110032596B (en) | Method and system for identifying abnormal traffic user | |
CN115358214A (en) | Keyword identification method and system based on user browsing and searching behaviors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20110831 |