CN102170640A - Mode library-based smart mobile phone terminal adverse content website identifying method - Google Patents

Mode library-based smart mobile phone terminal adverse content website identifying method Download PDF

Info

Publication number
CN102170640A
CN102170640A CN201110146136XA CN201110146136A CN102170640A CN 102170640 A CN102170640 A CN 102170640A CN 201110146136X A CN201110146136X A CN 201110146136XA CN 201110146136 A CN201110146136 A CN 201110146136A CN 102170640 A CN102170640 A CN 102170640A
Authority
CN
China
Prior art keywords
keyword
storehouse
content
grade
mobile phone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110146136XA
Other languages
Chinese (zh)
Inventor
肖波
孙浩量
刘建树
肖顺华
李骥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANTONG HAIYUN INFORMATION TECHNOLOGY SERVICE Co Ltd
Original Assignee
NANTONG HAIYUN INFORMATION TECHNOLOGY SERVICE Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANTONG HAIYUN INFORMATION TECHNOLOGY SERVICE Co Ltd filed Critical NANTONG HAIYUN INFORMATION TECHNOLOGY SERVICE Co Ltd
Priority to CN201110146136XA priority Critical patent/CN102170640A/en
Publication of CN102170640A publication Critical patent/CN102170640A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a mode library-based smart mobile phone terminal adverse content website identifying method. A network is provided with a cloud end server for providing a mode library for a smart mobile phone to download. The method comprises the following steps of: (1) carrying out keyword extraction processing on the traditional adverse website content sample and grading according to the occurrence frequency and adverse degree; dividing the keyword mode library into a plurality of grades, wherein each grade of keyword mode library is endowed with a unique adverse grade score, and the higher is score is, the larger the possibility that the content containing the keywords in the grade of keyword mode library is adverse information is; (2) carrying out keyword extraction processing on the website content to be accessed by using a word segmentation algorithm; (3) matching the extracted keyword with the keyword mode library for confirming the grade of the attributed mode library; and (4) summarizing the adverse grade score of the keyword, and when the sum is larger than a preset threshold value, judging the website content to be adverse information. The invention has the advantages of high detecting rate and low error detecting rate.

Description

Smart mobile phone end harmful content website discrimination method based on the pattern storehouse
One, technical field
The present invention relates to the method for using the pattern storehouse that smart mobile phone end objectionable website content is differentiated.
Two, background technology
Along with the fast development of mobile Internet, it has advanced smart mobile phone to move the flourish of industry greatly.Mobile phone, the same with PC, become a kind of important the Internet and insert and access means.Show that according to update China cellphone subscriber has reached 700,000,000 crowd, and has also surpassed 1.5 hundred million by the personage that cell phone platform is surfed the Net.Derive thereupon, the obscene porn site of mobile phone, and user's the visual field is also stepped in too numerous to mention mobile terminal network swindle etc.At being on the rise of a breach of security problem of all intelligent terminals, how the access to netwoks of mobile phone is effectively controlled and protected, more and more become an important subject under discussion.Present countermeasure mainly concentrates on checks and closes on the various vulgar websites; protected mode does not cover the transmission chain of whole pornographic information; also only limit to administration means, the means of protection and control should all be arranged, in each link especially for the portable terminal that inserts the Internet.In addition, because the temptation of huge interests, emerge in an endless stream in mobile phone both domestic and external porn site, only depends on to shut down web sites, and there are very big technology in inevitable ductility and missing rate to a certain degree sometimes and take precautions against leak,
Three, summary of the invention
The present invention seeks to: provide a kind of and be applied in intelligent mobile phone terminal, utilized renewable hierarchical pattern storehouse that web site contents is analyzed, the system schema of passing judgment on and feeding back.Especially the method that the objectionable website content of using the pattern storehouse may conduct interviews to the smart mobile phone end is differentiated.Can make smart mobile phone isolate the harmful effect of flame website automatically; Whether be flame, the standard of judgement is more accurately with comprehensive by polynary bad keyword in the pattern storehouse if especially adjudicating certain content of pages.
The objective of the invention is to be achieved through the following technical solutions: based on the smart mobile phone end harmful content website discrimination method in pattern storehouse, network is provided with the high in the clouds server storehouse that supplies a pattern allows smart mobile phone end (client) download.Pattern storehouse (keyword patterns storehouse) is set up in the following manner: (1) is made to extract keyword to existing objectionable website content sample and is handled, and carries out classification according to its frequency of occurrences and undesirable level; The keyword patterns storehouse is divided into some levels, and only one bad rating fraction is given in every grade of keyword patterns storehouse, and mark is high more, and the content that representative comprises keyword in this grade keyword patterns storehouse is that the possibility of flame is big more; (2), use and divide word algorithm that it is carried out the keyword extraction processing for the contained content in website to be visited; (3) keyword and the keyword patterns storehouse of extracting are mated, determine the grade in its affiliated pattern storehouse, promptly obtain the pairing bad rating fraction of this keyword, if no match pattern storehouse, the bad rating fraction of this keyword is 0; (4) the bad rating fraction with keyword adds up, and when summation during greater than a certain predetermined threshold value, can adjudicate this web page contents is flame; (5) when certain bad rating fraction summation did not reach threshold value, the keyword patterns storehouse also provided semantic clues behavior discriminant approach; Promptly in the pattern storehouse, define a flame keyword sequence A, B, C, D, wherein A, B, C, D are bad keyword, its bad rating fraction summation does not reach threshold value, but when a certain web site contents comprised these four keywords with the order of presetting sequence definition, then adjudicating this content of pages was flame; (6) with this objectionable website content uploading to the high in the clouds server; Server operation mode storehouse, high in the clouds is upgraded, and allows client downloads arrive up-to-date pattern storehouse.
Characteristics of the present invention are: the scheme that the present invention proposes can be differentiated the flame website on intelligent mobile phone terminal.The present invention makes full use of pattern storehouse matching technique and network technology, especially use the method in hierarchical pattern storehouse that the Website page content is given a mark to obtain its bad grade, to avoid the high False Rate of common keyword matching method, can make smart mobile phone isolate the harmful effect of flame website automatically; Use semantic clues behavior diagnostic method simultaneously, remedy the deficiency of threshold decision method, reduced misdetection rate.Whether the present invention especially adjudicates certain content of pages by polynary bad keyword in the pattern storehouse is flame, and the standard of judgement is more accurately with comprehensive.The present invention can be used for the technological means to the integrated management of network.
Four, description of drawings
Fig. 1 is the application block diagram of the scheme among the present invention.
Five, embodiment
The application block diagram of decision algorithm among the present invention in smart mobile phone end harmful content identification system as shown in Figure 1.
1. generate objectionable website content keyword pattern storehouse.Existing objectionable website content sample is made to extract keyword handle, carry out classification according to its frequency of occurrences and undesirable level.The keyword patterns storehouse can be divided into some levels, only one bad rating fraction is given in every grade of pattern storehouse, and mark is high more, and the content that representative comprises keyword in this grade pattern storehouse is that the possibility of flame is big more;
2. use bottom hook technology to obtain web site contents to be visited, use and divide word algorithm that it is carried out the keyword extraction processing;
3. the keyword and the pattern storehouse of extracting are mated, determine the grade in its affiliated pattern storehouse, promptly obtain the pairing bad rating fraction of this keyword, if no match pattern storehouse, the bad rating fraction of this keyword is 0;
4. the bad rating fraction with keyword adds up, and when summation during greater than a certain predetermined threshold value, can adjudicate this web page contents is flame.Can establish multistage threshold value, select to use according to client, threshold value is high more, and misdetection rate is low more, but False Rate is high more, and threshold value is low more, and misdetection rate is high more, but False Rate is low more;
With the objectionable website content uploading to the cloud storage server, so that improve the keyword patterns storehouse and adjust predetermined threshold value, to reduce misdetection rate and False Rate;
6. except that the method based on threshold decision was, the pattern storehouse also provided semantic clues behavior discriminant approach.The bad rating fraction of the keyword of some objectionable website content does not reach threshold value, can't the applicable threshold diagnostic method, and can use semantic clues behavior diagnostic method this moment.Promptly in the pattern storehouse, define a flame keyword sequence, for example (A, B, C, D), A wherein, B, C, D is bad keyword, its bad rating fraction summation does not reach threshold value, but when a certain web site contents comprised these four keywords with the order of presetting sequence definition, can adjudicate this content of pages was flame.
Be described further with embodiment with reference to the accompanying drawings:
1, as shown in Figure 1, when obtaining content of pages 1 to be visited, use word-dividing mode 2 to obtain the keyword 3 of this page.
2, as shown in Figure 1, use hierarchical pattern storehouse 4 to carry out the classification coupling, obtain the bad rating fraction 5 of this page page key words.
3, as shown in Figure 1, bad rating fraction 5 and predetermined threshold value 7 are made threshold ratio than 6, if threshold value comparative result 8 shows that bad rating fraction 5 is bigger, then adjudicating this page is the flame content; Otherwise, carry out semantic sequence behavior and differentiate 9.
4, as shown in Figure 1, semantic sequence behavior is differentiated 9 and according to the sequence criterion in the pattern storehouse page key words 3 is further adjudicated, and obtains final judging result 10.
5, as shown in Figure 1, will differentiate result 10 and feed back 11, upload to high in the clouds server 12 as the result.
6, as shown in Figure 1, high in the clouds server 12 is according to feedback result, and the operation mode storehouse upgrades 13, allows client downloads arrive up-to-date pattern storehouse.Promptly use high in the clouds server collection terminal feedback information and new model storehouse more, utilize the harmful content keyword that identifies that existing pattern storehouse is replenished, and distribute up-to-date pattern storehouse and arrive each intelligent mobile phone terminal.

Claims (1)

1. based on the smart mobile phone end harmful content website discrimination method in pattern storehouse, it is characterized in that network is provided with the high in the clouds server storehouse that supplies a pattern and allows smart mobile phone end (client) download, pattern storehouse (keyword patterns storehouse) is set up in the following manner: (1) is made to extract keyword to existing objectionable website content sample and is handled, and carries out classification according to its frequency of occurrences and undesirable level; The keyword patterns storehouse is divided into some levels, and only one bad rating fraction is given in every grade of keyword patterns storehouse, and mark is high more, and the content that representative comprises keyword in this grade keyword patterns storehouse is that the possibility of flame is big more; (2) treat the access websites content, use and divide word algorithm that it is carried out the keyword extraction processing; (3) keyword and the keyword patterns storehouse of extracting are mated, determine the grade in its affiliated pattern storehouse, promptly obtain the pairing bad rating fraction of this keyword, if no match pattern storehouse, the bad rating fraction of this keyword is 0; (4) the bad rating fraction with keyword adds up, and when summation during greater than a certain predetermined threshold value, can adjudicate this web page contents is flame; (5) when certain bad rating fraction summation did not reach threshold value, the keyword patterns storehouse also provided semantic clues behavior discriminant approach; Promptly in the pattern storehouse, define a flame keyword sequence A, B, C, D, wherein A, B, C, D are bad keyword, its bad rating fraction summation does not reach threshold value, but when a certain web site contents comprised these four keywords with the order of presetting sequence definition, then adjudicating this content of pages was flame; (6) with this objectionable website content uploading to the high in the clouds server; Server operation mode storehouse, high in the clouds is upgraded, and allows client downloads arrive up-to-date pattern storehouse.
CN201110146136XA 2011-06-01 2011-06-01 Mode library-based smart mobile phone terminal adverse content website identifying method Pending CN102170640A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110146136XA CN102170640A (en) 2011-06-01 2011-06-01 Mode library-based smart mobile phone terminal adverse content website identifying method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110146136XA CN102170640A (en) 2011-06-01 2011-06-01 Mode library-based smart mobile phone terminal adverse content website identifying method

Publications (1)

Publication Number Publication Date
CN102170640A true CN102170640A (en) 2011-08-31

Family

ID=44491581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110146136XA Pending CN102170640A (en) 2011-06-01 2011-06-01 Mode library-based smart mobile phone terminal adverse content website identifying method

Country Status (1)

Country Link
CN (1) CN102170640A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663093A (en) * 2012-04-10 2012-09-12 中国科学院计算机网络信息中心 Method and device for detecting bad website
CN102902790A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Web page classification system and method
CN102902794A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Web page classification system and method
CN103116647A (en) * 2013-02-27 2013-05-22 武汉虹旭信息技术有限责任公司 Data mining system and method based on mobile internet harmful information
CN103167499A (en) * 2012-09-07 2013-06-19 深圳市金立通信设备有限公司 Entertainment safe limiting system and method of smartphone
CN103208014A (en) * 2012-01-13 2013-07-17 施亿民 Image recognition system and operation method thereof
CN103279476A (en) * 2013-04-11 2013-09-04 深圳市易聆科信息技术有限公司 Detection method and system for WEB application system sensitive words
CN103324615A (en) * 2012-03-19 2013-09-25 哈尔滨安天科技股份有限公司 Method and system for detecting phishing website based on SEO (search engine optimization)
CN103475642A (en) * 2013-08-22 2013-12-25 北京奇虎科技有限公司 Malicious forum identification method and malicious forum identification device
CN103473299A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Website bad likelihood obtaining method and device
CN103841076A (en) * 2012-11-20 2014-06-04 天讯天网(福建)网络科技有限公司 Pornographic-webpage monitoring method
TWI456511B (en) * 2012-01-06 2014-10-11
CN104217156A (en) * 2013-06-03 2014-12-17 腾讯科技(深圳)有限公司 Method and device for preventing plug-in of games
WO2015058631A1 (en) * 2013-10-23 2015-04-30 Tencent Technology (Shenzhen) Company Limited Method, server and system for malicious url identification
CN104933055A (en) * 2014-03-18 2015-09-23 腾讯科技(深圳)有限公司 Webpage identification method and webpage identification device
CN106815200A (en) * 2015-11-30 2017-06-09 任子行网络技术股份有限公司 Objectionable text detection method and device based on keyword
CN107547555A (en) * 2017-09-11 2018-01-05 北京匠数科技有限公司 A kind of web portal security monitoring method and device
CN109076167A (en) * 2016-06-17 2018-12-21 索尼公司 Image processor, photographic device and image processing system
US10176000B2 (en) 2016-02-29 2019-01-08 International Business Machines Corporation Dynamic assistant for applications based on pattern analysis
US10262041B2 (en) 2017-03-29 2019-04-16 Accenture Global Solutions Limited Scoring mechanism for discovery of extremist content
CN112507086A (en) * 2020-12-21 2021-03-16 中电福富信息科技有限公司 Bad information monitoring method combining deep learning and keyword factors

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761204A (en) * 2005-11-18 2006-04-19 郑州金惠计算机系统工程有限公司 System for blocking off erotic images and unhealthy information in internet
CN101035128A (en) * 2007-04-18 2007-09-12 大连理工大学 Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
CN101692639A (en) * 2009-09-15 2010-04-07 西安交通大学 Bad webpage recognition method based on URL
CN101996203A (en) * 2009-08-13 2011-03-30 阿里巴巴集团控股有限公司 Web information filtering method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761204A (en) * 2005-11-18 2006-04-19 郑州金惠计算机系统工程有限公司 System for blocking off erotic images and unhealthy information in internet
CN101035128A (en) * 2007-04-18 2007-09-12 大连理工大学 Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
CN101996203A (en) * 2009-08-13 2011-03-30 阿里巴巴集团控股有限公司 Web information filtering method and system
CN101692639A (en) * 2009-09-15 2010-04-07 西安交通大学 Bad webpage recognition method based on URL

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI456511B (en) * 2012-01-06 2014-10-11
CN103208014A (en) * 2012-01-13 2013-07-17 施亿民 Image recognition system and operation method thereof
CN103324615A (en) * 2012-03-19 2013-09-25 哈尔滨安天科技股份有限公司 Method and system for detecting phishing website based on SEO (search engine optimization)
CN102663093B (en) * 2012-04-10 2014-07-09 中国科学院计算机网络信息中心 Method and device for detecting bad website
CN102663093A (en) * 2012-04-10 2012-09-12 中国科学院计算机网络信息中心 Method and device for detecting bad website
CN103167499A (en) * 2012-09-07 2013-06-19 深圳市金立通信设备有限公司 Entertainment safe limiting system and method of smartphone
CN102902790A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Web page classification system and method
CN102902794A (en) * 2012-09-29 2013-01-30 北京奇虎科技有限公司 Web page classification system and method
CN102902794B (en) * 2012-09-29 2016-08-03 北京奇虎科技有限公司 Web page classification system and method
CN103841076A (en) * 2012-11-20 2014-06-04 天讯天网(福建)网络科技有限公司 Pornographic-webpage monitoring method
CN103116647A (en) * 2013-02-27 2013-05-22 武汉虹旭信息技术有限责任公司 Data mining system and method based on mobile internet harmful information
CN103279476A (en) * 2013-04-11 2013-09-04 深圳市易聆科信息技术有限公司 Detection method and system for WEB application system sensitive words
CN103279476B (en) * 2013-04-11 2016-12-28 深圳市易聆科信息技术股份有限公司 The detection method of a kind of WEB application system sensitive word and system
CN104217156A (en) * 2013-06-03 2014-12-17 腾讯科技(深圳)有限公司 Method and device for preventing plug-in of games
CN104217156B (en) * 2013-06-03 2018-04-20 腾讯科技(深圳)有限公司 Prevent plug-in method and device of playing
CN103475642A (en) * 2013-08-22 2013-12-25 北京奇虎科技有限公司 Malicious forum identification method and malicious forum identification device
CN103473299B (en) * 2013-09-06 2017-02-08 北京锐安科技有限公司 Website bad likelihood obtaining method and device
CN103473299A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Website bad likelihood obtaining method and device
WO2015058631A1 (en) * 2013-10-23 2015-04-30 Tencent Technology (Shenzhen) Company Limited Method, server and system for malicious url identification
CN104933055A (en) * 2014-03-18 2015-09-23 腾讯科技(深圳)有限公司 Webpage identification method and webpage identification device
CN104933055B (en) * 2014-03-18 2020-01-31 腾讯科技(深圳)有限公司 Webpage identification method and webpage identification device
CN106815200A (en) * 2015-11-30 2017-06-09 任子行网络技术股份有限公司 Objectionable text detection method and device based on keyword
US10176000B2 (en) 2016-02-29 2019-01-08 International Business Machines Corporation Dynamic assistant for applications based on pattern analysis
CN109076167A (en) * 2016-06-17 2018-12-21 索尼公司 Image processor, photographic device and image processing system
US10262041B2 (en) 2017-03-29 2019-04-16 Accenture Global Solutions Limited Scoring mechanism for discovery of extremist content
CN107547555A (en) * 2017-09-11 2018-01-05 北京匠数科技有限公司 A kind of web portal security monitoring method and device
CN112507086A (en) * 2020-12-21 2021-03-16 中电福富信息科技有限公司 Bad information monitoring method combining deep learning and keyword factors

Similar Documents

Publication Publication Date Title
CN102170640A (en) Mode library-based smart mobile phone terminal adverse content website identifying method
CN110119948B (en) Power consumer credit evaluation method and system based on time-varying weight dynamic combination
CN105550583A (en) Random forest classification method based detection method for malicious application in Android platform
CN104735074A (en) Malicious URL detection method and implement system thereof
CN105426762A (en) Static detection method for malice of android application programs
CN105787366A (en) Android software visualization safety analysis method based on module relations
CN107870945B (en) Content rating method and apparatus
CN103136372A (en) Method of quick location, classification and filtration of universal resource locator (URL) in network credibility behavior management
CN102073707A (en) Method and device for identifying short text category information in real time, and computer equipment
CN102867038A (en) Method and device for determining type of file
Cummins et al. Evolving local and global weighting schemes in information retrieval
CN107958154A (en) A kind of malware detection device and method
CN114338064B (en) Method, device, system, equipment and storage medium for identifying network traffic type
CN107341371A (en) A kind of script control method suitable for web configurations
CN111310021A (en) Network public opinion monitoring method
CN112765660A (en) Terminal security analysis method and system based on MapReduce parallel clustering technology
CN102999538A (en) Character searching method and equipment
CN103914534B (en) Content of text sorting technique based on specialist system URL classification knowledge base
CN104714947A (en) Preset type number recognition method and device
CN107766342A (en) A kind of recognition methods of application and device
CN105099996B (en) Website verification method and device
CN110225007A (en) The clustering method of webshell data on flows and controller and medium
CN105447616A (en) Knowledge management system based on multidimensional classification and full-text retrieval
CN110032596B (en) Method and system for identifying abnormal traffic user
CN115358214A (en) Keyword identification method and system based on user browsing and searching behaviors

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110831