CN1588879A - Internet content filtering system and method - Google Patents

Internet content filtering system and method Download PDF

Info

Publication number
CN1588879A
CN1588879A CN 200410053683 CN200410053683A CN1588879A CN 1588879 A CN1588879 A CN 1588879A CN 200410053683 CN200410053683 CN 200410053683 CN 200410053683 A CN200410053683 A CN 200410053683A CN 1588879 A CN1588879 A CN 1588879A
Authority
CN
China
Prior art keywords
url
cfa
classification
information
cams
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200410053683
Other languages
Chinese (zh)
Inventor
薛向阳
石静
郭小鹏
许源
赵泽宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN 200410053683 priority Critical patent/CN1588879A/en
Publication of CN1588879A publication Critical patent/CN1588879A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

This invention relates a filter system and a filter method for Internet content. The system includes: a content filter agent(CFA), a question server (QS) and a content analysis and management server (CAMS). The flow of the content filter system is that when a user sends a request of accessing a URL, CFA permits or for bits the said acc request according to the white and black name list set by the user. If the URL is not in the CFA name list, CFA delivers an inquiry the hierachical information of the URL in its own library and feeds back the result to CFA. CFA responds it and QS will download even new URL hierachical information. The invention can accurately identify poor information in network and prevent users from accessing them actively.

Description

A kind of internet content filtration system and filter method
Technical field
The invention belongs to Internet technical field, the method that is specifically related to a kind of internet content filtration system and filters can be used for stoping all kinds of media datas on the user capture the Internet, comprises text, image, video, audio frequency, figure and animation etc.
Background technology
The Internet has become an indispensable part in the daily life.People live on the net, are enjoying the panoramic service that network provides: shopping online, Web bank, receiving and dispatching mail, information inquiry etc.Yet, when people enjoy the Internet benefit, also exist the negative effect that increases progressively day by day on the Internet, teenager's disseminating of adult website, flame of wallowing in for example, or by net crime etc.
According to U.S. N2H2 corporate statistics, it is pornographic webpage that roughly there is 8% webpage in the whole world.Have 1st/4th every day in the request that search engine is submitted to, relevant Pornograph; Pornographic in addition spam has become people's one of pain in the neck feelings the most.The free email box of general main flow can be received the such mail of 3-10 envelope every day, and the addresser is no matter whether the owner of mailbox grows up.
Compare with network pornography, with anti-government, antisocial be that the website and webpage of content are countless too.The wording of " Falun Gong " is seen everywhere, and so-called " government's secret " spreads just everywhere.The public's audiovisual is confused, people's life multilated.The network negative effect is big, and the flame content is wide, is that people are unexpected.
The security of operation and the information security that how to ensure the Internet have caused the common concern of the whole society.In order to promote what is beneficial and abolish what is harmful, promote the sound development of China the Internet, the Standing Committee of the National People's Congress has passed through a Decision on the Protection of Internet Security in December, 2000.This decision has been expressly provided: " in order to maintain state security and social stability, one of be to following line is arranged, in the case which constitutes a crime, give criminal sanctions according to the criminal law pertinent regulations:
(1) utilizes the Internet to start a rumour, calumniate or deliver, propagate other harmful informations, the inciting subversion state power, overthrow socialist system, perhaps instigate divided country, destroy unification of the motherland;
(2) steal, reveal state secret, information or military secret by the Internet;
(3) utilize the Internet to instigate ethnic hatred, ethnic discrimination, undermine national unity;
(4) utilize the Internet to organize cult, the contact member of cult, destroy state's laws, administrative regulation enforcement.”
At present, the Central Committee of the Communist Party of China, State Council are just emphasizing further to strengthen and improve minor's moral cultivation.The Ministry of Education also requires the important content that civilization is surfed the Net, network security knowledge is listed school moral education in is improved the ability that the minor resists harmful information with this in May, 2004.
In order to prevent illegal and invasion harmful information, mainly take three kinds of means technically, the one, delete document from server, in case recognizing, host services person on server, has illegal information, this category information must be deleted from server.The 2nd, the block information transmission, if the owner or the country of the server at illegal information place do not approve that this is illegal information or takes disoperative attitude, other country can only take the means stopped up, forbids the retrieval to this category information.The 3rd, develop effective filter software, developed the three generations filter software at present.The first generation is called as " blacklist " software, and the second generation is " white list " software, and the third generation is the PICS system.
The operation principle of " blacklist " software is to block the network address that firmly should not retrieve, and " white list " software is to be used for retrieving the network address that only allows visit." blacklist " software is used widely in first generation filter software, and that the most famous is Cyber Patro, and the nineties comes into operation in early days, can cooperate cooperation with the retrieval software of retrieve merchant and on-line service provider.Software records about 7000 network address, the illegal and harmful information of 12 big classes (violence/profaneness, racism/to the inappropriate comment of ethnic group, diabolatry, drugs, bellicose statements/extremism, gambling etc.)." white list " is and the just in time opposite software of " blacklist " operation principle, and it is to block the residence earlier internet address is arranged, and selecting then can be for the network address of visit, because this software is logically opposite with the internet, so the scope of application is very limited.
Filtering illegal is to adopt " platform for Internet content selection " (PICS-Platform for Internet Content Selection), " neutral label " (neutral labeling) system with another kind of effective technology means harmful information.This system is by breadboard Jim professor Miller of Massachusetts Institute of Technology's computer science exploitation, and it is similar to the V chip TV programme selector that filters out pornographic and violence in the TV programme.Formally issue in May, 1996 by World Wide Web (WWW) forum of association (W3C-Wold Wide WebConsortium).Be widely used at present.PICS has obtained 39 International Computers Ltd. (ICL)s, computer software and hardware manufacturer, retrieval service merchant, on-line service provider, publisher, content provider's extensive support, and it is installed in the browser of internet, selects to use for the user.The groundwork of PICS is that the content of each webpage is classified, and adds label according to content character, by computer software the label of webpage is monitored simultaneously, with the retrieval of restriction to the certain content webpage.Label on the webpage promptly can be a numerical character, also can be password.Label is embedded into RFC-822 transformat and html text form, by http protocol, can transmit with file.
Today, many software companys recognize that Web content filters the business opportunity of bringing, and various filter softwares constantly come out." network father ", " the anti-yellow expert of U.S. duckweed ", " the anti-yellow software of e ", " just soldier " are a collection of anti-yellow software that China emerges in large numbers in early days. ", " screen pack " and " Escort " etc.Make a general survey of domestic filter software, adopt simple URL coupling and keyword judgment technology to come filtering web page mostly, the content-based analysis and processing method of appropriate adoption comes the product of screen media file not have basically.
By contrast, the exploitation of external like product is faster than domestic, and filtering technique is also ripe relatively.ZyXEL, WebSense, FilterLogix, SurfControl are to use Web content filter software comparatively widely, and they all have a url database huge, that divided class.Generally the technology of Cai Yonging also is black and white lists and keyword matching inquiry.
The Proventia Web Filter of ISS company has maximum in the world up-to-date information filtering database, and it not only relies on keyword query and manual websites collection, and has used a text image analytical system to handle media content together.
The url database of FortiGuard comprises and surpasses 500 ten thousand URL and contain classified information.When request was arranged, system can go to inquire the classification situation of this webpage of FortiGuard database earlier, and allowed or refuse the request of webpage according to the policy that the client pre-establishes.
The WebWacher of Korea S is a pretty good network image filter software.This software provides control surf time and screen harmful content two big functions at the domestic consumer, protects the reasonable internet usage of children with this.
In addition, the company of most of U.S. exploitation filter software all is engaged in the exploitation of anti-rubbish mail and anti-virus software.Therefore, huge url database can be set up very soon for inquiry in the basis of each fatware company support itself on the one hand; And the mode of operation of filter software is basically to anti-rubbish mail or kill the virus similarly on the other hand, and simultaneously the subject of a sale is also often only at enterprise-class tools.
Summary of the invention
The objective of the invention is to propose a kind of new internet content filtration system and the method for filtering, the system that makes has self-learning capability, and can improve the genealogical classification precision, reduces cost of labor; When customer access network, filter all kinds of media datas that exist in the Internet with active mode, comprising: text, image, video, audio frequency, figure and animation etc.
Introduce the notion of URL below earlier.
URL is the abbreviation of Uniform Resource Locator (uniform resource locator), and its data structure is: agreement: // host name: port numbers/directory path/filename.
URL is corresponding with concrete data object on website or the server, for example corresponding portal of URL or BBS server, also can a corresponding website in a width of cloth particular picture under catalogue.Therefore, if stop certain website of user capture, server or certain data objects, then send this URL request as long as stop to the network user.
The resource type of agreement section explanation Internet, as: http represents HTML (Hypertext Markup Language) or WWW.Other agreements have: ftp (expression file transfer protocol (FTP)), telnet (expression Telnet), news (expression newsgroup), mailto (expression Email), mms (expression Streaming Media) etc.
The server name of host name section explanation Internet, for example: Www.fudan.edu.cnThe directory path section is pointed out file or partial document position on the internet server.Each grade catalogue separates with a forward slash (/) symbol.
The filename section is the actual name of document, image or the script that will visit, for example: index.html, logo.gif, script.cgi.These all belong to the optional part of URL port numbers, directory path, filename.
Provide the example of some URL below:
Http:// www.w3.org/index.html: the corresponding website of this URL
Http: // 10.64.130.4/images/advice.gif: the corresponding width of cloth picture of this URL
Ftp: // 10.11.3.8: the corresponding ftp server of this URL
Mms: // 10.11.4.6/abc.avi: this URL is used for audio and video program of program request
Telnet: //bbs.fudan.edu.cn: the corresponding BBS server of this URL
The Web content filtration system that the present invention proposes comprises following several sections (referring to shown in Figure 1): the information filtering agency under the Internet support, querying server, content analysis and management server, they are between ustomer premises access equipment and targeted sites.Wherein
1, ustomer premises access equipment (UT:User Terminal) can be the equipment of computer or other energy access internet, and the user is by UT accesses network resource, for example browsing page, searching document, file in download etc.
2, information filtering agency (CFA:Content Filtering Agent), storage blacklist (being the website or the file of disable access) and white list (allowing the website or the file of visit), they are actually one group of url list.This module will operate on the dissimilar platforms in a variety of forms.
3, querying server (QS:Query Server) has one to have classification and URL rating information, magnanimity storehouse.When QS receives the URL of UT submission, in classification and staging libraries, inquire about, and tell UT the result.The fundamental cause that adopts QS is because CFA resource-constrained can not be stored too many classify and grading information, can only store a spot of black/white list, and can store classify and grading information in a large number on QS, and a QS can support the concurrent visit of a large amount of CFA.Simultaneously can also dispose a plurality of QS on the net, also can dispose QS on the Intranet of a unit, to tackle a large amount of concurrent query requests at Internet.
4, content analysis and management server (CAMS:Content Analysis and Management Server), its main task is that the resource among the Internet is classified and classified estimation.For example write down the tabulation of " website or the bad URL that deposit yellow picture or phonotape and videotape ".The QS that obtains the authorization can be from downloading the URL storehouse with classification and rating information here.Generally, different enterprises or department pay close attention to dissimilar CAMS, and a plurality of different classes of CAMS can be arranged.CAMS also must have management and issuing function, also can be used as a network gateway website and exists.
5, targeted sites (TWS:Target Website or Server) can be the website or the server of any one storage resources, and UT can visit its open resource by Internet.
The concrete steps of this Web content filtration system work are summarized as follows:
1, when the user sends the request that certain URL is conducted interviews, CFA is according to blacklist or white list, allows or forbids this access request;
If in the blacklist and white list of CFA, CFA does not then send query requests to QS to 2 these URL;
3, QS will inquire about the rating information of this URL and the result is returned to CFA in local URL storehouse, and CFA then makes a response in view of the above;
4, QS understands the URL rating information of down loading updating from CAMS regularly;
5, CAMS search for automatically, multi-medium data on download and the analyzing and processing the Internet, adopt man-machine interactively mask method and machine automatic classification method, Web content is classified and classified estimation, form the URL information bank of classification and classification.
The internet content filtration system that the present invention proposes can be applied to various application scenarios, for example:
1. be used to stop the website political reaction of visit or the harm national security.
2. be used to stop visit yellow, influence the able-bodied website of teenager.
3. be used to stop the website of visit e-sports recreation.
4. be used to stop the website or the resource of visiting particular type, have concrete application demand to determine.
Filtering proxy CFA can operate on polytype hardware and software platform, for example in many ways:
1.CFA may operate on the acting server.
2.CFA may operate on the fire compartment wall.
3.CFA can be used as browser plug-in operates on the browser.
4.CFA can run in the network access devices such as ADSL Modem, Cable Modem, telephone line modem, ISDN PC adapter.
Description of drawings
Fig. 1 is a content filtering system overall frame structure diagram on the Internet.
Fig. 2 is the basic composition and the workflow diagram of content analysis and management server (CAMS).
Number in the figure: 1 is user side UT, and 2 act on behalf of CFA for information filtering, and 3 is querying server QS, and 4 are content analysis and management server CAMS, 5 targeted sites TWS.
Embodiment
Below by further introducing content of the present invention for example.
About content analysis and management server (CAMS)
As everyone knows, exist contents various, that change constantly on the Internet, for example text, image, video, audio frequency, figure, animation, dynamic web page, Flash etc.; From whole world angle, data are real magnanimity in the Internet net.
CAMS should pay close attention to multimedia data contents that changing, magnanimity of the various moment in the internet constantly, and can in time make objective classification and classification to Web content.This is bigger, the challenging work of difficulty, needs large-scale calculations and memory device, also needs a large amount of artificial assisting.
Following table has provided the classification example about classifications such as " violence ", " nudes ".
Classification: violence
Rank
0 No violence Do not have offensive act of violence, do not have nature or unexpected incident of violence
Rank
1 Fight To the injury of biology or massacre, to the injury of life object is arranged
Rank 2 Massacre People or the biological retaliatory injury that is injured or be killed and do not jeopardize biology
Rank
3 With the bloody scene of massacring The people is killed or is come to harm
Rank 4 Unscrupulous, very unreasonable act of violence Malice and act of violence for no reason
Classification: nude
Rank
0 Do not have There is not bare scene
Rank
1 The clothes that expose The clothes that expose
Rank 2 Half-naked Half-naked
Rank
3 Positive nude Positive nude
Rank 4 With provocative positive nude Has provocative front nudie
Classification: property
Rank
0 Do not have Description/the romance n. that does not have sexual behaviour
Rank
1 Passionate kisses Ardent kisses
Rank 2 The property that wears clothes is stroked The property that wears clothes is stroked
Rank 3 The property of non-exposure is stroked The property of non-exposure is stroked
Rank 4 The sexual behaviour that exposes The sexual behaviour that exposes
Classification: politics
Rank
0 Do not have Without any the reaction content
Rank
1 The content of avoiding mentioning The obscure related content of mentioning
Rank 2 Generally speaking state political sensitive content Generally speaking state the responsive reaction content of politics
Rank
3 The speech of reaction The blunt speech of telling about reaction
Rank 4 The extremely speech of reaction The extremely speech of reaction
Various data on website or the server are carried out automatic or automanual classification and classification is the very important task of CAMS.Here it must be noted that classification should be formulate and announced and be carried out with the standard of classification by national departments concerned.
Had after the classification and grade scale of Web content, different companies, unit or portal website just might carry out grading evaluation at a certain class data.For example, certain CAMS only pays close attention to political content, and another CAMS may only pay close attention to the content of pornographic aspect, can produce a lot of commercial opportunities thus.
Obviously, can the CAMS of certain particular category comprehensively and realized exactly classification to the network data content will being directly connected to the accuracy that Web content filters.It is very difficult relying on Computer Processing and analysis to come full-automatic evaluating network content fully, and the method that adopts artificial guidance and machine learning to combine in the present invention instructs computer to finish the evaluation task of magnanimity time-varying network data.
Fig. 2 has provided content-based multi-medium data analyzing and processing and evaluation method (at certain kinds, classification can manually be determined in advance), and it can carry out classification to various media contents such as image, video, audio frequency, texts, and its job step is:
1, various media object is carried out feature extraction.For example, from picture, extract color and color histogram, analysis image field color and texture structure etc.; The movable information of extraction camera or object, colouring information, texture information etc. from video data; From text, extract keyword etc.
2, with manual method a spot of object of part is marked.The object of these artificial marks will be as the sample of machine learning.
3, system is learnt according to artificial annotation results, obtains the semantic information of higher level, and is formed for the knowledge base of classification.
4, last, system carries out automatic classification to the inartificial most data objects that mark, thereby alleviates cost of labor greatly.
For guaranteeing that machine has enough niceties of grading, also need the result of machine sort is spot-check and artificial the evaluation, promptly further improve the classification performance of machine, i.e. relevant feedback by the artificial mode of estimating once more.
The main feature of said method has: adopt content-based analysis and processing method, the understanding of various media object is entered semantic level; Introduce man-machine interactively and mark, allow machine learning, with the classification accuracy of enhanced system; Adopt feedback mechanism, there is self-learning capability in system.By suitably manually instructing and machine learning method, can improve the machine sort precision preferably, greatly reduce cost of labor.
In addition, other functional module of CAMS has: classification and the rating information storehouse of management URL; Issue URL classification and rating information storehouse; It is exactly " web crawlers " that an important module also must be arranged, and is used for exploring automatically the Internet, and access websites or server grasp various media files.The reptile software of now existing a lot of similar functions, this is not an emphasis of the present invention.
Provide the detailed operation step (see figure 2) of CAMS below:
(1) web crawlers group: download various types of data from main search on the net from Internet, for example [corresponding flow process 1.] such as webpage, picture, video, music; According to suspicious URL information bank requirement, data download object [corresponding flow process is 7.].Notice that " suspicious URL information bank " is mainly by querying server (QS) url list that send, that QS still can not handle here.
(2) feature extraction: all kinds of multimedia data objects of downloading are carried out analyzing and processing, extract feature.For example, extract the features such as color, texture and shape of image; Extract the feature of video, for example object of which movement, camera motion etc.; The URL and the feature [corresponding flow process 2.] thereof of each data downloaded object of tissue storage.
(3) artificial mark: from artificial multimedia data downloaded object, select segment data object to classify and the classification mark; Manually the result to automatic classification and classification checks, both can reduce mistake, also can improve classification performance [corresponding flow process 3.] by the method for this relevant feedback.
(4) training classifier: the URL corresponding data objects is classified and classification automatically, can adopt machine learning method, with artificial guidance mark and related feedback information, grader is trained, obtain high-precision classification and classification results [corresponding flow process is 4.].
(5) classification and classification automatically: the grader that trains can automatically be classified to each data downloaded object and classification is handled, and obtains classifying and classification URL information bank [corresponding flow process 5.] afterwards; Can because Internet net content at every moment is on the turn, therefore require the cycle as far as possible short [corresponding flow process 6.] of upgrading and issuing to this URL information bank regular update and issue.
About querying server (QS)
Stored the URL classification and the rating information storehouse of magnanimity on QS, these information may come from one or more CAMS.The general data structure following (example) in URL classification and rating information storehouse in QS:
Sequence number URL (character string type) " violence " rank (integer type) " nude " rank (integer type) " politics " rank (integer type) ?????……
????1 ????URL-1 ????1 ????0 ????4 ?????…
????2 ????URL-2 ????2 ????2 ????0 ?????…
????… ????… ????… ????… ????… ?????…
????L ????URL-L ????3 ????2 ????1 ?????…
The groundwork of QS is that the URL that content filtering proxy (CFA) is submitted to is made judgement, and this is the process of simply tabling look-up.If this URL is present in classification and the hierarchical table, then QS feeds back to CFA with checking result (being rank); Otherwise QS will do two things: (1) feeds back " not decidable " (NAN) information to CFA; (2) this URL is submitted to CAMS, carry out analyzing and processing by CAMS.Because the Internet online content changes constantly, it is unavoidable undecidable situation occurring.If CAMS can in time analyze, processing, tracking network content change situation, the probability that " not decidable " then occur can be very little.
When realizing QS, must consider to support concurrent visit.The present invention adopts the URL index structure based on the Trie tree, utilizes the main memory cache policy simultaneously, and the URL item of often visiting is left in the main memory of server, and that does not often use leaves on the disk.This strategy that utilizes index structure and buffer memory has greatly improved the verifying speed of QS, supports the visit of large concurrent.
QS can dispose on Internet or Intranet in a large number, to serve all types of user, comprises domestic consumer or enterprise customer.QS will download classification and rating information storehouse from all kinds of CAMS that obtain the authorization.CAMS should in time handle the pairing data of URL that QS can not judged result, and periodically issue classification and rating information, downloads for QS.
About information filtering agency (CFA)
CFA is a very simple software module, and it operates on all kinds of software and hardware system platforms in a variety of forms.In CFA, store white list (WNList) and blacklist (BNList).In essence, the black/white list is a url list.
The data structure of the black/white list of CFA is as follows:
Sequence number URL (character string type) Attribute (Boolean)
1 ?http://www.fudan.edu.cn/news/ 0 (representative belongs to white list)
2 ?http://www.private.com 1 (representative belongs to blacklist)
3 ?…… ?……
The groundwork process of CFA:
1, when URL belongs to WNList, CFA allows URL to pass through, and URL is transmitted to TWS, and TWS will ask return results to give UT according to URL.
2, when URL belongs to BNList, CFA forbids that URL passes through, and CFA directly sends to UT with " disable access or warning " information, and this is actually the solicited message of having cut off UT.
3, neither belong to WNList as URL, when also not belonging to BNList, CFA sends to QS with this URL, and request QS verifies URL, and carries out respective handling according to the checking result.
Above details of operation also has more detailed narration in the follow-up work flow process.
Each CFA will have a mandate number of the account, and authorized user can be provided with all kinds of CFA options by the graphic interface of user side, forms filtering policy separately, specifically comprises:
1, judges that URL belongs to the setting of the URL category level of black/white list
For example, suppose that the user sets " violence " and (comprises 1 grade) more than 1 grade, the URL that " nude " (comprises 2 grades) more than 2 grades is a blacklist.When one of UT request visit not during the URL in the black/white list at CFA, CFA sends to QS with this URL.Suppose that in the classification and staging libraries of QS the rating information of this URL is " violence " 0 grade, " nude " 3 grades, when QS returned to CFA with this rating information, CFA can be provided with according to the user, judged that this URL belongs to blacklist, thereby tackled this URL.
2, when the QS return information is " NAN ", judge the attribute setting of this URL
Suppose that the user is made as " white list " with this option, then when QS returned " NAN " to CFA, CFA judged that automatically this URL belongs to white list; Otherwise, think blacklist.
3, the user can manage the black/white list among the CFA by hand, comprises and browses, increases and delete.
4, the user can revise the password of authorizing number of the account among the CFA.
When the storage resources of CFA is limited, need take certain cache policy, for example keep nearest and the black and white lists of frequent use.
The computing capability of CFA and storage resources are normally limited.For example, CFA operates in the modulator-demodulator (MODEM) of ADSL, and this moment, computing capability was obviously not enough, and the black/white list that can store is also quite limited.At this application, CFA must design simply small and exquisite fast.Obviously, the CFA that the present invention proposes does not need complicated program, be the process of tabling look-up and safeguarding buffer memory, and caching mechanism has significantly reduced the demand to memory space.
It is pointed out that at last communicates by letter between CFA, QS and the CAMS three can realize by the Socket programming, also can realize by other method.Between CFA and the QS, communicating by letter between QS and the CAMS all will be by authentication.The concrete steps following (seeing shown in Figure 1) that internet content filters among the present invention
1, wishes to visit certain targeted sites or server (TWS) as the user, when carrying out web page browsing, video request program or file download, to send http (or ftp, mms, telnet etc.) request, information filtering is acted on behalf of CFA can intercept and capture the URL of this request at once, and compares [corresponding flow process 1.] with URL in the black and white lists of CFA.
If the URL of UT request in the CFA blacklist, then tackles this URL request, return mistake or warning message and give the corresponding flow process of UT[2.].
If 3. the URL of 2 UT request in the CFA white list, then directly is transmitted to the corresponding flow process of targeted sites TWS[with this URL request]; TWS will reply UT and respond [corresponding flow process 6.] accordingly.
If promptly not in the CFA blacklist, also not in the CFA white list, 4. CFA sends to the corresponding flow process of querying server QS[with this URL to the URL of 3 requests], QS inquires about this URL, obtains rating information or NAN, and sends to the corresponding flow process of CFA[5.].
(1) if this URL in the URL storehouse of QS, and is provided with according to the user, when its category level belongs to blacklist, CFA assert that this URL belongs to blacklist, automatically upgrade its blacklist immediately, forbid that UT visits this URL, and return mistake or warning message and give the corresponding flow process of UT[2.].
(2) if this URL in the URL storehouse of QS, and is provided with according to the user, when its category level belonged to white list, CFA assert that this URL belongs to white list, upgrades its white list immediately automatically, and 3. request is transmitted to the corresponding flow process of TWS[]; TWS will reply UT and respond [corresponding flow process 6.] accordingly.
(3) if this URL not in the URL storehouse of QS, QS will notify this URL of CFA to judge, CFA will react according to the strategy that prior user is provided with: a kind of is to handle as white list automatically, and another kind is handled as blacklist automatically.But, in this case, CFA no longer upgrades its blacklist or white list.On the other hand, QS can transfer to this URL CAMS and handle [corresponding flow process 7.].
The URL staging libraries [corresponding flow process 10.] that content analysis management server CAMS can regularly upgrade to the QS issue enables in time to reflect content change in the Internet.Therefore the performance of CAMS directly influences filtering accuracy, need pay big cost and safeguard and upgrade CAMS.
In order to improve the distinguishing speed of CFA, reduce the requirement of CFA, need in CFA, introduce caching mechanism, promptly store the often black and white lists of visit of user UT storage resources, reduce UT and send the chance that checking is asked, because the one-time authentication request needs certain stand-by period to QS.
Authorized user can be according to the needs of oneself, and [corresponding flow process 8. and 9.] browsed, adds or deleted to black and white lists tabulation among the management CFA to it.

Claims (10)

1, a kind of Web content filtration system is characterized in that being made up of information filtering agency (being designated as CFA), querying server (being designated as QS) and content analysis and management server (being designated as CAMS), and wherein, information filtering agency storage has blacklist and white list; Querying server has a URL storehouse with classification and rating information; Content analysis and management server are that the resource among the Internet is classified and classified estimation.
2, Web content filtration system according to claim 1 is characterized in that being provided with among the CFA user individual configuration, comprising: (1) judges that URL belongs to the setting of the URL category level of blacklist or white list; (2) when QS return information during, judge the setting of this URL attribute for these URL clauses and subclauses not; (3) blacklist or the white list among the manual management CFA, comprise browse, increase and delete function.
3, Web content filtration system according to claim 1 is characterized in that CFA operates on the following all kinds of software and hardware system platform in a variety of forms: (1) acting server; (2) fire compartment wall; (3) browser; (4) network access devices such as ADSL Modem, Cable Modem, telephone line modem, ISDN PC adapter.
4, Web content filtration system according to claim 1 is characterized in that QS has the URL classification and the rating information of magnanimity, and the URL that CFA is submitted to carries out quick search and returns corresponding rating information.
5, Web content filtration system according to claim 1 is characterized in that QS can dispose in a large number on Internet or Intranet, support concurrent inquiry, is used to serve all types of user; QS will download classification and rating information storehouse from all kinds of CAMS that obtain the authorization.
6, Web content filtration system according to claim 1 is characterized in that CAMS adopts content-based multimedia analysis and processing method, all kinds of media contents in the Internet analyzed and assessed, and according to different their mark classifications that is categorized as.
7, Web content filtration system according to claim 1 is characterized in that CAMS introduces man-machine interactively and mark, utilizes the classification accuracy of machine learning enhanced system.
8, Web content filtration system according to claim 1 is characterized in that between CFA and the QS, and communicating by letter between QS and the CAMS all needs by authentication.
9, a kind of method of Web content filtration is characterized in that utilizing the described Web content filtration system of claim 1, and concrete steps are as follows:
(1) when the user sends the request that certain URL is conducted interviews, CFA is according to blacklist or white list, forbids or allows this access request;
(2) if this URL not in the blacklist and white list of CFA, CFA then sends query requests to QS;
(3) QS will inquire about the rating information of this URL and the result is returned to CFA in local URL storehouse, and CFA then makes a response in view of the above;
(4) QS understands the URL rating information of down loading updating from CAMS regularly;
(5) CAMS search for automatically, multi-medium data on download and the analyzing and processing the Internet, adopt man-machine interactively mask method and machine automatic classification method, Web content is classified and classified estimation, form the URL information bank of classification and classification.
10, Web content filter method according to claim 9 is characterized in that the job step of CAMS is as follows:
(1) web crawlers group: go up from main search download various types of data, according to suspicious URL information bank requirement, data download object from Internet;
(2) feature extraction: all kinds of multimedia data objects of downloading are carried out analyzing and processing, extract feature;
(3) artificial mark: from the multimedia data downloaded object, select segment data object to classify and the classification mark; Manually the result to automatic classification and classification checks;
(4) training classifier: to classifying automatically and classification, adopt machine learning method,, grader is trained with artificial mark and the related feedback information of instructing with the corresponding data object of URL;
(5) classification and classification automatically: the grader that trains is handled each data downloaded object being carried out classify and grading automatically, obtains classifying and classification URL information bank afterwards; To this URL information bank regular update and issue.
CN 200410053683 2004-08-12 2004-08-12 Internet content filtering system and method Pending CN1588879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410053683 CN1588879A (en) 2004-08-12 2004-08-12 Internet content filtering system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410053683 CN1588879A (en) 2004-08-12 2004-08-12 Internet content filtering system and method

Publications (1)

Publication Number Publication Date
CN1588879A true CN1588879A (en) 2005-03-02

Family

ID=34602956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410053683 Pending CN1588879A (en) 2004-08-12 2004-08-12 Internet content filtering system and method

Country Status (1)

Country Link
CN (1) CN1588879A (en)

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007076714A1 (en) * 2005-12-31 2007-07-12 Metaswarm (Hongkong) Ltd. System and method for generalizing an antispam blacklist
CN100362805C (en) * 2005-11-18 2008-01-16 郑州金惠计算机系统工程有限公司 Multifunctional management system for detecting erotic images and unhealthy information in network
CN100367294C (en) * 2005-06-23 2008-02-06 复旦大学 Method for dividing human body skin area from color digital images and video graphs
WO2010121542A1 (en) * 2009-04-22 2010-10-28 中兴通讯股份有限公司 Home gateway-based anti-virus method and device thereof
CN101877704A (en) * 2010-06-02 2010-11-03 中兴通讯股份有限公司 Network access control method and service gateway
CN101883180A (en) * 2010-05-11 2010-11-10 中兴通讯股份有限公司 Method and system for shielding information in wireless network accessed by mobile terminal and mobile terminal
CN101923561A (en) * 2010-05-24 2010-12-22 中国科学技术信息研究所 Automatic document classifying method
CN101937445A (en) * 2010-05-24 2011-01-05 中国科学技术信息研究所 Automatic file classification system
CN101951379A (en) * 2010-09-27 2011-01-19 苏州昂信科技有限公司 Green browser and URL long-distance filtration mechanism used thereby
CN101317376B (en) * 2006-07-11 2011-04-20 华为技术有限公司 Method, device and system for contents filtering
CN102027777A (en) * 2008-05-16 2011-04-20 日本电气株式会社 Base station device, information processing device, filtering system, filtering method, and program
CN102075502A (en) * 2009-11-24 2011-05-25 北京网御星云信息技术有限公司 Virus protection system based on cloud computing
CN102075617A (en) * 2010-12-02 2011-05-25 惠州Tcl移动通信有限公司 Method and device thereof for preventing short messages from being automatically sent through mobile phone virus
CN102110132A (en) * 2010-12-08 2011-06-29 北京星网锐捷网络技术有限公司 Uniform resource locator matching and searching method, device and network equipment
CN102137111A (en) * 2011-04-20 2011-07-27 北京蓝汛通信技术有限责任公司 Method and device for preventing CC (Challenge Collapsar) attack and content delivery network server
CN101605129B (en) * 2009-06-23 2012-02-01 北京理工大学 URL lookup method for URL filtering system
CN101163161B (en) * 2007-11-07 2012-02-29 福建星网锐捷网络有限公司 United resource localizer address filtering method and intermediate transmission equipment
CN102415119A (en) * 2009-04-27 2012-04-11 皇家Kpn公司 Managing undesired service requests in a network
CN102469146A (en) * 2010-11-19 2012-05-23 北京奇虎科技有限公司 Cloud security downloading method
CN101547197B (en) * 2009-04-30 2012-05-30 珠海金山软件有限公司 A URL washing device and a washing method
CN102663291A (en) * 2012-03-23 2012-09-12 奇智软件(北京)有限公司 Information prompting method and information prompting device for e-mails
CN102682037A (en) * 2011-03-18 2012-09-19 阿里巴巴集团控股有限公司 Data acquisition method, system and device
CN102724187A (en) * 2012-06-06 2012-10-10 奇智软件(北京)有限公司 Method and device for safety detection of universal resource locators
CN101283356B (en) * 2005-10-14 2012-10-10 微软公司 Search results injected into client applications
CN102754488A (en) * 2011-04-18 2012-10-24 华为技术有限公司 User access control method, apparatus and system
CN102833258A (en) * 2012-08-31 2012-12-19 北京奇虎科技有限公司 Website access method and system
CN102831149A (en) * 2012-06-25 2012-12-19 腾讯科技(深圳)有限公司 Sample analyzing method, device and storage medium
CN102946377A (en) * 2012-07-16 2013-02-27 珠海市君天电子科技有限公司 Antivirus system and method for preventing users from downloading virus documents from internet
CN103024092A (en) * 2011-09-28 2013-04-03 中国移动通信集团公司 Method, system and device for blocking domain
CN101208942B (en) * 2005-03-30 2013-04-10 西门子企业通讯有限责任两合公司 Method for protection against undesirable telemarketing advertisements for communication networks
WO2013067724A1 (en) * 2011-11-08 2013-05-16 北京捷通华声语音技术有限公司 Cloud end user mapping system and method
WO2013078825A1 (en) * 2011-11-30 2013-06-06 华为技术有限公司 Method, device and system for recommending accessible website to user
CN103338211A (en) * 2013-07-19 2013-10-02 腾讯科技(深圳)有限公司 Malicious URL (unified resource locator) authenticating method and device
CN103390129A (en) * 2012-05-08 2013-11-13 腾讯科技(深圳)有限公司 Method and device for detecting security of uniform resource locator
CN103428187A (en) * 2012-05-25 2013-12-04 腾讯科技(深圳)有限公司 Method and system for access controlling, and equipment
CN103679014A (en) * 2012-09-04 2014-03-26 腾讯科技(深圳)有限公司 Method and device for intercepting processing of webpage malicious Flash
CN103973749A (en) * 2013-02-05 2014-08-06 腾讯科技(深圳)有限公司 Cloud server and website processing method based on same
CN103984708A (en) * 2014-04-29 2014-08-13 暨南大学 Method and system of emergency decomposing and sorting for processing of big data of catastrophe risks
CN104079528A (en) * 2013-03-26 2014-10-01 北大方正集团有限公司 Method and system of safety protection of Web application
CN104239369A (en) * 2013-06-24 2014-12-24 腾讯科技(深圳)有限公司 Method, device and system for filtering out webpage advertisements
CN104506426A (en) * 2012-03-23 2015-04-08 北京奇虎科技有限公司 Information prompting method and device for E-mails
CN104598508A (en) * 2013-09-18 2015-05-06 Ims保健公司 System and method for fast query response
TWI490726B (en) * 2012-09-03 2015-07-01 Tencent Tech Shenzhen Co Ltd Method and device for protecting access to multiple applications by using single sign-on
CN105187290A (en) * 2005-03-25 2015-12-23 高通股份有限公司 Apparatus And Methods For Managing Content Exchange On A Wireless Device
CN106055557A (en) * 2015-12-25 2016-10-26 中国科学技术信息研究所 Method and system for classification and pre-processing of big data under Internet environment
CN103428187B (en) * 2012-05-25 2016-11-30 腾讯科技(深圳)有限公司 Access method, equipment and the system controlled
CN106408334A (en) * 2016-08-31 2017-02-15 微梦创科网络科技(中国)有限公司 Verification method and system of network advertisements
CN107528845A (en) * 2017-09-01 2017-12-29 华中科技大学 A kind of intelligent url filtering system and method based on crawler technology
CN107580004A (en) * 2017-10-31 2018-01-12 深圳竹云科技有限公司 A kind of new authentication method and authentication center's framework
CN109063641A (en) * 2018-08-01 2018-12-21 浠诲嘲 Computer checking method
CN110472133A (en) * 2018-05-08 2019-11-19 上海利业律兴企业管理有限公司 A kind of internet information exchange method and device
CN110516066A (en) * 2019-07-23 2019-11-29 同盾控股有限公司 A kind of content of text safety protecting method and device
CN110709833A (en) * 2017-12-05 2020-01-17 谷歌有限责任公司 Identifying videos with inappropriate content by processing search logs
CN113099441A (en) * 2021-03-29 2021-07-09 Oppo广东移动通信有限公司 Website management method, website management platform, electronic device and medium
CN114238962A (en) * 2021-09-29 2022-03-25 睿贸恒诚(山东)科技发展有限责任公司 Harmful information filtering system and method based on mobile internet

Cited By (82)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105187290A (en) * 2005-03-25 2015-12-23 高通股份有限公司 Apparatus And Methods For Managing Content Exchange On A Wireless Device
CN105187290B (en) * 2005-03-25 2019-07-30 高通股份有限公司 For managing the device and method of content exchange on the wireless device
CN101208942B (en) * 2005-03-30 2013-04-10 西门子企业通讯有限责任两合公司 Method for protection against undesirable telemarketing advertisements for communication networks
CN100367294C (en) * 2005-06-23 2008-02-06 复旦大学 Method for dividing human body skin area from color digital images and video graphs
CN101283356B (en) * 2005-10-14 2012-10-10 微软公司 Search results injected into client applications
CN100362805C (en) * 2005-11-18 2008-01-16 郑州金惠计算机系统工程有限公司 Multifunctional management system for detecting erotic images and unhealthy information in network
WO2007076714A1 (en) * 2005-12-31 2007-07-12 Metaswarm (Hongkong) Ltd. System and method for generalizing an antispam blacklist
CN101317376B (en) * 2006-07-11 2011-04-20 华为技术有限公司 Method, device and system for contents filtering
US8055241B2 (en) 2006-07-11 2011-11-08 Huawei Technologies Co., Ltd. System, apparatus and method for content screening
CN101163161B (en) * 2007-11-07 2012-02-29 福建星网锐捷网络有限公司 United resource localizer address filtering method and intermediate transmission equipment
CN102027777A (en) * 2008-05-16 2011-04-20 日本电气株式会社 Base station device, information processing device, filtering system, filtering method, and program
WO2010121542A1 (en) * 2009-04-22 2010-10-28 中兴通讯股份有限公司 Home gateway-based anti-virus method and device thereof
CN101527721B (en) * 2009-04-22 2012-09-05 中兴通讯股份有限公司 Anti-virus method on the basis of household gateway and device thereof
CN104822146A (en) * 2009-04-27 2015-08-05 皇家Kpn公司 Managing undesired service requests in a network
CN102415119B (en) * 2009-04-27 2015-05-27 皇家Kpn公司 Managing undesired service requests in a network
CN104822146B (en) * 2009-04-27 2020-03-24 皇家Kpn公司 Managing undesired service requests in a network
CN102415119A (en) * 2009-04-27 2012-04-11 皇家Kpn公司 Managing undesired service requests in a network
CN101547197B (en) * 2009-04-30 2012-05-30 珠海金山软件有限公司 A URL washing device and a washing method
CN101605129B (en) * 2009-06-23 2012-02-01 北京理工大学 URL lookup method for URL filtering system
CN102075502A (en) * 2009-11-24 2011-05-25 北京网御星云信息技术有限公司 Virus protection system based on cloud computing
CN102075502B (en) * 2009-11-24 2013-12-11 北京网御星云信息技术有限公司 Virus protection system based on cloud computing
WO2011140784A1 (en) * 2010-05-11 2011-11-17 中兴通讯股份有限公司 Method for screening mobile terminal from accessing wireless network information, mobile terminal and system thereof
CN101883180A (en) * 2010-05-11 2010-11-10 中兴通讯股份有限公司 Method and system for shielding information in wireless network accessed by mobile terminal and mobile terminal
CN101923561A (en) * 2010-05-24 2010-12-22 中国科学技术信息研究所 Automatic document classifying method
CN101937445A (en) * 2010-05-24 2011-01-05 中国科学技术信息研究所 Automatic file classification system
WO2011150692A1 (en) * 2010-06-02 2011-12-08 中兴通讯股份有限公司 Method for controlling network access and service gateway thereof
CN101877704A (en) * 2010-06-02 2010-11-03 中兴通讯股份有限公司 Network access control method and service gateway
CN101951379A (en) * 2010-09-27 2011-01-19 苏州昂信科技有限公司 Green browser and URL long-distance filtration mechanism used thereby
CN102469146B (en) * 2010-11-19 2015-11-25 北京奇虎科技有限公司 A kind of cloud security downloading method
WO2012065551A1 (en) * 2010-11-19 2012-05-24 北京奇虎科技有限公司 Method for cloud security download
CN102469146A (en) * 2010-11-19 2012-05-23 北京奇虎科技有限公司 Cloud security downloading method
CN102075617A (en) * 2010-12-02 2011-05-25 惠州Tcl移动通信有限公司 Method and device thereof for preventing short messages from being automatically sent through mobile phone virus
CN102110132B (en) * 2010-12-08 2013-06-19 北京星网锐捷网络技术有限公司 Uniform resource locator matching and searching method, device and network equipment
CN102110132A (en) * 2010-12-08 2011-06-29 北京星网锐捷网络技术有限公司 Uniform resource locator matching and searching method, device and network equipment
CN102682037B (en) * 2011-03-18 2016-09-28 阿里巴巴集团控股有限公司 A kind of data capture method, system and device
CN102682037A (en) * 2011-03-18 2012-09-19 阿里巴巴集团控股有限公司 Data acquisition method, system and device
CN102754488B (en) * 2011-04-18 2016-06-08 华为技术有限公司 The control method of user's access, Apparatus and system
CN102754488A (en) * 2011-04-18 2012-10-24 华为技术有限公司 User access control method, apparatus and system
CN102137111A (en) * 2011-04-20 2011-07-27 北京蓝汛通信技术有限责任公司 Method and device for preventing CC (Challenge Collapsar) attack and content delivery network server
CN103024092B (en) * 2011-09-28 2015-04-22 中国移动通信集团公司 Method, system and device for blocking domain
CN103024092A (en) * 2011-09-28 2013-04-03 中国移动通信集团公司 Method, system and device for blocking domain
WO2013067724A1 (en) * 2011-11-08 2013-05-16 北京捷通华声语音技术有限公司 Cloud end user mapping system and method
WO2013078825A1 (en) * 2011-11-30 2013-06-06 华为技术有限公司 Method, device and system for recommending accessible website to user
CN104506426B (en) * 2012-03-23 2019-03-01 北京奇虎科技有限公司 The information cuing method and device of mail
CN102663291A (en) * 2012-03-23 2012-09-12 奇智软件(北京)有限公司 Information prompting method and information prompting device for e-mails
CN104506426A (en) * 2012-03-23 2015-04-08 北京奇虎科技有限公司 Information prompting method and device for E-mails
CN103390129A (en) * 2012-05-08 2013-11-13 腾讯科技(深圳)有限公司 Method and device for detecting security of uniform resource locator
CN103390129B (en) * 2012-05-08 2015-12-16 腾讯科技(深圳)有限公司 Detect the method and apparatus of security of uniform resource locator
CN103428187B (en) * 2012-05-25 2016-11-30 腾讯科技(深圳)有限公司 Access method, equipment and the system controlled
CN103428187A (en) * 2012-05-25 2013-12-04 腾讯科技(深圳)有限公司 Method and system for access controlling, and equipment
CN102724187A (en) * 2012-06-06 2012-10-10 奇智软件(北京)有限公司 Method and device for safety detection of universal resource locators
CN102724187B (en) * 2012-06-06 2016-05-25 北京奇虎科技有限公司 A kind of safety detection method for network address and device
CN102831149B (en) * 2012-06-25 2015-08-12 腾讯科技(深圳)有限公司 Method of sample analysis, device
CN102831149A (en) * 2012-06-25 2012-12-19 腾讯科技(深圳)有限公司 Sample analyzing method, device and storage medium
CN102946377A (en) * 2012-07-16 2013-02-27 珠海市君天电子科技有限公司 Antivirus system and method for preventing users from downloading virus documents from internet
CN102833258A (en) * 2012-08-31 2012-12-19 北京奇虎科技有限公司 Website access method and system
CN102833258B (en) * 2012-08-31 2015-09-23 北京奇虎科技有限公司 Network address access method and system
TWI490726B (en) * 2012-09-03 2015-07-01 Tencent Tech Shenzhen Co Ltd Method and device for protecting access to multiple applications by using single sign-on
CN103679014A (en) * 2012-09-04 2014-03-26 腾讯科技(深圳)有限公司 Method and device for intercepting processing of webpage malicious Flash
CN103679014B (en) * 2012-09-04 2018-07-03 腾讯科技(深圳)有限公司 The intercepting processing method and device of webpage malicious Flash
CN103973749A (en) * 2013-02-05 2014-08-06 腾讯科技(深圳)有限公司 Cloud server and website processing method based on same
CN104079528A (en) * 2013-03-26 2014-10-01 北大方正集团有限公司 Method and system of safety protection of Web application
CN104239369A (en) * 2013-06-24 2014-12-24 腾讯科技(深圳)有限公司 Method, device and system for filtering out webpage advertisements
WO2015007231A1 (en) * 2013-07-19 2015-01-22 腾讯科技(深圳)有限公司 Method and device for identification of malicious url
CN103338211A (en) * 2013-07-19 2013-10-02 腾讯科技(深圳)有限公司 Malicious URL (unified resource locator) authenticating method and device
CN104598508A (en) * 2013-09-18 2015-05-06 Ims保健公司 System and method for fast query response
CN104598508B (en) * 2013-09-18 2021-06-08 Iqvia 公司 System and method for fast query response
CN103984708B (en) * 2014-04-29 2017-11-28 暨南大学 The emergent decomposition method for sorting and system of catastrophe risk big data processing
CN103984708A (en) * 2014-04-29 2014-08-13 暨南大学 Method and system of emergency decomposing and sorting for processing of big data of catastrophe risks
CN106055557A (en) * 2015-12-25 2016-10-26 中国科学技术信息研究所 Method and system for classification and pre-processing of big data under Internet environment
CN106408334A (en) * 2016-08-31 2017-02-15 微梦创科网络科技(中国)有限公司 Verification method and system of network advertisements
CN107528845A (en) * 2017-09-01 2017-12-29 华中科技大学 A kind of intelligent url filtering system and method based on crawler technology
CN107580004A (en) * 2017-10-31 2018-01-12 深圳竹云科技有限公司 A kind of new authentication method and authentication center's framework
CN110709833A (en) * 2017-12-05 2020-01-17 谷歌有限责任公司 Identifying videos with inappropriate content by processing search logs
CN110709833B (en) * 2017-12-05 2023-09-05 谷歌有限责任公司 Identifying video with inappropriate content by processing search logs
CN110472133A (en) * 2018-05-08 2019-11-19 上海利业律兴企业管理有限公司 A kind of internet information exchange method and device
CN109063641A (en) * 2018-08-01 2018-12-21 浠诲嘲 Computer checking method
CN110516066A (en) * 2019-07-23 2019-11-29 同盾控股有限公司 A kind of content of text safety protecting method and device
CN110516066B (en) * 2019-07-23 2022-04-15 同盾控股有限公司 Text content safety protection method and device
CN113099441A (en) * 2021-03-29 2021-07-09 Oppo广东移动通信有限公司 Website management method, website management platform, electronic device and medium
CN113099441B (en) * 2021-03-29 2022-11-18 Oppo广东移动通信有限公司 Website management method, website management platform, electronic device and medium
CN114238962A (en) * 2021-09-29 2022-03-25 睿贸恒诚(山东)科技发展有限责任公司 Harmful information filtering system and method based on mobile internet

Similar Documents

Publication Publication Date Title
CN1588879A (en) Internet content filtering system and method
US10235465B2 (en) Internet and database searching with handheld devices
CN101971591B (en) System and method of analyzing web addresses
US10346462B2 (en) Metadata management and generation using perceptual features
CN101512522B (en) System and method for analyzing web content
US9723018B2 (en) System and method of analyzing web content
US8271650B2 (en) Systems and method of identifying and managing abusive requests
US8359651B1 (en) Discovering malicious locations in a public computer network
US7987173B2 (en) Systems and methods of handling internet spiders
US7860971B2 (en) Anti-spam tool for browser
CN1906612A (en) Method and system for recording search trails across one or more search engines in a communications network
US20090100015A1 (en) Web-based workspace for enhancing internet search experience
US20070094738A1 (en) Techniques to pollute electronic profiling
JP2005339545A (en) Detection of search engine spam using external data
CN107786537B (en) Isolated page implantation attack detection method based on Internet cross search
CN1540552A (en) Computer search with correlation
CN1601532A (en) Improved systems and methods for ranking documents based upon structurally interrelated information
US20080021903A1 (en) Protecting non-adult privacy in content page search
US20090254553A1 (en) Matching media for managing licenses to content
US8521746B1 (en) Detection of bounce pad sites
US20090024577A1 (en) System and method for organizing, posting and searching information on a network
US20140280038A1 (en) Delivering a filtered search result
JPWO2005006191A1 (en) Apparatus and method for registering multiple types of information
Boyapati et al. Anti-phishing approaches in the era of the internet of things
Sharma et al. Image Web Crawler Towards Machine Learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication