WO2007016868A2 - System and method for verifying links and electronic addresses in web pages and messages - Google Patents

System and method for verifying links and electronic addresses in web pages and messages Download PDF

Info

Publication number
WO2007016868A2
WO2007016868A2 PCT/CN2006/001986 CN2006001986W WO2007016868A2 WO 2007016868 A2 WO2007016868 A2 WO 2007016868A2 CN 2006001986 W CN2006001986 W CN 2006001986W WO 2007016868 A2 WO2007016868 A2 WO 2007016868A2
Authority
WO
WIPO (PCT)
Prior art keywords
page
checker
asker
link
pages
Prior art date
Application number
PCT/CN2006/001986
Other languages
French (fr)
Inventor
Marvin Shannon
Wesley Boudeville
Original Assignee
Metaswarm (Hongkong) Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Metaswarm (Hongkong) Ltd. filed Critical Metaswarm (Hongkong) Ltd.
Publication of WO2007016868A2 publication Critical patent/WO2007016868A2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Definitions

  • This invention relates generally to information delivery and management in a computer network. More particularly, the invention relates to techniques for automatically classifying electronic communications as bulk versus non-bulk and categorizing the same.
  • ECM electronic communications modality
  • Links to pages at other domains, and also email addresses can be validated.
  • a visitor to the page can query the domain in the link or email address.
  • the query might be done by an "Asker", which is a plug-in to the browser, or, on a trusted website, by the website itself.
  • the Asker sends the URI of the page being viewed and that of the link in the page, to a "Checker" at a domain given by the link. It has a table of pages external to the domain, which are approved as having links to its pages.
  • the Checker can reply with whether the external URJ in the query is approved to link to the URI of its page that was in the query.
  • the query and reply can also be generalized to ask about more information about a person. This lets a user with a browser view a page with more confidence.
  • our method lets sellers on auction sites write information about an item for sale, that can be verified by a potential bidder, which should reduce the incidence of fraud, and lead to more bids and higher valued bids. Also, job seekers could post their resumes on job sites, and have various items verifiable by a reader using a browser. We also permit an automated detection of broken links, and an easy manual or automated repair of those links.
  • ISP Internet Protocol
  • plug-in to a browser
  • a given browser might one day incorporate the functionality ascribed here to the plug-in.
  • a browser we also include other programs that can display documents in a hypertext markup language (including but not limited to HTML), and follow any hyperlinks in those documents. Plus, when we discuss the Internet, occasionally we will give examples of Internet addresses, in the IPv4 format. All our remarks also apply to IPv6.
  • the ⁇ ask> tag might have an optional port number, specified perhaps as
  • she might put a button before or after the tags. (Or possibly between them.) Such that the user can press it to ask for validation of that address. Or, optionally, if there are several addresses in the page, and in general these are different from each other, though some could be the same, then she might have a button that will ask for validation of all such addresses.
  • the validations are only performed against addresses delineated by the above tags.
  • the site's web server might choose to perform such validations. That is, the web server incorporates the functionality of the Asker. Then, for those addresses which validate, the server might amend the pages to display these in some distinctive fashion. Or, for an address which had the validation tags, but then did not validate, it might be shown in another manner. And above, we said that Beth might have written in buttons next to each address to be validated, or perhaps a button for the entire page. But this might be done by the server, saving Beth some effort.
  • an Asker might be implemented as a Service Agent that runs on the user's computer.
  • the ISP can retrofit the above maintenance of a PL into its existing address book, assuming that it offers the latter to its users.
  • Dinesh can search his address book for a person with a surname containing a string that he supplies. Then Dinesh might enter the URI for his PL into a new address book contact, in the surname field.
  • the Checker later gets a query for a possible URI for Dinesh, it searches the surname field of Dinesh's address book, and then returns an indicator of whether any matches were found. In this outline, clearly other fields in the address book might also be used.
  • a page can be made of frames. So in the above, a frame might have a button, to validate only those addresses with tags that are in that frame. Or, a frame might have a button, which verifies those addresses with tags that are in another frame.
  • Dinesh might store a domain, like "b.com” in his PL. This can be taken to mean that he approves of all instances of his address in URIs at that domain or its subdomains. Or Dinesh might store a "negative” domain, like "-c.com”. Which can be taken to mean that no queries coming from that domain or its subdomains will be verified. Or Dinesh might have "b.com” and "-d.b.com”. These mean that the Checker should verify for queries coming from b.com and all of its subdomains, unless they come from d.b.com and any of its subdomains.
  • the Checker could record, for each URI key in his PL, how many times it gets asked. And possibly other statistics, like when each request was made, and from which network address/ The Checker might offer this data to Dinesh, maybe as a premium service.
  • the Checker can use the clear text URIs in Dinesh's PL to extract more information about his interests. This can be used to provide improved ad access to him, from advertisers that the Checker/ISP approves of. It could also use information about requests that come in, for Dinesh's username, but for URIs that are not in his list, in some fashion. For example, if there are many of these, did he inadvertently delete a popular URI from his PL? Or, more disturbingly, are there some unauthorized pages that refer to his address, and indicate that it is verifiable? The author of such a page might put in the above tags for verification, and possibly some visual indication as well, in the expectation that most readers will not actually ask for verification. In some ways, this should be more troubling to Dinesh than a conventional page that points to him, but does not use our verification tags.
  • the Checker might retain the URIs that are not in his PL, for some portion of time, so that he or it can do an analysis.
  • the Checker can use the clear text URIs across all its users' PLs to help it build a social network. Currently, these are made using other types of data or behavior. But with some or many users now making PLs, our method gives the Checker greater efficacy in making such networks.
  • a user with a PL might also disseminate it to other users, who are not necessarily at the same ISP.
  • the user's ISP might optionally offer a simple programmatic way for the user to do this. Though in the last recourse, a user could always manually write a message containing his PL and send it to others.
  • One motivation might be to use those PLs are extra data in making a social network that is not constrained to users at one ISP.
  • Such a link might be bracketed by ⁇ ask> and ⁇ /ask>.
  • a custom attribute call it "ask" into an ⁇ a> tag, as in, for example,
  • image.py refers to an applet written in the Python language.
  • each such link might be a button to verify just that link. Or a button for the entire page (or frame) might do the verification for such links (and email addresses as already discussed). Where the button might go back to its web server to do this. Or perhaps invoke a script that is in the page. Or the browser may have a plug-in with this ability.
  • This ability an "Asker", as was also used above.
  • An Asker asks (sends a query to) a Checker for information.
  • the Asker finds the link's base domain, D. Then, for each different D in the page, the Asker makes an array of the URIs containing it. For each D, it sends to a Checker at D a request. Which contains the URI of the above page, and the array of URIs that point to D.
  • DoS Denial of Service
  • the linked object is a page
  • it could be any network addressable object.
  • an applet written in Python. It had an URI, so it could be used in the context of the Checker.
  • This link verification is expected to be mainly for outgoing links. It can also be used for incoming links, which are typically for loading images onto a page. An important case is where the incoming link is for downloading a file. It is a common mode for the spreading of viruses that they masquerade as innocuous files from a trusted domain. Our method offers a means to reduce such spreading.
  • an HTML opening tag like ⁇ body>, might have a custom attribute, e.g., ⁇ body ask>, which means that for the scope of that tag, all enclosed links and email addresses should be verified, by default. This saves putting an explicit "ask” in every enclosed ⁇ a>, for instance.
  • a given tag that contains a link might have an exempt flag, e.g., "askNo", to exempt that link from verification.
  • An Asker might have the option of being able to try to verify all links in a page, whether those have been designated as verifiable or not. The user might thus be able to choose this option for extra information on the page's links.
  • the web server publishing a page with verifiable links might pre-verify these and change the page to indicate the results, in some manner. Likewise, it might put buttons into the page, next to each verifiable link, to implement the above testing.
  • a Checker were to store, for one of its given pages, or email addresses, the popularity of each entry in the PL for that page (or address), as measured by how many requests it gets for verification of those entries. Then, it might choose to reveal to an Asker, what the most popular or least popular entry is. Or even the entire list of partners for that page (or address) and the number of requests for each entry. The Checker might also reveal this to other programs, asking over the network.
  • Beta the author of Alpha is investigating pages at Beta's website, possibly to link to them, as she is imagined to do with Beta.
  • Several useful features can be done on that website, to help her easily what links might be verifiable.
  • Each file can have a parameter, here implemented as an attribute - ⁇ html mightApprove> or ⁇ html wontApprove>. These values are mutually exclusive. The latter says a verifiable link to the file is moot, because the Checker will never approve it. So imagine editing software that is being used to write Alpha. If the author puts in a link to Beta, the program can go across the network and get Beta and look for the existence of the above attributes. Hence, it can indicate in some manner via its user interface, whether she should even put in a verify attribute.
  • the files index.htm and index.html can have the extra optional attributes - ⁇ html mightApproveDir> and ⁇ html wontApproveDir>.
  • wontApproveDir Mutually exclusive, and they apply to other files in the directory.
  • using wontApproveDir here means that it is unnecessary to write a wontApprove attribute in the other files. While a mightApprove in one of those files will override this directory setting.
  • the files index.htm and index.html can have the extra optional attributes - ⁇ html mightApproveSubDir> and ⁇ html wontApproveSubDir>. These apply to subdirectories recursively.
  • index.htm or index.html might have a list of such files -
  • index file might have entries like these, to indicate which subdirectories could have files that might be approved for linking -
  • the value is the name of a tag in the page given by the href value.
  • the Asker goes to the page and looks for a tag with the askLabel's value.
  • This value can be that of an HTML tag or of a custom tag. If it is of an HTML tag, then the value is case insensitive. Because though official HTML style guidelines recommend that HTML tags be written in uppercase, most browsers accept any case combination for these tags. If the value refers to a custom tag, then it could be case sensitive or not.
  • the askLabel can be assumed to be present, with a default value of "title”. This refers to the HTML ⁇ title>, which is optionally present in the ⁇ head> section of an HTML document.
  • the only valid HTML tag name accepted by the Asker for askLabel would be "title”.
  • the Asker could do the following. It loads Beta and looks for opening and closing tags with the name given by askLabel's value. If the tags are missing then the link label is unverified. Else if the tags exist and the enclosed string is not the same as the link's label in Alpha, then the link label is unverified. Else the link label is verified. The string comparison should be case sensitive.
  • An Asker could also have the extra ability to find the correct value of a link label, from Beta, and then, if the value in Alpha is wrong, to produce a new Alpha page, with the correct value. And also perhaps to be able to store the new, correct page. If the Asker is on Alpha's website, and being run by someone with write authorization, then she might tell the Asker to write a new, correct Alpha.
  • Alpha is a dynamic spreadsheet. It has a table, with several cells having links to pages at other sites. Where the technique of this section is used to have the link values be derived from special tagged values on those pages.
  • the person who views Alpha can have an Asker that can dynamically verify the accuracy of the table's cells, and possibly correct any wrong values.
  • Asker can dynamically verify the accuracy of the table's cells, and possibly correct any wrong values.
  • the response of Alpha to the user changing parameters might be slower than that of a conventional spreadsheet.
  • the advantage here is that Alpha has a dynamic dependence on remotely distributed inputs. And the people maintaining those linked pages can just use standard HTML to maintain them. With custom tags to isolate the values that Alpha will use. Possibly, their websites might run Checkers, if link approval is seen as a desirable feature in Alpha, by those who will use it.
  • Beta there is a tag called “garlic”, with some value between its opening and closing instances, that should be the same as the above value in Alpha.
  • the Asker might have the ability to show to a user such values in a distinctive manner in Alpha.
  • the notation is arbitrary, but it mimics and is a simplification of the standard XML notation for denoting different XML namespaces.
  • “garlic” and “porridge” are assumed to be the names of tags in Beta.
  • the notation also lets us intersperse values from different linked pages.
  • link labels should be generalized to include checking the values of custom tags in a page, where those tag values are derived from tags in linked pages.
  • Rho that is in Phi's PL.
  • Ws web server or Asker might go to the link in Rho that points to Phi, download that page, see Rho in the PL, and then amend Rho to include some indication that the link was verified, as well as possibly put in a button to allow verification. Though, as earlier, it would probably only follow the link in Rho to Phi if the link had tags to indicate that it was verifiable.
  • inline tells Ws server to first look in the linked page. While if we are using an ask attribute, we might just use an extra attribute, like, for example -
  • a website that hosts pages with inline PLs can also run a Checker.
  • the simplest way is that the files with inline PLs are not dealt with by the Checker.
  • Another way might be that a file might have an inline PL 5 and then another PL in another file, as discussed in the previous paragraph, and then the Checker might be consulted.
  • In the file might be a code defining the order of lookup, which could vary from that example in the previous sentence.
  • An analogy can be made to the file /etc/nsswitch.conf, which is found in many unix or linux machines. This defines the order in which the databases for various system processes are searched.
  • inline PLs gives an incentive for a future browser (or plug-in) to implement a bi-directional web, by allowing the display of such PLs if they exist.
  • a future browser or plug-in
  • the above method of writing a PL in special tags renders them invisible to a standard browser. Browsers have always been designed with this ability. This is highly desirable, because it means that an implementation of our inline method does not cause any visual change to existing pages that are being linked to. But it is clearly foreseeable that a future browser might arise to take advantage of the extra information in a PL, and display it in some manner, and let the user select links in it.
  • One simple implementation might be for that browser to make a new page with 2 frames. Then it puts the page to be displayed in one frame, which is the main frame. The second frame shows the PL from that page. Clicking on a link in the PL causes that new page to be fetched and shown in the first frame. And the second frame gets erased if the new page has no PL, or it gets the PL of the new page.
  • the web server might still keep tabs on the requests for each entiy. Hence, even for static files, it might periodically overwrite these, by using a simple XML extension of the example, to write the number of requests for each entry. If so, an Asker or browser might have the ability to read and display this extra information.
  • phi slier user might write a verify button that does not actually verify. Like indicating that an email address or link has already been verified, when it has not. Thus, in the web page, the user should not have the ability to write all of it.
  • the website might confine the user's content to a frame, for example, and the website writes its own content outside that frame. The latter having a verify button that operates on the user's content.
  • Theta is a page at an ISP.
  • Theta is a page at an ISP.
  • a user Jane
  • the URI has the base domain, somewhere.com, and then to the right of it, what amounts to a unique string that identifies the message currently being shown in her browser.
  • the Checker then replies true or false, or codes to that effect, if its records show that the user dinesh performed the above.
  • the hash might be of the body+from or body+from+to+subject or similar arrangements.
  • a better approach is that the recipient gets to decide whether to verify a given message or not. This is a manual operation, and the human reaction time limits how many queries a person can ask for, in a given time period.
  • Beta can use the Checker to see a list of which other pages link to Beta, and which desire an approved link. That is, the approval or not is a manual operation. But the Checker can optionally perform steps to present the author with extra information, that she might use in her decision.
  • the Checker can analyze the linking page, Alpha. It can extract the other links and their base domains. And test if any of these are in a blacklist. If so, then it might count up the number of such links. This information can be given to the author. Because if Alpha is suspected of being associated with a spammer domain in the blacklist, say, then perhaps Alpha's author is trying to burnish Alpha's credibility by having an approved link to a good page, Beta.
  • the Checker might spider to find other pages at Alpha's domain. While this entails more work, it can all be done programmatically. Then, the Checker can apply the above test to those pages, and report these.
  • the Checker can check Alpha's base domain itself, to see if it is in a blacklist. Or possibly in a neighborhood of an entry in the blacklist.
  • Beta Beta ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇
  • rA URJs of pages with verifiable links to Beta, that it approves of.
  • rB " " " “” that it disapproves of.
  • rC URJs of pages with (normal) links to Beta, that it approves of.
  • rD " " " “” that it disapproves of.
  • Sets C and D might be found by perhaps consulting a search engine to find which pages it knows about that link to Beta. As discussed above, so too might some pages that go into A and B, if the authors of those pages do not contact Beta's author for inclusion.
  • Any of these sets might be empty, perhaps because the Checker did not decide to hold such data. It is also possible that two nonempty sets might have contradictory information. Imagine if A had a page Chi, and so did B. This indicates an error, perhaps in the manual construction of those tables. A Checker might optionally, but probably preferably, have algorithms to detect such contradictions and alert its operators, and possibly not promulgate such results to the network. Any Asker that asks a Checker should be aware that the latter might not run such checks. So the Asker might have some optional filters to detect such contradictions. There is an element of redundancy here. But in a distributed network, it may be necessary.
  • Another implementation might be that the reply is stored as characters. In which case, a natural implementation would be to map the above 4 bit replies into the hexadecimal characters, '0'-'9' and 'A'-'F'.
  • Asker It might reside as a plug-in in Jane's browser. Or in the web server of a website. In either case, suppose it has one or both of a whitelist and a blacklist of URJs and domains.
  • the Asker might have options that are shown to, and adjustable by Jane, that control what it asks for, when looking at a page or message. These include, but are not limited to -
  • the whitelist might come from antiphishing methods, including those in our Antiphishing Provisionals.
  • the blacklist might come from antispam or antiphishing methods, perhaps performed by one or more ISPs.
  • An Asker can have methods to prevent auto verification.
  • the last choice above might be the default setting.
  • the Asker 1 s user might be able to see its choice, and to change it.
  • An Asker might also be able to show the user any links that violate this test.
  • An Asker might have an option, selectable by its user, of testing the validity of any digital certificates on the Checker. Note that there is little need for an Asker to be able to do this with the page that is currently being viewed on the browser. Because most browsers can already do this.
  • An Asker's query might be formulated as an HTTP GET or POST (instead of using the XML format we have described). And sent either to the same port at a Checker that is used for HTTP, or, preferably, to a new port. This might be if the implementation of that Checker is compatible with that of a standard HTTP web server. Its reply would be in HTTP and enclose an HTML message. In such an event, the information going to and from the Checker would be functionally equivalent to what has been discussed earlier.
  • a given Checker might support both formats, so that this is moot to an Asker.
  • an Asker might have prior knowledge of that Checker's preferred format. Perhaps from previous queries to that Checker. Where the Checker might return a message indicating its preferred format, and the Asker stored that information about that Checker, for use if the user were to want another verification from the Checker.
  • Asker is associated with a website, rather than with a plug-in. It might keep statistics on which are the most popular of its pages that users verify, and associated data.
  • the Asker might/should have the means of displaying the comment and address to the user. Plus a means of letting her easily pick the address and have the browser go to it.
  • Beta a page at a website of a bank, bankO.com.
  • Alpha is written by a critic or competitor of the bank.
  • Alpha's author included a link to Beta. This link need not necessarily be marked as verifiable.
  • the bank has learnt of Alpha, possibly via a search engine.
  • the bank might want to write a page, Gamma, that is a direct reply to Alpha.
  • Beta's PL now includes ("disapprove", C, Gamma), where C is a comment.
  • Alpha is a page on a pharai, that might be pretending to be the bank or an affiliate of the bank. In such latter cases, the bank would take stronger measures, like contacting the government and Alpha's network service providers, to have the pharm taken offline. Rather, Alpha is imagined to be a validly posted page, that has a right to be posted. If so, then the bank has the same right to reply to Alpha, and our technique here lets the bank easily do this.
  • Beta "Referer” [sic] header line in the http request said it came from Alpha. Then the server can use Beta's PL. It finds that Alpha is a key in the PL, where we imagine the latter implemented as a hash table. Corresponding to the key is the above triplet. Hence, instead of returning Beta, the server returns Gamma. Now, many current web servers have an ability to customize what page they return, based on the page that the requester's browser came from. But what we have shown here is how our method can produce a PL that has several uses. Not just in our method, but also by the web server.
  • a verifiable link in a page pointed to a URI has a domain one.two.base.com.
  • the query was considered as going to a Checker at base.com.
  • An alternative is that the query go to a Checker at one.two.base.com. If this does not exist, then a query might be sent to two.base.com, and if no server were there, then a query might be sent to base.com.
  • a website W with users who login to it. A user, Mary, can maintain a Partner List.
  • Each item in the PL is a URI or (in general) a network address of somewhere on that network (usually the Internet) with a page that links to Mary's address at W. And Mary approves of that link.
  • W runs a Checker. It gets a query.
  • An example might be
  • Phi is the URI of the outside page, with a link to Mary at W. She might have an alias, as shown in the second example. Assume that both her user name and alias are unique, amongst Ws users. In the above, there might also be several URIs in the ask. Assume one, without loss of generality.
  • the url tag might have an attribute to ask about the expiration time, if it exists.
  • the url tag might have an attribute to ask about the expiration time, if it exists.
  • Mary might also have various attributes, and these might be asked for. Like for example, asking for all attributes she might have - ⁇ ask>
  • the reply by the Checker might list the attributes in various ways. One is to have the order implicitly be the same order as the attributes in the ask. So we might get ... ⁇ attributes>
  • the reply might also list each attribute name and Mary's value for it.
  • Our method lets a game site condone it, and by operating a Checker, facilitate such transactions.
  • the reason is that currently, on an auction site, a potential buyer has to take a seller's word about whether she is even a member of a game site. Let alone how much experience she has amassed with her character.
  • the main protection the site offers is usually a feedback rating of that seller for previous auctions.
  • the company might let its graduates be members of one of its groups, with network access. Mary can write up her auction description and post it, and find its URI. Then, she can login to her special (limited functionality) company account and add that URI to her PL.
  • the attributes are not directly changeable by her - like her graduation details. The company writes these attributes. But this can be done once, at graduation. From then on, there is little manual intervention needed by the company. Naturally, it might choose to charge Mary for this extra, post-graduation service.
  • Beta is a link to a page at uni-x.edu. This page has -
  • the Asker checks that the degree and date in her resume matches that in the official listing. Plus, a Checker can confirm that Mary's link to Beta is approved. This is necessary, otherwise someone else claiming to be Mary might put up a fake resume page.
  • the Checker approval follows some steps that Mary is assumed to have done, outside the scope of this invention, to prove her identity to the university.
  • the checking of her link label prevents her from "upgrading" her degree on her page, to a Masters or PhD, say, after her page has been added to Beta's PL.
  • Beta should be unique within the scope of Beta, and should have no other significance outside that page. Specifically, it should not be her former student identification label.
  • Beta is presumed to be publicly readable. So long as the tags follow this recommendation, then the tags do not give out any personal information.
  • our method envisions that any organization with information about its members or past members, might choose to implement it, to better serve these members.
  • the ongoing manual effort is by members logging in to maintain their PLs.
  • the information that is made verifiable is done so by the explicit activity of those to whom the information refers. Which addresses any privacy concerns.
  • the merit of our method is that it simplifies this ability, and essentially lets any user who has a website or electronic address at a location that runs a Checker, to be a Validator. If a Checker is running, along with Askers in users' plug-ins and possibly in the validated website, then it enables a much simpler to operate, and cheaper, validation.
  • the Checker can have the option of dropping (i.e., not replying to) a request from an Asker. This might be if that Asker has made too many requests in some given time. Or if the Asker is coming from a network address that is on the Checker's blacklist (if the latter exists). Or for any other reason. The main reason for this is to protect a Checker against attacks, like Denial of Service; some of which might involve trying to swamp the Checker.
  • any implementation of an Asker should have the ability to recover from sending a query to a Checker, and not getting a reply.
  • the Checker has a blacklist of spammer domains. It can have a policy of automatically replying "disapprove” to any ask it gets about an URI Phi in that blacklist that points to one of the Checker's pages (or email address of a Checker's user). This might override any entry for Phi in one of its page's PL or in one of its user's PL.
  • the Checker might apply other analysis to its page or user, if either had Phi in its PL. In part, it could suspect that the author of the page, or the user, might be involved in spamming or phishing.
  • the Checker works at a website that is an ISP. So a spammer might open a mail account, and on her (spam) domain, have pages that try to verify to the spammer's account at the ISP. This helps protect the Checker/ISP, from possibly being blacklisted by other ISPs, if they associate the Checker's domain with the spammer.
  • Beta has a Checker. If visitors to Alpha use their plug-ins to verify, then in general, Alpha does not know this. Because a plug-in would use the instance of Alpha that has been downloaded to its browser. And the Asker would talk directly with the Checker, bypassing Alpha's server. But periodically, the Checker might send statistics about those verifications that came to it about Alpha, to Alpha's server. A Reporting Service for which the Checker might bill that server. Which might then combine that information with its various server logs, to try to discern more information about its visitors. This could be valuable if Alpha is one page of a commercial website that performs transactions.
  • a future web server might incorporate some or all of the functionality of a
  • BME Envelope
  • Each BME would have metadata in various metadata spaces, like domain, hash, style, relay and user.
  • metadata spaces like domain, hash, style, relay and user.
  • the Ask Space is a data space consists of those ask queries, that go from an Asker to a Checker. Call an entry in this space an "ASK". It has various metadata. Most importantly, the two URIs, call them Alpha --> Beta. Unlike the BME's metadata, these two form a directed edge, in the language of graph theory. From each URI, we can extract their domains and base domains. If the latter are dl and d2, then we have dl — > d2.
  • Beta a request to Beta's web server to return Beta gets to a redirector, which then goes to another address, Gamma, possibly at a different base domain. Then we have Alpha — > Beta --> Gamma. If we reduce down to base domains, and Gamma is in the same base domain as Beta, then as before, we would just have two entries, dl --> d.2.
  • the Ask domain clustering can be used for various purposes. Like observing how user behavior changes over time, as indicated by what items verifications are asked for.
  • Another way to make a cluster of spammer domains is to find a spammer domain, by whatever means. Then use the method of this invention to find all domains that approve of it, and put those domains and the original domain into a cluster.
  • a search engine finds various clusters of domains in Ask Space. It could send (or sell) this information to an antispam organization or an ISP. This could be a continual process, as new lists of Ask Space clusters could be made at regular time intervals, to reflect changing links and user behavior.
  • C be a cluster of domains in Ask Space
  • f and g be domains in C 5 where f links to g, and g approves of the link.
  • From the ISP's incoming or outgoing messages it can find clusters of domains, possibly using our method of "1745".
  • f is in one such cluster, D. If g is not in D, then the ISP might add g to D.
  • the proposed added domain might be subject to an exclusion list maintained by the ISP.
  • f and g are already in an existing cluster D, then they need not necessarily be directly connected. So the f — > g association from C can be used to make such a new connection within D.
  • the ISP might perform the above actions for some or all domains in each Ask Space cluster, and across some or all Ask Space clusters. After which, it can try to coalesce its clusters. Because a domain which was added to a cluster might already exist in another cluster.
  • the Asker is a website, and the user is looking at a page on that website, and then asks the Asker to ask the Checker.
  • the Checker might require from the Asker the network address of the user. So it might want a query like this example,
  • an Asker When an Asker starts up, it could scan its pages, looking for the verifiable links. Then it might contact the Checkers at these links, asking, in part, for whether they want this information or not. There is also the possibility that a Checker can also change its policy, and then contact its Askers to inform them of this change. Where perhaps an Asker could expose a listener or Web Service at one of its ports, for this purpose.
  • the Asker does not have to give the Checker that information. Perhaps the Asker is making a voluntary policy choice that it believes is protecting the privacy of those using it. This might also be a function of the physical location of the Checker, insofar as that can be determined from its network address. Or it might also be a function of external knowledge about that Checker, or knowledge of the Asker's past interactions with that Checker. For example, the Asker might suspect that the Checker and its website are associated with the dissemination of spam, or the hosting of spam domains. Or 5 more broadly, that the Checker is located in a country with a high incidence of fraudulent websites. While it may seem like an Asker is making sweeping generalizations, in general, it may have leeway to do so, whether its assessments are correct or not.
  • the Asker is located in a region with government regulations that prohibit it giving the Checker that information. Conversely, the regulations might mandate that it hand over such information. In either instance, the regulations may also be a function of the network address or perceived physical location of the Checker. For example, it might be permissible to hand over the information only if the Checker is in the Asker's country.
  • the Checker may have reasons of its own to ask for the user's address. (And we will discuss some of these below.) But in some regions, regulations might mandate that it must ask for such information, or perhaps that it must not. And this may also be based on the Asker's network address or perceived physical location.
  • our method allows for a policy enforcement on the actions of the parties, where the policies might be set by some combination of government regulation, corporate decisions or individual decisions.
  • the user's network address is not necessarily that private, with respect to the Checker. Often, the user might click on the link, and thus presumably get the page at the Checker's website. Thus, the Checker or its associated web server knows her address, because that is where it sends the page to. This might happen after she sees the Checker's reply, as shown by the Asker. Or, it might even happen before. Where perhaps she clicked on the link, got the Checker's page loaded into her browser. Then she went back on the browser to the previous page, and asked the Asker for information.
  • an Asker could, but does not have to, reveal the user's address to the Checker.
  • the Asker might ask the user to manually approve it sending out the user's address. And apply this.
  • the Asker might refrain from asking the Checker, and tell the user that the link could not be verified for this reason.
  • the user might set a default policy of always yes or always no.
  • the Asker might have its own policy, which might not involve consulting the user or applying the user's policy.
  • the Asker can choose to give (or perhaps be required to by regulation) a false address to the Checker.
  • the Checker might be able to detect this in a statistical sense.
  • the Asker often asks the Checker and furnishes various addresses. Then, often soon after (or soon before), the Checker's page in question gets downloaded, to addresses different from those furnished by the Asker. Especially if these time intervals are short compared to the average time between page downloads seen by the Checker.
  • the Checker may have various policies to decide what to do if it detects suspected false addresses.
  • the Asker gets repeated requests for verification of the Alpha --> Beta link, inside the Alpha page, and it hands over real addresses to the Checker, then the Asker might have heuristics to try to see patterns in the Checker's replies. For example, if the Asker notices that all asks from 10.20.30.* get verified, and all asks from other addresses do not get verified, then it might infer that the Checker is approving based on that address range. But the Asker can never be fully sure that this is the entire Checker rule set. Because the Asker might never have gotten a request from the 40.55.*.* range, say; and for this range, the Checker will verify. Thus the Asker cannot get a full deterministic result.
  • the Asker can actively experiment by probing the Checker with false addresses, to perhaps map out possible ranges of addresses that the Checker will or will not verify. Though if it wants to avoid the Checker detecting this, it should probe only within the statistical envelope of actual queries received by it, and do so in a pseudo-random fashion.
  • a Denial of Service (DoS) attack might be happening against the Checker.
  • the attacker is coming from one or more browsers and using the Asker to directly hit the Checker.
  • the addresses given by the Asker to the Checker might help the latter locate the attacker.
  • the Checker and its associated web server might block any http request from those addresses. Because these might be a direct DoS attack on the web server.
  • the Checker could also tell any router it controls to block incoming messages from those addresses. And maybe inform any upstream routers that it does not control, to ask them to do likewise. Plus, when we describe blocking addresses, the Checker might also block neighborhoods of those addresses. Where it determines these according to some heuristics.
  • the Checker can also tell the Asker that it (the Checker) is under attack. Perhaps the Asker already thinks it (the Asker) is under attack, due to a high frequency of messages coming to it. Or perhaps not. If the attacker is contacting various Askers with requests that point to the Checker, so that each Asker might not think that it is under, attack, and hence pass on those requests. Whereas if the Checker can get (accurate) information about the addresses of the requests, summed over all its Askers, then it might be able to detect patterns in the address distribution or frequencies. Hence, it might ask the Askers and their routers if they could block such requests.
  • the browser is showing page Alpha, with a verifiable link to Beta.
  • the Checker might also ask the Asker for what address or domain the user was at, when coming to Alpha. If the Asker is a plug-in for the browser, then it has access to this information from the browser, since this is what the Back button on the browser uses. If the Asker is on Alpha's website, then the http request to Alpha's web server might have the Referer field. This sometimes has the domain that the user came from, to Alpha. The field could be blank or it could be misleading. In general, Alpha's Asker cannot tell if a non-blank field is true or not.
  • servers on the Internet might have lists of various types of accredited companies or organizations.
  • the companies might be banks, chartered by various governments. Or recognized schools or universities.
  • Etc. An Asker might query these, in addition to asking a Checker. Or perhaps instead of asking that Checker; depending on the reply from the server. This is useful, in part, to stop a phisher starting up a fake bank, that has a domain name similar to an established bank. And then running a Checker (and possibly an Asker) for the fake domain, in order to try to visually mislead users.
  • the above has involved a manual effort by a user adding a URI to her PL at some website where she is a member.
  • This can be generalized.
  • the website, Gamma could have procedures, online or offline, so that a person who satisfies them could add a URI of a web page at another website, that links to an address at Gamma.
  • an external queiy could get a reply from Gamma's Checker, that gave data, perhaps in the format suggested above for the attributes of a user. But here, the person need not necessarily have a login at Gamma.
  • Phi is a page at an auction site
  • Dinesh is trying to sell his car.
  • his item description he writes a verifiable link to the DMVs website. This may be instrumental in him getting more bids, and higher valued bids, compared to car sellers on that site without such aids.
  • V selling an item T on its website.
  • V obtains T from company Y, which has a website, y.com.
  • V makes a page with URI Alpha, that offers T for sale.
  • Alpha has a link to a page at Y 5 Kappa.
  • T has a unique id.
  • V contacts Y and asks Y to put Alpha in its PL for Kappa, such that when an Asker comes to Y's Checker, the latter is able to attest that V is selling T, with that unique id, and T came from Y.
  • a variant is if T is not distinguished by a unique id, but by a model number. Y's Checker may then attest that V is approved to sell an instance of, or some number of instances of, that model, which came from V.
  • her verify query should be done from her plug-in, and not from any verify button or facility offered by V. Again, here our method helps protect her against pharms offering perhaps non-existent merchandise.
  • T in turn has obtained T from a reseller Z.
  • Y's page for T could similarly provide a verifiable link to Z. This can be continued recursively, all the way back to the manufacturer. Then, if T is made from several parts, this procedure could in turn be followed for the suppliers of those parts.
  • a chain of authentication that is programmatically available to a potential end user of T. This can be interpreted as an extension of supply chain management, where manufacturers, resellers and retailers integrate their electronic procedures, to track inventory at each stage of distribution.
  • Our method can also be used to verify analysis of various types. Including, for example, appraisals on the authenticity or value of an item. Imagine some real estate being offered for sale on a website. It might say in the text of the page Chi that it has been appraised by appraiser Marcus at some value Ml . And appraised by appraiser Jody at another value Jl . Chi could have links to those appraisers' websites, and where they, separately, have added Chi to their PLs. Also, Chi might describe a structural/engineering analysis of the property by Susan, with a link to her website and where she has added Chi to her PL.
  • our method has described use cases where items are offered on an electronic marketplace, like an auction or seller's website. But our method can also be extended to a physical marketplace. Imagine such a marketplace with customer Jane walking through it. She sees a desirable item, and looks it up on an electronic device she is carrying, that has wireless communication ability. A type of a cellphone or PDA, perhaps.
  • the marketplace has a transceiver that broadcasts what is effectively a web page on this item. (And broadcasts other pages for other items.) The page gives more information about the item. And it might have verify tags that let her verify various details about it, like who supplied it to the marketplace.
  • a variant is that instead of, or in addition to, the marketplace having a centralized transceiver, an item might have the equivalent of its own transceiver. Imagine perhaps that an item that already has a built-in transceiver, for normal use by an end user. Like a car, or a laptop computer.
  • Another variant is that an item might have the equivalent of a passive, physical tag. When Jane queries it with her device, it replies with enough information for her to perform a verification.
  • the hypertext markup language used to describe the item is not HTML.
  • it might be the Wireless Application Protocol, which is expressly designed for wireless devices with small screens.
  • Our method permits a simpler, universal approach to extending the functionality of the pages. It could be used by G to run a Checker, insofar as constructing PLs for its pages. Then, perhaps the Checker does not answer queries from Askers. But the PLs can then be naturally used when an http request comes in for those pages, to decide whether a given page should be returned to an address that is in the page's PL.
  • Our method has mostly concerned about being able to verify some types of information related to a seller. Like data about an item the seller is offering, or possibly data about the seller herself/itself. We can also make similar statements about a buyer. She might post a web page requesting to buy an item. This page might have verifiable links to independent third parties, that approve the placement of this links on her page. And where these third parties can attest to facts or opinions about her.
  • escrow companies for online auctions might also use our method to improve their credibility. There have been instances of fake escrow companies, that essentially accept money from buyers, and then disappear.
  • Another context is that of online dating or socializing. There is merit in being able to verify some claims posted by a person about himself in a dating website, say. By providing bona fides in the manner of our method, he may be able to attract more responses.
  • Beta An extension of our method involves two programs, on different computers, that take part in transactions. Without loss of generality, imagine that these are implemented using Web Services. Let one program/computer be Alpha and the other Beta. Suppose Alpha needs certain resources. This might be extra computing power to run a program that Alpha has, for example. Or, Alpha might be an automated purchasing agent, looking to buy certain software or hardware items. Beta claims to offer the resources or items Alpha is looking for. We assume that if Alpha decides to go with Beta, then the transaction can take place in some programmatic fashion.
  • Beta is qualified.
  • Beta could read Beta's message, with those tags, and then make a query to such a third party, Gamma. Assume that Gamma runs a Checker, with the equivalent of the Partner Lists described above. Gamma could reply "true" to
  • Beta if it "approves” of Beta's reference to it.
  • Gamma might be some third party that is credible to Alpha.
  • a search engine deals mostly with web pages, and the links between them. It is these links which are typically the main determinant in how to rank search results. Typically, the more pages that link to a given page, the more relevant or "important" the latter is presumed to be. But this use of links can be considered a static analysis. Even though many pages are being continually created, destroyed or changed, that is still largely true. What would also be useful to the Engine is data on actual user browsings. That is, on actual page viewing. To some extent, the Engine can use the queries it gets from users, as an indication of general user interest. And, if the Engine hosts ads on the pages showing results, then often these ads are links that first go. back to the
  • the Engine which redirects to the desired destination. So from the user clicks on the ads, the Engine can get additional data on user interest.
  • the basic problem facing the Engine is that it has no direct access to users' browsers, when those users are surfing the Web. (although links can be found by
  • Verify Score of a page with URI Alpha as a measure of how many pages it links to, and that approve of those links.
  • the VS might also be modified by taking into account the VS, or some other ranking, of those pages it links to, in some self consistent formulation. For example, if the page that is pointed to has many pages that point to it, then this might increase a weighting of its approval of the link from the original page we were looking at.
  • a VS of a page is easy to find, and can be done deterministically. Unlike, for example, Google's PageRank.TM. The latter depends on finding the pages that link to a given page. As noted earlier, this can only be done statistically, by performing as comprehensive a spidering of the Web as possible.
  • the Engine can show the most popular Checkers it knows, and for each, the pages that are the most popular approvals. Analogous to showing popular search queries.
  • the Engine can also spider Askers which are located at websites, where an
  • Asker might have an API or Web Service. The Asker might reveal which are the most popular Checkers that its visitors call for verification. And it might reveal which of its pages correspond to these requests.
  • the Engine can let its users search for Askers and Checkers, in addition to the usual searching for websites. It might also have classifications of them, that users could peruse. The classifications might perhaps be done algorithmically, using, say, keyword or Bayesian analysis of their pages, or manually.
  • One way is for the Engine to sum up the number of requests that each Checker gets, and sort the Checkers using this, where the data comes from the Checkers. Then, the Engine can look at its Askers 1 data of the Checkers that they access, and sum up the number of requests to those Checkers. So two ordered lists of Checkers can be made. The order should be approximately the same, if most Askers are independent of the Checkers. The totals for a Checker will be different, in general, because the Checker also gets requests from Askers at plug-ins, and the Engine has no access to that data.
  • a Checker might have incentive to inflate the number of requests it gets. Perhaps to boost its ranking in a listing of Checkers, which might translate into more visitors to its website. But it is difficult to imagine much economic value, if any, in a Checker falsifying its data, to make itself appear less important.
  • This method of testing the Checkers' data can also be used, in mirror form, to test the Askers' data. Plus, if the Engine has wide usage, and there are many Askers and Checkers, the sheer global nature of the data may make it harder to tamper with in a meaningful way.
  • the Engine has an affiliated website or websites with a lot of content. For example, Yahoo.com and Msn.com.
  • the Engine can run its own Checkers on these sites. So it knows that this data is reliable. It can then find the biggest Askers that ask its websites, and rank those reliably. Then, it can ask those Askers, preferably in an automated manner, for their statistics. From these, it can find the biggest Checkers. Which it can in turn ask. This is a spidering method.
  • the Engine runs a messaging services, like email or Instant Messaging. For example, Google Corporation's Gmail, or Yahoo or Microsoft's Hotmail. It can run its own Askers for these services. Then it can find the biggest Checkers that its Askers ask. So it can ask those Checkers, preferably in an automated manner, for their statistics. From these, it can find the biggest Askers. Which it can in turn ask. Another spidering method.
  • the Engine has both content websites and messaging services, then it can combine the methods of the previous two paragraphs.
  • the Engine can also find from Askers and Checkers which pages are more likely to be verified. Presumably, this is due to more than just casual browsing by users. Timely knowledge of those pages can be exploited by the Engine, in ways including but not limited to the following -
  • the page's topic is non-financial and non-controversial, perhaps. So visitors to the page might see verifiable links, but have little interest or incentive in actually asking for verification.
  • the link farm scenario deserves further comment. Its authors might avoid using Askers and Checkers, to not appear on an A&C ranking. This in itself could be desirable to the Engine, acting to keep the latter ranking freer from distortion. But suppose a link farm wanted to get a high A&C ranking, where this is ultimately as bogus as its link ranking. It has to do more work, as these requests have to be continuously made from Askers to Checkers. And the Askers can have rules emplaced to restrict the number of requests coming from a given network address or address neighborhood, per unit time. Remember too that these Askers are not those in plug-ins, but at websites, and presumably the Engine might restrict its Asker data collection to those on its whitelist of reputable Askers. Where reputable might also mean that they implement various precautions like those described here.
  • Beta (and other pages).
  • the farm wants to boost its A&C ranking.
  • it runs a Checker for Beta.
  • Alpha is probably written by the farmer. It might be at a web hosting site, for example, where she can open an account and write pages.
  • an independent page is unlikely to link to Beta. So the more Alpha pages she needs, the more work she has to do. And possibly the more money she might have to spend, if the outside hosting service charges fees.
  • she only has a few Alpha pages. Then she has to send relatively more queries per page.
  • the Alphas are very likely to have low link rankings. Unless she cross-references them with each other, and with the pages inside the farm. But in this case, the Alphas are a de facto part of the farm. If not, then the Engine can search for pages (Alphas) with low link rankings, but which make a large number of asks to her Checker. A possible indication of manipulation.
  • the Engine might also choose that its A&C rankings have this extra criterion. If it gets data from an Asker at a web hosting site, it might give preference to such Askers when the site charges its users for hosting pages. To increase pressure on link farms trying the above scheme. Here, giving preference might mean increasing a weighting on such data, or possibly not getting data from Askers at free web hosting sites.
  • Notphish and its Agg let us defend large, trusted companies against phishing. While the corporate customers of Notphish+Agg can be extended downwards to smaller companies, as perhaps by the method described in [8], this has to be carefully done, in order to prevent phishers from joining. Whereas the above implementations of P described cases where anyone can use it at the level of supplying a Partner List.
  • Notphish implementation should not be extended to include the functionality of P.
  • Notphish corporate user, like a bank, might also want to run our method. Because the Notphish only works with users who have a browser with the corresponding Notphish plug-in, and which is NOT the Asker of this invention. Users without the Notphish plug-in lack the Notphish protection.
  • the bank might insert verifiable links into some of its pages. These links could be to various governmental bodies that regulate banks, or to a banking industry organizational body, for example. Where those entities would run Checkers to provide verification. The domains of these links might be published in the bank's Notphish Partner Lists, so that the links are compatible with the Notphish plug-in.
  • Our method can be extended to permit the checking of links and email addresses in arbitrary sections of a web page. For example, a page might have these custom tags -
  • the ⁇ ask> is shown next to the ⁇ /askLimit> for clarity. In general, it might appear anywhere between the askLimit tags.
  • the syntax of the ⁇ ask> tag which previously had been used in the context of verifying an email address.
  • the syntax of the email address was used to derive a domain at which the Asker would query a Checker.
  • the network address of the Checker domain is given explicitly in the ⁇ ask>, along with a port number. If the latter is omitted, then a default port number is used.
  • the ⁇ ask> also has an optional attribute (“name”) that might be used to indicate the name of the person or organization doing the approval. There might be other attributes with more information. These might be used by the Asker, in part, to be displayed to the user.
  • the Asker might also have the ability to query the Checker for its corresponding values for these attributes, if they exist, in order to find and highlight any discrepancies. If such exist, the values given by the Checker might be considered as more authoritative than those written in the page being analyzed.
  • the Asker may (or should) have code that can display the portion of the page between the askLimit tags, in some visible fashion.
  • a page might have different portions that can be verified. For simplicity, these should be nonoverlapping. The only overlapping permitted should be where one askLimit pair is completely embedded in the scope of another pair. This is standard XML practice, and permits a programmatic validation of correct syntax. Which is different from the verification process described in this invention, where an Asker goes out over the network.
  • Beta's Checker which essentially means ultimately by Beta's author.
  • Alpha "wants” to be approved by Beta is more "reputable” than Alpha
  • Alpha's use of a verifiable link is an attempt to gain Beta's consent to use Beta's reputation.
  • Beta is another modality.
  • Alpha is perhaps a more popular page, or a more reputable page.
  • Beta that gains from being linked to from Alpha.
  • Beta wants to affirm support of Alpha.
  • Alpha is an essay or petition or political statement, perhaps. Some people might want to publicly affirm support of Alpha.
  • Our method as already delineated, lets each of them add Alpha to her PL, while Alpha's author adds a verifiable link to her.
  • a normal browser will see the first ⁇ ask ⁇ thers>, and simply ignore it. While the Asker can read it, and possibly change the visible display of Alpha, to indicate its presence, and also, if instructed by the user, read other.htm and any other associated files, and perform a verification. In this instance, the Asker might have settings that restrict the maximum number of links it will do this for, with the user being able to alter it. To handle the case of a massive number of links that will take very long to confirm.
  • the file other.htm, and any other such files that it might reference, are written in HTML. This is optional. But it also lets those files be viewed in a standard browser, with many links active.
  • the other users who have approved the links to them might be searchable by the Asker, under various criteria settable by the user.
  • a search engine which spiders and finds Alpha might also use information in these associated files, to build a more structured picture of the network.
  • the auctioneer required that any real such message, from the seller to a bidder, sent via its messaging system, contains certain structure. Specifically, that it mentions the auction id and seller handle and possibly email address, in a manner that can be programmatically parsed for. Imagine, for example, a line with example values -
  • the messaging software might write this automatically as the first lines, say, in any such message from the seller to a bidder. This by itself is insufficient. For Amy might write that explicitly in the first lines, if she is not the seller. Or likewise if Amy sends the message entirely outside the auction's messaging system.
  • the data instead of the data being in hardwired locations in the message, they might be written in XML format. And placed anywhere in the message. They might be written as attributes inside custom tags, or each datum might be bracketed between the opening and closing instances of a custom tag. More flexible. We will assume henceforth that this is done, instead of hardwired locations.
  • the auction software When a seller sent such a message, not only would the auction software insert the above data into the seller's message, but it would also store the data, as a record of what the seller sent.
  • the Checker gets a query, it would try to match the furnished data with its stored data. And it can return true if there is a match, and false otherwise. Or perhaps more elaborate error codes, to indicate which field or fields are wrong.
  • the auctioneer's internal messaging system might also be available for the general case of a user, who is not a seller, to send a message to another user.
  • a message from a non-seller can be automatically parsed, to look for those tags. If they are present, then this could be suspicious.
  • the values can be compared to the Checker's data. For example, if the sender is not the seller, where the seller was found from the appropriate tag, then this could be fraud.
  • the other tags could be also tested. Hence, a suspected fraudulent message might not be forwarded to the recipient. And the sender might be placed under investigation.
  • Jack has a plug-in Asker. It recognizes what purports to be a message from an auction seller. It parses to find the above data. If they are absent, then it does not validate the message. Suppose the Asker finds the above data. It contacts a Checker at the auction site, and sends the extracted data. Based on the reply, the Asker could then validate or invalidate the message.
  • the plug-in Asker is also needed because a fake message from a "seller" Amy might arrive at Jack's external email address, without originating in the auctioneer's messaging system. This lets Amy bypass the above test done by the messaging system on a message. Amy might be sending out a mass mailing and hoping that some recipients get fooled. Or, Amy might have scrutinized Jack's bidding (on a high value item perhaps), and by other means, mapped from his handle to his email address. Some users choose handles to be their email addresses. While Amy might send non-fraudulent messages from another of her accounts at the auction site, to various bidders who bid high amounts. To try to get replies that reveal their email addresses. Which she can then later use as a fake seller.
  • an Asker plug-in For a particular auctioneer, it might distribute an Asker plug-in. Where there might be different versions of this, for different browsers. There is an important simplification for the Asker, as compared to a Notphish plug-in. When the latter is used to check email that the user is reading in the browser, then it needs knowledge of where the message body starts and ends, within the BODY tags of the HTML page. Because it has to find links in the message body, and typically outside this body, the ISP has various other links. Hence, the Notphish plug-in needs knowledge of how major ISPs demarcate the message body in their web pages. [3] Whereas the Asker has a more limited functionality. It only needs to find the custom tags defined by the auctioneer. Hence, if those tag names are unique, with respect to the tags found in the message web pages used by the major ISPs, then this is sufficient for the Asker to find them and their values.
  • the auctioneer can then simply train its users that if they have its plug-in and get a message claiming to be from a seller, then ONLY if the Asker validates it should they regard it as genuine.
  • Asker An alternative to the Asker is for Jack to forward the message to, say, test@auctioneer.com. That address leads to a equivalent Asker/Checker at the server that parses the message and gets the purported data and compares it to the records. This can be fully automated and very quick. It should be able to reply to the user, giving a "true” or "false” result.
  • a refinement of the above is for the stored data to also include the handle of the recipient, and that the starting lines also show this information.
  • the server In terms of privacy, it can be seen that the server reveals very little that is not already publicly known or knowable. For any given auction, its id and seller handle are public knowledge. But the server does not broadcast even these. It gets submissions from plug-ins of purported data. Its reply only at most shows which data fields are wrong, and not the correct values for those fields.
  • That plug-in is basically the Notphish plug-in that was described in those Provisionals. But imagine that the user also has an Asker plug-in. Instead of that plug-in automatically doing the Notphish verification, she might turn it off, by default. Then, when she reads the purported message, the Asker will do the verification of this method, where the auction website has a Checker. It has for each user, a PL. This is slightly different from what we described earlier about reading email and doing verification. There, we described how on the Checker side, when a user sent out email, it could record various header attributes. Here, instead of, or in addition to, it could add various auction attributes, like those discussed above.
  • Laura might guess that perhaps the file d.html has been renamed to another file in the same directory, b/. So she uses her browser to search b/. But she may not be able to do this. Because for security reasons, say, Susan may have told her server to prevent users from rummaging around in b/. That is, a browser must supply the name of an actual file, in order to see it. Now suppose Laura can look at b/. There might be hundred of files here, with the suffix html. Even if the server lets her read these, it may take some time to identify which of these she should HnIc to, if any. Now suppose Susan moved d.html to a different directory, c/e/f/. Even if Laura has all the necessary permissions to look through Susan's directory tree, this can be a lot of effort. Worse yet is if Susan decided to replace d.html by a file in another domain entirely. In general, Laura will never find this.
  • Beta's Checker can contact a Web Service or API at Alpha's domain. Assume that this is incorporated into the functionality of an Asker running at that domain.
  • the Checker sends a structured message, perhaps written in XML, as for example,
  • these messages might have some digital signature or other means whereby the recipient can check that the message did come from the Checker.
  • the instruction to the Checker to send the message might be done manually. Perhaps Susan, who renamed Beta, might do this, after she has done the renaming. Or more generally, Beta might have been deleted by her, and Gamma might exist at a different domain. In either case, Susan might tell the Checker the equivalent of the above message, and tell it to send the information to all those in Beta's PL that were approved. Presumably, there is no need to tell those in the PL which she disapproved of linking to Beta in the first place. This may even have been a reason for her to change from Beta to Gamma.
  • Alpha's Asker has received the message. It might attempt to authenticate the message. If so, assume that this succeeds. Then, it might contact Alpha's author, Laura, with the message. She can then manually make any necessary changes to Alpha. In this manual case, there is less need for the message to be authenticated, because Laura can by manual inspection check the links to see if the changes make sense. Though perhaps the authentication should still be done, since this can act as a filter, trapping any fake messages, before they reach Laura.
  • Asker might programmatically change the appropriate links in Alpha (there could be several links to Beta) to point to Gamma.
  • the Asker has the necessary permissions on the Alpha file, if Alpha is a static file. But suppose Alpha is a dynamic file - held in a database, and then assembled by a web server. The latter could have code to take the Checker's message and then replace the instances of Beta in the database with Gamma.
  • This syntax can also be used when the domain remains the same, but directories are renamed.
  • the Asker should require some authentication of this message. Plus, it might also have a requirement that the Checker's base domain be the same as the base domain in the oldBase tag.
  • the Asker might also have rules about the new addresses that it is being asked to link to. For example, it might have a blacklist of addresses. It applies this to the proposed changes. If any new addresses are in this list, then it might flag the original address and indicate this and the new address to Laura, in some fashion. She then can make a decision whether to use the new address, or perhaps remove the old link.
  • the blacklist may be useful to protect her website. Because other parties on the web, like search engines and ISPs, might spider the web. If they find that Laura links to an address in one of their blacklists, then they might also add Laura's page or domain to that list.
  • the Asker might have a whitelist of addresses, and it might restrict new addresses to be within this list. Or that the base domains of the new addresses be in this list.
  • the Asker might have a whitelist of protocols and addresses. It might have policies against having an http link to a non-default port, for example. Maybe in part because the intended users who will visit the Asker's website are behind a firewall that permits only a few outgoing protocols and ports.
  • the above messages might also have a changeover time, indicating when the changes will take place.
  • This time might be measured in Universal Time. It could designate that the old addresses will remain valid up to that time.
  • There might also be another time, call it newTime, say, with newTime ⁇ changeover, where newTime is when the new addresses become valid.
  • the ⁇ item> tags can be used to encapsulate each page.
  • Beta to see if she still wants to link to it. Or perhaps to modify the textual part of Alpha, to take into account Beta's change. Or, perhaps, when the title or a custom tag changed, the Asker might be able to automatically incorporate the changes into a new Alpha.
  • the Asker should have a filter that operates on the new addresses it gets in these messages. So that questionable character substrings might be detected and eliminated. These could be scripts or other malware that might be inadvertently executed by the Asker's machines. Or perhaps by the user's browser, in a cross-site scripting manner. Even if the Asker successfully authenticates the Checker's message as coming from the Checker, it may want to apply this filter. Because if the Checker has been electronically or physically subverted by a cracker, then malware might be introduced. And authentication methods on the Checker's messages are insufficient to detect this.
  • the Asker might ask the Checker not to be notified about any changes. It is up to the Checker whether to honor this request or not.
  • the Asker can automatically make changes, especially if its pages are dynamic, then the changes amount to a non-fmancial transaction. So the Asker could save the message in a log. In part, because if we regard the message as an operator on the Asker 1 s pages, then the message can be inverted. That is, from the message, an undo operation can be derived. Hence, recording the message allows for the possibility of a rollback if the need arises.
  • the Checker has informed the Asker of changes that have happened, or will happen, and that some of these changes involve Beta moving or being deleted. Then, some time after the change has occurred, the Checker can spider Alpha, to see if Alpha has been updated to reflect the new address change (or deletion) of Beta. If not, it can send another message to the Asker. Perhaps at a later time, if it detects still no change, it might have a policy of removing Alpha from its PL for Beta, or in that PL, changing Alpha from approved to unapproved. It could also do this for other pages at Alpha's domain or base domain, that point to pages in the Checker's purview.
  • Big can send certain data to the Agg. Including but not limited to the following:
  • a simple generalization of the first point is that Big can send a list of approved domains to the Agg. So that a message from one of these domains is allowed to load files from big.com. Or that these domains can have pages that load files from big.com. (Or these might be 2 lists, to distinguish between messages and pages.) This list is similar to the Partner List of the Antiphishing Provisionals.
  • the Agg amasses data from Big and other companies. Note that the data, per company, is relatively small. Periodically, it sends the data to its plug-ins.
  • a plug- in might store the data as a hash table, Kappa, with the key being a company domain (like "big.com"), and the corresponding value being that company's data, as described above.
  • the plug-in uses the methods of "2528" to detect if the user is reading messages at a known message provider, like a major ISP. It extracts the message and can then determine the (purported) sender. It checks if the body is loading any files. If so, it finds the base domains of those addresses.
  • the plug-in can do other steps. It extracts any links. These, or the suspect messages, might periodically be reported to the Agg. Assuming that the user gives permission for entire suspect messages to be uploaded. If not, summary metadata about the message might be uploaded. These metadata might be those found using our antispam canonical steps of "8757".
  • the Agg can then report to Big that its web resources appear to be misused, and send it various supporting data. Like the links or even the texts of those suspect messages or the URLs of the suspect pages. If the plug-in also has our antiphishing capabilities, then the above steps can optionally, but preferably, be done after the antiphishing steps. Where, if the message is has been found to be phishing, there might be no need to do the above. Since a phishing message is worse than the type of messages we are detecting here.

Description

System and Method for Verifying Links and Electronic Addresses in Web Pages and Messages
CROSS-REFERENCES TO RELATED APPLICATIONS
This application claims the benefit of the filing date of U.S. Provisional Application, Number 60/595807, "System and Method for Verifying Links and Electronic Addresses in Web Pages and Messages", filed August 7, 2005. That Application is incorporated by reference in its entirety.
TECHNICAL FIELD
This invention relates generally to information delivery and management in a computer network. More particularly, the invention relates to techniques for automatically classifying electronic communications as bulk versus non-bulk and categorizing the same.
BACKGROUND OF THE INVENTION
The development of HTML and http led to the explosive growth of the World Wide Web. Driven in large part by the ease of use. The key item was the hyperlink. The intuitiveness of clicking on a link, and then having the browser go to that link and display its page, becomes almost axiomatic after brief use. But if you consider a web page, at a given URI, how do you find what other pages link to it? There is no direct method. A hyperlink is unidirectional. A search engine might aid by listing some of those pages. These it finds by virtue of a comprehensive spidering of the Web, which in many respects is a necessaiy but brute force approach. Alternative formulations for hyperlinks have included a bidirectional capability. But these have not gained wide traction. Perhaps because of a complex implementation. And possibly because they lacked a compelling reason for use.
Why would you want to know who links to your web page? After all, being able to freely link to any destination on the Web, without the permission of that destination, has been one of the driving forces in its growth. But as various types of malware, like phishing and pharmmg, have become more common, then there is value in knowing, and approving, certain links to your page.
Imagine that you are BankO, with a website bankO.com. Let Amy be a phisher. She might send out messages, claiming to be from you, with links to URIs inside your website, for verisimilitude. Or she might have a website that pretends to be your website, known as a pharm. Or, her messages or website might not actually pretend to be BankO, but perhaps affiliated in some way authorized by you, like a marketing partner.
Consider a user, Jane, who uses a browser. She might read a message from Amy, or, by some means, visit Amy's pharm. The latter might be done with a message from Amy that links to her pharm. It would be very useful to protect both Jane and BankO, if there was a programmatic and objective means to protect her by indicating that the message or website is dangerous.
Another context refers to electronic addresses. Specifically email addresses, though our method below applies to any electronic communications modality (ECM), like SMS or Instant Messaging. Suppose you are an individual with an email address. Anyone who knows your address can write a web page containing that address. While you cannot prevent this, there can be value to you and that author, if you are able to approve its usage, and where a user visiting that page can ascertain that you have done so. It adds credibility to that page. REFERENCES CITED
Antiphishing Working Group, antiphishing.org World Wide Web Consortium - HTML definition, w3.org/MarkUp
(Distributed) Denial of Service attack - en.wikipedia.org/wiki/Denial_of_Service
SUMMARY OF THE INVENTION
The foregoing has outlined some of the more pertinent objects and features of the present invention. These objects and features should be construed to be merely illustrative of some of the more prominent features and applications of the invention. Other beneficial results can be achieved by using the disclosed invention in a different manner or changing the invention as will be described. Thus, other objects and a fuller understanding of the invention may be had by referring to the following detailed description of the Preferred Embodiment.
On a web page, links to pages at other domains, and also email addresses, can be validated. A visitor to the page can query the domain in the link or email address. The query might be done by an "Asker", which is a plug-in to the browser, or, on a trusted website, by the website itself. The Asker sends the URI of the page being viewed and that of the link in the page, to a "Checker" at a domain given by the link. It has a table of pages external to the domain, which are approved as having links to its pages. The Checker can reply with whether the external URJ in the query is approved to link to the URI of its page that was in the query. The query and reply can also be generalized to ask about more information about a person. This lets a user with a browser view a page with more confidence.
Specifically, our method lets sellers on auction sites write information about an item for sale, that can be verified by a potential bidder, which should reduce the incidence of fraud, and lead to more bids and higher valued bids. Also, job seekers could post their resumes on job sites, and have various items verifiable by a reader using a browser. We also permit an automated detection of broken links, and an easy manual or automated repair of those links.
BRIEF DESCRIPTION OF THE DRAWINGS
There is one drawing. It shows a user ("Jane") on a computer browsing the Internet. Her browser has an "Asker" that communicates with a "Checker" at another website.
For a more complete understanding of the present invention and the advantages thereof, reference should be made to the following Detailed Description taken in connection with the accompanying drawing.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
What we claim as new and desire to secure by letters patent is set forth in the following claims.
We have described many ways, using an Aggregation Center (Agg) in conjunction with a browser plug-in, to detect phishing and pharming in these U.S.
Provisional Patents:
#60/522245 ("2245"), "System and Method to Detect Phishing and Verify Electronic Advertising", September 7, 2004; #60/522458 ("2458"), "System and Method for Enhanced Detection of Phishing", October 4, 2004; #60/552528 ("2528"), "System and Method for Finding Message Bodies in Web-Displayed Messaging", October 11 , 2004; #60/552640 ("2640"), "System and Method for For Investigating Phishing Web Sites", October 22, 2004; #60/552644 ("2644"), "System and Method for Detecting Phishing Messages In Sparse Data Communications", October 24, 2004; #60/593114 ("3114"), "System and Method of Blocking Pornographic Websites and Content", December 12, 2004; #60/593115 ("3115"), "System and Method for Attacking Malware in Electronic Messages", December 12, 2004; #60/593186 ("3186"), "System and Method for Making a Validated Search Engine", December 18, 2004; #60/593877 ("3877"), "System and Method for Improving Multiple Two Factor Usage", February 21 , 2005; #60/593878 ("3878"), "System and Method for Registered and Authenticated Electronic Messages", February 21, 2005; #60/593879 ("3879"), "System and Method of Mobile Anti-Pharming", February 21 , 2005; #60/594043 ("4043"), "System and Method for Upgrading an Anonymizer for Mobile Anti-Pharming", March 7, 2005; #60/594051 ("4051"), "System and Method for Using a Browser Plug-in to Combat Click Fraud", March 7, 2005. Collectively, we shall refer to these as the "Antiphishing Provisionals".
We will also reference these U.S. Provisional Patents:
"System and Method for an Anti-Phishing Plug-in to Aid e-Commerce", ; #60/521698, "System and Method Relating to Dynamically Constructed Addresses in Electronic Messages, June 20, 2004; #60/320,046 ("0046"), "System and Method for the Classification of Electronic Communications", March 24, 2003; #60/481,745 ("1745"), "System and Method for the Algorithmic Categorization and Grouping of Electronic Communications", December 5, 2003; #10/905037 ("5037") (ref: Provisional #60/481,789, Provisional #60/48 l,745),"System and Method for the Algorithmic Disposition of Electronic Communications", December 14, 2003; #60/481,899 ("1899") "Systems and Method for Advanced Statistical Categorization of Electronic Communications", January 15, 2004.
In what follows, we often refer to a user having a message account at an ISP.
For brevity, we take "ISP" to mean any electronic message provider, and not just one who also provides a user's physical connection to the Internet. When we refer to a plug-in (to a browser) performing certain actions, we also include the possibility that a given browser might one day incorporate the functionality ascribed here to the plug-in. When we refer to a browser, we also include other programs that can display documents in a hypertext markup language (including but not limited to HTML), and follow any hyperlinks in those documents. Plus, when we discuss the Internet, occasionally we will give examples of Internet addresses, in the IPv4 format. All our remarks also apply to IPv6.
Most, if not all of our invention can also be applied to other electronic networks. We give numerous examples below in the context of the Internet, like the format of email addresses, joe@isp.com, say. Or of HTML and http. But other networks can be envisioned as having similar functional attributes. Likewise, our steps can be re-expressed in those networks.
We refer to a user visually experiencing what the browser shows. But the browser might offer an audio interaction. Perhaps for blind or visually handicapped users. All our remarks below about visible effects apply, with suitable modification, to this case.
Sections
1. Email Address
2. Link
3. Check Link Labels
4. Alternative to a Checker
5. Trusted Website or Plug- in
6. Reading Email
7. Approving a Linking Page
8. Partner List Extensions
9. Asker Extensions
10. Checker Extensions
11. Ask Space 12. Anti spam Extensions
13. User Network Address
14. Other Extensions
15. Verifying Buyers and Other Roles 16. Full Automation
17. Search Engine
18. Differences With Notphish
19. Digital Certificates
20. Verifying Arbitrary Sections of a Page 21. Different Modality
22. Online Auction Fraud
23. Broken Links
24. Unauthorized Use of Web Resources
1. Email Address
We start by explaining the case of a user, Dinesh, approving the publication of his electronic address in a web page. We show one implementation. Later, we will generalize this.
Suppose we have a user Dinesh, with the address dinesh@isp.com. Suppose another user, Beth, writes a page at the URI http://a.b.com/beth.html. In the page, she writes the text "dinesh@isp.com". This might optionally be preceded by "mailto:", which in many browsers would then cause the address to be clickable. And if clicked, a small browser subwindow might appear, to let the user write and send a message to that address. Notice that the two base domains, isp.com and b.com, are independent. In general, these are different organizations or corporations, with no particular affiliation between them. Beth writes the address with these example bracketing tags, as
<ask>dinesh@isp.com</ask> or perhaps as
<ask>mailto:dinesh@isp.com</ask>.
The <ask> tag might have an optional port number, specified perhaps as
<ask port=8319> . . . </ask>
Optionally, she might put a button before or after the tags. (Or possibly between them.) Such that the user can press it to ask for validation of that address. Or, optionally, if there are several addresses in the page, and in general these are different from each other, though some could be the same, then she might have a button that will ask for validation of all such addresses. The validations are only performed against addresses delineated by the above tags.
Beth now contacts Dinesh. She sends him the URJ of her page, and asks if he could enable validation, on his end. He goes to the URJ. Suppose he approves of the context in which his address appears. Then, he logs into isp.com. It implements part of our method, by letting him maintain a list, call it an "Partner List" (PL). This is similar to our Partner List in "2245". There, a corporation or organization would maintain one or more PLs, and the entities in a PL were the Internet domains or URJs of other corporations or organizations. That PL was disseminated to an Aggregation Center (Agg). Here, an individual can maintain his own PL. And it is not (necessarily) disseminated to an Agg.
Dinesh types in the above URI into his PL. Or copies and pastes it. Optionally, there might be some programmatic capability of his browser, interacting with the ISP's web page, that does a "form-filling", to make it even simpler for him to transfer this URI into his PL. This form-filling is like that already performed by electronic wallets. Optionally, there might be a lifetime on such an item, where he can set it, and with perhaps a default value. The ISP could periodically programmatically inspect his PL, and delete any expired entries.
Now imagine that Jane, a user, goes to the above URI, and reads the page in her browser. She sees some indication that Dinesh's address can be validated, like a button. She presses it. A process is invoked, which finds the address enclosed by the tags, and finds the base domain, which in this case is isp.com. We term this process an "Asker". It then goes to an Application Programming Interface (API) or Web Service at isp.com and presents two strings, "dinesh@isp.com" and "http://a.b.com/beth.html". Assume for simplicity that it is a Web Service waiting at some port on isp.com. We term this Web Service a "Checker". On a unix or linux machine, it might be a daemon that listens continually on that port. (Similar statements could be made under other operating systems). It checks that "dinesh@isp.com" is a valid address. If not, it returns some error message. Suppose the address is valid. Then it checks that the submitted URI is in Dinesh's PL. If not, then an error message is returned. Else, it returns a message saying that the address is approved. The Asker then does some steps, to make the results visible to Jane.
In the above, if the port number was not explicitly specified in the <ask>, then the Asker would go to some default port. Otherwise, it would go to that port at the destination address.
Naturally, for compactness, the various error and success messages could be implemented as integer codes. Though in other instances, they might be explanatory textual strings.
While the implementation of the Checker is up to isp.com, it can be seen that collectively, all the PLs can be stored as a hash table, where the key is the name of a user, and the result might be an array of character strings, one string for each URI in the PL for that user.
From the above implementation, variations are possible. For example, the site's web server might choose to perform such validations. That is, the web server incorporates the functionality of the Asker. Then, for those addresses which validate, the server might amend the pages to display these in some distinctive fashion. Or, for an address which had the validation tags, but then did not validate, it might be shown in another manner. And above, we said that Beth might have written in buttons next to each address to be validated, or perhaps a button for the entire page. But this might be done by the server, saving Beth some effort.
Also, instead of being a browser plug-in, an Asker might be implemented as a Service Agent that runs on the user's computer.
One variation is that the ISP can retrofit the above maintenance of a PL into its existing address book, assuming that it offers the latter to its users. (Most do.) Suppose Dinesh can search his address book for a person with a surname containing a string that he supplies. Then Dinesh might enter the URI for his PL into a new address book contact, in the surname field. When the Checker later gets a query for a possible URI for Dinesh, it searches the surname field of Dinesh's address book, and then returns an indicator of whether any matches were found. In this outline, clearly other fields in the address book might also be used.
Also, in a markup language like HTML, a page can be made of frames. So in the above, a frame might have a button, to validate only those addresses with tags that are in that frame. Or, a frame might have a button, which verifies those addresses with tags that are in another frame.
To optimize the network traffic, imagine that the verifiable addresses in a page are gathered up, before being sent to the various ISPs. If several addresses are the same, then there is only a need to verify one of these. Also, if there are several addresses at the same ISP, like dinesh@isp.com andjoe@isp.com, then these can be sent in the same network message to the Checker at that ISP5 in the obvious generalization of the above.
Another variation is that Dinesh might store a domain, like "b.com" in his PL. This can be taken to mean that he approves of all instances of his address in URIs at that domain or its subdomains. Or Dinesh might store a "negative" domain, like "-c.com". Which can be taken to mean that no queries coming from that domain or its subdomains will be verified. Or Dinesh might have "b.com" and "-d.b.com". These mean that the Checker should verify for queries coming from b.com and all of its subdomains, unless they come from d.b.com and any of its subdomains.
From Dinesh's viewpoint, his actions are very simple. The addition of an URI to his PL is conceptually not so different from him adding another person's email address to his Address Book.
When Dinesh adds an URI to his PL, it might be stored as a hash, computed according to some common hash function. This might be if he is cautious about his ISP knowing too much about himself. His browser might have the ability to take an URI, and hash it, possibly from a choice of hash functions, and return the result. Then, this is copied, perhaps manually or programmatically, into his PL. His ISP might even offer this hashing ability itself. Though given that he is storing a hash, he may want to decline this offer. If an URI is stored as a hash, then along with it might be stored a code to indicate what hash function was chosen. If this is omitted, there might be a default function. Thus, when the Checker gets a request with an address and an URI, if the PL has hashes, it may automatically convert the URI to a hash, using an appropriate hash function, in order to do a comparison.
Also, when Dinesh adds an URI or hash to his PL, he could also add a password. Plus, optionally, a requirement that the interaction be made using some encryption method, like Secure Sockets Layer or a modification of it, for example. This also assumes that his ISP supports this ability. If so, and a query comes in, then the Checker will ask for a password and compare it with what Dinesh has specified. This transmission of a password from the requester can be done in cleartext or encrypted, if Dinesh had required the latter.
But it should be realized that using a hash or password is only a minor part of our method. Typically, if a request comes in about an URI in a PL, then anyone who can reach that URI in the first place can view the validation. Plus, she could copy that validated information and then email it to anyone else. (Of course, this copy will not, in general, be able to be validated as per our method.) So any information that Dinesh voluntarily exposes for verification using our method should fundamentally not be something he wants to keep a secret.
Assume for brevity that the entries in his PL are URIs. The Checker could record, for each URI key in his PL, how many times it gets asked. And possibly other statistics, like when each request was made, and from which network address/ The Checker might offer this data to Dinesh, maybe as a premium service.
The Checker can use the clear text URIs in Dinesh's PL to extract more information about his interests. This can be used to provide improved ad access to him, from advertisers that the Checker/ISP approves of. It could also use information about requests that come in, for Dinesh's username, but for URIs that are not in his list, in some fashion. For example, if there are many of these, did he inadvertently delete a popular URI from his PL? Or, more disturbingly, are there some unauthorized pages that refer to his address, and indicate that it is verifiable? The author of such a page might put in the above tags for verification, and possibly some visual indication as well, in the expectation that most readers will not actually ask for verification. In some ways, this should be more troubling to Dinesh than a conventional page that points to him, but does not use our verification tags.
The Checker might retain the URIs that are not in his PL, for some portion of time, so that he or it can do an analysis.
Also, the Checker can use the clear text URIs across all its users' PLs to help it build a social network. Currently, these are made using other types of data or behavior. But with some or many users now making PLs, our method gives the Checker greater efficacy in making such networks.
A user with a PL might also disseminate it to other users, who are not necessarily at the same ISP. The user's ISP might optionally offer a simple programmatic way for the user to do this. Though in the last recourse, a user could always manually write a message containing his PL and send it to others. Along these lines, we can envisage a group of users voluntarily submitting their PLs to the group or another organization. One motivation might be to use those PLs are extra data in making a social network that is not constrained to users at one ISP.
2. Link
The above method can generalized to handle the verification of links. Such a link might be bracketed by <ask> and </ask>. Or, preferably, we might insert a custom attribute, call it "ask" into an <a> tag, as in, for example,
<a href="http://32. somewhere. com/d. htm" ask > Click here </a>
Likewise, for an image being loaded, we could also have an attribute, like
<img src="http://somewhere.com/b.jpg" ask >
More generally, we might have, for example, <object title="test" classid="http://blah.blah.com/image.py" ask> </object>
Here, the "image.py" refers to an applet written in the Python language.
Optionally, there might be another custom attribute to indicate a port at the Checker's domain to send the query to. For example, "askPort=3811 ".
Most browsers are designed to ignore both non-HTML tags and non-HTML attributes inside an HTML tag. So the above is backwardly compatible. We suggest the use of an attribute inside the appropriate HTML tags, as being more concise.
Suppose we have such a link in the above example page. Then, next to each such link might be a button to verify just that link. Or a button for the entire page (or frame) might do the verification for such links (and email addresses as already discussed). Where the button might go back to its web server to do this. Or perhaps invoke a script that is in the page. Or the browser may have a plug-in with this ability. We term this ability an "Asker", as was also used above. An Asker asks (sends a query to) a Checker for information.
In one implementation, for each link to be verified, the Asker finds the link's base domain, D. Then, for each different D in the page, the Asker makes an array of the URIs containing it. For each D, it sends to a Checker at D a request. Which contains the URI of the above page, and the array of URIs that point to D. The Checker "might" reply. In its simplest form, the reply is a Boolean array, where result[i]=true if URI i is approved, and false otherwise. We say "might" because the Checker could be overloaded, and so deliberately discard some requests, in order to handle its workload. Or, it could have policies restricting how many requests it will handle from a given network address or network neighborhood. In part, perhaps to ward off Denial of Service (DoS) attacks involving repeated requests from one or more network addresses.
While we consider the typical case where the linked object is a page, in general it could be any network addressable object. Recall the above example of an applet written in Python. It had an URI, so it could be used in the context of the Checker.
This link verification is expected to be mainly for outgoing links. It can also be used for incoming links, which are typically for loading images onto a page. An important case is where the incoming link is for downloading a file. It is a common mode for the spreading of viruses that they masquerade as innocuous files from a trusted domain. Our method offers a means to reduce such spreading.
In general, an HTML opening tag, like <body>, might have a custom attribute, e.g., <body ask>, which means that for the scope of that tag, all enclosed links and email addresses should be verified, by default. This saves putting an explicit "ask" in every enclosed <a>, for instance. Along with this, a given tag that contains a link might have an exempt flag, e.g., "askNo", to exempt that link from verification.
An Asker might have the option of being able to try to verify all links in a page, whether those have been designated as verifiable or not. The user might thus be able to choose this option for extra information on the page's links.
As with the discussion on email addresses, the web server publishing a page with verifiable links might pre-verify these and change the page to indicate the results, in some manner. Likewise, it might put buttons into the page, next to each verifiable link, to implement the above testing.
Suppose a Checker were to store, for one of its given pages, or email addresses, the popularity of each entry in the PL for that page (or address), as measured by how many requests it gets for verification of those entries. Then, it might choose to reveal to an Asker, what the most popular or least popular entry is. Or even the entire list of partners for that page (or address) and the number of requests for each entry. The Checker might also reveal this to other programs, asking over the network.
One possible use might be if Jane uses her browser (with an Asker) and goes to a page, Gamma, that points to a page, Beta at the above Checker. Suppose the Asker finds that Gamma is indeed in Beta's PL. Jane might be interested in blowing what are the most popular other approved pages that point to Beta.
Where those pages might be in domains completely independent of Gamma's domain and Beta's domain. An example of "collaborative filtering", that could enhance Jane's browsing experience.
Note that privacy issues need not be a factor here. The addresses in Beta's PL that its Checker chooses to reveal to the Asker (and hence Jane) are presumably those of publicly accessible web pages.
Suppose the author of Alpha is investigating pages at Beta's website, possibly to link to them, as she is imagined to do with Beta. Several useful features can be done on that website, to help her easily what links might be verifiable. Each file can have a parameter, here implemented as an attribute - <html mightApprove> or <html wontApprove>. These values are mutually exclusive. The latter says a verifiable link to the file is moot, because the Checker will never approve it. So imagine editing software that is being used to write Alpha. If the author puts in a link to Beta, the program can go across the network and get Beta and look for the existence of the above attributes. Hence, it can indicate in some manner via its user interface, whether she should even put in a verify attribute.
Along these lines, the files index.htm and index.html can have the extra optional attributes - <html mightApproveDir> and <html wontApproveDir>. Mutually exclusive, and they apply to other files in the directory. Hence, using wontApproveDir here means that it is unnecessary to write a wontApprove attribute in the other files. While a mightApprove in one of those files will override this directory setting.
Also, the files index.htm and index.html can have the extra optional attributes - <html mightApproveSubDir> and <html wontApproveSubDir>. These apply to subdirectories recursively.
To reduce network traffic, and burden on Askers and Checkers and web servers, we can also do the following. By reading one of the above index files, it can tell us which of the files in the current directory might be approvable, without us having to inspect each file. So index.htm or index.html might have a list of such files -
<mightApprove>a.htm</mightAppiOve>
<mightAppiOve>bee.html</mightApprove>
This is especially useful if the files and directory are dynamic. Because a user cannot typically use a browser to find a listing of all the "files" in the directory.
Similarly, the index file might have entries like these, to indicate which subdirectories could have files that might be approved for linking -
<mightApprove>./dir 1 /dir2/</mightApprove> <mightApprove>./apple/pear/peach/</mightAppiOve>
While the labels of these attributes are arbitrary, the use of these permits a compact way to indicating which files are offlimits to verification, and which might be verifiable, across an entire directory tree.
Or course, when we above put custom attributes into the <html> tag, this was a logical but arbitrary choice. They could have gone into the <head> or <body> tags, for example. Another possibility is that these are written not as attributes of a tag, but between the <meta> and </meta> tags. Or it might be done in addition to written them as attributes. This could be useful because search engines use the meta information as descriptive metadata about the document. So one might be able to use a general puipose search engine to find resources implementing our method.
3. Check Link Labels
It is also possible for the label on a link to be compared against data in the linked page. Why is this desirable? Because it enables a further optional programmatic maintenance. On a page, Alpha, with a link that points to Beta, some visible text is written as the label of the link. The content of this text is completely arbitrary, and need have no correlation with the content of Beta. Hence, Alpha's author might perhaps mislead the reader. So imagine three levels of testing -
1.Check links, as described in section 2. 2. Check link labels. 3.Do the first two checks.
To check link labels, consider for example,
<a href="http://32. somewhere. com/d. htm" ask askLabel="title" >A page title</a>
Here, there is a custom attribute, which we call "askLabel". It has a value, which is the name of a tag in the page given by the href value. The Asker goes to the page and looks for a tag with the askLabel's value. This value can be that of an HTML tag or of a custom tag. If it is of an HTML tag, then the value is case insensitive. Because though official HTML style guidelines recommend that HTML tags be written in uppercase, most browsers accept any case combination for these tags. If the value refers to a custom tag, then it could be case sensitive or not.
The askLabel value has an optional special meaning when it is of the form askLabel="a-xyz". This means that at the linked page, it refers to a tag with <a id="xyz">.
If the askLabel is missing, and the Asker is checking labels, then the askLabel can be assumed to be present, with a default value of "title". This refers to the HTML <title>, which is optionally present in the <head> section of an HTML document.
Optionally, but preferably, the only valid HTML tag name accepted by the Asker for askLabel would be "title". Other names of HTML tags, like "body", are not well suited for a natural parsing into a label. Whereas title's role is as a one line descriptive string that usually appears at the top of a browser, for the page shown by the browser. Hence, it is a natural candidate for use in our method. But there may be instances where the author of Beta wants to make a string available for use in our method, and have this different from the title. Hence she can put in a custom tag. We suggest "askLabel". So that Alpha's link tag would include 'askLabel="askLaber", and Beta would have
<askLabel>some string goes here</asl<Label>
Then, for checking link labels, the Asker could do the following. It loads Beta and looks for opening and closing tags with the name given by askLabel's value. If the tags are missing then the link label is unverified. Else if the tags exist and the enclosed string is not the same as the link's label in Alpha, then the link label is unverified. Else the link label is verified. The string comparison should be case sensitive.
This ability for an Asker to check links and link labels is useful in preventing this: The author of Alpha writes an initial version of Alpha and then informs Beta's author and asks for approval. If the latter does so, afterwards Alpha's author changes the link label to something else, that might be misleading. And that, if Beta's author had seen this new value, might not have approved of it.
Note however that there may be times when it is only needed to check link labels. In this case, as shown above, a Checker is not needed. Perhaps where Beta's site lets anyone link to Beta, and does not perform approval of those links. (The current default on the Web.)
An Asker could also have the extra ability to find the correct value of a link label, from Beta, and then, if the value in Alpha is wrong, to produce a new Alpha page, with the correct value. And also perhaps to be able to store the new, correct page. If the Asker is on Alpha's website, and being run by someone with write authorization, then she might tell the Asker to write a new, correct Alpha.
As an example of what checking link labels can enable, consider if Alpha is a dynamic spreadsheet. It has a table, with several cells having links to pages at other sites. Where the technique of this section is used to have the link values be derived from special tagged values on those pages. The person who views Alpha can have an Asker that can dynamically verify the accuracy of the table's cells, and possibly correct any wrong values. Of course, it should be expected that the response of Alpha to the user changing parameters might be slower than that of a conventional spreadsheet. But the advantage here is that Alpha has a dynamic dependence on remotely distributed inputs. And the people maintaining those linked pages can just use standard HTML to maintain them. With custom tags to isolate the values that Alpha will use. Possibly, their websites might run Checkers, if link approval is seen as a desirable feature in Alpha, by those who will use it.
This can be generalized. Consider a page Alpha that has links and link values which can be verified and, optionally, updated if any values are wrong. The page might have parts which are functions of links or values, where these functions need not be numerical functions of (numerical) values. On the client side, the functions might be implemented as scripts. Or the functions might be done by Alpha's server. Our method makes this possible, as a simple extension to HTML.
We can extend the case of values in Alpha coming from Beta, to where these values in Alpha are not link values. Imagine Alpha having the following -
<askLabel link=Beta id="garlic">garlic ice cream</askLabel>
This means that in Beta, there is a tag called "garlic", with some value between its opening and closing instances, that should be the same as the above value in Alpha. The Asker might have the ability to show to a user such values in a distinctive manner in Alpha.
For convenience, if there are many such values, then we might have notation such at the following -
<af:O xmlns:af=Beta />
<af:garlic>garlic ice cream</af:garlic>
<af:poiτidge>more ponϊdge</af:porridge>
The notation is arbitrary, but it mimics and is a simplification of the standard XML notation for denoting different XML namespaces. In the above, "garlic" and "porridge" are assumed to be the names of tags in Beta. The notation also lets us intersperse values from different linked pages.
Hence, the above checking of link labels should be generalized to include checking the values of custom tags in a page, where those tag values are derived from tags in linked pages.
4. Alternative to a Checker
There is an alternative to a website running a Checker to list approved links to its pages. It does not give all of the Checker functionality described above. But enough can be done so that some might consider this instead. Imagine a page Phi, to which other URIs, (d_l ,...,d_m) link to it, where in general these are on other websites. Assume that the author of Phi knows of and approves those links. She can embed tags in Phi, that list the approved links. These could be in some standard format, for example -
<approved> <app-i>http://orange.pear.com:83 l/cgi/d3.htm</app-i>
<app-i>https://3. somewhere, co. hk/test/h3/dataListing.html</app-i> </approved>
If these are outside any HTML comment tags, then they will still be invisible in the display of the page. Or these might be put within comment tags. In the latter case, there is no strict need to use XML-style tags. But given the flexibility of XML, and the presence of many robust XML parsers, we recommend that any such information be expressed in XML.
Of course, publishing the approved links in the page has scaling limitations if there are many links, because the page will take longer to download. But some websites might consider it acceptable, especially if they only intend to approve only a few links per page.
On the other side, consider a website W with a page Rho that is in Phi's PL. Ws web server or Asker might go to the link in Rho that points to Phi, download that page, see Rho in the PL, and then amend Rho to include some indication that the link was verified, as well as possibly put in a button to allow verification. Though, as earlier, it would probably only follow the link in Rho to Phi if the link had tags to indicate that it was verifiable.
For reasons of efficiency, if we are using an ask tag, then it could have an attribute to indicate that the PL is in the page itself, like, for example -
<ask inline> ... </ask>
Here, "inline" tells Ws server to first look in the linked page. While if we are using an ask attribute, we might just use an extra attribute, like, for example -
<a href="http://t.somewhere.com/l .htm" ask asldnline> Click </a>
It is also possible for several pages at a website to have a common set of approved links to them. This common set might be held in a file separate from each of those pages. This might be in addition to a given page also having in it specific approved links not in that file. The arrangement here is similar to Cascading Style Sheets, and how they are used to factor out presentation elements common to many HTML files. Of course, CSS refers to the visual representation of web pages, while here we are discussing other types of data.
An important special case of the above is when one page has its approved links in another file. This can have the benefit of reducing the size of that page, for downloading, especially by a standard browser. While the second file need only be accessed by an Asker.
A website that hosts pages with inline PLs can also run a Checker. The simplest way is that the files with inline PLs are not dealt with by the Checker. Another way might be that a file might have an inline PL5 and then another PL in another file, as discussed in the previous paragraph, and then the Checker might be consulted. In the file might be a code defining the order of lookup, which could vary from that example in the previous sentence. An analogy can be made to the file /etc/nsswitch.conf, which is found in many unix or linux machines. This defines the order in which the databases for various system processes are searched.
Note that the presence of inline PLs gives an incentive for a future browser (or plug-in) to implement a bi-directional web, by allowing the display of such PLs if they exist. As explained, currently the above method of writing a PL in special tags renders them invisible to a standard browser. Browsers have always been designed with this ability. This is highly desirable, because it means that an implementation of our inline method does not cause any visual change to existing pages that are being linked to. But it is clearly foreseeable that a future browser might arise to take advantage of the extra information in a PL, and display it in some manner, and let the user select links in it.
One simple implementation might be for that browser to make a new page with 2 frames. Then it puts the page to be displayed in one frame, which is the main frame. The second frame shows the PL from that page. Clicking on a link in the PL causes that new page to be fetched and shown in the first frame. And the second frame gets erased if the new page has no PL, or it gets the PL of the new page.
In the above example of the <approved>...</approved>, the web server might still keep tabs on the requests for each entiy. Hence, even for static files, it might periodically overwrite these, by using a simple XML extension of the example, to write the number of requests for each entry. If so, an Asker or browser might have the ability to read and display this extra information.
5. Trusted Website or Plug- in
An important point is that for verifying email addresses or links, Jane asked the web page to perform the verification. This is fine if that domain is well known and reputable. There are many possibilities. The domain might be one that lets people contribute content, where in general those people are not employees or contractors of the corporation that owns the domain. Like an auction site, where sellers write the descriptions of what they are selling. Some auction sites might let sellers write HTML descriptions, and that these could contain links to external domains.
Other examples might be domains that host blogs or reviews or act as bulletin boards, and where users can write HTML content. Currently, these do not offer our verification method. Plus, while some may do a manual scrutiny of submitted content, before allowing it to be posted, others may perform a programmatic scrutiny (e.g., looking for "bad" words, whatever those are deemed to be), or none at all. In any of these instances, there is still value for the website to offer our method, to give their readers more options in trusting what is posted. Plus, for the website to use our method involves some simple programmatic changes. But, once those are done, no manual steps, except perhaps on an occasional basis.
For all of these websites, there is a caveat. If users can write their own HTML, then a phi slier user might write a verify button that does not actually verify. Like indicating that an email address or link has already been verified, when it has not. Thus, in the web page, the user should not have the ability to write all of it. The website might confine the user's content to a frame, for example, and the website writes its own content outside that frame. The latter having a verify button that operates on the user's content.
But what if Jane is browsing at an unfamiliar website? It might be a pharm. And thus, any verification buttons, and any visual indication on the page that an item has already been verified, could be fake. Hence, Jane might have a plug-in to her browser that is an Asker, which directly does the above query of the Checker, and displays the result.
6. Reading Email
Thus far, we have discussed using a browser to go to a web page at an URI Theta, say, and then verifying an email address or link in that page. A very important special case is where Theta is a page at an ISP. In other words, a user, Jane, logs into her email account at somewhere.com, and proceeds to read her mail. Typically, at most such message providers, when she is reading a given message, the URI has the base domain, somewhere.com, and then to the right of it, what amounts to a unique string that identifies the message currently being shown in her browser.
So if Dinesh sends email to jane@somewhere.com, with verifiable links to his pages at t.com, say, then he should add "somewhere.com" to his PL. Since in general he cannot know the actual full URI of his message, when Jane gets to reading it.
When Jane reads a message with verifiable elements, she should not use any buttons within it that claim to verify. Because these could be bogus. Instead, she should use a verify button provided by her ISP's mail reading interface, if such a button exists. Or she might use an Asker plug-in with this verify ability.
However, this has the drawback that a spammer or phisher could easily forge such a mailing, if she knew that a bank, say, had sent out mailings to users at somewhere.com, and used a verifiable link, and put somewhere.com into its mailing PL. She might send out a mailing to that ISP, using a similar verifiable link, where the mailing has a From address of the bank. But here, the content would be misleading in some fashion.
Given the above, the following is an immediate extension, that offers stronger protection against spoofing. Let Dinesh's address be dinesh@isp.com. His ISP might offer the ability to let a recipient verify that he sent a message. It could offer two "send" buttons. One is the normal sending of a message. The other allows the verification. It adds a field to the header, say, "X- Verify: true". Then, it records in a database that the user dinesh sent a message to "jane@somewhere.com", with a subject line of "Meeting next week?", for example. Optionally, it might also record other data about the message, like the date. But, for space reasons, it should not save the entire message.
When Jane reads the message, she sees the following:
From: dinesh@isp.com To: jane@somewhere.com Subject: Meeting next week?
Date:
X- Verify: true [etc]
She presses a verify button provided by her ISP or plug-in. It notices that there is an "X- Verify" flag being true. So from the From line, it finds the base domain, isp.com, and sends it this query -
<ask> <from>dinesh</from>
<to>jane@somewhere.com</to> <subject>Meeting next week?</subject> <! — perhaps other fields here --> </ask>
The Checker then replies true or false, or codes to that effect, if its records show that the user dinesh performed the above.
In passing, we point out one aspect. We have now used "ask" in 3 different ways in tags. One was as the name of a tag that brackets an email address or link (incoming or outgoing). The other was as the name of an attribute within an HTML tag, like <a> or <img>. The third is above, as the root tag of an XML message that goes to the Checker. It is not a necessity of our invention that all these names be the same. The choices of these names is arbitrary, subject only to the requirements that in an HTML message, a name of our tag must be different from any HTML tag name, and that a name of our attribute inside an HTML tag must be different from any HTML attribute of that tag. We deliberately make the choice that the names of our tags and attribute be the same, for conceptual clarity. Hopefully, the context of any given usage will not cause confusion.
However, a spammer could still overcome this requirement for extra information. Thus, a stronger variant of the above is when Dinesh sends out the message, his ISP finds a hash of the body. Then, at the recipient's end, the Asker can also hash the body and send the hash in the query. Optionally, the hash might be of the body+from or body+from+to+subject or similar arrangements.
It is important to note that the above X-Verify should not be used by the receiving ISP as an antispam indicator. Or, if it is going to do so, it should first apply most if not all of its other antispam filters. Because if this flag becomes widespread in messages, and major ISPs use it to suggest that the message is not spam, then spammers will soon append it to their spam headers. Along these lines, it is perhaps inadvisable for an ISP to pre-verify messages with this flag, by making network connections to those Checkers, asking about the messages. Because spammers can try to swamp the ISP by sending it spam, deliberately to trigger those network requests. The bandwidth and latency of which can be costly to the ISP, if it performs many such queries.
A better approach is that the recipient gets to decide whether to verify a given message or not. This is a manual operation, and the human reaction time limits how many queries a person can ask for, in a given time period.
Also, there is a continuum of spammers. Some are more "respectable" than others. Perhaps in what they sell, or perhaps because they do not forge items like their sender fields in messages. So they might actually install a Checker on their domain, and make the above records for their outgoing messages. Hence a recipient could actually verify that the message came from that particular spammer. But such messages might still be regarded as spam by the vast majority of their recipients.
Notice that we recommend that an ISP not pre-verify its incoming messages with the X- Verify, while earlier we suggested that a website hosting web pages written by its users could pre-verify any such verification tagged pages. The reason for the difference is that spammers can easily generate millions of messages. So that currently, some of the largest ISPs in the world estimate that they each get over a billion spam messages a day. The marginal cost of making an extra spam message is effectively zero to a spammer. Whereas for web pages, manual creation is very labor intensive. While one could imagine web pages made by a program, many web sites might place constraints on the number of submitted pages per day. Aside from which, there seems to be no incentive for anyone to make millions of web pages. There are certainly fake web pages, in the sense of link farms, and fake web sites (pharms), which is one of the motivations for our method and earlier Provisionals. But these involve relatively few pages. The intent is to drive as much traffic to those few pages as possible. 7. Approving a Linking Page
Thus far, we have said that the author of Beta can use the Checker to see a list of which other pages link to Beta, and which desire an approved link. That is, the approval or not is a manual operation. But the Checker can optionally perform steps to present the author with extra information, that she might use in her decision.
For example, the Checker can analyze the linking page, Alpha. It can extract the other links and their base domains. And test if any of these are in a blacklist. If so, then it might count up the number of such links. This information can be given to the author. Because if Alpha is suspected of being associated with a spammer domain in the blacklist, say, then perhaps Alpha's author is trying to burnish Alpha's credibility by having an approved link to a good page, Beta.
Similarly, the Checker might spider to find other pages at Alpha's domain. While this entails more work, it can all be done programmatically. Then, the Checker can apply the above test to those pages, and report these.
Plus, the Checker can check Alpha's base domain itself, to see if it is in a blacklist. Or possibly in a neighborhood of an entry in the blacklist.
It might also run various heuristics, including but not limited to Bayesians, and neural networks, against Alpha and other pages in its domain, to get an estimate of the content of that site. And then present these to the author of Beta.
8. Partner List Extensions
Consider a web page at URI Alpha. It has a verifiable link to a page Beta at another domain. The Checker at Beta's domain has this pairing - (Beta, S) where S is a set of pages, including presumably Alpha, that link to Beta, and it approves of those links. Let Gamma be another web page that has a verifiable link to Beta. Suppose Gamma is not in S. As formulated above, we cannot logically distinguish the reason for this, which could be -
1.Beta does not know of Gamma linking to it.
2.Beta knows of Gamma linking to it, and does not approve.
Item 1 deserves comment. Earlier, we discussed a procedure where the author of Gamma, after having made the verifiable link to Beta, would then communicate with Beta's author, in the hope that the latter would put Gamma into its PL. Yet the communication might not happen. Maybe the author of Gamma simply forgot. Or, she had no intention of doing so. Perhaps she is a phisher, who is putting in a verifiable link. But hopes that few readers will actually do the verification. This is similar to how she might write fake electronic seals on her pharm website, in the reasonable expectation that most visitors won't click on those. ["5804"]
To resolve the above ambiguity as to why Gamma is not in S, we extend what the Checker can hold for one of its pages, Beta. It has 4 sets, (Beta, [A5B5C5D]), where
rA = URJs of pages with verifiable links to Beta, that it approves of. rB = " " " that it disapproves of. rC = URJs of pages with (normal) links to Beta, that it approves of. rD = " " " that it disapproves of.
Sets C and D might be found by perhaps consulting a search engine to find which pages it knows about that link to Beta. As discussed above, so too might some pages that go into A and B, if the authors of those pages do not contact Beta's author for inclusion.
Any of these sets might be empty, perhaps because the Checker did not decide to hold such data. It is also possible that two nonempty sets might have contradictory information. Imagine if A had a page Chi, and so did B. This indicates an error, perhaps in the manual construction of those tables. A Checker might optionally, but probably preferably, have algorithms to detect such contradictions and alert its operators, and possibly not promulgate such results to the network. Any Asker that asks a Checker should be aware that the latter might not run such checks. So the Asker might have some optional filters to detect such contradictions. There is an element of redundancy here. But in a distributed network, it may be necessary.
Suppose a query (Beta, Alpha) comes to the Checker. It wants to know if Alpha is in any of the above sets. As one implementation, we can imagine a 4 bit reply, where
[-bit 0 = 1 if Rho is in A5 = 0 if Rlio is not in A.
[-bit 1 = 1 if Rlio is in B, = 0 if Rho is not in B. rbit 2 = 1 if Rho is in C, = 0 if Rlio is not in C. rbit 3 = 1 if Rho is in D, = 0 if Rho is not in D.
Clearly, the choice of associating set A with bit 0 is arbitrary. (And similarly with the other sets.)
Another implementation might be that the reply is stored as characters. In which case, a natural implementation would be to map the above 4 bit replies into the hexadecimal characters, '0'-'9' and 'A'-'F'.
The above choice of 4 sets being voluntarily filled by each Checker means that our method could be independently implemented by servers for different domains. With some servers only finding the (default) set A, and others also perhaps finding the other sets.
(Related to this is an optional extension to the query syntax, (Beta, code, Alpha), where "code" refers to which of the sets A-D that the query wants tested for the presence of Alpha. At the simplest level, it might be a bitmap of 4 bit positions. We refrain from writing this as an XML implementation, but such a one should be straightforward to imagine.)
Now suppose that the verifying code that is looking at Alpha, got back a result from the Checker. With bit 1 = I5 or bit 3 = 1. These indicate disapproval. The code might perhaps turn off the link, and warn the user. In some manner, the code might indicate a severer warning than if merely bit 0 = 0, because in this case it has specific information about disapproval from the linked page.
Note also that there could be 8 sets associated with each page. This arises if we choose to distinguish between links that go out and do not retrieve data, and links that do. (For the latter, imagine an outgoing link that loads an image file.) So that set A splits into 2 such subsets, and likewise with B, C and D.
For brevity, elsewhere in this invention, when we say we add an item to a page's PL, unless otherwise qualified, we mean that we add it to the page's approved PL.
9. Asker Extensions
Consider an Asker. It might reside as a plug-in in Jane's browser. Or in the web server of a website. In either case, suppose it has one or both of a whitelist and a blacklist of URJs and domains. The Asker might have options that are shown to, and adjustable by Jane, that control what it asks for, when looking at a page or message. These include, but are not limited to -
1.Only verify for links in the whitelist. 2.OnIy verify for links not in the whitelist. 3.OnIy verify for links in blacklist. 4.OnIy verify for links not in blacklist. There might also be combinations of using both lists. Related to the above, an Asker might adopt the policy of not verifying any page (or message) with a link to an entry in its blacklist.
Optionally, the whitelist might come from antiphishing methods, including those in our Antiphishing Provisionals. Also, the blacklist might come from antispam or antiphishing methods, perhaps performed by one or more ISPs.
When the Asker is scanning the web page or email, to find links, some of these might have a static address written into them, but the link would actually go • to an address dynamically calculated. Typically, by invoking instructions written in a scripting language, where these instructions are part of the page. This is client-side scripting, where the author of the page or email expects the user's browser to run the instructions and go to the address made by them. It can be anticipated that spammers or phishers might attempt this, to evade various antispam or anti-phishing methods. Accordingly, we can use our techniques of "1698" here, in order to attempt to find the actual address that the browser would go to. Here, we have a slight extension of "1698", for that concerned itself with electronic messages, while now we broaden it to handle web pages.
Imagine a future hypertext markup language (like a future version of HTML) where a tag can contain links to several addresses. Where this might be for some failover ability - so that if the first address does not reply, the second is queried and so on. The Asker can thus also have the ability to call Checkers at those links, possibly in the above order of normal usage of the links.
An Asker can have methods to prevent auto verification. Suppose Dinesh has a page at the URJ Theta = http://a.b.com/c/d/e.html. It has a verifiable link to a page Omega = http://a.b.com/c/d/f.html. Both URIs refer to the same domain, and within it, to the same directory. If Dinesh has write access to Theta, it's a reasonable assumption that he has write access to Omega. In this case, the verifiable link probably has little significance. Note that we draw a distinction between this and the case where Omega is at a completely different base domain. An important usage of our method is to let a user demonstrate that he has write access to web pages at two different base domains. Thus, an Asker can have these optional abilities -
1.It will not verify if Theta and Omega are the same.
2. " " at the same domain.
3. it it at the same base domain.
The last choice above might be the default setting. The Asker1 s user might be able to see its choice, and to change it. An Asker might also be able to show the user any links that violate this test.
An Asker might have an option, selectable by its user, of testing the validity of any digital certificates on the Checker. Note that there is little need for an Asker to be able to do this with the page that is currently being viewed on the browser. Because most browsers can already do this.
An Asker's query might be formulated as an HTTP GET or POST (instead of using the XML format we have described). And sent either to the same port at a Checker that is used for HTTP, or, preferably, to a new port. This might be if the implementation of that Checker is compatible with that of a standard HTTP web server. Its reply would be in HTTP and enclose an HTML message. In such an event, the information going to and from the Checker would be functionally equivalent to what has been discussed earlier.
How can an Asker know which format a Checker might want? For one thing, a given Checker might support both formats, so that this is moot to an Asker. Alternatively, an Asker might have prior knowledge of that Checker's preferred format. Perhaps from previous queries to that Checker. Where the Checker might return a message indicating its preferred format, and the Asker stored that information about that Checker, for use if the user were to want another verification from the Checker. Another approach might be that the author of the page, Alpha, which points to a verifiable link to Beta, might use prior knowledge of Beta's Checker preference, and encode this into the link. For example, as <a href="...M ask askMode="http">, where the other value for askMode might be "ws" (for Web Service). The latter might be the default, and is implicitly what we have been using. If the value is this default, it might be omitted for brevity.
Apart from these two askMode values, there might in future be other values. Which could designate other modes of communication between an Asker and a Checker.
Suppose an Asker is associated with a website, rather than with a plug-in. It might keep statistics on which are the most popular of its pages that users verify, and associated data.
10. Checker Extensions
Refer to our earlier discussion about extensions to a PL for a page Beta. Suppose the PL has entries Pl ... Pn. Some of these are pages with approved links to Beta, and some are unapproved. Associated with each is an optional comment string and an optional string that is a network address, like an URI. Then, when the Checker replies to an Asker, if the referring page is in the PL, then it could also return the associated comment and network address, if these exist.
The Asker might/should have the means of displaying the comment and address to the user. Plus a means of letting her easily pick the address and have the browser go to it.
These optional adjuncts to the PL are useful. Imagine an address, Alpha, in the PL that is unapproved. Let Beta be a page at a website of a bank, bankO.com. Suppose Alpha is written by a critic or competitor of the bank. And Alpha's author included a link to Beta. This link need not necessarily be marked as verifiable. Now suppose the bank has learnt of Alpha, possibly via a search engine. Then the bank might want to write a page, Gamma, that is a direct reply to Alpha. So Beta's PL now includes ("disapprove", C, Gamma), where C is a comment.
Note that in this example, we do not consider Alpha to be a page on a pharai, that might be pretending to be the bank or an affiliate of the bank. In such latter cases, the bank would take stronger measures, like contacting the government and Alpha's network service providers, to have the pharm taken offline. Rather, Alpha is imagined to be a validly posted page, that has a right to be posted. If so, then the bank has the same right to reply to Alpha, and our technique here lets the bank easily do this.
Suppose the bank's web server gets a request to serve the page Beta. And the
"Referer" [sic] header line in the http request said it came from Alpha. Then the server can use Beta's PL. It finds that Alpha is a key in the PL, where we imagine the latter implemented as a hash table. Corresponding to the key is the above triplet. Hence, instead of returning Beta, the server returns Gamma. Now, many current web servers have an ability to customize what page they return, based on the page that the requester's browser came from. But what we have shown here is how our method can produce a PL that has several uses. Not just in our method, but also by the web server.
We have previously considered a Checker to be located at a base domain.
Imagine that a verifiable link in a page pointed to a URI has a domain one.two.base.com. Previously, the query was considered as going to a Checker at base.com. An alternative is that the query go to a Checker at one.two.base.com. If this does not exist, then a query might be sent to two.base.com, and if no server were there, then a query might be sent to base.com. Consider a website W, with users who login to it. A user, Mary, can maintain a Partner List. Each item in the PL is a URI or (in general) a network address of somewhere on that network (usually the Internet) with a page that links to Mary's address at W. And Mary approves of that link.
W runs a Checker. It gets a query. An example might be
<ask> <!— example with a usemame — >
<user>Mary</user> <url>Phi</url>
</ask>
or
<ask> <!-- example with an alias --> <alias>Zoltan the Crafty</alias>
<url>Phi</url>
</ask>
Here, Phi is the URI of the outside page, with a link to Mary at W. She might have an alias, as shown in the second example. Assume that both her user name and alias are unique, amongst Ws users. In the above, there might also be several URIs in the ask. Assume one, without loss of generality.
The url tag might have an attribute to ask about the expiration time, if it exists. Like for example,
<url expire>Phi</url>
At W, Mary might also have various attributes, and these might be asked for. Like for example, asking for all attributes she might have - <ask>
<user>Mary</user> <url>Phi</url> <attributes/> </ask>
Or, one might ask for the values of specific attributes, where it is assumed that we already know the names of those attributes -
<ask>
<user>Mary</u ser>
<uiϊ>Phi</url>
<attributes>
<i>Experience</i> <i>Hit Points</i>
<i>Gold</i> OBottle of Poison</i> </attributes> </ask>
When Mary had earlier logged into W and added Phi to her PL, she could also list the names of attributes that Ws Checker can reveal to Phi about her. Likewise, the Checker might also run a filter that prevents it from revealing some attributes, like Mary's real life address if she is a child, even if she specifically authorizes this to Phi. Along these lines, if Mary does not specifically list for a given URI, which attributes can be revealed to it, she might have a default list prepared. And the Checker can also have various default lists for its users.
The reply by the Checker might list the attributes in various ways. One is to have the order implicitly be the same order as the attributes in the ask. So we might get ... <attributes>
<i>Raw Beginner</i> <i>-3</i> <i>15</i>
<i>0</i> </attributes> ...
Or the reply might also list each attribute name and Mary's value for it.
The query to the Checker, and its reply, are analogous to http.
Consider possible usages of the above. Suppose Mary is the author of web page Phi. In it, she claims to be a player in a Massive Multiplayer Online Role Playing Game located at w.com. She lets the reader verify this, by using tags like
<ask link="w.com:830" alias="Zoltan the Crafty"> Zoltan the Crafty</ask>
The above is only meant as an example notation. (Other conventions are possible.) It is implied that in Phi, there is a button from which the query can be made to W. Or, perhaps there is a plug-in Asker to do this. The gist of the example notation is that there is enough information in it, along with the implied knowledge of Phi, the address of the page, to construct the example query shown earlier that asked about that alias.
Upon receiving a reply from the Checker, it might be displayed in some fashion. Perhaps by changing the Phi page, or by bringing up a new window showing the reply.
Note that this example of ask syntax is different from the first example of this invention, where we wrote <ask>dinesh@isp.com</ask>. Our current example embedded the address of the Checker as an attribute, while this has the address implicitly available by reducing the email address to the base domain of isp.com. In these examples, we assume that the Asker which reads in an ask has logic to handle the different variants.
This example using an alias also illustrates another point. The tags are invisible, but the visible text says "Zoltan the Crafty". In general, this does not have to be the same as that given in the alias attribute. Exactly equivalent to the standard HTML situation of <a href="..."> ... </a> tags, where the visible text need have no correspondence to the href value. Spammers and phishers have used that to try to fool users. In our method, optionally, we could apply the steps in Section 3, to verify the link label.
It is useful that our method lets Mary allow her readers verify for themselves her claim to be a player in Ws website. Burnishes her credibility. This can be helpful in several contexts. Some people make a living by joining a game site and creating characters with experience. Then, they go to an auction site, and offer to sell off their character. This may attract a buyer who does not want to start from scratch tediously building up experience. Some game sites prohibit this. Others ignore it.
Our method lets a game site condone it, and by operating a Checker, facilitate such transactions. The reason is that currently, on an auction site, a potential buyer has to take a seller's word about whether she is even a member of a game site. Let alone how much experience she has amassed with her character. The main protection the site offers is usually a feedback rating of that seller for previous auctions.
There is much wider scope, even in the context of a seller in an auction.
Imagine that the seller, Mary, is offering her consulting services, perhaps as a network engineer. She might claim certification by various networking companies. Certainly, a potential employer might imagine emailing one of those companies, and asking if "Mary Jones" was certified by it on 8 August 2003, as she claims in her resume. The company might reply, yes or no. But on what time scale? Plus, this manual effort by the company might well be seen by it as a cost burden. Our method allows for a rapid programmatic verification, triggered by a manual query by a potential bidder.
Under one implementation, the company might let its graduates be members of one of its groups, with network access. Mary can write up her auction description and post it, and find its URI. Then, she can login to her special (limited functionality) company account and add that URI to her PL. Here, the attributes are not directly changeable by her - like her graduation details. The company writes these attributes. But this can be done once, at graduation. From then on, there is little manual intervention needed by the company. Naturally, it might choose to charge Mary for this extra, post-graduation service.
It also offers an incentive for the auction site to add a verification ability to its interface. Because it helps reduce the incidence of auction fraud, and hence should attract more bidders.
More generally, if Mary posts her resume, on her personal website, say, then she might add in similar verification abilities. Universities and schools that she attended might also implement our Checkers. The context of resumes is one that might benefit greatly from our method. Currently, if Mary posts her resume, virtually all the details are unverified. Some resumes do nowadays give email addresses or physical addresses or phone numbers of references. While useful, all this involves manual effort by a potential recruiter. This has also given rise to a speciality of companies occasionally using investigators to check an applicant. Our method expands the scope of what can be automated and indeed perhaps what should be. It also gives Mary a chance to distinguish herself from others who post their resumes online. By offering readers the chance to verify in real time, she stands out from others who do not. Given the often competitive nature of job searching, a job site thus has incentive to enable this front end query ability for its users, and possibly charge them accordingly for it.
We offer a simple example here, where Mary's resume web page has the following entry -
<a href=Beta ask askLabel="A83 ">Mary Jones B. Arts (June 1990)</a>, University X.
where Beta is a link to a page at uni-x.edu. This page has -
<G30>Joshi Aggarwal B. S. (April 1985)</G30> <A83>Mary Jones B. Arts (June 1990)</A83> <D78>Harry Runge B. S. (Dec 1989)</D78>
The Asker checks that the degree and date in her resume matches that in the official listing. Plus, a Checker can confirm that Mary's link to Beta is approved. This is necessary, otherwise someone else claiming to be Mary might put up a fake resume page. Here, the Checker approval follows some steps that Mary is assumed to have done, outside the scope of this invention, to prove her identity to the university. The checking of her link label prevents her from "upgrading" her degree on her page, to a Masters or PhD, say, after her page has been added to Beta's PL.
Note in the above that the tag name in Beta that is associated with Mary, A83, should be unique within the scope of Beta, and should have no other significance outside that page. Specifically, it should not be her former student identification label. Remember that Beta is presumed to be publicly readable. So long as the tags follow this recommendation, then the tags do not give out any personal information.
In essence, our method envisions that any organization with information about its members or past members, might choose to implement it, to better serve these members. In our method, the ongoing manual effort is by members logging in to maintain their PLs. Furthermore, the information that is made verifiable is done so by the explicit activity of those to whom the information refers. Which addresses any privacy concerns.
Our method also lets any individual (or organization) that might be considered an expert on a topic, or whose opinion is valued, to offer the equivalent of an economical electronic seal. Currently, those are issued by "Validator" companies that specialize in validating other companies. ["5804"] Typically, a Validator charges a customer company a sizable fee for performing that assessment. The customer then gets, amongst other services, the right to post an electronic seal on its website. Often, this seal is clickable and goes back to the Validator.
The merit of our method is that it simplifies this ability, and essentially lets any user who has a website or electronic address at a location that runs a Checker, to be a Validator. If a Checker is running, along with Askers in users' plug-ins and possibly in the validated website, then it enables a much simpler to operate, and cheaper, validation.
Of course, how valid is an individual Validator who uses our method? This can be left to the marketplace to determine. Our method also offers a potential source of income for those whose opinions are valued. They can charge other websites for their approval of those websites. This charging might be a flat fee, per unit time, or perhaps a fee that is some function of the number of approval requests received for a given website.
The Checker can have the option of dropping (i.e., not replying to) a request from an Asker. This might be if that Asker has made too many requests in some given time. Or if the Asker is coming from a network address that is on the Checker's blacklist (if the latter exists). Or for any other reason. The main reason for this is to protect a Checker against attacks, like Denial of Service; some of which might involve trying to swamp the Checker.
Hence, any implementation of an Asker should have the ability to recover from sending a query to a Checker, and not getting a reply.
Suppose the Checker has a blacklist of spammer domains. It can have a policy of automatically replying "disapprove" to any ask it gets about an URI Phi in that blacklist that points to one of the Checker's pages (or email address of a Checker's user). This might override any entry for Phi in one of its page's PL or in one of its user's PL.
Carrying this one step further, the Checker might apply other analysis to its page or user, if either had Phi in its PL. In part, it could suspect that the author of the page, or the user, might be involved in spamming or phishing. Suppose, for example, that the Checker works at a website that is an ISP. So a spammer might open a mail account, and on her (spam) domain, have pages that try to verify to the spammer's account at the ISP. This helps protect the Checker/ISP, from possibly being blacklisted by other ISPs, if they associate the Checker's domain with the spammer.
Suppose we have a page with address Alpha, with a verifiable link to address
Beta. And Beta has a Checker. If visitors to Alpha use their plug-ins to verify, then in general, Alpha does not know this. Because a plug-in would use the instance of Alpha that has been downloaded to its browser. And the Asker would talk directly with the Checker, bypassing Alpha's server. But periodically, the Checker might send statistics about those verifications that came to it about Alpha, to Alpha's server. A Reporting Service for which the Checker might bill that server. Which might then combine that information with its various server logs, to try to discern more information about its visitors. This could be valuable if Alpha is one page of a commercial website that performs transactions.
A future web server might incorporate some or all of the functionality of a
Checker that is described in this invention.
11. Ask Space
In ["8757", "1745", "5037", "1899"], for email, we defined a Bulk Message
Envelope (BME) in the data space of email. Each BME would have metadata in various metadata spaces, like domain, hash, style, relay and user. In each of the metadata spaces, we were able to define clusters of metadata, and use these in, for example, antispam applications.
Here, the Ask Space is a data space consists of those ask queries, that go from an Asker to a Checker. Call an entry in this space an "ASK". It has various metadata. Most importantly, the two URIs, call them Alpha --> Beta. Unlike the BME's metadata, these two form a directed edge, in the language of graph theory. From each URI, we can extract their domains and base domains. If the latter are dl and d2, then we have dl — > d2.
It is possible for there to be more URIs in an ASK. Imagine that a request to Beta's web server to return Beta gets to a redirector, which then goes to another address, Gamma, possibly at a different base domain. Then we have Alpha — > Beta --> Gamma. If we reduce down to base domains, and Gamma is in the same base domain as Beta, then as before, we would just have two entries, dl --> d.2.
Other metadata in an ASK might include, but not be limited to -
1.The time when the query was made. (Preferably measured in Universal Time.)
2. Extra attributes in the query. 3. The Checker's reply.
Note that, following our methods of ["8757", "1745", "5037", "1899"], we define a datum to be the envelope or query. While certain salient parts of that query we define to be metadata.
We can use our clustering methods of ["1745", "5037", "1899"] to build clusters here. With the important proviso that now we have, from the directed edges, a directed graph. The methods could be used with the full URIs, or with subsets, like the domains or base domains. Based on our experience with our antispam domain clusters, where these domains are base domains, we suggest doing likewise, to make Ask (base) domain clusters.
It is a merit of this invention that our methods of ["1745", "5037", "1899"] be likewise applied here.
The Ask domain clustering can be used for various purposes. Like observing how user behavior changes over time, as indicated by what items verifications are asked for.
What is the difference between our Ask Space and "Link Space" that can be defined from the links between pages? As mentioned earlier, the latter is a static picture. The Ask Space derives from actual queries for verification.
12. Antispam Extensions In ["8757", "1745", "5037", "1899"], we described the use of clustering across various metadata spaces. In part to enhance antispara methods that detect spammer domains. Suppose we are an ISP or other organization that wants to find spammer domains. Suppose we start with a domain, Alpha, that we have classified, by whatever means, as a spammer domain. We can spider it, looking for links that are verifiable. Let one such link be to an URI Kappa, at some different base domain. We can run an Asker that asks Kappa's Checker if it approves of that link. If the Checker exists and it approves, then we could choose to add Kappa or, more powerfully, Kappa's base domain, to our set of spammer domains. Because we might take Kappa's approval as suggesting that it is also complicit in spamming. This gets around a possibility that Alpha might have regular links to domains uninvolved in spamming, in order to prevent us building up a list of spammer domains, starting from Alpha.
Plus, if Alpha is part of a spammer cluster, then this method lets us extend that cluster. Notice that this could be an example of a cross-Electronic Communications Modality technique. The domain cluster was made perhaps using Bulk Message Envelopes ["8757"], which are derived from single emails. While the extension here is in the space of websites. Two different ECM spaces.
Another way to make a cluster of spammer domains is to find a spammer domain, by whatever means. Then use the method of this invention to find all domains that approve of it, and put those domains and the original domain into a cluster.
Suppose a search engine ("Engine") finds various clusters of domains in Ask Space. It could send (or sell) this information to an antispam organization or an ISP. This could be a continual process, as new lists of Ask Space clusters could be made at regular time intervals, to reflect changing links and user behavior. Consider now what an ISP might do with this information. Let C be a cluster of domains in Ask Space, and let f and g be domains in C5 where f links to g, and g approves of the link. From the ISP's incoming or outgoing messages, it can find clusters of domains, possibly using our method of "1745". Suppose f is in one such cluster, D. If g is not in D, then the ISP might add g to D. This is stronger than merely adding a generic g to D3 simply because f has a web page with a link to a page in g. Because if f is run by a spammer, she might put in spurious links to non-spammer domains. Whereas the affirmation by g is a more positive indicator.
Suppose that instead, g is in D and f is not in D. But since g approved f, and f links to g, then f might be added to D.
In either case of adding to D, the proposed added domain might be subject to an exclusion list maintained by the ISP.
If f and g are already in an existing cluster D, then they need not necessarily be directly connected. So the f — > g association from C can be used to make such a new connection within D.
In any case, when we add such a link, the issue of what weight we should associate with that link arises. The ISP might have various rules in place to make such a decision.
Any new links that are added, or any links whose weights are changed because of the information from the Ask Space clusters, might be designated differently from other existing links. Plus, the new links might incorporate a possible directionality.
The ISP might perform the above actions for some or all domains in each Ask Space cluster, and across some or all Ask Space clusters. After which, it can try to coalesce its clusters. Because a domain which was added to a cluster might already exist in another cluster.
13. User Network Address
Imagine that the Asker is a website, and the user is looking at a page on that website, and then asks the Asker to ask the Checker. The Checker might require from the Asker the network address of the user. So it might want a query like this example,
<ask>
<url>Alpha</url> <!-- page at Asker's website -->
<link>Beta</link> <!-- page at Checker's website -->
<address>20.30.40.50</address> <!-- user's network address --> </ask>
When an Asker starts up, it could scan its pages, looking for the verifiable links. Then it might contact the Checkers at these links, asking, in part, for whether they want this information or not. There is also the possibility that a Checker can also change its policy, and then contact its Askers to inform them of this change. Where perhaps an Asker could expose a listener or Web Service at one of its ports, for this purpose.
Or, suppose that an Asker does not know that the Checker wants this information. So it sends the above query, omitting the address data. The Checker might reject it, with some error message indicating that the Asker should resend, with the address data.
This requirement leads to several consequences. For one, the Asker does not have to give the Checker that information. Perhaps the Asker is making a voluntary policy choice that it believes is protecting the privacy of those using it. This might also be a function of the physical location of the Checker, insofar as that can be determined from its network address. Or it might also be a function of external knowledge about that Checker, or knowledge of the Asker's past interactions with that Checker. For example, the Asker might suspect that the Checker and its website are associated with the dissemination of spam, or the hosting of spam domains. Or5 more broadly, that the Checker is located in a country with a high incidence of fraudulent websites. While it may seem like an Asker is making sweeping generalizations, in general, it may have leeway to do so, whether its assessments are correct or not.
Or perhaps the Asker is located in a region with government regulations that prohibit it giving the Checker that information. Conversely, the regulations might mandate that it hand over such information. In either instance, the regulations may also be a function of the network address or perceived physical location of the Checker. For example, it might be permissible to hand over the information only if the Checker is in the Asker's country.
Likewise, the Checker may have reasons of its own to ask for the user's address. (And we will discuss some of these below.) But in some regions, regulations might mandate that it must ask for such information, or perhaps that it must not. And this may also be based on the Asker's network address or perceived physical location.
In all these cases, our method allows for a policy enforcement on the actions of the parties, where the policies might be set by some combination of government regulation, corporate decisions or individual decisions.
Note that the user's network address is not necessarily that private, with respect to the Checker. Often, the user might click on the link, and thus presumably get the page at the Checker's website. Thus, the Checker or its associated web server knows her address, because that is where it sends the page to. This might happen after she sees the Checker's reply, as shown by the Asker. Or, it might even happen before. Where perhaps she clicked on the link, got the Checker's page loaded into her browser. Then she went back on the browser to the previous page, and asked the Asker for information.
Plus, the user might be going through an anonymizer, which acts to conceal her address. Or she might be using some hardware configuration to do so. Like two Network Address Translation boxes, back-to-back.
Consider now where an Asker could, but does not have to, reveal the user's address to the Checker. The Asker might ask the user to manually approve it sending out the user's address. And apply this. Where, for example, if the user says no, and the Checker requires it, then the Asker might refrain from asking the Checker, and tell the user that the link could not be verified for this reason. Or the user might set a default policy of always yes or always no. Or the Asker might have its own policy, which might not involve consulting the user or applying the user's policy.
The Asker can choose to give (or perhaps be required to by regulation) a false address to the Checker.
If the Checker requires an address, it can have its own policy with respect to this. Including -
1.Don't verify if no address is given. 2. Verify if no address is given.
3.If it gets an address, then it could apply further tests.
If the Asker often gives a false address to the Checker, for repeated queries, then the Checker might be able to detect this in a statistical sense. Suppose the Asker often asks the Checker and furnishes various addresses. Then, often soon after (or soon before), the Checker's page in question gets downloaded, to addresses different from those furnished by the Asker. Especially if these time intervals are short compared to the average time between page downloads seen by the Checker. The Checker may have various policies to decide what to do if it detects suspected false addresses.
If the Asker gets repeated requests for verification of the Alpha --> Beta link, inside the Alpha page, and it hands over real addresses to the Checker, then the Asker might have heuristics to try to see patterns in the Checker's replies. For example, if the Asker notices that all asks from 10.20.30.* get verified, and all asks from other addresses do not get verified, then it might infer that the Checker is approving based on that address range. But the Asker can never be fully sure that this is the entire Checker rule set. Because the Asker might never have gotten a request from the 40.55.*.* range, say; and for this range, the Checker will verify. Thus the Asker cannot get a full deterministic result.
However, the Asker can actively experiment by probing the Checker with false addresses, to perhaps map out possible ranges of addresses that the Checker will or will not verify. Though if it wants to avoid the Checker detecting this, it should probe only within the statistical envelope of actual queries received by it, and do so in a pseudo-random fashion.
Why would a Checker base a verification in part or entirely on the user's address? There are many possible reasons. We discuss a few here.
A Denial of Service (DoS) attack might be happening against the Checker. The attacker is coming from one or more browsers and using the Asker to directly hit the Checker. Hence the addresses given by the Asker to the Checker might help the latter locate the attacker. For example, the Checker and its associated web server might block any http request from those addresses. Because these might be a direct DoS attack on the web server. The Checker could also tell any router it controls to block incoming messages from those addresses. And maybe inform any upstream routers that it does not control, to ask them to do likewise. Plus, when we describe blocking addresses, the Checker might also block neighborhoods of those addresses. Where it determines these according to some heuristics.
The Checker can also tell the Asker that it (the Checker) is under attack. Perhaps the Asker already thinks it (the Asker) is under attack, due to a high frequency of messages coming to it. Or perhaps not. If the attacker is contacting various Askers with requests that point to the Checker, so that each Asker might not think that it is under, attack, and hence pass on those requests. Whereas if the Checker can get (accurate) information about the addresses of the requests, summed over all its Askers, then it might be able to detect patterns in the address distribution or frequencies. Hence, it might ask the Askers and their routers if they could block such requests.
Another reason for the Checker wanting the user's address is that some address ranges might correspond to organizations that have paid the Checker for offering its service. And it does not want to do so for other users.
Another reason is that it might want to analyze which websites that it approves of are more likely to have users who want to verify. This can be important feedback for the Checker. For example, are some of these Asker websites considered trustworthy by visitors, so that they are then less likely to ask to verify any links? More importantly, are some Asker websites continually getting a lot more requests to verify? This might indicate that those sites are riskier for the Checker to approve of. The extra information given by the user addresses might be used by the Checker to do a closer study of the makeup of the visitors going to these Askers. If an Asker with many verifies has a different distribution of visitors than an Asker with fewer verifies, then there might be other reasons, independent of the Askers' website credibilities. Suppose the browser is showing page Alpha, with a verifiable link to Beta. The Checker might also ask the Asker for what address or domain the user was at, when coming to Alpha. If the Asker is a plug-in for the browser, then it has access to this information from the browser, since this is what the Back button on the browser uses. If the Asker is on Alpha's website, then the http request to Alpha's web server might have the Referer field. This sometimes has the domain that the user came from, to Alpha. The field could be blank or it could be misleading. In general, Alpha's Asker cannot tell if a non-blank field is true or not.
Suppose the Asker has this previous domain or address. The same issues arise as above, regarding whether a Checker can ask for this information, and whether the Asker should furnish it.
14. Other Extensions
For greater assurance of verification, there might be optional servers on the Internet. These might have lists of various types of accredited companies or organizations. The companies might be banks, chartered by various governments. Or recognized schools or universities. Etc. An Asker might query these, in addition to asking a Checker. Or perhaps instead of asking that Checker; depending on the reply from the server. This is useful, in part, to stop a phisher starting up a fake bank, that has a domain name similar to an established bank. And then running a Checker (and possibly an Asker) for the fake domain, in order to try to visually mislead users.
The above has involved a manual effort by a user adding a URI to her PL at some website where she is a member. This can be generalized. The website, Gamma, could have procedures, online or offline, so that a person who satisfies them could add a URI of a web page at another website, that links to an address at Gamma. Or, where an external queiy could get a reply from Gamma's Checker, that gave data, perhaps in the format suggested above for the attributes of a user. But here, the person need not necessarily have a login at Gamma.
For example, imagine that Gamma is a U.S. state's Department of Motor Vehicles. Dinesh owns a car in that state, and has registered it with the DMV. Assume that the DMV has a website and a Checker implementing this method. Dinesh has a web page in which he claims he owns that car, and he writes various details about it, including, say, the license plate and Vehicle Identification Number. He records the URI of the page, Phi. He takes that to the DMV along with other information deemed necessary by it. This is done online or offline. If everything is satisfactory to them, possibly (if not inevitably) involving him paying a fee, then they will make certain information about his car available on their Checker, in response to a query from Phi.
Next, imagine that Phi is a page at an auction site, and Dinesh is trying to sell his car. In his item description, he writes a verifiable link to the DMVs website. This may be instrumental in him getting more bids, and higher valued bids, compared to car sellers on that site without such aids.
Another example is if Dinesh is trying to sell software at an auction site. Bootleg software is an endemic problem to the software industry. In auctions, a bidder often faces two questions. Does the seller actually have the software he is describing? And is it a valid copy? (Not bootleg.) The software copyright owner can set up a Checker. Then, when Dinesh registers his copy, he might also furnish the URI of his auction web page, which has a verifiable link to the software owner. Of course, it is not a given that the owner might promulgate the information. If Dinesh is a valid registered end user of a copy, the owner might have a policy of prohibiting resales. But if Dinesh is an approved reseller, say, then this would not be a problem. Or, if Dinesh is an end user, the software owner might let its Checker verify him, for a fee.
We have extended the scope beyond information about a person, to information about a person's belongings.
Also, all that we have said about a "person" could equally well apply to a company or organization. So a company might publish on its website various licenses it has obtained, where it has these to conduct its business. The licenses are issued by government agencies or industry bodies. By using the above procedures, the company might let its licenses be verified by a visitor to its website.
Related to this, a company might sell goods on its own website, instead of or in addition to doing so on an auction site. Consider such a company, V, selling an item T on its website. Suppose V obtains T from company Y, which has a website, y.com. V makes a page with URI Alpha, that offers T for sale. Alpha has a link to a page at Y5 Kappa. Suppose T has a unique id. V contacts Y and asks Y to put Alpha in its PL for Kappa, such that when an Asker comes to Y's Checker, the latter is able to attest that V is selling T, with that unique id, and T came from Y.
A variant is if T is not distinguished by a unique id, but by a model number. Y's Checker may then attest that V is approved to sell an instance of, or some number of instances of, that model, which came from V.
As before, if Jane does not know V, then her verify query should be done from her plug-in, and not from any verify button or facility offered by V. Again, here our method helps protect her against pharms offering perhaps non-existent merchandise.
Suppose Y in turn has obtained T from a reseller Z. Y's page for T could similarly provide a verifiable link to Z. This can be continued recursively, all the way back to the manufacturer. Then, if T is made from several parts, this procedure could in turn be followed for the suppliers of those parts. Thus, what is possible is a chain of authentication that is programmatically available to a potential end user of T. This can be interpreted as an extension of supply chain management, where manufacturers, resellers and retailers integrate their electronic procedures, to track inventory at each stage of distribution. However, there is currently no implementation that extends this knowledge to a (potential) customer. Where for the customer, the reason for use may be primarily to authenticate the item.
Here, there might arise standards and programmatic methods that remove much, if not all, of the manual effort in maintaining PLs on Checkers, and for writing the pages with items and verifiable links that go to these servers. Earlier, we had discussed examples of individuals maintaining their resumes or selling items. In such cases, the manual effort would not be great. But clearly, for large scale commercial deployment, there is benefit in a programmatic implementation of our methods.
Our method can also be used to verify analysis of various types. Including, for example, appraisals on the authenticity or value of an item. Imagine some real estate being offered for sale on a website. It might say in the text of the page Chi that it has been appraised by appraiser Marcus at some value Ml . And appraised by appraiser Jody at another value Jl . Chi could have links to those appraisers' websites, and where they, separately, have added Chi to their PLs. Also, Chi might describe a structural/engineering analysis of the property by Susan, with a link to her website and where she has added Chi to her PL.
Thus far, our method has described use cases where items are offered on an electronic marketplace, like an auction or seller's website. But our method can also be extended to a physical marketplace. Imagine such a marketplace with customer Jane walking through it. She sees a desirable item, and looks it up on an electronic device she is carrying, that has wireless communication ability. A type of a cellphone or PDA, perhaps. The marketplace has a transceiver that broadcasts what is effectively a web page on this item. (And broadcasts other pages for other items.) The page gives more information about the item. And it might have verify tags that let her verify various details about it, like who supplied it to the marketplace.
A variant is that instead of, or in addition to, the marketplace having a centralized transceiver, an item might have the equivalent of its own transceiver. Imagine perhaps that an item that already has a built-in transceiver, for normal use by an end user. Like a car, or a laptop computer.
Another variant is that an item might have the equivalent of a passive, physical tag. When Jane queries it with her device, it replies with enough information for her to perform a verification.
It is possible that the hypertext markup language used to describe the item is not HTML. For example, it might be the Wireless Application Protocol, which is expressly designed for wireless devices with small screens.
In "5804", we described how to find fake electronic seals. In that invention, a validator was a company that issued a real electronic seal. But a phisher might fake this in a pharm website or in an email. In this invention, we extend "5804" by having the validator run a Checker that lists the valid URIs of pages that can show its seal. And when an actual client of the validator could use our above methods to indicate that the seal it is showing is verifiable. This action by the validator might be in addition to or in place of its actions in "5804".
Imagine a content provider, F, with a website at which it has numerous pages. It might require that users going to some of these pages login first. Where the login could entail that the user had paid a fee. But the provider might also have arrangements with another company, G. Such that someone logged into G can click on a link in a page in G, and that link goes to a page in F. So that she can read that page, and possibly other pages in F, without having to login to F. An example might be where F is wsj.com, run by the Wall Street Journal Corporation. While G is LexisNexis.com, run by LexisNexis Corporation. Currently, a company in the role of F that wants to set up such an arrangement may need to customize its web server. And different companies may need to independently devise various such ad hoc solutions.
Our method permits a simpler, universal approach to extending the functionality of the pages. It could be used by G to run a Checker, insofar as constructing PLs for its pages. Then, perhaps the Checker does not answer queries from Askers. But the PLs can then be naturally used when an http request comes in for those pages, to decide whether a given page should be returned to an address that is in the page's PL.
15. Verifying Buyers and Other Roles
Our method has mostly concerned about being able to verify some types of information related to a seller. Like data about an item the seller is offering, or possibly data about the seller herself/itself. We can also make similar statements about a buyer. She might post a web page requesting to buy an item. This page might have verifiable links to independent third parties, that approve the placement of this links on her page. And where these third parties can attest to facts or opinions about her.
More generally, someone may be wanting to take part in a more complex transaction, like a barter. (Where she may be considered a seller and a buyer.) She could also put verifiable links in her page. Where some links might be about validating whatever item she is offering. And other links might be about validating information about herself.
Likewise, escrow companies for online auctions might also use our method to improve their credibility. There have been instances of fake escrow companies, that essentially accept money from buyers, and then disappear.
Another context is that of online dating or socializing. There is merit in being able to verify some claims posted by a person about himself in a dating website, say. By providing bona fides in the manner of our method, he may be able to attract more responses.
16. Full Automation
An extension of our method involves two programs, on different computers, that take part in transactions. Without loss of generality, imagine that these are implemented using Web Services. Let one program/computer be Alpha and the other Beta. Suppose Alpha needs certain resources. This might be extra computing power to run a program that Alpha has, for example. Or, Alpha might be an automated purchasing agent, looking to buy certain software or hardware items. Beta claims to offer the resources or items Alpha is looking for. We assume that if Alpha decides to go with Beta, then the transaction can take place in some programmatic fashion.
But our focus is on Alpha being able to decide if Beta is qualified. In Beta's
Web Service description, it could add tags, similar in format perhaps to the ones described above. These would effectively be links to third parties on the network. Hence, Alpha could read Beta's message, with those tags, and then make a query to such a third party, Gamma. Assume that Gamma runs a Checker, with the equivalent of the Partner Lists described above. Gamma could reply "true" to
Alpha if it "approves" of Beta's reference to it. Here, Gamma might be some third party that is credible to Alpha.
17. Search Engine
Our method can be used by a search engine to improve its searching of the web, and the analysis of those results. A search engine ("Engine") deals mostly with web pages, and the links between them. It is these links which are typically the main determinant in how to rank search results. Typically, the more pages that link to a given page, the more relevant or "important" the latter is presumed to be. But this use of links can be considered a static analysis. Even though many pages are being continually created, destroyed or changed, that is still largely true. What would also be useful to the Engine is data on actual user browsings. That is, on actual page viewing. To some extent, the Engine can use the queries it gets from users, as an indication of general user interest. And, if the Engine hosts ads on the pages showing results, then often these ads are links that first go. back to the
Engine, which redirects to the desired destination. So from the user clicks on the ads, the Engine can get additional data on user interest.
The basic problem facing the Engine is that it has no direct access to users' browsers, when those users are surfing the Web. (While links can be found by
Engine spidering, independent of any user activity.) Of course, a user might regard this as favorable, in protecting her privacy from the Engine.
But if our method becomes implemented on the Web, with many (independent) Askers and Checkers, then this can be a valuable source of dynamic data, that closer reflects actual user interest. These include, but are not limited to:
1.Spidering Checkers on the Internet, to get their data, if they are willing and able to provide this to the Engine. Some Checkers might be located in countries that prohibit the divulging of the data.
2.For a page, showing a list of those pages that it links to, and that approve of it linking to them.
3. For a page, showing a list of those pages that it links to, and that disapprove of it linking to them. 4. When it has a page Alpha, and a page Beta that it links to, and approves of its link, then "collapsing" the two pages into one. This can be to simplify the topology between a set of pages at various addresses. Or to combine the text of both pages and then computing various rankings for the combined page. The data might also include addresses of the users that asked the Checkers. Where there is the possibility that some of this information might be false, due to Askers sending this to the Checkers. We also include the case where the Engine is run by a government, possibly for its internal use. And where the government might mandate that Checkers located in its country must divulge any user address data they have to the Engine.
The verifiable links in pages and the Checkers give the possibility of using extra, high value structure on the Web. We define a Verify Score (VS) of a page with URI Alpha as a measure of how many pages it links to, and that approve of those links. There can be various specific implementations. For example,
l .VS(Alpha) = number of verified links in Alpha.
2.VS(Alpha) = number of verified links in Alpha / number of links in Alpha. 3.VS(Alpha) = number of verified links - (number of verified links that disapprove)
The VS might also be modified by taking into account the VS, or some other ranking, of those pages it links to, in some self consistent formulation. For example, if the page that is pointed to has many pages that point to it, then this might increase a weighting of its approval of the link from the original page we were looking at.
Naturally, from the above, it also follows that one could define like a "Verify Anti-Score", which is a measure of how many pages a page links to, that disapprove of it. This VAS might also have merit in some usages.
We point out that a VS of a page is easy to find, and can be done deterministically. Unlike, for example, Google's PageRank.TM. The latter depends on finding the pages that link to a given page. As noted earlier, this can only be done statistically, by performing as comprehensive a spidering of the Web as possible. The Engine can show the most popular Checkers it knows, and for each, the pages that are the most popular approvals. Analogous to showing popular search queries.
The Engine can also spider Askers which are located at websites, where an
Asker might have an API or Web Service. The Asker might reveal which are the most popular Checkers that its visitors call for verification. And it might reveal which of its pages correspond to these requests.
The Engine can let its users search for Askers and Checkers, in addition to the usual searching for websites. It might also have classifications of them, that users could peruse. The classifications might perhaps be done algorithmically, using, say, keyword or Bayesian analysis of their pages, or manually.
In our earlier discussions of a Checker, this was an entity that verified pages outside its website. But how to verify a Checker? An Engine is a natural entity to do this. It has name recognition with a large, global user base that, presumably, trusts its regular search results. It could offer a whitelist of its approved Checkers, based on whatever criteria it chooses. It might also offer a separate (or combined) list of its approved Askers.
Why would a Checker or Asker give its statistics (or subsets thereof) to the Engine? One reason is that it might appear higher in (free) search results of such entities. Hence, the entity or its associated website might get more visitors. It might typically consider this desirable.
Note that a Checker or Asker could send false data to the Engine. This is somewhat unlike the case when the Engine spiders links. Though even here, false data can surface. Often, that phenomenon is a link farm. Where there are closely affiliated websites that try to bolster their search engine rankings with heavy linkages between their pages. So just as an Engine might have heuristics to try to detect link farms, it might also have other heuristics to detect false Checker or Asker data.
One way is for the Engine to sum up the number of requests that each Checker gets, and sort the Checkers using this, where the data comes from the Checkers. Then, the Engine can look at its Askers1 data of the Checkers that they access, and sum up the number of requests to those Checkers. So two ordered lists of Checkers can be made. The order should be approximately the same, if most Askers are independent of the Checkers. The totals for a Checker will be different, in general, because the Checker also gets requests from Askers at plug-ins, and the Engine has no access to that data.
These ordered lists of Checkers can be used to try to detect any falsified data from a Checker, if its position in one list is different from that on the other list. Specifically, perhaps, if its position on the list based directly on Checkers' data, is "significantly" higher than its position on the list based on Askers' data. It suggests that this Checker is inflating its numbers. Of course, what "significantly" means above is up to that Engine to determine. But our method gives it a valuable technique to investigate fake data.
The above technique uses an asymmetry. A Checker might have incentive to inflate the number of requests it gets. Perhaps to boost its ranking in a listing of Checkers, which might translate into more visitors to its website. But it is difficult to imagine much economic value, if any, in a Checker falsifying its data, to make itself appear less important.
This method of testing the Checkers' data can also be used, in mirror form, to test the Askers' data. Plus, if the Engine has wide usage, and there are many Askers and Checkers, the sheer global nature of the data may make it harder to tamper with in a meaningful way. Suppose the Engine has an affiliated website or websites with a lot of content. For example, Yahoo.com and Msn.com. The Engine can run its own Checkers on these sites. So it knows that this data is reliable. It can then find the biggest Askers that ask its websites, and rank those reliably. Then, it can ask those Askers, preferably in an automated manner, for their statistics. From these, it can find the biggest Checkers. Which it can in turn ask. This is a spidering method.
Suppose the Engine runs a messaging services, like email or Instant Messaging. For example, Google Corporation's Gmail, or Yahoo or Microsoft's Hotmail. It can run its own Askers for these services. Then it can find the biggest Checkers that its Askers ask. So it can ask those Checkers, preferably in an automated manner, for their statistics. From these, it can find the biggest Askers. Which it can in turn ask. Another spidering method.
If the Engine has both content websites and messaging services, then it can combine the methods of the previous two paragraphs.
The Engine can also find from Askers and Checkers which pages are more likely to be verified. Presumably, this is due to more than just casual browsing by users. Timely knowledge of those pages can be exploited by the Engine, in ways including but not limited to the following -
1.Find topics from those pages that people are likely to want verified. This might be done programmatically. Maybe keyword extraction. 2.Those pages might be more valuable for ads than "generic" web pages. So the
Engine can contact the pages' owners to arrange ad placement. 3. Companies selling things related to the pages' topics can be approached by the
Engine. To perhaps place their ads on those pages.
4.Or to place their ads for keywords found from those pages. Where these ads might appear in the Engine's general pages for search results for those keywords. 5. Other pages with similar topics (or keywords) might also be good for ads, even if they do not appear in the Asker and Checker data.
Suppose we have a web page that is heavily linked to by others, and thus has a high ranking in the search results for a given phrase. But the Asker and Checker data ranks it low, but not zero. This assumes that the page has verifiable links, or it has an Checker, and the Checker says that the page has a PL of other pages. As discussed earlier, the Asker and Checker ranking may be unlikely to be low because of false data, because that would tend to act to falsely inflate the ranking. So why the discrepancy with the regular search ranking?
Perhaps because the web page is part of a link farm. The farm has successfully boosted the page's link ranking. But link farms have relatively little real user traffic. And the Asker and Checker data reflects that.
Or, the page's topic is non-financial and non-controversial, perhaps. So visitors to the page might see verifiable links, but have little interest or incentive in actually asking for verification.
We stress that in either case, more tests might be applied. But after such tests, the Engine could decide to lower the page's link ranking, influenced by its low A&C ranking. Which can help act as an extra filter to improve the usefulness of the search results; making them more valuable to users and advertisers.
The link farm scenario deserves further comment. Its authors might avoid using Askers and Checkers, to not appear on an A&C ranking. This in itself could be desirable to the Engine, acting to keep the latter ranking freer from distortion. But suppose a link farm wanted to get a high A&C ranking, where this is ultimately as bogus as its link ranking. It has to do more work, as these requests have to be continuously made from Askers to Checkers. And the Askers can have rules emplaced to restrict the number of requests coming from a given network address or address neighborhood, per unit time. Remember too that these Askers are not those in plug-ins, but at websites, and presumably the Engine might restrict its Asker data collection to those on its whitelist of reputable Askers. Where reputable might also mean that they implement various precautions like those described here.
Consider a given link farm. It has a page, Beta (and other pages). The farm wants to boost its A&C ranking. Suppose it runs a Checker for Beta. Let verify requests come from an Asker for a page Alpha that is outside the farm. Alpha is probably written by the farmer. It might be at a web hosting site, for example, where she can open an account and write pages. For her Checker to get many requests from outside the farm, most of these might have to come from such Alpha pages. Because an independent page is unlikely to link to Beta. So the more Alpha pages she needs, the more work she has to do. And possibly the more money she might have to spend, if the outside hosting service charges fees. Suppose instead she only has a few Alpha pages. Then she has to send relatively more queries per page. In both cases, the Alphas are very likely to have low link rankings. Unless she cross-references them with each other, and with the pages inside the farm. But in this case, the Alphas are a de facto part of the farm. If not, then the Engine can search for pages (Alphas) with low link rankings, but which make a large number of asks to her Checker. A possible indication of manipulation.
But the farmer making the Alphas part of the farm also makes them vulnerable to detection. Various spidering and clustering techniques can be used to trace out a possible link farm, and gather in these Alphas. Thence, asks going from within this set to other members of the set might be detected and used as an additional indicator of a link farm. Because, essentially, in "Ask Space", that draws the link farm pages closer together. Whereas for a farm to elude detection, it should have many external pages linking to its pages, and likewise for ask requests.
As earlier, we use data in two different conceptual spaces, link space and Ask Space, to test against each other.
The Engine might also choose that its A&C rankings have this extra criterion. If it gets data from an Asker at a web hosting site, it might give preference to such Askers when the site charges its users for hosting pages. To increase pressure on link farms trying the above scheme. Here, giving preference might mean increasing a weighting on such data, or possibly not getting data from Askers at free web hosting sites.
18. Differences with Notphish
Call the plug-in described in the Antiphishing Provisionals as Notphish. There, this term was used primarily as the name of a tag to be inserted in messages and web pages. Here, we use the term more generally to encompass the range of techniques used in those inventions. Call the plug-in of this method P. There are various differences between the two -
1.P verifies email addresses. Notphish does not.
2.P verifies link values with respect to the pages in which they appear. Typically,
Notphish could but does not.
3.P does not use typically use an Agg. Notphish does. 4.P can be used by anyone. Notphish is for (large) companies or organizations.
It is the last reason which is the crucial difference. Notphish and its Agg let us defend large, trusted companies against phishing. While the corporate customers of Notphish+Agg can be extended downwards to smaller companies, as perhaps by the method described in [8], this has to be carefully done, in order to prevent phishers from joining. Whereas the above implementations of P described cases where anyone can use it at the level of supplying a Partner List.
Thus, it may be that a given Notphish implementation should not be extended to include the functionality of P.
However, there may be some circumstances where a Notphish corporate user, like a bank, might also want to run our method. Because the Notphish only works with users who have a browser with the corresponding Notphish plug-in, and which is NOT the Asker of this invention. Users without the Notphish plug-in lack the Notphish protection. To partly alleviate this, the bank might insert verifiable links into some of its pages. These links could be to various governmental bodies that regulate banks, or to a banking industry organizational body, for example. Where those entities would run Checkers to provide verification. The domains of these links might be published in the bank's Notphish Partner Lists, so that the links are compatible with the Notphish plug-in.
Separately, suppose a company, like a bank, BankO, is using our Notphish method. Jane uses her Notphish plug-in to validate baiilcO.com when she visits that website. Where this validation is done using the Notphish Agg and BankO's Notphish PL. But imagine that BankO also runs a Checker, because it wants to approve certain links to its pages. This is the obverse of the situation in the previous paragraph. If Jane is able to access the clear text Checker's PL, then she might choose to regard those pages, or even those entire websites, as reputable for her interaction. This may or may not involve financial transactions.
It should be stressed that the decision is up to her as to making use of the Checker's PL in this manner. The bank might disclaim all responsibility for any misuse of its PL. Continuing in this manner, one can imagine Jane going to one of those websites. Then if it runs a Checker, she might in turn go to one of those referred websites for interactions. And hence recursively. A "friend of a friend" approach.
19. Digital Certificates
For several years, many browsers have come equipped with the ability to handle digital certificates. Where these certificates might use some symmetric cryptosystem, or a public key cryptosystem. These implementations have been impressively done, and mathematically correct. Yet for all this effort, the average user typically does not understand these, or their usages. In contrast, our invention is much simpler to explain to such a user. It does not revolve around encryption, but authentication. And it does so in a manner that is not much more complicated than understanding what clicking on a conventional link in an HTML page does.
While our method does not challenge digital certificates in what they do, we believe they are overly complex for many common authentication needs. Plus, our method is far more computationally lightweight.
20. Verifying Arbitrary Sections of a Page
Our method can be extended to permit the checking of links and email addresses in arbitrary sections of a web page. For example, a page might have these custom tags -
<askLimit>
<!-- various HTML tags and content to be approved -->
<!-- Susan will (presumably) approve the above content — > <ask link="20.30.40.50:3408" name=" Susan Jones" /> </askLimit>
The <ask> is shown next to the </askLimit> for clarity. In general, it might appear anywhere between the askLimit tags.
In the above, we extend the syntax of the <ask> tag, which previously had been used in the context of verifying an email address. In that situation, the syntax of the email address was used to derive a domain at which the Asker would query a Checker. Here, the network address of the Checker domain is given explicitly in the <ask>, along with a port number. If the latter is omitted, then a default port number is used. The <ask> also has an optional attribute ("name") that might be used to indicate the name of the person or organization doing the approval. There might be other attributes with more information. These might be used by the Asker, in part, to be displayed to the user. The Asker might also have the ability to query the Checker for its corresponding values for these attributes, if they exist, in order to find and highlight any discrepancies. If such exist, the values given by the Checker might be considered as more authoritative than those written in the page being analyzed.
It is also assumed here that when the author of the page contacts the person ("Susan") at that address, in order for Susan to approve, that Susan is shown the portion of the page to which she is asked to approve. At Susan's end, she might have software that can show this portion. Or, in the last recourse, she might inspect the page's source code.
Likewise, the Asker may (or should) have code that can display the portion of the page between the askLimit tags, in some visible fashion.
A page might have different portions that can be verified. For simplicity, these should be nonoverlapping. The only overlapping permitted should be where one askLimit pair is completely embedded in the scope of another pair. This is standard XML practice, and permits a programmatic validation of correct syntax. Which is different from the verification process described in this invention, where an Asker goes out over the network.
21. Different Modality
Consider a page, with URI Alpha, that has a verifiable link to a page at URI Beta. Thus far, we have considered usages where Alpha "wants" to be approved by Beta's Checker, which essentially means ultimately by Beta's author. For brevity, we might say that Alpha "wants" to be approved by Beta. In some sense, Beta is more "reputable" than Alpha, and Alpha's use of a verifiable link is an attempt to gain Beta's consent to use Beta's reputation. But there is another modality. Where Alpha is perhaps a more popular page, or a more reputable page. And it is Beta that gains from being linked to from Alpha. Perhaps (the author of) Beta wants to affirm support of Alpha. Imagine Alpha as being an essay or petition or political statement, perhaps. Some people might want to publicly affirm support of Alpha. Our method, as already delineated, lets each of them add Alpha to her PL, while Alpha's author adds a verifiable link to her.
There is a scaling problem, if many people want to have such links. Alpha's author might be unwilling to add these, perhaps because it might greatly increase the size of Alpha, and thus hamper its downloading. So Alpha might use a custom tag, that factors out the links into another file, which is not downloaded by a normal browser. But which the Asker can access to perform its verification. For example, we might have in Alpha's source code, this tag -
<askθthers file="other.htm" />
This refers to a file, "other.htm", in the same directory as Alpha. And the file might have this format -
<html> <body>
<a href- 'http:// ... "> Joe Smith </a> < a href="http :// ... "> Mark Wong </a>
<ask>mary@somewhere.org</ask>
<!— Let another file of this format be included — > <askθthers fi Ie=" other2.htm" />
</body> </html>
Thus, a normal browser will see the first <askθthers>, and simply ignore it. While the Asker can read it, and possibly change the visible display of Alpha, to indicate its presence, and also, if instructed by the user, read other.htm and any other associated files, and perform a verification. In this instance, the Asker might have settings that restrict the maximum number of links it will do this for, with the user being able to alter it. To handle the case of a massive number of links that will take very long to confirm.
The file other.htm, and any other such files that it might reference, are written in HTML. This is optional. But it also lets those files be viewed in a standard browser, with many links active.
The other users who have approved the links to them might be searchable by the Asker, under various criteria settable by the user.
A search engine which spiders and finds Alpha might also use information in these associated files, to build a more structured picture of the network.
22. Online Auction Fraud
At eBay Corporation and other online auctioneers, fraud has become an increasing problem. We now describe one common type of fraud on eBay, and our means of attacking it. Our remarks can also be applied, with trivial modification, to the other auction websites, and other organizations, including governments, that want to regularly disseminate email to their users or constituents.
Let the auctioneer use our Notphish methods to send a PL to an Agg, which then disseminates it to its Notphish plug-ins. Let Amy be someone trying to commit fraud on a bidder, Jack. Suppose he is not or was not the highest bidder. (The auction might still be proceeding.) Amy might send him a message, via the auctioneer's internal messaging system, where this message will get forwarded to Jack's email address that the auctioneer has for him. The message could purport to be from the seller, offering perhaps to sell another instance of the item, at Jack's price. Typically, if Jack responds affirmatively, this interaction might then be conducted with Amy trying to get Jack to pay her first, whereupon she does not deliver the item, or she delivers an inferior or different item.
Suppose now that the auctioneer required that any real such message, from the seller to a bidder, sent via its messaging system, contains certain structure. Specifically, that it mentions the auction id and seller handle and possibly email address, in a manner that can be programmatically parsed for. Imagine, for example, a line with example values -
Auction Id = 123456789abc; Seller = someHandle; real email address;
Optionally, there might also be a date on that line or the next line. As well as the closing price if the auction has ended, or the current high price. Possibly too the title of the auction, and other related details. The messaging software might write this automatically as the first lines, say, in any such message from the seller to a bidder. This by itself is insufficient. For Amy might write that explicitly in the first lines, if she is not the seller. Or likewise if Amy sends the message entirely outside the auction's messaging system.
Or, preferably, instead of the data being in hardwired locations in the message, they might be written in XML format. And placed anywhere in the message. They might be written as attributes inside custom tags, or each datum might be bracketed between the opening and closing instances of a custom tag. More flexible. We will assume henceforth that this is done, instead of hardwired locations. When a seller sent such a message, not only would the auction software insert the above data into the seller's message, but it would also store the data, as a record of what the seller sent. Hence, when the Checker gets a query, it would try to match the furnished data with its stored data. And it can return true if there is a match, and false otherwise. Or perhaps more elaborate error codes, to indicate which field or fields are wrong.
The auctioneer's internal messaging system might also be available for the general case of a user, who is not a seller, to send a message to another user. A message from a non-seller can be automatically parsed, to look for those tags. If they are present, then this could be suspicious. The values can be compared to the Checker's data. For example, if the sender is not the seller, where the seller was found from the appropriate tag, then this could be fraud. The other tags could be also tested. Hence, a suspected fraudulent message might not be forwarded to the recipient. And the sender might be placed under investigation.
But why should a sender, who wants to imitate a real seller, put in those tags? Imagine that Jack has a plug-in Asker. It recognizes what purports to be a message from an auction seller. It parses to find the above data. If they are absent, then it does not validate the message. Suppose the Asker finds the above data. It contacts a Checker at the auction site, and sends the extracted data. Based on the reply, the Asker could then validate or invalidate the message.
The plug-in Asker is also needed because a fake message from a "seller" Amy might arrive at Jack's external email address, without originating in the auctioneer's messaging system. This lets Amy bypass the above test done by the messaging system on a message. Amy might be sending out a mass mailing and hoping that some recipients get fooled. Or, Amy might have scrutinized Jack's bidding (on a high value item perhaps), and by other means, mapped from his handle to his email address. Some users choose handles to be their email addresses. While Amy might send non-fraudulent messages from another of her accounts at the auction site, to various bidders who bid high amounts. To try to get replies that reveal their email addresses. Which she can then later use as a fake seller.
For a particular auctioneer, it might distribute an Asker plug-in. Where there might be different versions of this, for different browsers. There is an important simplification for the Asker, as compared to a Notphish plug-in. When the latter is used to check email that the user is reading in the browser, then it needs knowledge of where the message body starts and ends, within the BODY tags of the HTML page. Because it has to find links in the message body, and typically outside this body, the ISP has various other links. Hence, the Notphish plug-in needs knowledge of how major ISPs demarcate the message body in their web pages. [3] Whereas the Asker has a more limited functionality. It only needs to find the custom tags defined by the auctioneer. Hence, if those tag names are unique, with respect to the tags found in the message web pages used by the major ISPs, then this is sufficient for the Asker to find them and their values.
The auctioneer can then simply train its users that if they have its plug-in and get a message claiming to be from a seller, then ONLY if the Asker validates it should they regard it as genuine.
An alternative to the Asker is for Jack to forward the message to, say, test@auctioneer.com. That address leads to a equivalent Asker/Checker at the server that parses the message and gets the purported data and compares it to the records. This can be fully automated and very quick. It should be able to reply to the user, giving a "true" or "false" result.
A refinement of the above is for the stored data to also include the handle of the recipient, and that the starting lines also show this information.
For Amy to fake such a message, she would have to know that for a given auction, the seller actually did send out a message, at some given time. Plus, if the recipient's handle is also used, she would have to know that the seller did send the message to that user. Very hard to do.
In terms of privacy, it can be seen that the server reveals very little that is not already publicly known or knowable. For any given auction, its id and seller handle are public knowledge. But the server does not broadcast even these. It gets submissions from plug-ins of purported data. Its reply only at most shows which data fields are wrong, and not the correct values for those fields.
The above, in essence, is a specific implementation of ["2245", "2458"]. That plug-in is basically the Notphish plug-in that was described in those Provisionals. But imagine that the user also has an Asker plug-in. Instead of that plug-in automatically doing the Notphish verification, she might turn it off, by default. Then, when she reads the purported message, the Asker will do the verification of this method, where the auction website has a Checker. It has for each user, a PL. This is slightly different from what we described earlier about reading email and doing verification. There, we described how on the Checker side, when a user sent out email, it could record various header attributes. Here, instead of, or in addition to, it could add various auction attributes, like those discussed above.
Consider a generalization. Imagine a government that wants to mail out millions of messages to its constituents. Each message might have items specific to that recipient, and items common across the messages. The government can embed tags around various recipient-specific data. And also around some common data. Then, assume that it had earlier distributed a custom Asker to its citizens. When a recipient gets a message, she can run the Asker against it. The tags are used to extract the data, and these can be sent to a government Checker for verification. There is an advantage to doing it this way, so that the user can manually ask for verification. Because of the number of messages, having the Notphish plug-in automatically verify might be very computationally intensive at the Agg. Whereas, the manual Asker -> Checker invocation may reduce the number of requests.
Naturally, our remarks about the government usage also apply to any organization that regularly sends mailing to many of its members, and wants them to have a deterministic way of proving authenticity.
23. Broken Links
A persistent problem ever since the Web arose has been broken links. These arise because the resources (pages) pointed to can be renamed by their authors, or deleted, without notification to the author of the pointing page (Alpha). Partly a consequence of the unidirectional nature of the HTML hyperlink. So that Susan, the author of a pointed page, Beta, need not even know of the existence of Alpha, that points to Beta. And even if she did, she is under no obligation to inform Alpha's owner if she renames Beta.
Our method helps alleviate a longstanding problem with web pages. While the breaking of links can be programmatically detected, finding the new links has been a tedious and error-prone manual effort. This is one downside of the free ability to link to any page on the Web. The repair effort can be very difficult if the author of the linked page does not cooperate. For example, ideally if a page is moved, the author can tell her web server to redirect a query to the new page. But it is well known that this is often not done. Now consider what Laura, the author of Alpha, might do, when she finds that Beta has disappeared. As an example, let Beta be http://l .two. com/c/b/d. html. Laura might guess that perhaps the file d.html has been renamed to another file in the same directory, b/. So she uses her browser to search b/. But she may not be able to do this. Because for security reasons, say, Susan may have told her server to prevent users from rummaging around in b/. That is, a browser must supply the name of an actual file, in order to see it. Now suppose Laura can look at b/. There might be hundred of files here, with the suffix html. Even if the server lets her read these, it may take some time to identify which of these she should HnIc to, if any. Now suppose Susan moved d.html to a different directory, c/e/f/. Even if Laura has all the necessary permissions to look through Susan's directory tree, this can be a lot of effort. Worse yet is if Susan decided to replace d.html by a file in another domain entirely. In general, Laura will never find this.
The above assumed that we are talking about files. So that it at least makes sense to try to traverse a directory tree and look at listings of files. Assuming all permissions are granted for this. But a very common case is that some websites also serve up dynamic pages. So the above address of c/b/d.html could actually be a dynamic address. In such cases, Laura typically cannot search a directory tree. She would have to provide actual full addresses of resources that the server can then serve. But in general, she does not know what these are.
Our method can be extended to provide a simple manual or automated detection and repair of broken verifiable links. Assume that in Alpha, the link to Beta is marked as verifiable. And suppose that Beta's PL includes Alpha as one of the approved linking pages. We also assume that the Alpha and Beta are on different domains, and that the persons responsible for maintaining them are different. Because if it was the same person, then when she renames Beta, she can merely alter Alpha to point to the new address.
If Beta is renamed to a new address, Gamma, then Beta's Checker can contact a Web Service or API at Alpha's domain. Assume that this is incorporated into the functionality of an Asker running at that domain. The Checker sends a structured message, perhaps written in XML, as for example,
<change>
<old>Beta</old> <new>Gamma</new> </change>
If several files are being changed, then we might have a syntax like this -
<change>
<item>
<old>Beta</old>
<new>Gamma</new> </item>
<item>
<old>Rho</old>
<new>Phi</new>
</item> </change>
Optionally but preferably, these messages might have some digital signature or other means whereby the recipient can check that the message did come from the Checker.
The instruction to the Checker to send the message might be done manually. Perhaps Susan, who renamed Beta, might do this, after she has done the renaming. Or more generally, Beta might have been deleted by her, and Gamma might exist at a different domain. In either case, Susan might tell the Checker the equivalent of the above message, and tell it to send the information to all those in Beta's PL that were approved. Presumably, there is no need to tell those in the PL which she disapproved of linking to Beta in the first place. This may even have been a reason for her to change from Beta to Gamma.
So now Alpha's Asker has received the message. It might attempt to authenticate the message. If so, assume that this succeeds. Then, it might contact Alpha's author, Laura, with the message. She can then manually make any necessary changes to Alpha. In this manual case, there is less need for the message to be authenticated, because Laura can by manual inspection check the links to see if the changes make sense. Though perhaps the authentication should still be done, since this can act as a filter, trapping any fake messages, before they reach Laura.
An automated alternative is for the Asker to try traversing the old link to Beta. To see if this fails. And to try accessing Gamma. To see if this succeeds. Suppose both are true. The message should optionally also be authenticated. Then, the
Asker might programmatically change the appropriate links in Alpha (there could be several links to Beta) to point to Gamma. We assume that the Asker has the necessary permissions on the Alpha file, if Alpha is a static file. But suppose Alpha is a dynamic file - held in a database, and then assembled by a web server. The latter could have code to take the Checker's message and then replace the instances of Beta in the database with Gamma.
There are other useful cases of messages that the Checker can send to the Asker. Suppose that Susan has moved the directory tree from 1.two.com to 2.three.org. Then instead of the Checker sending out changes for each file, it can send this -
<change>
<oldBase>http://l .two.com</oldBase> <iiewBase>http://2.three.org</newBase>
</change>
This syntax can also be used when the domain remains the same, but directories are renamed.
As above, the Asker should require some authentication of this message. Plus, it might also have a requirement that the Checker's base domain be the same as the base domain in the oldBase tag.
The Asker might also have rules about the new addresses that it is being asked to link to. For example, it might have a blacklist of addresses. It applies this to the proposed changes. If any new addresses are in this list, then it might flag the original address and indicate this and the new address to Laura, in some fashion. She then can make a decision whether to use the new address, or perhaps remove the old link. The blacklist may be useful to protect her website. Because other parties on the web, like search engines and ISPs, might spider the web. If they find that Laura links to an address in one of their blacklists, then they might also add Laura's page or domain to that list.
The Asker might have a whitelist of addresses, and it might restrict new addresses to be within this list. Or that the base domains of the new addresses be in this list.
The Asker might have a whitelist of protocols and addresses. It might have policies against having an http link to a non-default port, for example. Maybe in part because the intended users who will visit the Asker's website are behind a firewall that permits only a few outgoing protocols and ports.
Suppose that Susan is changing the port number across all the files. Then the Checker might send out
<change>
<oldPort>80</oldPort> <newPort>8088</newPort> </change>
Likewise, the protocols might be changed, as in this example - <change>
<oldPiOto>httρ</oldProto> <newProto>https</newPiOto> </change>
Optionally, the above messages might also have a changeover time, indicating when the changes will take place. This time might be measured in Universal Time. It could designate that the old addresses will remain valid up to that time. There might also be another time, call it newTime, say, with newTime <= changeover, where newTime is when the new addresses become valid. These times can let Susan and others who link to Beta and other files that will be changing, manage a schedule to decide to incorporate the changes or not.
The above syntax examples can be combined, where a specific change overrides a more general change. For example, if many files are being moved to a new domain. But some files are being moved to a different domain. Like
<change> <oldBase>http://l .two.com</oldBase>
<newBase>http://2.three.org</newBase>
<item>
<old>httρ :// 1 ,two.com/g.htm</old>
<new>http://apple.pear.org/q.htm</new> </item>
<item> . . . </item>
</change>
If a file is being deleted, without any replacement, then the syntax might be
<change> <old>httρ://l.two.com/t.htm</old>
<new>null</new>
</change>
In the <item>, there could be two or more <new> addresses. It is a characteristic of HTML that a link is from one resource to one resource. But here, Susan might put in several. Then, Laura can get these from her Asker and decide how she might manually modify her pages, if she wants to accommodate these extra links. While this is manual, our method lets her get the proposed new information programmatically. Eases the effort to change.
If an <item> has several <new>, one programmatic way to handle this is for the Asker to make the necessary changes, but using only the first <new> entry.
If Susan decides that she will no longer let Laura verify Beta, then this message might be sent -
<change> <old>Beta</old> <verify>false</verify>
</change>
If this is done for several pages, then the <item> tags can be used to encapsulate each page.
If Susan changes the title on a page, she can convey this to all those approved pages that link to it. So the Checker can send to an Asker -
<change> <old>Beta</old>
<newTitle>some title goes here</newTitle> </change>
Or see Section 3, where we discussed the checking of link labels. If Susan uses a custom tag, like "askLabel", and she changes its value, then her Checker might send to an Asker -
<change> <old>Beta</old>
<askLabel>some title goes here</askLabel> </change>
More generally, if Susan changes the content of Beta, she can tell the Checker to inform all the approved linking pages. The Checker can send out to the Asker -
<change>
<old>Beta</old>
</change>
Thus, Laura will be told by her Asker of this, and she can inspect the new
Beta, to see if she still wants to link to it. Or perhaps to modify the textual part of Alpha, to take into account Beta's change. Or, perhaps, when the title or a custom tag changed, the Asker might be able to automatically incorporate the changes into a new Alpha.
The Asker should have a filter that operates on the new addresses it gets in these messages. So that questionable character substrings might be detected and eliminated. These could be scripts or other malware that might be inadvertently executed by the Asker's machines. Or perhaps by the user's browser, in a cross-site scripting manner. Even if the Asker successfully authenticates the Checker's message as coming from the Checker, it may want to apply this filter. Because if the Checker has been electronically or physically subverted by a cracker, then malware might be introduced. And authentication methods on the Checker's messages are insufficient to detect this.
The Asker might ask the Checker not to be notified about any changes. It is up to the Checker whether to honor this request or not.
If the Asker can automatically make changes, especially if its pages are dynamic, then the changes amount to a non-fmancial transaction. So the Asker could save the message in a log. In part, because if we regard the message as an operator on the Asker1 s pages, then the message can be inverted. That is, from the message, an undo operation can be derived. Hence, recording the message allows for the possibility of a rollback if the need arises.
Suppose the Checker has informed the Asker of changes that have happened, or will happen, and that some of these changes involve Beta moving or being deleted. Then, some time after the change has occurred, the Checker can spider Alpha, to see if Alpha has been updated to reflect the new address change (or deletion) of Beta. If not, it can send another message to the Asker. Perhaps at a later time, if it detects still no change, it might have a policy of removing Alpha from its PL for Beta, or in that PL, changing Alpha from approved to unapproved. It could also do this for other pages at Alpha's domain or base domain, that point to pages in the Checker's purview.
24. Unauthorized Use of Web Resources
Consider a company, "Big", with a domain big.com. It has a web server that serves up web pages. Some of these might be static and some might be dynamic. In either case, a page typically has various images. These are often stored in files, one image per file. Some images might have a stylized trademark. If big.com has many pages, then factoring out the images into files is useful. Notably because image files that are used in several pages helps those pages have a consistent look, which the reader might associate with the company. (If a page is dynamic, these images are still often held in actual static files.)
These image files almost always have addresses (URLs) that can be used by anyone else on the web, to directly load those files. Suppose a file is at http://big.com/images/al .jpg, for example. If the server gets an http request with that URL, then it supplies al .jpg.
Why is this a problem? Imagine a spammer, Joe. He has a website, joe.com.
He sends out spam email that uses those images. If he is also claiming to be sending from big.com, then this is phishing, and the methods of the Antiphishing Provisionals can be used to combat it. In those Provisional, we gave extensive discussion of how to use a browser plug-in in tandem with an Aggregation Center (Agg) and with the use of a Notphish tag <notphish> in email or web pages.
But suppose instead he does not claim to be from big.com, either in the sender line or in the text of the message. He could have several reasons for doing this. Firstly, while not explicitly claiming to be from big.com, he might be "subliminally" trying to trick a reader into thinking so. If big.com is a well known website, then by imitating its look and feel, Joe adds an unwarranted credence to his spam.
Secondly, he reduces the bandwidth and computational load onjoe.com, compared to where he copies those files from big.com to joe.com, and then has his spam load fromjoe.com. This may be a significant saving to him. Because the number of people who read his message, in a program that can show HTML (like a browser), might often be much more than those who then click on a link that goes to joe.com. Assuming that the spam has such a link; most do. Alternatively, he might not have such a link or a website. The spam might ask the reader to call a phone number, for example. While Joe might also have other reasons, the above two may be the most likely. Each of the above reasons is undesirable to Big. If Joe sells shoddy (or illegal) goods or services, then there is a risk that people get a negative impression of Big. He is also imposing extra bandwidth and computational costs onto Big.
Another thing that Joe might do is atjoe.com. Here, pages might have images loading from big.com.
Hitherto, there has been little that Big could do to prevent either of these cases. Here, we offer the use of a browser plug-in. The functionality to be described below can optionally, but preferably, be regarded as an extension of the antiphishing plug-in from the Antiphishing Provisionals. Big can send certain data to the Agg. Including but not limited to the following:
|-Can an email that is not from big.com load files from big.com? fCan an Instant Message that is not from big.com or from big.com's phone numbers, load files from big.com? (The data sent to the Agg also includes these phone numbers.) rDitto for an SMS.
A simple generalization of the first point is that Big can send a list of approved domains to the Agg. So that a message from one of these domains is allowed to load files from big.com. Or that these domains can have pages that load files from big.com. (Or these might be 2 lists, to distinguish between messages and pages.) This list is similar to the Partner List of the Antiphishing Provisionals.
The Agg amasses data from Big and other companies. Note that the data, per company, is relatively small. Periodically, it sends the data to its plug-ins. A plug- in might store the data as a hash table, Kappa, with the key being a company domain (like "big.com"), and the corresponding value being that company's data, as described above. The plug-in uses the methods of "2528" to detect if the user is reading messages at a known message provider, like a major ISP. It extracts the message and can then determine the (purported) sender. It checks if the body is loading any files. If so, it finds the base domains of those addresses. If any of these are keys of Kappa, then it looks up the keys' values to see if the sender domain is in them. If not, then it can mark the message as "suspect". Where, following the discussion in ["2245", "2458", "2528"], this might be via a color change in a button on the browser, that is associated with the plug-in. Plus, it might turn off any links in the message, to protect the user. While this does not prevent the resource consumption on big.com, it reduces the risk that the user will mis-perceive the message as being from big.com. And it reduces the ability of Joe to derive a sale from the message.
The above assumed that the user was viewing the message. Another possibility is that the plug-in might be able to perform the above on all or a portion of the user's messages. Then, those that are suspect might be placed into a Bulk folder, for example.
Given a suspect message, the plug-in can do other steps. It extracts any links. These, or the suspect messages, might periodically be reported to the Agg. Assuming that the user gives permission for entire suspect messages to be uploaded. If not, summary metadata about the message might be uploaded. These metadata might be those found using our antispam canonical steps of "8757".
Suppose now that the user is surfing the web and goes to a page at joe.com, and the page loads a file from big.com. Essentially the same steps as above can be performed to detect and mark a web page as suspect.
The Agg can then report to Big that its web resources appear to be misused, and send it various supporting data. Like the links or even the texts of those suspect messages or the URLs of the suspect pages. If the plug-in also has our antiphishing capabilities, then the above steps can optionally, but preferably, be done after the antiphishing steps. Where, if the message is has been found to be phishing, there might be no need to do the above. Since a phishing message is worse than the type of messages we are detecting here.

Claims

1. A method where a user with an email address at an ISP5 maintains a "Partner List" at that ISP of URIs of web pages that contain her email address, and of which she approves.
2. A method, using claim 1 , where an author of a web page at a given URL/URI, writes an email address, optionally demarcated by a custom tag (e.g. "<ask>" and "</ask>"), and asks the addressee to add the URL/URI to the latter's Partner List.
3. A method, using claim 1 , where the ISP runs a Web Service ("Checker") that answers queries on the network, using the Partner Lists of its customers, where a query consists of a URI of a web page, and an address of a customer of the ISP, and the reply is whether the URI is in that customer's Partner List.
4. A method, using claim 3, where there is a modification ("Asker") to a browser, which can detect an email address in the currently viewed web page, and sends a Checker, that might exist at that address's base domain, the URI of the page and the email address and asks whether the URI is in that address's Partner List, and the Asker indicates the result in the browser in some fashion.
5. A method where a Web domain ("Alpha") maintains a Partner List of external web pages that have links to its pages, and of which it approves.
6. A method, using claim 5, where an author of a web page at a given URI writes a link, with an optional custom attribute (e.g. "ask"), and asks the base domain of that link to add the URI to its Partner List.
7. A method, using claim 5, where Alpha runs a Checker that answers queries on the network, where a query consists of a URI of a web page, and a link in that page to a page in Alpha, and the reply is whether the URI is in Alpha's page's Partner List.
8. A method, using claim 5, where there is a modification ("Asker") to a browser, which can detect a link in the currently viewed web page, and sends a
Checker, that might exist at that link's base domain, the URI of the page and the link and asks whether the URI is in that link's Partner List, and the Asker indicates the result in the browser in some fashion.
9. A method, where an author of a web page writes, using custom tags, the
Partner List of that page, into the page itself.
10. A method, using claim 9, where the Asker downloads a page that is linked to from the page being currently seen in the browser, and if the linked page has a Partner List, then the Asker indicates in the browser in some fashion whether the viewed page is in that list.
11. A method, using claim 5, of defining a "Verify Score" of a web page Gamma, that is a measure of the pages that it links to, at other domains, and which have Gamma in their Partner Lists.
12. A method, using claim 11 , of a search engine finding the Verify Scores (if any) of web pages, and using these to improve the pages' rankings and to improve a programmatic understanding of the pages' meanings.
13. A method, using claims 3 and 7, of a search engine displaying a list of Checkers, sorted by various criteria, and possibly showing the most popular queries gotten by each Checker, where the Checkers have consented to share this information with the search engine.
PCT/CN2006/001986 2005-08-07 2006-08-07 System and method for verifying links and electronic addresses in web pages and messages WO2007016868A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US59580705P 2005-08-07 2005-08-07
US60/595,807 2005-08-07
US46271006A 2006-08-06 2006-08-06
US11/462,710 2006-08-06

Publications (1)

Publication Number Publication Date
WO2007016868A2 true WO2007016868A2 (en) 2007-02-15

Family

ID=37727660

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2006/001986 WO2007016868A2 (en) 2005-08-07 2006-08-07 System and method for verifying links and electronic addresses in web pages and messages

Country Status (1)

Country Link
WO (1) WO2007016868A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7860971B2 (en) 2008-02-21 2010-12-28 Microsoft Corporation Anti-spam tool for browser
US7930303B2 (en) 2007-04-30 2011-04-19 Microsoft Corporation Calculating global importance of documents based on global hitting times
US9065845B1 (en) * 2011-02-08 2015-06-23 Symantec Corporation Detecting misuse of trusted seals
CN112711455A (en) * 2020-12-31 2021-04-27 京东数字科技控股股份有限公司 Page interaction method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7930303B2 (en) 2007-04-30 2011-04-19 Microsoft Corporation Calculating global importance of documents based on global hitting times
US7860971B2 (en) 2008-02-21 2010-12-28 Microsoft Corporation Anti-spam tool for browser
US9065845B1 (en) * 2011-02-08 2015-06-23 Symantec Corporation Detecting misuse of trusted seals
CN112711455A (en) * 2020-12-31 2021-04-27 京东数字科技控股股份有限公司 Page interaction method and device, electronic equipment and storage medium
CN112711455B (en) * 2020-12-31 2024-04-16 京东科技控股股份有限公司 Page interaction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US7822620B2 (en) Determining website reputations using automatic testing
US8516377B2 (en) Indicating Website reputations during Website manipulation of user information
US9384345B2 (en) Providing alternative web content based on website reputation assessment
US8826155B2 (en) System, method, and computer program product for presenting an indicia of risk reflecting an analysis associated with search results within a graphical user interface
US7765481B2 (en) Indicating website reputations during an electronic commerce transaction
US8566726B2 (en) Indicating website reputations based on website handling of personal information
US20060253584A1 (en) Reputation of an entity associated with a content item
US20060253582A1 (en) Indicating website reputations within search results
US20140331119A1 (en) Indicating website reputations during user interactions
US20150213131A1 (en) Domain name searching with reputation rating
US20070094500A1 (en) System and Method for Investigating Phishing Web Sites
Gandhi et al. Badvertisements: Stealthy click-fraud with unwitting accessories
Jakobsson The death of the internet
Klein Defending Against the Wily Surfer {-Web-Based} Attacks and Defenses
WO2007016868A2 (en) System and method for verifying links and electronic addresses in web pages and messages
WO2007076715A1 (en) System and method of approving web pages and electronic messages
WO2006026921A2 (en) System and method to detect phishing and verify electronic advertising
WO2006042480A2 (en) System and method for investigating phishing web sites
Medlin et al. The cost of electronic retailing: Prevalent security threats and their results
Suresh et al. Detailed investigation: stratification of phishing websites assisted by user ranking mechanism
WO2006060967A2 (en) System and method for extending an antiphishing aggregator
WO2023003457A1 (en) Method for increasing web security and device and system for doing the same
Bian New Approaches for Ensuring User Online Privacy
Jakobsson et al. Real-World Phishing Experiments: A Case Study
Ronda iTrustPage: pretty good phishing protection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 06775304

Country of ref document: EP

Kind code of ref document: A2