US20140379443A1 - Methods, systems, and media for applying scores and ratings to web pages,web sites, and content for safe and effective online advertising - Google Patents
Methods, systems, and media for applying scores and ratings to web pages,web sites, and content for safe and effective online advertising Download PDFInfo
- Publication number
- US20140379443A1 US20140379443A1 US14/184,264 US201414184264A US2014379443A1 US 20140379443 A1 US20140379443 A1 US 20140379443A1 US 201414184264 A US201414184264 A US 201414184264A US 2014379443 A1 US2014379443 A1 US 2014379443A1
- Authority
- US
- United States
- Prior art keywords
- rating
- ordinomial
- ratings
- content
- posterior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000013179 statistical model Methods 0.000 claims abstract description 11
- 238000013507 mapping Methods 0.000 claims description 11
- 230000004931 aggregating effect Effects 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 2
- 238000004220 aggregation Methods 0.000 description 25
- 230000002776 aggregation Effects 0.000 description 24
- 230000007246 mechanism Effects 0.000 description 16
- 230000002123 temporal effect Effects 0.000 description 15
- 230000008569 process Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 12
- 238000013459 approach Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 230000009467 reduction Effects 0.000 description 9
- 238000009826 distribution Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000003860 storage Methods 0.000 description 8
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 6
- 241000208125 Nicotiana Species 0.000 description 5
- 235000002637 Nicotiana tabacum Nutrition 0.000 description 5
- 239000003814 drug Substances 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 230000001568 sexual effect Effects 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 206010020400 Hostility Diseases 0.000 description 2
- 230000016571 aggressive behavior Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- 208000001613 Gambling Diseases 0.000 description 1
- 239000003638 chemical reducing agent Substances 0.000 description 1
- 230000001055 chewing effect Effects 0.000 description 1
- 235000019504 cigarettes Nutrition 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 238000004886 process control Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0263—Targeted advertisements based upon Internet or website rating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
-
- G06F17/3053—
-
- G06F17/30861—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0282—Rating or review of business operators or products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0277—Online advertisement
Definitions
- the disclosed subject matter generally relates to methods, systems, and media for applying scores and ratings to web pages, web sites, and other pieces of content of interest to advertisers or content providers for safe and effective online advertising.
- Online advertisers use tools that provide information about websites or publishers and the viewers of such websites to facilitate more effective planning and management of online advertising by advertisers.
- online advertisers continually desire increased control over the web pages on which their advertisements and brand messages appear. For example, particular online advertisers want to control the risk that their advertisements and brand messages appear on pages or sites that contain objectionable content (e.g., pornography or adult content, hate speech, bombs, guns, ammunition, alcohol, offensive language, tobacco, spyware, malicious code, illegal drugs, music downloading, particular types of entertainment, illegality, obscenity, etc.).
- objectionable content e.g., pornography or adult content, hate speech, bombs, guns, ammunition, alcohol, offensive language, tobacco, spyware, malicious code, illegal drugs, music downloading, particular types of entertainment, illegality, obscenity, etc.
- advertisers for adult-oriented products, such as alcohol and tobacco want to avoid pages directed towards children.
- the disclosed subject matter provides advertisers, agencies, advertisement networks, advertisement exchanges, and publishers with the ability to make risk-controlled decisions based on the category-specific risk and/or general risk associated with a given web page, website, etc.
- advertisers, agencies, advertisement networks, advertisement exchanges, and publishers can determine whether to place a particular advertisement on a particular web page based on a high confidence that the page does not contain objectionable content.
- advertisers, agencies, advertisement networks, advertisement exchanges, and publishers can request to view a list of pages in their current advertisement network traffic assessed to have the highest risk of objectionable content.
- the risk rating can, in some embodiments, represent the probability that a page or a site contains or will contain objectionable content, the degree of objectionability of the content, and/or any suitable combination thereof.
- the risk rating can be determined for a single domain and/or a single category such that a particular piece of media or content can have a rating for each of a number of objectionable content categories.
- the risk rating can be determined across several objectionable content categories, across multiple pieces of content (e.g., the pages appearing in the advertiser's traffic), and/or across multiple domains managed by a publisher.
- these mechanisms can be generated using multiple statistical models and considering multiple pieces of evidence. In some embodiments, these mechanisms can account for temporal dynamics in content by determining a risk rating that is based on the probability of encountering different severity levels from a given URL and that is based on the types of estimated severity exhibited in the past.
- these mechanisms can evaluate the quality of collections of content. More particularly, these mechanisms can collect individual content ratings (e.g., ordinal ratings and/or real-valued ratings), aggregate these ratings across arbitrary subsets, normalize these ordinal and real-valued ratings onto a general index scale, and calibrate and/or map the normalized ratings using a global mean to provide a benchmark for comparison. This mapping can capture the risk and/or severity profiles of appearance of content.
- individual content ratings e.g., ordinal ratings and/or real-valued ratings
- This mapping can capture the risk and/or severity profiles of appearance of content.
- the method comprises: extracting one or more features from a piece of web content; applying a plurality of statistical models to the extracted features to generate a plurality of ordinomial estimates, wherein each ordinomial estimate represents a probability that the web content is a member of one of a plurality of severity groups; determining a posterior ordinomial estimate for the web content by combining the plurality of ordinomial estimates; generating a risk rating that encodes severity and confidence based on the determined posterior ordinomial estimate, wherein the risk rating identifies whether the web content is likely to contain objectionable content of a given category; and providing the risk rating for determining whether an advertisement should be associated with the web content.
- the method further comprises: determining a plurality of posterior ordinomial estimates at a plurality of times for the web content; and determining an expected posterior ordinomial estimate by combining the plurality of posterior ordinomial estimates over the plurality of times.
- the method further comprises: extracting a uniform resource locator from the one or more features; assembling a first set of posterior ordinomial estimates from the plurality of posterior ordinomial estimates based on the uniform resource locator; and determining the expected posterior ordinomial estimate by combining the first set of posterior ordinomial estimates over the plurality of times.
- the method further comprises: determining that the web content belongs to a sitelet, wherein the sitelet includes a plurality of web pages; determining a sitelet ordinomial by aggregating the plurality of posterior ordinomial estimates associated with each of the plurality of web pages; and generating a sitelet rating based on the aggregated plurality of posterior ordinomials.
- the method further comprises: comparing the sitelet ordinomial with the plurality of posterior ordinomial estimates associated with each of the plurality of web pages belonging to the sitelet; and determining whether to store at least one of the sitelet ordinomial and the plurality of posterior ordinomial estimates based on the comparison and a sensitivity value.
- the method further comprises: collecting a plurality of ratings associated with a plurality of pieces of web content, wherein the plurality of ratings includes ordinal ratings and real-valued ratings; and determining an aggregate rating for the plurality of pieces of web content based on the collected plurality of ratings.
- the method further comprises normalizing the aggregate rating by mapping the aggregate rating to an index-scaled rating.
- the method further comprises: applying a severity weight to the index-scaled rating; and generating a severity-weighted index-scaled rating for the plurality of pieces of web content.
- the method further comprises generating a combined risk rating by combining the generated risk rating that encodes whether the web content is likely to contain objectionable content of the given category with a second risk rating that encodes whether the web content is likely to contain objectionable content of a second category.
- a system for rating webpages for safe advertising comprising a processor that: extracts one or more features from a piece of web content; applies a plurality of statistical models to the extracted features to generate a plurality of ordinomial estimates, wherein each ordinomial estimate represents a probability that the web content is a member of one of a plurality of severity groups; determines a posterior ordinomial estimate for the web content by combining the plurality of ordinomial estimates; generates a risk rating that encodes severity and confidence based on the determined posterior ordinomial estimate, wherein the risk rating identifies whether the web content is likely to contain objectionable content of a given category; and provides the risk rating for determining whether an advertisement should be associated with the web content.
- a non-transitory computer-readable medium containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for rating webpages for safe advertising, the method comprising: extracting one or more features from a piece of web content; applying a plurality of statistical models to the extracted features to generate a plurality of ordinomial estimates, wherein each ordinomial estimate represents a probability that the web content is a member of one of a plurality of severity groups; determining a posterior ordinomial estimate for the web content by combining the plurality of ordinomial estimates; generating a risk rating that encodes severity and confidence based on the determined posterior ordinomial estimate, wherein the risk rating identifies whether the web content is likely to contain objectionable content of a given category; and providing the risk rating for determining whether an advertisement should be associated with the web content.
- FIG. 1 is a diagram of an illustrative example of a process for determining the probability of membership in a severity group for a category of objectionable content in accordance with some embodiments of the disclosed subject matter.
- FIG. 2 is a diagram of an illustrative example of combining ordinomial estimates into a posterior ordinomial estimate in accordance with some embodiments of the disclosed subject matter.
- FIG. 3 is an illustrative example of temporal aggregation of posterior ordinomials in accordance with some embodiments of the disclosed subject matter.
- FIG. 4 is an illustrative example of the map reduction approach (MapReduce) for determining the temporal aggregation of posterior ordinomials in accordance with some embodiments of the disclosed subject matter.
- FIG. 5 is a diagram of an illustrative example of a process for generating one or more ratings for a webpage in accordance with some embodiments of the disclosed subject matter.
- FIG. 6 is a diagram of a graph showing the selection of an appropriate bin (b i ) in an ordinomial given a confidence parameter ( ⁇ ) in accordance with some embodiments of the disclosed subject matter.
- FIG. 7 is a diagram of an illustrative rating scale in accordance with some embodiments of the disclosed subject matter.
- FIG. 8 is an illustrative example that incoming URLs can be matched to the sitelet with the longest available shared prefix in accordance with some embodiments of the disclosed subject matter.
- FIG. 9 is an illustrative example of calculating sitelet ordinomials in accordance with some embodiments of the disclosed subject matter.
- FIG. 10 is an illustrative example of calculating sitelet ordinomials and sitelet ratings in settings with small domains in accordance with some embodiments of the disclosed subject matter.
- FIG. 11 is an illustrative example of calculating sitelet ordinomials and sitelet ratings in settings with larger domains in accordance with some embodiments of the disclosed subject matter.
- FIG. 13 is a diagram of an illustrative system on which a rating application can be implemented in accordance with some embodiments of the disclosed subject matter.
- FIG. 14 is a diagram of an illustrative user computer and server as provided, for example, in FIG. 13 in accordance with some embodiments of the disclosed subject matter.
- mechanisms for scoring and rating web pages, web sites, and other pieces of content of interest to advertisers or content providers for safe and effective online advertising are provided. These mechanisms, among other things, generate a risk rating that accounts for the inclusion of objectionable content with the use of ordinomials.
- the risk rating can, in some embodiments, represent the probability that a page or a site contains or will contain objectionable content, the degree of objectionability of the content, and/or any suitable combination thereof.
- the risk rating can be determined for a single domain and/or a single category such that a particular piece of media or content can have a rating for each of a number of objectionable content categories.
- the risk rating can be determined across several objectionable content categories, across multiple pieces of content (e.g., the pages appearing in the advertiser's traffic), and/or across multiple domains managed by a publisher.
- these mechanisms can be generated using multiple statistical models and considering multiple pieces of evidence. In some embodiments, these mechanisms can account for temporal dynamics in content by determining a risk rating that is based on the probability of encountering different severity levels from a given URL and that is based on the types of estimated severity exhibited in the past.
- these mechanisms can be used in a variety of applications.
- these mechanisms can provide a rating application that allows advertisers, ad networks, publishers, site managers, and/or other entities to make risk-controlled decisions based at least in part on risk associated with a given webpage, website, or any other suitable content (generally referred to herein as a “webpage” or “page”).
- these mechanisms can be provide a rating application that allows advertisers, agencies, advertisement networks, advertisement exchanges, and/or publishers to determine whether to place a particular advertisement on a particular web page based on a high confidence that the page does not contain objectionable content.
- these mechanisms allow an advertiser to designate that an advertisement should not be placed on a web page unless a particular confidence (e.g., high confidence, medium-high confidence, etc.) is achieved.
- the particular confidence may be determined based on having a severity greater than a particular severity group in a particular category.
- advertisers, agencies, advertisement networks, advertisement exchanges, and publishers can request to view a list of pages in their current advertisement network traffic assessed to have the highest risk of objectionable content.
- these categories can include content that relates to guns, bombs, and/or ammunition (e.g., sites that describe or provide information on weapons including guns, rifles, bombs, and ammunition, sites that display and/or discuss how to obtain weapons, manufacture of weapons, trading of weapons (whether legal or illegal), sites which describes or offer for sale weapons including guns, ammunition, and/or firearm accessories, etc.).
- content that relates to guns, bombs, and/or ammunition e.g., sites that describe or provide information on weapons including guns, rifles, bombs, and ammunition, sites that display and/or discuss how to obtain weapons, manufacture of weapons, trading of weapons (whether legal or illegal), sites which describes or offer for sale weapons including guns, ammunition, and/or firearm accessories, etc.
- these categories can include content relating to alcohol (e.g., sites that provide information relating to alcohol, sites that provide recipes for mixing drinks, sites that provide reviews and locations for bars, etc.), drugs (e.g., sites that provide instructions for or information about obtaining, manufacturing, or using illegal drugs), and/or tobacco (e.g., sites that provide information relating to smoking, cigarettes, chewing tobacco, pipes, etc.).
- alcohol e.g., sites that provide information relating to alcohol, sites that provide recipes for mixing drinks, sites that provide reviews and locations for bars, etc.
- drugs e.g., sites that provide instructions for or information about obtaining, manufacturing, or using illegal drugs
- tobacco e.g., sites that provide information relating to smoking, cigarettes, chewing tobacco, pipes, etc.
- these categories can include offensive language (e.g., sites that contain swear words, profanity, hard language, inappropriate phrases and/or expressions), hate speech (e.g., sites that advocate hostility or aggression towards individuals or groups on the basis of race, religion, gender, nationality, or ethnic origin, sites that denigrate others or justifies inequality, sites that purport to use scientific or other approaches to justify aggression, hostility, or denigration), and/or obscenities (e.g., sites that display graphic violence, the infliction of pain, gross violence, and/or other types of excessive violence).
- these categories can include adult content (e.g., sites that contain nudity, sex, use of sexual language, sexual references, sexual images, and/or sexual themes).
- these categories can include spyware or malicious code (e.g., sites that provide instructions to practice illegal or unauthorized acts of computer crime using technology or computer programming skills, sites that contain malicious code, etc.) or other illegal content (e.g., sites that provide instructions for threatening or violating the security of property or the privacy of others, such as theft-related sites, locking picking and burglary-related sites, fraud-related sites).
- spyware or malicious code e.g., sites that provide instructions to practice illegal or unauthorized acts of computer crime using technology or computer programming skills, sites that contain malicious code, etc.
- other illegal content e.g., sites that provide instructions for threatening or violating the security of property or the privacy of others, such as theft-related sites, locking picking and burglary-related sites, fraud-related sites.
- objectionable content on one or more of these webpages can generally be defined as having a severity level worse than (or greater than) b j in a category y.
- Each category (y) can include various severity groups b j , where j is greater than or equal to 1 through n and n is an integer greater than one.
- an adult content category can have various severity levels, such as G, PG-13, PG, R, NC-17, and X.
- an adult content category and an offensive speech category can be combined to form one category of interest.
- a category may not have fine grained severity groups and a binomial distribution can be used.
- a binomial probability can be used for binary outcome events, where there is typically one positive event (e.g., good, yes, etc.) and one negative event (e.g., bad, no, etc.).
- FIG. 1 is a diagram showing an example of a process for determining the probability of membership in a severity group for one or more category of objectionable content in accordance with some embodiments of the disclosed subject matter.
- process 100 begins by receiving or reviewing content on a webpage, website, or any other suitable content (generally referred to herein as a “webpage” or “page”) at 110 .
- a rating application can receive multiple requests to rate a group of webpages or websites.
- a rating application can receive, from an advertiser, a list of websites that the advertiser is interested in placing an advertisement provided that each of these websites does not contain or does not have a high likelihood of containing objectionable content.
- a rating application can receive, from an advertiser, that advertiser's current advertisement network traffic for assessment.
- the rating application or a component of the rating application selects a uniform resource locator (URL) for rating at 120 .
- the rating application can receive one or more requests from other components (e.g., the most popular requests are assigned a higher priority, particular components of the rating application are assigned a higher priority, or random selection from the requests).
- a fixed, prioritized list of URLs can be defined based, for example, on ad traffic or any other suitable input (e.g., use of the rating for scoring, use of the rating for active learning, etc.).
- One or more pieces of evidence can be extracted from the uniform resource locator or page at 130 .
- These pieces of evidence can include, for example, the text of the URL, image analysis, HyperText Markup Language (HTML) source code, site or domain registration information, ratings, categories, and/or labeling from partner or third party analysis systems (e.g., site content categories), source information of the images on the page, page text or any other suitable semantic analysis of the page content, metadata associated with the page, anchor text on other pages that point to the page of interest, ad network links and advertiser information taken from a page, hyperlink information, malicious code and spyware databases, site traffic volume data, micro-outsourced data, any suitable auxiliary derived information (e.g., ad-to-content ratio), and/or any other suitable combination thereof.
- evidence and/or any other suitable information relating to the page can be collected, extracted, and/or derived using one or more evidentiary sources.
- an ordinomial can be generated at 140 .
- a multi-severity classification can be determined by using an ordinomial to encode the probability of membership in an ordered set of one or more severity groups.
- the ordinomial can be represented as follows:
- y is a variable representing the severity class that page x belongs to. It should be noted that the ordinal nature implies that b i is less severe than b j , when i ⁇ j. It should also be noted that ordinomial probabilities can be estimated using any suitable statistical models, such as the ones described herein, and using the evidence derived from the pages.
- an ordinomial distribution that includes each generated ordinomial for one or more severity groups can be generated. Accordingly, the cumulative ordinal distribution F can be described as:
- a category may not have fine grained severity groups and a binomial distribution can be used.
- a binomial probability can be used for binary outcome events, where there is typically one positive event (e.g., good, yes, etc.) and one negative event (e.g., bad, no, etc.).
- a binary or binomial-probability determination of appropriateness or objectionability can be projected onto an ordinomial by considering the extreme classes—b 1 and b n .
- a binomial determination can be performed, where the extreme classes include one positive class (e.g., malware is present in the content) and one negative class (e.g., malware is not present in the content).
- Ordinomial probabilities can be estimated using one or more statistical models, for example, from evidence derived or extracted from the received web pages.
- process 100 of FIG. 1 and other processes described herein some steps can be added, some steps may be omitted, the order of the steps may be rearranged, and/or some steps may be performed simultaneously.
- ordinomials can be generated from a variety of different statistical models based on a diverse range of evidence. For example, different pieces of evidence can be accounted for in the determination of an ordinomials. These ordinomial estimates can be combined into a posterior ordinomial estimate using, for example, ensemble approaches and information fusion approaches.
- example aggregation approaches include weighted averaging, AdaBoost-type mixing, or using sub-ordinomials as covariates in a secondary model. Accordingly, as shown in FIG. 2 , this can be represented as:
- the rating application can provide temporal aggregation features to account for the change to web pages over time.
- FIG. 4 shows an illustrative example of the map reduction approach (MapReduce) for determining the temporal aggregation of posterior ordinomials in accordance with some embodiments of the disclosed subject matter.
- MapReduce map reduction approach
- URLs can be used as the key for the reduction phase of the MapReduce process. This has the effect of compiling all samples that belong to a given domain onto a single computer during the reduction.
- the ordinomials probabilities and the timestamp denoting the instant the ordinomials probability sample was made are passed. More particularly, as shown in FIG. 4 , the posterior ordinomials for a given domain can be sorted based on the timestamp or observation time.
- Probability estimates can then be performed, where the sorted posterior ordinomials for a given domain are combined and an expected posterior ordinomials is calculated. Depending on the computational nature of the temporal aggregation, this expected ordinomial can be stored for use in future temporal aggregations, thereby alleviating the need for explicit storage of each individual record. Additionally, the reduction phase of this MapReduce process can compute and output a rating as described herein.
- FIG. 5 is a diagram of an example of a process 500 for generating a rating (R) for a webpage in accordance with some embodiments of the disclosed subject matter.
- p(y b i
- x) that includes severity and confidence parameters is determined.
- an advertiser may desire that the rating represents a particular confidence that the page's content is no worse than severity group b j .
- an advertiser may desire that the rating encodes the confidence that a particular webpage is no better than a particular severity group.
- process 500 begins by selecting the worst severity in accordance with a user specified confidence parameter ( ⁇ ) at 510 .
- ⁇ a user specified confidence parameter
- FIG. 6 starting from the least severe or objectionable category in the ordinomial (b 1 ), the bins of the ordinomial are ascended, maintaining a sum of the probabilities encountered.
- the bin, b i where the level of confidence ( ⁇ ) is reached can be represented by:
- the bin, b i is selected such that the application has at least the level of confidence ( ⁇ ) that the content is no worse than b i .
- the rating application can determine ratings from a given page's ordinomial probability estimates and encodes both severity and confidence. It should be noted that the rating application can assume that ratings are given on a numeric scale that can be divided into ranges B j , where there is a one-to-one mapping between these ranges and the b j . That is, step 510 of process 500 indicates that there is a particular confidence that a page has severity no worse than b j , and the rating (R) is somewhere in the range B j . For example, as shown in FIG.
- the rating scale 700 can be a numeric scale of the numbers 0 through 1000, where 1000 denotes the least severe end or the highly safe portion of the scale.
- rating scale 700 can be further divided such that particular portions of rating scale are determined to be the best pages—e.g., ratings falling between 800 and 1000. Accordingly, if a greater than confidence that the page's content is no worse than the best category, then the page's rating falls in the 800-1000 range.
- interior rating ranges for a particular objectionability category can be defined.
- the rating application can generate one or more ratings that take into account the difference between being uncertain between R rated content and PG rated content, where R and PG are two interior severity levels within the adult content category.
- the rating application can generate one or more ratings that take into account the difference between a page having no evidence of X rated content and a page having some small evidence of containing X rating content.
- rating range B j can be defined as s j-1 and s j .
- one or more ratings can be generated for one or more objectionable categories. For example, multiple ratings can be generated, where one rating is generated for each selected objectionable content category (e.g., adult content, offensive language, and alcohol).
- selected objectionable content category e.g., adult content, offensive language, and alcohol.
- ratings for two or more objectionable categories can be combined to create a combined score. For example, a first rating generated for an adult content category and a second rating generated for an offensive language category can be combined.
- weights can be assigned to each category such that a higher weight can be assigned to the adult content category and a lower weight can be assigned to the offensive language category. Accordingly, an advertiser or any other suitable user of the rating application can customize the score by assigning weights to one or more categories.
- a multi-dimensional rating vector can be created that represents, for each site, the distribution of risk of adjacency to objectionable content along different dimensions: guns, bombs and ammunition; alcohol; offensive language; hate speech, tobacco; spyware and malicious code; illegal drugs; adult content, gaming and gambling; entertainment; illegality; and/or obscenity.
- the rating application can determine a rating for a sitelet.
- a sitelet is a collection or subset of web pages and, more particularly, is often a topically homogeneous portion of a page, such as a topic-oriented subtree of a large site's hierarchical tree structure. For example, “finance.yahoo.com” can receive a rating as a sitelet of the website “yahoo.com.”
- the rating application can rate sitelets as there are web pages that the rating application has never seen before. However, that does not mean that the rating application has no evidence with which to rate the page. There is substantial rating locality within sitelets. A page from a risky site or sitelet is risky itself.
- the rating application can rate sitelets for computational storage efficiency as it may not be necessary to save or store the scores for individual pages if they are not significantly different from the scores for the sitelet. For example, if the ratings for the individual pages that make up website www.foo.com are within a given threshold value (e.g., a 5% difference), the rating application can store a rating for a sitelet (a collection of those individual pages). It should also be noted that sitelet scores can provide additional evidence to the rating computation even when the page has been seen before.
- advertising on a website can be an indication of direct financial support of the website. Even if a particular page does not contain objectionable content or is determined to not likely contain objectionable content, an advertiser may not want to support a site that otherwise promotes objectionable categories of content.
- the rating application can provide an indication when a particular news item promotes or supports a major Vietnamese website.
- the rating application can provide an indication when a particular advertiser that supports or advertises on a particular website falls in an objectionable category.
- the rating application can detect whether the content falls within an objectionable category and whether advertisers, promoters, or other entities associated with the content fall within an objectionable category.
- FIG. 8 shows an illustrative example that incoming URLs can be matched to the sitelet with the longest available shared prefix. The aggregated ordinomials and associated rating of this longest prefix are then used for the query URL. Radix trees can, in some embodiments, be used to make this query computationally efficient.
- a rating for every URL or sub-string in the file tree implied by a domain's URLs need not be stored explicitly. If the rating for a page or sub-tree is not significantly different from that of its parents, then explicit storage offers little additional benefit at the expense of increased storage and computation.
- ⁇ sensitivity parameter or threshold
- R( ⁇ ) denotes the rating for an entity
- c denotes the child page or subtree whose rating is under consideration
- p denotes the parent of child page c.
- sitelet ratings can be generated from sitelet ordinomials.
- the sitelet ordinomials can be produced by an aggregation process over the pages in the sitelet.
- the sitelet ordinomial can be a weighted combination of the page ordinomials, a Bayesian combination, or generated using any suitable explicit mathematical function.
- FIG. 9 shows an illustrative example of calculating sitelet ordinomials in accordance with some embodiments of the disclosed subject matter.
- the pages in the sitelet can be considered as a large set, or the tree structure can be taken into account explicitly. In the latter case, the calculation can be done efficiently by recursion.
- the base step is to calculate the rating at the root node. Then, for each step, the ratings for all the children are calculated. For each child, the inequality
- sitelet ordinomials can be efficiently calculated using a map reduction process in accordance with some embodiments of the disclosed subject matter.
- the rating application can generate ratings using a single pass via MapReduce or any other suitable mapping approach.
- the reduction phase is performed using the domain as a key. Once the URLs belonging to a domain are assembled together, a file tree or domain tree can be generated, and the above-mentioned calculation of sitelet ordinomials can be used to find pertinent ratings in a domain.
- FIG. 11 shows that sitelet ordinomials can be efficiently calculated using a map reduction process for settings with larger domains.
- the reduction via MapReduce can occur iteratively.
- M denote the number of suffixes in the largest domain.
- t from M to 1, combine all ordinomials at level t in accordance to the inequality
- the inequality returns false, all children are stored for rating and sitelet computation at higher levels. This may lead to an unacceptable demand for memory and resources.
- children with the same or very similar ratings can be combined using explicit combination functions, for example, Bayesian or weighted averaging.
- Those children that have a difference in rating of at least ⁇ are stored explicitly as their own sitelet rating. Each step reduces t by one: t ⁇ t ⁇ 1. This is repeated until t is equivalent to 1, where the rating and sitelet are calculated and stored to ensure all URLs present in a domain receive some rating.
- the rating application can calculate ratings using both temporal aggregation and sitelet aggregation. Generally speaking, the rating application accomplishes this by performing the temporal aggregation on URLs at the first step of sitelet aggregation. For example, as shown in FIG. 12 , the rating application can aggregate posterior ordinomials for all times (t), a reduction phase is performed using the domain as a key, and, once the URLs belonging to a domain are assembled together, a file tree or domain tree can be generated. The expected ordinomial for each URL can then be calculated.
- mechanisms are provided for evaluating the quality of collections of online media and other suitable content. Because online media is often purchased by advertisers at different levels of granularity (e.g., ranging from individual pages to large sets of domains), it is desirable to develop metrics for comparing the quality of such diverse sets of content. More particularly, these mechanisms, among other things, collect individual content ratings, aggregate these ratings across arbitrary subsets, normalize these ratings to be on a general index scale, and calibrate the normalized ratings such that the global mean provides a benchmark for comparison.
- the application calculates several metrics for particular content (e.g., media, web pages, etc.). For example, in the case of objectionable content, a category can have metrics encapsulating the risk related to the appearance of adult content, metrics encapsulating the risk related to the appearance or use of hate speech, etc. Accordingly, in some embodiments, the application can provide a single metric encapsulating the different aspects of the content.
- x j refers to an individual example of a piece of online media or online content, such as a particular web page, video, or image.
- x j refers to an individual example of a piece of online media or online content, such as a particular web page, video, or image.
- the multiple risk ratings can be combined into a single concise metric, r(x j ), using, for example, a specialized combination function, h, such that:
- r ( x j ) h ( r (1) ( x j ), . . . , r (M) ( x j ))
- example combination functions include weighted averaging, where the weights are set to the importance of particular objectionable content categories, Bayesian mixing, a secondary combining model, and/or a simple minimum function that determines the most risky category in the case of a brand safety model.
- weights are set to the importance of particular objectionable content categories
- Bayesian mixing e.g., Bayesian mixing
- secondary combining model e.g., a secondary combining model
- simple minimum function that determines the most risky category in the case of a brand safety model.
- multiple combining functions can also be used and aggregated to create the single concise metric.
- the single concise metric can, for example, be used to compare diverse sets of content.
- the application can allow the advertiser to compare the content management by two different advertising networks.
- r(•) can be ordinal, where r(•) ⁇ V 0 , . . . , V d ⁇ , such that without loss of generality, V 0 ⁇ V 1 ⁇ . . . ⁇ V d .
- the ratings r(•) can also be real-valued, where (•) ⁇ .
- r(x j ) can provide a measure that includes both the quality (or severity) of x j , and the confidence that x j deserves that level of quality. That is, the rating application can provide a rating r(•) that combines both the likelihood and the severity of content considered.
- online media is often packaged into arbitrary collections when being traded in the online advertising marketplace. Additionally, natural boundaries may exist, segregating a collection of content into distinct subsets. Given a rating defined on individual examples in this content space, r(•), it can be desirable to combine the ratings on individual pages into aggregate ratings denoting the expected rating of an entire subset of content.
- X denote a collection of media, for example, the media holdings of an online publisher having a particular category of web pages, such as pages related to sports, of the pages offered by a supply-side advertising network, including any subsets thereof.
- the rating application can aggregate the ratings of content in this collection, x ⁇ X.
- ⁇ (•) is an indicator function that takes the value 1 when the operand is true, and zero otherwise. This corresponds to the most common ordinal value in the collection. It should also be noted that ties may be broken arbitrarily, for example, by choosing the most severe category in the tie, for safety.
- r agg 1 ⁇ X ⁇ ⁇ ⁇ x ⁇ X ⁇ r ⁇ ( x ) .
- the rating application When aggregating content ratings, the rating application considers that content may be presented in a pre-aggregated form.
- the input may be domains, each with an aggregate rating.
- Y 1 be a collection of one or more examples of content
- x ⁇ Y l Let X then be extended to be a collection of such collections, Y l ⁇ X. Rating aggregation can then be extended to such sub-aggregations of content.
- r agg argmax V ⁇ Y l ⁇ X
- ⁇ ( r agg ( Y ) V ),
- r agg 1 ⁇ X ⁇ ⁇ ⁇ Y l ⁇ X ⁇ ⁇ Y l ⁇ ⁇ r agg ⁇ ( Y ) ,
- the rating application takes unconstrained, real-valued ratings and projects them onto a bounded region of the number line for ease of comparison.
- This mapping to the number value assigned to each ordinal category can be constructed to capture the risk and severity profiles of content in each respective category.
- the rating application can be configured to define an index-scaled rating to be a numerical rating assigned to online media constrained to the range r i (x) ⁇ [ ⁇ , ⁇ ].
- This rating is assumed to capture both the severity and risk of appearance of online media, with r i (x j ) ⁇ r i (x k ) implying that x k , is at least a risky as x j —there is a greater chance of riskier content appearing on x k than on x j .
- This implies that xj is likely to be safer for brand advertisers or other online media buyers.
- the rating application Given an index-scaled rating, r i (x), on a particular example x, the rating application defines a mapping from an unscaled rating to a scaled rating for both ordinal and real-valued ratings into r i (x).
- index-scaled rating For ordinal ratings, the index-scaled rating can be expressed as:
- mapping to an index-scaled rating is performed by assigning a constant, a, to each ordinal non-index scaled rating.
- a a constant
- r(•) ⁇ V 0 , . . . , V d ⁇ the mapping to an index-scaled rating.
- a ⁇ [ ⁇ , ⁇ ] e.g., a is bounded by the index-scaled rating range and without loss of generality a V m ⁇ a V n whenever V m ⁇ V n
- more risky ordinal categories have lower numerical values in the mapping.
- index-scaled rating For real-valued ratings, the index-scaled rating can be expressed as:
- f (•) is a monotonic function. For example, f(r(x j )) ⁇ f(r(x k )) whenever r(x j ) ⁇ r(x k ). That is, lower unscaled ratings tend to get lower scaled ratings. Additionally, it should be noted that, the range of f(•) is [ ⁇ , ⁇ ].
- the rating application can transform arbitrary raw ratings into an index-scaled rating.
- a numerical rating can encode the likelihood of encountering risky or inappropriate content on a given example of online media, in addition to the likely severity of such content.
- the resulting index-scaled rating represents the value of online content to buyers and advertisers, with risky and severely inappropriate content generally being of low value.
- the rating application can be configured to aggregate ratings for collected content, x ⁇ Xm with commensurate impact of riskier individual pages. This can be represented as follows:
- r i , agg ⁇ x ⁇ X ⁇ r i ⁇ ( x ) ⁇ x ⁇ X ⁇ w ⁇ ( r i ⁇ ( x ) )
- w(•) ⁇ [1, ⁇ ) represents a weight function associated with a content rating. More particularly, content that is riskier receives both a lower numerical rating and contributes to a higher total weight, thereby lowering the expected score via a lower denominator.
- the rating application creates four risk buckets—e.g., very high risk, high risk, moderate risk, and low risk, each with ranges of an index-scaled rating. For a given aggregation of content, the rating application also denotes the number of examples in each by r 1 , r 2 , r 3 , and r 4 , respectively.
- the rating application can also assign a native index-scaled rating to each bucket.
- the rating application can assign 50, 100, 150, and 200 to each bucket, respectively.
- the rating application can provide combination weights for each category.
- the application can assign the combination weights of 35.2, 8.8, 2.2, and 1.0 for each bucket, respectively. Accordingly, a severity weight aggregation of such content can be determined by calculating:
- r i , agg 50 ⁇ ⁇ r 1 + 100 ⁇ ⁇ r 2 + 150 ⁇ ⁇ r 3 + 200 ⁇ r 4 35.2 ⁇ ⁇ r 1 + 8.8 ⁇ r 2 + 2.2 ⁇ r 3 + 1.0 ⁇ r 4
- the rating application not only considers how a content rates with respect to risk and severity, but also determines how that content compares to other similar content. In order to perform such a comparison, the rating application recalibrates ratings to the mean rating of content being considered.
- the mean ( ⁇ r ) of the uncalibrated set of ratings can be determined by calculating:
- ⁇ r 1 ⁇ X ⁇ ⁇ ⁇ x ⁇ X ⁇ r i ⁇ ( x )
- gamma ( ⁇ ) can denote a value that the mean is mapped after calibration and Y j can denote a subset of content in X.
- the rating application then defines a calibration of Y j 's rating, r c relating to ⁇ r using the following cases:
- the re-calibration can be performed by determining:
- FIG. 13 is a generalized schematic diagram of a system 1300 on which the rating application may be implemented in accordance with some embodiments of the disclosed subject matter.
- system 1300 may include one or more user computers 1302 .
- User computers 1302 may be local to each other or remote from each other.
- User computers 1302 are connected by one or more communications links 1304 to a communications network 1306 that is linked via a communications link 1308 to a server 1310 .
- System 1300 may include one or more servers 1310 .
- Server 1310 may be any suitable server for providing access to the application, such as a processor, a computer, a data processing device, or a combination of such devices.
- the application can be distributed into multiple backend components and multiple frontend components or interfaces.
- backend components such as data collection and data distribution can be performed on one or more servers 1310 .
- the graphical user interfaces displayed by the application such as a data interface and an advertising network interface, can be distributed by one or more servers 1310 to user computer 1302 .
- each of the client 1302 and server 1310 can be any of a general purpose device such as a computer or a special purpose device such as a client, a server, etc.
- a general purpose device such as a computer
- a special purpose device such as a client, a server, etc.
- Any of these general or special purpose devices can include any suitable components such as a processor (which can be a microprocessor, digital signal processor, a controller, etc.), memory, communication interfaces, display controllers, input devices, etc.
- client 1302 can be implemented as a personal computer, a personal data assistant (PDA), a portable email device, a multimedia terminal, a mobile telephone, a set-top box, a television, etc.
- PDA personal data assistant
- any suitable computer readable media can be used for storing instructions for performing the processes described herein, can be used as a content distribution that stores content and a payload, etc.
- computer readable media can be transitory or non-transitory.
- non-transitory computer readable media can include media such as magnetic media (such as hard disks, floppy disks, etc.), optical media (such as compact discs, digital video discs, Blu-ray discs, etc.), semiconductor media (such as flash memory, electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), etc.), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media.
- transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
- communications network 1306 may be any suitable computer network including the Internet, an intranet, a wide-area network (“WAN”), a local-area network (“LAN”), a wireless network, a digital subscriber line (“DSL”) network, a frame relay network, an asynchronous transfer mode (“ATM”) network, a virtual private network (“VPN”), or any combination of any of such networks.
- Communications links 1304 and 1308 may be any communications links suitable for communicating data between user computers 1302 and server 1310 , such as network links, dial-up links, wireless links, hard-wired links, any other suitable communications links, or a combination of such links.
- User computers 1302 enable a user to access features of the application.
- User computers 1302 may be personal computers, laptop computers, mainframe computers, dumb terminals, data displays, Internet browsers, personal digital assistants (“PDAs”), two-way pagers, wireless terminals, portable telephones, any other suitable access device, or any combination of such devices.
- User computers 1302 and server 1310 may be located at any suitable location. In one embodiment, user computers 1302 and server 1310 may be located within an organization. Alternatively, user computers 1302 and server 1310 may be distributed between multiple organizations.
- user computer 1302 may include processor 1402 , display 1404 , input device 1406 , and memory 1408 , which may be interconnected.
- memory 1408 contains a storage device for storing a computer program for controlling processor 1402 .
- Processor 1402 uses the computer program to present on display 1404 the application and the data received through communications link 1304 and commands and values transmitted by a user of user computer 1302 . It should also be noted that data received through communications link 1304 or any other communications links may be received from any suitable source.
- Input device 1406 may be a computer keyboard, a cursor-controller, dial, switchbank, lever, or any other suitable input device as would be used by a designer of input systems or process control systems.
- Server 1310 may include processor 1420 , display 1422 , input device 1424 , and memory 1426 , which may be interconnected.
- memory 1426 contains a storage device for storing data received through communications link 1308 or through other links, and also receives commands and values transmitted by one or more users.
- the storage device further contains a server program for controlling processor 1420 .
- the application may include an application program interface (not shown), or alternatively, the application may be resident in the memory of user computer 1302 or server 1310 .
- the only distribution to user computer 1302 may be a graphical user interface (“GUI”) which allows a user to interact with the application resident at, for example, server 1310 .
- GUI graphical user interface
- the application may include client-side software, hardware, or both.
- the application may encompass one or more Web-pages or Web-page portions (e.g., via any suitable encoding, such as HyperText Markup Language (“HTML”), Dynamic HyperText Markup Language (“DHTML”), Extensible Markup Language (“XML”), JavaServer Pages (“JSP”), Active Server Pages (“ASP”), Cold Fusion, or any other suitable approaches).
- HTTP HyperText Markup Language
- DHTML Dynamic HyperText Markup Language
- XML Extensible Markup Language
- JSP JavaServer Pages
- ASP Active Server Pages
- Cold Fusion or any other suitable approaches.
- the application is described herein as being implemented on a user computer and/or server, this is only illustrative.
- the application may be implemented on any suitable platform (e.g., a personal computer (“PC”), a mainframe computer, a dumb terminal, a data display, a two-way pager, a wireless terminal, a portable telephone, a portable computer, a palmtop computer, an H/PC, an automobile PC, a laptop computer, a cellular phone, a personal digital assistant (“PDA”), a combined cellular phone and PDA, etc.) to provide such features.
- PC personal computer
- mainframe computer e.g., a mainframe computer, a dumb terminal, a data display, a two-way pager, a wireless terminal, a portable telephone, a portable computer, a palmtop computer, an H/PC, an automobile PC, a laptop computer, a cellular phone, a personal digital assistant (“PDA”), a combined cellular phone and PDA, etc.
- PDA personal
- a procedure is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. These steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared and otherwise manipulated. It proves convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. It should be noted, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
- the manipulations performed are often referred to in terms, such as adding or comparing, which are commonly associated with mental operations performed by a human operator. No such capability of a human operator is necessary, or desirable in most cases, in any of the operations described herein which form part of the present invention; the operations are machine operations.
- Useful machines for performing the operation of the present invention include general purpose digital computers or similar devices.
- the present invention also relates to apparatus for performing these operations.
- This apparatus may be specially constructed for the required purpose or it may comprise a general purpose computer as selectively activated or reconfigured by a computer program stored in the computer.
- the procedures presented herein are not inherently related to a particular computer or other apparatus.
- Various general purpose machines may be used with programs written in accordance with the teachings herein, or it may prove more convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these machines will appear from the description given.
Abstract
Description
- This application is a continuation of U.S. patent application Ser. No. 13/151,146, filed Jun. 1, 2011, which claims the benefit of U.S. Provisional Patent Application No. 61/350,393, filed Jun. 1, 2010 and U.S. Provisional Patent Application No. 61/431,789, filed Jan. 11, 2011, which are hereby incorporated by reference herein in their entireties.
- This application is also related to U.S. patent application Ser. No. 12/859,763, filed Aug. 19, 2010, which is hereby incorporated by reference herein in its entirety.
- The disclosed subject matter generally relates to methods, systems, and media for applying scores and ratings to web pages, web sites, and other pieces of content of interest to advertisers or content providers for safe and effective online advertising.
- Brands are carefully crafted and incorporate a firm's image as well as a promise to the firm's stakeholders. Unfortunately, in the current online environment, advertising networks may juxtapose advertisements that represent such brands with undesirable content due to the opacity of the ad-placement process and possibly to a misalignment of incentives in the ad-serving ecosystem. Currently, neither the ad network nor the brand can efficiently recognize whether a website contains or has a tendency to contain questionable content.
- Online advertisers use tools that provide information about websites or publishers and the viewers of such websites to facilitate more effective planning and management of online advertising by advertisers. Moreover, online advertisers continually desire increased control over the web pages on which their advertisements and brand messages appear. For example, particular online advertisers want to control the risk that their advertisements and brand messages appear on pages or sites that contain objectionable content (e.g., pornography or adult content, hate speech, bombs, guns, ammunition, alcohol, offensive language, tobacco, spyware, malicious code, illegal drugs, music downloading, particular types of entertainment, illegality, obscenity, etc.). In another example, advertisers for adult-oriented products, such as alcohol and tobacco, want to avoid pages directed towards children. In yet another example, particular online advertisers want to increase the probability that their content appears on specific sorts of sites (e.g., websites containing news-related information, websites containing entertainment-related information, etc.). However, current advertising tools merely categorize websites into categories indicating that a web site contains a certain sort of content.
- There is therefore a need in the art for approaches for applying scores and ratings to web pages, web sites, and content for safe and effective online advertising. Accordingly, it is desirable to provide methods, systems, and media that overcome these and other deficiencies of the prior art.
- For example, the disclosed subject matter provides advertisers, agencies, advertisement networks, advertisement exchanges, and publishers with the ability to make risk-controlled decisions based on the category-specific risk and/or general risk associated with a given web page, website, etc. In a more particular example, advertisers, agencies, advertisement networks, advertisement exchanges, and publishers can determine whether to place a particular advertisement on a particular web page based on a high confidence that the page does not contain objectionable content. In another more particular example, advertisers, agencies, advertisement networks, advertisement exchanges, and publishers can request to view a list of pages in their current advertisement network traffic assessed to have the highest risk of objectionable content.
- In accordance with various embodiments of the disclosed subject matter, mechanisms for scoring and rating web pages, web sites, and other pieces of content of interest to advertisers or content providers for safe and effective online advertising are provided.
- These mechanisms, among other things, generate a risk rating that accounts for the inclusion of objectionable content with the use of ordinomials. The risk rating can, in some embodiments, represent the probability that a page or a site contains or will contain objectionable content, the degree of objectionability of the content, and/or any suitable combination thereof. In a more particular example, the risk rating can be determined for a single domain and/or a single category such that a particular piece of media or content can have a rating for each of a number of objectionable content categories. Alternatively, in another more particular example, the risk rating can be determined across several objectionable content categories, across multiple pieces of content (e.g., the pages appearing in the advertiser's traffic), and/or across multiple domains managed by a publisher.
- In some embodiments, these mechanisms can be generated using multiple statistical models and considering multiple pieces of evidence. In some embodiments, these mechanisms can account for temporal dynamics in content by determining a risk rating that is based on the probability of encountering different severity levels from a given URL and that is based on the types of estimated severity exhibited in the past.
- In some embodiments, these mechanisms can evaluate the quality of collections of content. More particularly, these mechanisms can collect individual content ratings (e.g., ordinal ratings and/or real-valued ratings), aggregate these ratings across arbitrary subsets, normalize these ordinal and real-valued ratings onto a general index scale, and calibrate and/or map the normalized ratings using a global mean to provide a benchmark for comparison. This mapping can capture the risk and/or severity profiles of appearance of content.
- Systems, methods, and media for rating websites for safe advertising are provided. In accordance with some embodiments of the disclosed subject matter, the method comprises: extracting one or more features from a piece of web content; applying a plurality of statistical models to the extracted features to generate a plurality of ordinomial estimates, wherein each ordinomial estimate represents a probability that the web content is a member of one of a plurality of severity groups; determining a posterior ordinomial estimate for the web content by combining the plurality of ordinomial estimates; generating a risk rating that encodes severity and confidence based on the determined posterior ordinomial estimate, wherein the risk rating identifies whether the web content is likely to contain objectionable content of a given category; and providing the risk rating for determining whether an advertisement should be associated with the web content.
- In some embodiments, the method further comprises: determining a plurality of posterior ordinomial estimates at a plurality of times for the web content; and determining an expected posterior ordinomial estimate by combining the plurality of posterior ordinomial estimates over the plurality of times.
- In some embodiments, the method further comprises: extracting a uniform resource locator from the one or more features; assembling a first set of posterior ordinomial estimates from the plurality of posterior ordinomial estimates based on the uniform resource locator; and determining the expected posterior ordinomial estimate by combining the first set of posterior ordinomial estimates over the plurality of times.
- In some embodiments, the method further comprises: determining that the web content belongs to a sitelet, wherein the sitelet includes a plurality of web pages; determining a sitelet ordinomial by aggregating the plurality of posterior ordinomial estimates associated with each of the plurality of web pages; and generating a sitelet rating based on the aggregated plurality of posterior ordinomials.
- In some embodiments, the method further comprises: comparing the sitelet ordinomial with the plurality of posterior ordinomial estimates associated with each of the plurality of web pages belonging to the sitelet; and determining whether to store at least one of the sitelet ordinomial and the plurality of posterior ordinomial estimates based on the comparison and a sensitivity value.
- In some embodiments, the method further comprises: collecting a plurality of ratings associated with a plurality of pieces of web content, wherein the plurality of ratings includes ordinal ratings and real-valued ratings; and determining an aggregate rating for the plurality of pieces of web content based on the collected plurality of ratings.
- In some embodiments, the method further comprises normalizing the aggregate rating by mapping the aggregate rating to an index-scaled rating.
- In some embodiments, the method further comprises: applying a severity weight to the index-scaled rating; and generating a severity-weighted index-scaled rating for the plurality of pieces of web content.
- In some embodiments, the method further comprises generating a combined risk rating by combining the generated risk rating that encodes whether the web content is likely to contain objectionable content of the given category with a second risk rating that encodes whether the web content is likely to contain objectionable content of a second category.
- In some embodiments, a system for rating webpages for safe advertising is provided, the system comprising a processor that: extracts one or more features from a piece of web content; applies a plurality of statistical models to the extracted features to generate a plurality of ordinomial estimates, wherein each ordinomial estimate represents a probability that the web content is a member of one of a plurality of severity groups; determines a posterior ordinomial estimate for the web content by combining the plurality of ordinomial estimates; generates a risk rating that encodes severity and confidence based on the determined posterior ordinomial estimate, wherein the risk rating identifies whether the web content is likely to contain objectionable content of a given category; and provides the risk rating for determining whether an advertisement should be associated with the web content.
- In some embodiments, a non-transitory computer-readable medium containing computer-executable instructions that, when executed by a processor, cause the processor to perform a method for rating webpages for safe advertising, the method comprising: extracting one or more features from a piece of web content; applying a plurality of statistical models to the extracted features to generate a plurality of ordinomial estimates, wherein each ordinomial estimate represents a probability that the web content is a member of one of a plurality of severity groups; determining a posterior ordinomial estimate for the web content by combining the plurality of ordinomial estimates; generating a risk rating that encodes severity and confidence based on the determined posterior ordinomial estimate, wherein the risk rating identifies whether the web content is likely to contain objectionable content of a given category; and providing the risk rating for determining whether an advertisement should be associated with the web content.
- Various objects, features, and advantages of the disclosed subject matter can be more fully appreciated with reference to the following detailed description of the invention when considered in connection with the following drawing, in which like reference numerals identify like elements.
-
FIG. 1 is a diagram of an illustrative example of a process for determining the probability of membership in a severity group for a category of objectionable content in accordance with some embodiments of the disclosed subject matter. -
FIG. 2 is a diagram of an illustrative example of combining ordinomial estimates into a posterior ordinomial estimate in accordance with some embodiments of the disclosed subject matter. -
FIG. 3 is an illustrative example of temporal aggregation of posterior ordinomials in accordance with some embodiments of the disclosed subject matter. -
FIG. 4 is an illustrative example of the map reduction approach (MapReduce) for determining the temporal aggregation of posterior ordinomials in accordance with some embodiments of the disclosed subject matter. -
FIG. 5 is a diagram of an illustrative example of a process for generating one or more ratings for a webpage in accordance with some embodiments of the disclosed subject matter. -
FIG. 6 is a diagram of a graph showing the selection of an appropriate bin (bi) in an ordinomial given a confidence parameter (β) in accordance with some embodiments of the disclosed subject matter. -
FIG. 7 is a diagram of an illustrative rating scale in accordance with some embodiments of the disclosed subject matter. -
FIG. 8 is an illustrative example that incoming URLs can be matched to the sitelet with the longest available shared prefix in accordance with some embodiments of the disclosed subject matter. -
FIG. 9 is an illustrative example of calculating sitelet ordinomials in accordance with some embodiments of the disclosed subject matter. -
FIG. 10 is an illustrative example of calculating sitelet ordinomials and sitelet ratings in settings with small domains in accordance with some embodiments of the disclosed subject matter. -
FIG. 11 is an illustrative example of calculating sitelet ordinomials and sitelet ratings in settings with larger domains in accordance with some embodiments of the disclosed subject matter. -
FIG. 12 is an illustrative example of using the rating application to calculate ratings using both temporal aggregation and sitelet aggregation in accordance with some embodiments of the disclosed subject matter. -
FIG. 13 is a diagram of an illustrative system on which a rating application can be implemented in accordance with some embodiments of the disclosed subject matter. -
FIG. 14 is a diagram of an illustrative user computer and server as provided, for example, inFIG. 13 in accordance with some embodiments of the disclosed subject matter. - In accordance with some embodiments of the disclosed subject matter, mechanisms for scoring and rating web pages, web sites, and other pieces of content of interest to advertisers or content providers for safe and effective online advertising are provided. These mechanisms, among other things, generate a risk rating that accounts for the inclusion of objectionable content with the use of ordinomials. The risk rating can, in some embodiments, represent the probability that a page or a site contains or will contain objectionable content, the degree of objectionability of the content, and/or any suitable combination thereof. In a more particular example, the risk rating can be determined for a single domain and/or a single category such that a particular piece of media or content can have a rating for each of a number of objectionable content categories. Alternatively, in another more particular example, the risk rating can be determined across several objectionable content categories, across multiple pieces of content (e.g., the pages appearing in the advertiser's traffic), and/or across multiple domains managed by a publisher.
- In some embodiments, these mechanisms can be generated using multiple statistical models and considering multiple pieces of evidence. In some embodiments, these mechanisms can account for temporal dynamics in content by determining a risk rating that is based on the probability of encountering different severity levels from a given URL and that is based on the types of estimated severity exhibited in the past.
- These mechanisms can be used in a variety of applications. For example, these mechanisms can provide a rating application that allows advertisers, ad networks, publishers, site managers, and/or other entities to make risk-controlled decisions based at least in part on risk associated with a given webpage, website, or any other suitable content (generally referred to herein as a “webpage” or “page”). In another example, these mechanisms can be provide a rating application that allows advertisers, agencies, advertisement networks, advertisement exchanges, and/or publishers to determine whether to place a particular advertisement on a particular web page based on a high confidence that the page does not contain objectionable content. In a more particular example, these mechanisms allow an advertiser to designate that an advertisement should not be placed on a web page unless a particular confidence (e.g., high confidence, medium-high confidence, etc.) is achieved. In such an example, the particular confidence may be determined based on having a severity greater than a particular severity group in a particular category. In another example, advertisers, agencies, advertisement networks, advertisement exchanges, and publishers can request to view a list of pages in their current advertisement network traffic assessed to have the highest risk of objectionable content.
- It should be noted that there can be several categories of objectionable content that may be of interest. For example, these categories can include content that relates to guns, bombs, and/or ammunition (e.g., sites that describe or provide information on weapons including guns, rifles, bombs, and ammunition, sites that display and/or discuss how to obtain weapons, manufacture of weapons, trading of weapons (whether legal or illegal), sites which describes or offer for sale weapons including guns, ammunition, and/or firearm accessories, etc.). In another example, these categories can include content relating to alcohol (e.g., sites that provide information relating to alcohol, sites that provide recipes for mixing drinks, sites that provide reviews and locations for bars, etc.), drugs (e.g., sites that provide instructions for or information about obtaining, manufacturing, or using illegal drugs), and/or tobacco (e.g., sites that provide information relating to smoking, cigarettes, chewing tobacco, pipes, etc.). In yet another example, these categories can include offensive language (e.g., sites that contain swear words, profanity, hard language, inappropriate phrases and/or expressions), hate speech (e.g., sites that advocate hostility or aggression towards individuals or groups on the basis of race, religion, gender, nationality, or ethnic origin, sites that denigrate others or justifies inequality, sites that purport to use scientific or other approaches to justify aggression, hostility, or denigration), and/or obscenities (e.g., sites that display graphic violence, the infliction of pain, gross violence, and/or other types of excessive violence). In another example, these categories can include adult content (e.g., sites that contain nudity, sex, use of sexual language, sexual references, sexual images, and/or sexual themes). In another example, these categories can include spyware or malicious code (e.g., sites that provide instructions to practice illegal or unauthorized acts of computer crime using technology or computer programming skills, sites that contain malicious code, etc.) or other illegal content (e.g., sites that provide instructions for threatening or violating the security of property or the privacy of others, such as theft-related sites, locking picking and burglary-related sites, fraud-related sites).
- It should be noted that objectionable content on one or more of these webpages can generally be defined as having a severity level worse than (or greater than) bj in a category y. Each category (y) can include various severity groups bj, where j is greater than or equal to 1 through n and n is an integer greater than one. For example, an adult content category can have various severity levels, such as G, PG-13, PG, R, NC-17, and X. In another example, an adult content category and an offensive speech category can be combined to form one category of interest. In yet another example, unlike the adult content category example, a category may not have fine grained severity groups and a binomial distribution can be used. For example, a binomial probability can be used for binary outcome events, where there is typically one positive event (e.g., good, yes, etc.) and one negative event (e.g., bad, no, etc.).
-
FIG. 1 is a diagram showing an example of a process for determining the probability of membership in a severity group for one or more category of objectionable content in accordance with some embodiments of the disclosed subject matter. As shown inFIG. 1 ,process 100 begins by receiving or reviewing content on a webpage, website, or any other suitable content (generally referred to herein as a “webpage” or “page”) at 110. For example, in some embodiments, a rating application can receive multiple requests to rate a group of webpages or websites. In another example, a rating application can receive, from an advertiser, a list of websites that the advertiser is interested in placing an advertisement provided that each of these websites does not contain or does not have a high likelihood of containing objectionable content. In yet another example, a rating application can receive, from an advertiser, that advertiser's current advertisement network traffic for assessment. - In response to receiving one or more webpages, the rating application or a component of the rating application selects a uniform resource locator (URL) for rating at 120. In another example, the rating application can receive one or more requests from other components (e.g., the most popular requests are assigned a higher priority, particular components of the rating application are assigned a higher priority, or random selection from the requests). In yet another example, a fixed, prioritized list of URLs can be defined based, for example, on ad traffic or any other suitable input (e.g., use of the rating for scoring, use of the rating for active learning, etc.).
- One or more pieces of evidence can be extracted from the uniform resource locator or page at 130. These pieces of evidence can include, for example, the text of the URL, image analysis, HyperText Markup Language (HTML) source code, site or domain registration information, ratings, categories, and/or labeling from partner or third party analysis systems (e.g., site content categories), source information of the images on the page, page text or any other suitable semantic analysis of the page content, metadata associated with the page, anchor text on other pages that point to the page of interest, ad network links and advertiser information taken from a page, hyperlink information, malicious code and spyware databases, site traffic volume data, micro-outsourced data, any suitable auxiliary derived information (e.g., ad-to-content ratio), and/or any other suitable combination thereof. As described herein, evidence and/or any other suitable information relating to the page can be collected, extracted, and/or derived using one or more evidentiary sources.
- Approaches for collecting and analyzing various pieces of evidence for generating a risk rating are further described in, for example, above-referenced U.S. patent application Ser. No. 12/859,763, filed Aug. 19, 2010, which is hereby incorporated by reference herein in its entirety.
- To encode the probability of membership in severity group bj, an ordinomial can be generated at 140. For example, a multi-severity classification can be determined by using an ordinomial to encode the probability of membership in an ordered set of one or more severity groups. The ordinomial can be represented as follows:
-
∀jε[0,J],p(y=b j |x) - where y is a variable representing the severity class that page x belongs to. It should be noted that the ordinal nature implies that bi is less severe than bj, when i<j. It should also be noted that ordinomial probabilities can be estimated using any suitable statistical models, such as the ones described herein, and using the evidence derived from the pages.
- At 150, an ordinomial distribution that includes each generated ordinomial for one or more severity groups can be generated. Accordingly, the cumulative ordinal distribution F can be described as:
-
F(y=b j |x)=Σi=1 j p(y=b i |x) - Alternatively, unlike the adult content category example described above, a category may not have fine grained severity groups and a binomial distribution can be used. For example, a binomial probability can be used for binary outcome events, where there is typically one positive event (e.g., good, yes, etc.) and one negative event (e.g., bad, no, etc.). At 160, in some embodiments, a binary or binomial-probability determination of appropriateness or objectionability can be projected onto an ordinomial by considering the extreme classes—b1 and bn. For example, in cases where a large spectrum of severity groups may not be present, such as malware, a binomial determination can be performed, where the extreme classes include one positive class (e.g., malware is present in the content) and one negative class (e.g., malware is not present in the content). Ordinomial probabilities can be estimated using one or more statistical models, for example, from evidence derived or extracted from the received web pages.
- It should be noted that, in
process 100 ofFIG. 1 and other processes described herein, some steps can be added, some steps may be omitted, the order of the steps may be rearranged, and/or some steps may be performed simultaneously. - In some embodiments, multiple ordinomials can be generated from a variety of different statistical models based on a diverse range of evidence. For example, different pieces of evidence can be accounted for in the determination of an ordinomials. These ordinomial estimates can be combined into a posterior ordinomial estimate using, for example, ensemble approaches and information fusion approaches. In a more particular example, example aggregation approaches include weighted averaging, AdaBoost-type mixing, or using sub-ordinomials as covariates in a secondary model. Accordingly, as shown in
FIG. 2 , this can be represented as: -
p(y=b i |x)=f(p 0(y=b i |x), . . . ,p m(y=b i |x)) - for
predictive models 1 through m. - It should also be noted that, as web pages change over time, the rating application can account for such temporal dynamics. With the dynamics of web pages, subsequent estimates of posterior ordinomials can provide different results. The rating application accounts for these temporal dynamics, where the output can be based on the probability of encountering different severity levels from a given URL based on the type of estimated severity exhibited in the past. Given p(y=bi|x), the estimated posterior ordinomials at time tv can be estimated using, for example, Bayesian combination, techniques of ensemble modeling, exponential discounting over time, conditional random fields, hidden Markov models, and/or any other techniques that explicitly account for time differences. More particularly, one suitable technique can provide various weights based on data age.
- In some embodiments, the rating application can provide temporal aggregation features to account for the change to web pages over time.
FIG. 3 provides an illustrative example of temporal aggregation of posterior ordinomials in accordance with some embodiments of the disclosed subject matter. As shown, temporal aggregation can be implemented in an efficient and distributed manner using a map reduce paradigm, where the key of reduction is the URL being considered. The posterior ordinomials for all times (t) are aggregated and a final p(y=bi|x) is calculated using the aggregated ordinomials and output. -
FIG. 4 shows an illustrative example of the map reduction approach (MapReduce) for determining the temporal aggregation of posterior ordinomials in accordance with some embodiments of the disclosed subject matter. As shown, URLs can be used as the key for the reduction phase of the MapReduce process. This has the effect of compiling all samples that belong to a given domain onto a single computer during the reduction. Along with the URL, the ordinomials probabilities and the timestamp denoting the instant the ordinomials probability sample was made are passed. More particularly, as shown inFIG. 4 , the posterior ordinomials for a given domain can be sorted based on the timestamp or observation time. Probability estimates can then be performed, where the sorted posterior ordinomials for a given domain are combined and an expected posterior ordinomials is calculated. Depending on the computational nature of the temporal aggregation, this expected ordinomial can be stored for use in future temporal aggregations, thereby alleviating the need for explicit storage of each individual record. Additionally, the reduction phase of this MapReduce process can compute and output a rating as described herein. -
FIG. 5 is a diagram of an example of aprocess 500 for generating a rating (R) for a webpage in accordance with some embodiments of the disclosed subject matter. Generally speaking, one or more ratings can be determined for a webpage and its ordinomial probability estimates that encode both severity and confidence. That is, a rating (R) associated with a particular ordinomial, p(y=bi|x) that includes severity and confidence parameters is determined. For example, an advertiser may desire that the rating represents a particular confidence that the page's content is no worse than severity group bj. Alternatively, in another example, an advertiser may desire that the rating encodes the confidence that a particular webpage is no better than a particular severity group. - As shown in
FIG. 5 ,process 500 begins by selecting the worst severity in accordance with a user specified confidence parameter (β) at 510. For example, as shown inFIG. 6 , starting from the least severe or objectionable category in the ordinomial (b1), the bins of the ordinomial are ascended, maintaining a sum of the probabilities encountered. The bin, bi, where the level of confidence (β) is reached can be represented by: -
- Accordingly, the bin, bi is selected such that the application has at least the level of confidence (β) that the content is no worse than bi.
- It should be noted that, when a larger confidence parameter (β) is assigned, a smaller probability mass resides in more severe categories is ensured.
- Referring back to
FIG. 5 , one or more ratings are generated at 520. The rating application can determine ratings from a given page's ordinomial probability estimates and encodes both severity and confidence. It should be noted that the rating application can assume that ratings are given on a numeric scale that can be divided into ranges Bj, where there is a one-to-one mapping between these ranges and the bj. That is,step 510 ofprocess 500 indicates that there is a particular confidence that a page has severity no worse than bj, and the rating (R) is somewhere in the range Bj. For example, as shown inFIG. 7 , therating scale 700 can be a numeric scale of the numbers 0 through 1000, where 1000 denotes the least severe end or the highly safe portion of the scale. In another example,rating scale 700 can be further divided such that particular portions of rating scale are determined to be the best pages—e.g., ratings falling between 800 and 1000. Accordingly, if a greater than confidence that the page's content is no worse than the best category, then the page's rating falls in the 800-1000 range. - Additional features of the rating scale are described further below.
- To determine the rating (R) within the range, boundaries to the rating range (Bj) and a center (cj) of each bin are defined in the configuration of the application. For example, consider two pages A and B, where page A has 99.9% confidence that the page contains pornography and page B has a confidence of (1−β)+ε that it contains pornography. It should be noted that ε is generally an arbitrarily small number. That is, while page A contains pornography, it cannot be stated with confidence that page B does not contain pornography. Both pages A and B fall in the lowest ratings range. However, the rating application generates a significantly lower rating for page A.
- It should be noted that, in some embodiments, interior rating ranges for a particular objectionability category can be defined. For example, the rating application can generate one or more ratings that take into account the difference between being uncertain between R rated content and PG rated content, where R and PG are two interior severity levels within the adult content category. In another example, the rating application can generate one or more ratings that take into account the difference between a page having no evidence of X rated content and a page having some small evidence of containing X rating content.
- The boundaries of rating range Bj can be defined as sj-1 and sj. In addition, a center cj can be defined for each bin. It should be noted that the center for each bin is not necessarily the median of the range. Rather, the center is the rating desired by the application should either all probability reside in this range, or should there be balanced probabilities above and below in accordance with a given level of β assurance. Accordingly, the rating given the chosen bin, bi and the ordinomial encoding of p(y=bj|x) can be represented by:
-
- It should be noted that one or more ratings can be generated for one or more objectionable categories. For example, multiple ratings can be generated, where one rating is generated for each selected objectionable content category (e.g., adult content, offensive language, and alcohol).
- It should also be noted that, in some embodiments, ratings for two or more objectionable categories can be combined to create a combined score. For example, a first rating generated for an adult content category and a second rating generated for an offensive language category can be combined. Alternatively, weights can be assigned to each category such that a higher weight can be assigned to the adult content category and a lower weight can be assigned to the offensive language category. Accordingly, an advertiser or any other suitable user of the rating application can customize the score by assigning weights to one or more categories. That is, a multi-dimensional rating vector can be created that represents, for each site, the distribution of risk of adjacency to objectionable content along different dimensions: guns, bombs and ammunition; alcohol; offensive language; hate speech, tobacco; spyware and malicious code; illegal drugs; adult content, gaming and gambling; entertainment; illegality; and/or obscenity.
- Additionally or alternatively to generating a rating for a website or a webpage, the rating application can determine a rating for a sitelet. As used herein, a sitelet is a collection or subset of web pages and, more particularly, is often a topically homogeneous portion of a page, such as a topic-oriented subtree of a large site's hierarchical tree structure. For example, “finance.yahoo.com” can receive a rating as a sitelet of the website “yahoo.com.”
- It should be noted that the rating application can rate sitelets as there are web pages that the rating application has never seen before. However, that does not mean that the rating application has no evidence with which to rate the page. There is substantial rating locality within sitelets. A page from a risky site or sitelet is risky itself. In addition, the rating application can rate sitelets for computational storage efficiency as it may not be necessary to save or store the scores for individual pages if they are not significantly different from the scores for the sitelet. For example, if the ratings for the individual pages that make up website www.foo.com are within a given threshold value (e.g., a 5% difference), the rating application can store a rating for a sitelet (a collection of those individual pages). It should also be noted that sitelet scores can provide additional evidence to the rating computation even when the page has been seen before.
- It should further be noted that advertising on a website can be an indication of direct financial support of the website. Even if a particular page does not contain objectionable content or is determined to not likely contain objectionable content, an advertiser may not want to support a site that otherwise promotes objectionable categories of content. For example, the rating application can provide an indication when a particular news item promotes or supports a major Nazi website. In another example, aside from the content of a page, the rating application can provide an indication when a particular advertiser that supports or advertises on a particular website falls in an objectionable category. In a more particular example, the rating application can detect whether the content falls within an objectionable category and whether advertisers, promoters, or other entities associated with the content fall within an objectionable category.
- In the example where sitelets are subtrees in the hierarchical site structure,
FIG. 8 shows an illustrative example that incoming URLs can be matched to the sitelet with the longest available shared prefix. The aggregated ordinomials and associated rating of this longest prefix are then used for the query URL. Radix trees can, in some embodiments, be used to make this query computationally efficient. - It should be noted that a rating for every URL or sub-string in the file tree implied by a domain's URLs need not be stored explicitly. If the rating for a page or sub-tree is not significantly different from that of its parents, then explicit storage offers little additional benefit at the expense of increased storage and computation. Given a sensitivity parameter or threshold, τ, that expresses the trade-off between sensitivity and storage, the rating application can store ratings for those components of the subtree with:
- where R() denotes the rating for an entity, c denotes the child page or subtree whose rating is under consideration, and p denotes the parent of child page c.
- Similar to individual pages, sitelet ratings can be generated from sitelet ordinomials. The sitelet ordinomials can be produced by an aggregation process over the pages in the sitelet. For example, the sitelet ordinomial can be a weighted combination of the page ordinomials, a Bayesian combination, or generated using any suitable explicit mathematical function.
-
FIG. 9 shows an illustrative example of calculating sitelet ordinomials in accordance with some embodiments of the disclosed subject matter. As shown, for calculating the aggregated sitelet ordinomial, the pages in the sitelet can be considered as a large set, or the tree structure can be taken into account explicitly. In the latter case, the calculation can be done efficiently by recursion. The base step is to calculate the rating at the root node. Then, for each step, the ratings for all the children are calculated. For each child, the inequality |(p)−(c)|≧τ is evaluated. It should be noted that, in this embodiment, p represents the most closely neighboring super-node in the file tree to c that has been isolated as a sitelet and given an explicitly stored rating. Children that return true for this inequality are stored explicitly as their own sitelet and subjected to further recursion. - In some embodiments, sitelet ordinomials can be efficiently calculated using a map reduction process in accordance with some embodiments of the disclosed subject matter. For example, as shown in
FIG. 10 , in settings with small domains (e.g., those where processing can occur comfortably in a single reducer machine), the rating application can generate ratings using a single pass via MapReduce or any other suitable mapping approach. The reduction phase is performed using the domain as a key. Once the URLs belonging to a domain are assembled together, a file tree or domain tree can be generated, and the above-mentioned calculation of sitelet ordinomials can be used to find pertinent ratings in a domain. - Alternatively,
FIG. 11 shows that sitelet ordinomials can be efficiently calculated using a map reduction process for settings with larger domains. In such cases, the reduction via MapReduce can occur iteratively. Let M denote the number of suffixes in the largest domain. Then for t from M to 1, combine all ordinomials at level t in accordance to the inequality |(p)−(c)|≧τ. In cases where the inequality returns false, all children are stored for rating and sitelet computation at higher levels. This may lead to an unacceptable demand for memory and resources. To alleviate this demand, children with the same or very similar ratings can be combined using explicit combination functions, for example, Bayesian or weighted averaging. Those children that have a difference in rating of at least τ are stored explicitly as their own sitelet rating. Each step reduces t by one: t←t−1. This is repeated until t is equivalent to 1, where the rating and sitelet are calculated and stored to ensure all URLs present in a domain receive some rating. - In some embodiments, the rating application can calculate ratings using both temporal aggregation and sitelet aggregation. Generally speaking, the rating application accomplishes this by performing the temporal aggregation on URLs at the first step of sitelet aggregation. For example, as shown in
FIG. 12 , the rating application can aggregate posterior ordinomials for all times (t), a reduction phase is performed using the domain as a key, and, once the URLs belonging to a domain are assembled together, a file tree or domain tree can be generated. The expected ordinomial for each URL can then be calculated. - In accordance with some embodiments of the disclosed subject matter, mechanisms are provided for evaluating the quality of collections of online media and other suitable content. Because online media is often purchased by advertisers at different levels of granularity (e.g., ranging from individual pages to large sets of domains), it is desirable to develop metrics for comparing the quality of such diverse sets of content. More particularly, these mechanisms, among other things, collect individual content ratings, aggregate these ratings across arbitrary subsets, normalize these ratings to be on a general index scale, and calibrate the normalized ratings such that the global mean provides a benchmark for comparison.
- Generally speaking, the application calculates several metrics for particular content (e.g., media, web pages, etc.). For example, in the case of objectionable content, a category can have metrics encapsulating the risk related to the appearance of adult content, metrics encapsulating the risk related to the appearance or use of hate speech, etc. Accordingly, in some embodiments, the application can provide a single metric encapsulating the different aspects of the content.
- For example, let xj refer to an individual example of a piece of online media or online content, such as a particular web page, video, or image. Given multiple risk ratings for the piece of content xj—e.g., r(1)(xj), . . . , r(M)(xj) for 1 through M different categories of objectionable content, the multiple risk ratings can be combined into a single concise metric, r(xj), using, for example, a specialized combination function, h, such that:
-
r(x j)=h(r (1)(x j), . . . ,r (M)(x j)) - In a more particular example, example combination functions include weighted averaging, where the weights are set to the importance of particular objectionable content categories, Bayesian mixing, a secondary combining model, and/or a simple minimum function that determines the most risky category in the case of a brand safety model. As described above, multiple combining functions can also be used and aggregated to create the single concise metric.
- The single concise metric can, for example, be used to compare diverse sets of content. In a more particular example, the application can allow the advertiser to compare the content management by two different advertising networks.
- Note that, in some embodiments, r(•) can be ordinal, where r(•)ε{V0, . . . , Vd}, such that without loss of generality, V0<V1< . . . <Vd. Additionally or alternatively, the ratings r(•) can also be real-valued, where (•)ε. As used herein, r(xj) can provide a measure that includes both the quality (or severity) of xj, and the confidence that xj deserves that level of quality. That is, the rating application can provide a rating r(•) that combines both the likelihood and the severity of content considered.
- Generally speaking, online media is often packaged into arbitrary collections when being traded in the online advertising marketplace. Additionally, natural boundaries may exist, segregating a collection of content into distinct subsets. Given a rating defined on individual examples in this content space, r(•), it can be desirable to combine the ratings on individual pages into aggregate ratings denoting the expected rating of an entire subset of content. Let X denote a collection of media, for example, the media holdings of an online publisher having a particular category of web pages, such as pages related to sports, of the pages offered by a supply-side advertising network, including any subsets thereof. The rating application can aggregate the ratings of content in this collection, xεX.
- For ordinal ratings, the aggregation of ratings can be expressed as:
-
r agg=argmaxVεxεXΠ(r(x)=V). - It should be noted that, in the above-mentioned equation, Π(•) is an indicator function that takes the
value 1 when the operand is true, and zero otherwise. This corresponds to the most common ordinal value in the collection. It should also be noted that ties may be broken arbitrarily, for example, by choosing the most severe category in the tie, for safety. - For real-value ratings, the aggregation of ratings can be expressed as:
-
- It should be noted that, in the above-mentioned equation, |X| is the number of examples in X. It should also be noted that aggregation corresponds to the arithmetic mean of individual content ratings.
- When aggregating content ratings, the rating application considers that content may be presented in a pre-aggregated form. For example, the input may be domains, each with an aggregate rating. Formally, let Y1 be a collection of one or more examples of content, xεYl Let X then be extended to be a collection of such collections, YlεX. Rating aggregation can then be extended to such sub-aggregations of content.
- For ordinal ratings, the aggregation of ratings can be expressed as:
-
r agg=argmaxVΣYl εX |Y l|Π(r agg(Y)=V), - where |Y| is the number of examples in Y.
- For real-value ratings, the aggregation of ratings can be expressed as:
-
- where |X| is extended to be ΣY
l εXΣxεYl Π(xεYl), the count of all examples in all subsets. - It should be noted that the aggregate ratings on pre-collected online media are recursive. That is, content ratings can be aggregated on collections of collections of collections, etc.
- In some embodiments, the rating application takes unconstrained, real-valued ratings and projects them onto a bounded region of the number line for ease of comparison. This mapping to the number value assigned to each ordinal category can be constructed to capture the risk and severity profiles of content in each respective category.
- For example, the rating application can be configured to define an index-scaled rating to be a numerical rating assigned to online media constrained to the range ri(x)ε[α,β]. This rating is assumed to capture both the severity and risk of appearance of online media, with ri(xj)≧ri(xk) implying that xk, is at least a risky as xj—there is a greater chance of riskier content appearing on xk than on xj. This implies that xj is likely to be safer for brand advertisers or other online media buyers. The values of α and β can be set arbitrarily. For example, β=0 and β=200 can be provided for the scale.
- Given an index-scaled rating, ri(x), on a particular example x, the rating application defines a mapping from an unscaled rating to a scaled rating for both ordinal and real-valued ratings into ri(x).
- For ordinal ratings, the index-scaled rating can be expressed as:
-
r i(x j)=a r(xj ) - It should be noted that the mapping to an index-scaled rating is performed by assigning a constant, a, to each ordinal non-index scaled rating. As mentioned above, in the ordinal setting, r(•)ε{V0, . . . , Vd}. Here, as each aε[α,β] (e.g., a is bounded by the index-scaled rating range and without loss of generality aV
m <aVn whenever Vm<Vn), more risky ordinal categories have lower numerical values in the mapping. - For real-valued ratings, the index-scaled rating can be expressed as:
-
r i(x j)=f(r(x j)) - Here, f (•) is a monotonic function. For example, f(r(xj))≦f(r(xk)) whenever r(xj)≦r(xk). That is, lower unscaled ratings tend to get lower scaled ratings. Additionally, it should be noted that, the range of f(•) is [α,β].
- Accordingly, the rating application can transform arbitrary raw ratings into an index-scaled rating. Such a numerical rating can encode the likelihood of encountering risky or inappropriate content on a given example of online media, in addition to the likely severity of such content. The resulting index-scaled rating represents the value of online content to buyers and advertisers, with risky and severely inappropriate content generally being of low value.
- Because a single example of an advertisement appearing on severely inappropriate content (e.g., pornography or hate speech) may have harsh consequences for the advertiser placing the advertisement, individual examples of inappropriate content may have a disproportionate influence in the aggregate rating of collected content. That is, even a few severely inappropriate pages in a site containing thousands of examples may bring the aggregate rating down significantly. In order to capture this disproportionate influence of inappropriate content, the rating application can be configured to aggregate ratings for collected content, xεXm with commensurate impact of riskier individual pages. This can be represented as follows:
-
- It should be noted that w(•)ε[1,∞) represents a weight function associated with a content rating. More particularly, content that is riskier receives both a lower numerical rating and contributes to a higher total weight, thereby lowering the expected score via a lower denominator. For example, assume that the rating application creates four risk buckets—e.g., very high risk, high risk, moderate risk, and low risk, each with ranges of an index-scaled rating. For a given aggregation of content, the rating application also denotes the number of examples in each by r1, r2, r3, and r4, respectively. The rating application can also assign a native index-scaled rating to each bucket. For example, the rating application can assign 50, 100, 150, and 200 to each bucket, respectively. In addition, the rating application can provide combination weights for each category. For example, the application can assign the combination weights of 35.2, 8.8, 2.2, and 1.0 for each bucket, respectively. Accordingly, a severity weight aggregation of such content can be determined by calculating:
-
- In some embodiments, the rating application not only considers how a content rates with respect to risk and severity, but also determines how that content compares to other similar content. In order to perform such a comparison, the rating application recalibrates ratings to the mean rating of content being considered. The mean (μr) of the uncalibrated set of ratings can be determined by calculating:
-
- It should be noted that gamma (γ) can denote a value that the mean is mapped after calibration and Yj can denote a subset of content in X. The rating application then defines a calibration of Yj's rating, rc relating to μr using the following cases:
-
- For example, in the above-mentioned case, where α=0 and β=200 and let γ=100. The re-calibration can be performed by determining:
-
-
FIG. 13 is a generalized schematic diagram of asystem 1300 on which the rating application may be implemented in accordance with some embodiments of the disclosed subject matter. As illustrated,system 1300 may include one ormore user computers 1302.User computers 1302 may be local to each other or remote from each other.User computers 1302 are connected by one ormore communications links 1304 to acommunications network 1306 that is linked via acommunications link 1308 to aserver 1310. -
System 1300 may include one ormore servers 1310.Server 1310 may be any suitable server for providing access to the application, such as a processor, a computer, a data processing device, or a combination of such devices. For example, the application can be distributed into multiple backend components and multiple frontend components or interfaces. In a more particular example, backend components, such as data collection and data distribution can be performed on one ormore servers 1310. Similarly, the graphical user interfaces displayed by the application, such as a data interface and an advertising network interface, can be distributed by one ormore servers 1310 touser computer 1302. - More particularly, for example, each of the
client 1302 andserver 1310 can be any of a general purpose device such as a computer or a special purpose device such as a client, a server, etc. Any of these general or special purpose devices can include any suitable components such as a processor (which can be a microprocessor, digital signal processor, a controller, etc.), memory, communication interfaces, display controllers, input devices, etc. For example,client 1302 can be implemented as a personal computer, a personal data assistant (PDA), a portable email device, a multimedia terminal, a mobile telephone, a set-top box, a television, etc. - In some embodiments, any suitable computer readable media can be used for storing instructions for performing the processes described herein, can be used as a content distribution that stores content and a payload, etc. For example, in some embodiments, computer readable media can be transitory or non-transitory. For example, non-transitory computer readable media can include media such as magnetic media (such as hard disks, floppy disks, etc.), optical media (such as compact discs, digital video discs, Blu-ray discs, etc.), semiconductor media (such as flash memory, electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), etc.), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
- Referring back to
FIG. 13 ,communications network 1306 may be any suitable computer network including the Internet, an intranet, a wide-area network (“WAN”), a local-area network (“LAN”), a wireless network, a digital subscriber line (“DSL”) network, a frame relay network, an asynchronous transfer mode (“ATM”) network, a virtual private network (“VPN”), or any combination of any of such networks.Communications links user computers 1302 andserver 1310, such as network links, dial-up links, wireless links, hard-wired links, any other suitable communications links, or a combination of such links.User computers 1302 enable a user to access features of the application.User computers 1302 may be personal computers, laptop computers, mainframe computers, dumb terminals, data displays, Internet browsers, personal digital assistants (“PDAs”), two-way pagers, wireless terminals, portable telephones, any other suitable access device, or any combination of such devices.User computers 1302 andserver 1310 may be located at any suitable location. In one embodiment,user computers 1302 andserver 1310 may be located within an organization. Alternatively,user computers 1302 andserver 1310 may be distributed between multiple organizations. - Referring back to
FIG. 13 , the server and one of the user computers depicted inFIG. 13 are illustrated in more detail inFIG. 14 . Referring toFIG. 14 ,user computer 1302 may includeprocessor 1402,display 1404,input device 1406, andmemory 1408, which may be interconnected. In a preferred embodiment,memory 1408 contains a storage device for storing a computer program for controllingprocessor 1402. -
Processor 1402 uses the computer program to present ondisplay 1404 the application and the data received through communications link 1304 and commands and values transmitted by a user ofuser computer 1302. It should also be noted that data received through communications link 1304 or any other communications links may be received from any suitable source.Input device 1406 may be a computer keyboard, a cursor-controller, dial, switchbank, lever, or any other suitable input device as would be used by a designer of input systems or process control systems. -
Server 1310 may includeprocessor 1420,display 1422,input device 1424, andmemory 1426, which may be interconnected. In a preferred embodiment,memory 1426 contains a storage device for storing data received through communications link 1308 or through other links, and also receives commands and values transmitted by one or more users. The storage device further contains a server program for controllingprocessor 1420. - In some embodiments, the application may include an application program interface (not shown), or alternatively, the application may be resident in the memory of
user computer 1302 orserver 1310. In another suitable embodiment, the only distribution touser computer 1302 may be a graphical user interface (“GUI”) which allows a user to interact with the application resident at, for example,server 1310. - In one particular embodiment, the application may include client-side software, hardware, or both. For example, the application may encompass one or more Web-pages or Web-page portions (e.g., via any suitable encoding, such as HyperText Markup Language (“HTML”), Dynamic HyperText Markup Language (“DHTML”), Extensible Markup Language (“XML”), JavaServer Pages (“JSP”), Active Server Pages (“ASP”), Cold Fusion, or any other suitable approaches).
- Although the application is described herein as being implemented on a user computer and/or server, this is only illustrative. The application may be implemented on any suitable platform (e.g., a personal computer (“PC”), a mainframe computer, a dumb terminal, a data display, a two-way pager, a wireless terminal, a portable telephone, a portable computer, a palmtop computer, an H/PC, an automobile PC, a laptop computer, a cellular phone, a personal digital assistant (“PDA”), a combined cellular phone and PDA, etc.) to provide such features.
- It will also be understood that the detailed description herein may be presented in terms of program procedures executed on a computer or network of computers. These procedural descriptions and representations are the means used by those skilled in the art to most effectively convey the substance of their work to others skilled in the art.
- A procedure is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. These steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared and otherwise manipulated. It proves convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. It should be noted, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
- Further, the manipulations performed are often referred to in terms, such as adding or comparing, which are commonly associated with mental operations performed by a human operator. No such capability of a human operator is necessary, or desirable in most cases, in any of the operations described herein which form part of the present invention; the operations are machine operations. Useful machines for performing the operation of the present invention include general purpose digital computers or similar devices.
- The present invention also relates to apparatus for performing these operations. This apparatus may be specially constructed for the required purpose or it may comprise a general purpose computer as selectively activated or reconfigured by a computer program stored in the computer. The procedures presented herein are not inherently related to a particular computer or other apparatus. Various general purpose machines may be used with programs written in accordance with the teachings herein, or it may prove more convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these machines will appear from the description given.
- Accordingly, methods, systems, and media for applying scores and ratings to web pages, web sites, and other pieces of content of interest to advertisers or content providers for safe and effective online are provided.
- It is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting.
- Although the invention has been described and illustrated in the foregoing illustrative embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the invention can be made without departing from the spirit and scope of the invention. Features of the disclosed embodiments can be combined and rearranged in various ways.
Claims (21)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/184,264 US20140379443A1 (en) | 2010-06-01 | 2014-02-19 | Methods, systems, and media for applying scores and ratings to web pages,web sites, and content for safe and effective online advertising |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US35039310P | 2010-06-01 | 2010-06-01 | |
US12/859,763 US20110047006A1 (en) | 2009-08-21 | 2010-08-19 | Systems, methods, and media for rating websites for safe advertising |
US201161431789P | 2011-01-11 | 2011-01-11 | |
US13/151,146 US8732017B2 (en) | 2010-06-01 | 2011-06-01 | Methods, systems, and media for applying scores and ratings to web pages, web sites, and content for safe and effective online advertising |
US14/184,264 US20140379443A1 (en) | 2010-06-01 | 2014-02-19 | Methods, systems, and media for applying scores and ratings to web pages,web sites, and content for safe and effective online advertising |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/151,146 Continuation US8732017B2 (en) | 2010-06-01 | 2011-06-01 | Methods, systems, and media for applying scores and ratings to web pages, web sites, and content for safe and effective online advertising |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140379443A1 true US20140379443A1 (en) | 2014-12-25 |
Family
ID=45439233
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/151,146 Active 2032-03-17 US8732017B2 (en) | 2010-06-01 | 2011-06-01 | Methods, systems, and media for applying scores and ratings to web pages, web sites, and content for safe and effective online advertising |
US14/184,264 Pending US20140379443A1 (en) | 2010-06-01 | 2014-02-19 | Methods, systems, and media for applying scores and ratings to web pages,web sites, and content for safe and effective online advertising |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/151,146 Active 2032-03-17 US8732017B2 (en) | 2010-06-01 | 2011-06-01 | Methods, systems, and media for applying scores and ratings to web pages, web sites, and content for safe and effective online advertising |
Country Status (1)
Country | Link |
---|---|
US (2) | US8732017B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180253661A1 (en) * | 2017-03-03 | 2018-09-06 | Facebook, Inc. | Evaluating content for compliance with a content policy enforced by an online system using a machine learning model determining compliance with another content policy |
CN109101502A (en) * | 2017-06-20 | 2018-12-28 | 阿里巴巴集团控股有限公司 | A kind of flow configuration method, switching method and the device of the page |
Families Citing this family (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9990674B1 (en) | 2007-12-14 | 2018-06-05 | Consumerinfo.Com, Inc. | Card registry systems and methods |
US8312033B1 (en) | 2008-06-26 | 2012-11-13 | Experian Marketing Solutions, Inc. | Systems and methods for providing an integrated identifier |
US8060424B2 (en) | 2008-11-05 | 2011-11-15 | Consumerinfo.Com, Inc. | On-line method and system for monitoring and reporting unused available credit |
US9195990B2 (en) | 2010-06-02 | 2015-11-24 | Integral Ad Science, Inc. | Methods, systems, and media for reviewing content traffic |
US9483606B1 (en) | 2011-07-08 | 2016-11-01 | Consumerinfo.Com, Inc. | Lifescore |
US9311599B1 (en) | 2011-07-08 | 2016-04-12 | Integral Ad Science, Inc. | Methods, systems, and media for identifying errors in predictive models using annotators |
US9223888B2 (en) * | 2011-09-08 | 2015-12-29 | Bryce Hutchings | Combining client and server classifiers to achieve better accuracy and performance results in web page classification |
US9106691B1 (en) | 2011-09-16 | 2015-08-11 | Consumerinfo.Com, Inc. | Systems and methods of identity protection and management |
US8738516B1 (en) | 2011-10-13 | 2014-05-27 | Consumerinfo.Com, Inc. | Debt services candidate locator |
US9014717B1 (en) * | 2012-04-16 | 2015-04-21 | Foster J. Provost | Methods, systems, and media for determining location information from real-time bid requests |
US9853959B1 (en) | 2012-05-07 | 2017-12-26 | Consumerinfo.Com, Inc. | Storage and maintenance of personal data |
US10387911B1 (en) | 2012-06-01 | 2019-08-20 | Integral Ad Science, Inc. | Systems, methods, and media for detecting suspicious activity |
US9552590B2 (en) | 2012-10-01 | 2017-01-24 | Dstillery, Inc. | Systems, methods, and media for mobile advertising conversion attribution |
US9654541B1 (en) | 2012-11-12 | 2017-05-16 | Consumerinfo.Com, Inc. | Aggregating user web browsing data |
US9916621B1 (en) | 2012-11-30 | 2018-03-13 | Consumerinfo.Com, Inc. | Presentation of credit score factors |
US11068931B1 (en) | 2012-12-10 | 2021-07-20 | Integral Ad Science, Inc. | Systems, methods, and media for detecting content viewability |
US10102570B1 (en) | 2013-03-14 | 2018-10-16 | Consumerinfo.Com, Inc. | Account vulnerability alerts |
US9406085B1 (en) | 2013-03-14 | 2016-08-02 | Consumerinfo.Com, Inc. | System and methods for credit dispute processing, resolution, and reporting |
US10497030B1 (en) | 2013-03-15 | 2019-12-03 | Integral Ad Science, Inc. | Methods, systems, and media for enhancing a blind URL escrow with real time bidding exchanges |
US10482477B2 (en) * | 2013-03-15 | 2019-11-19 | Netflix, Inc. | Stratified sampling applied to A/B tests |
US10685398B1 (en) | 2013-04-23 | 2020-06-16 | Consumerinfo.Com, Inc. | Presenting credit score information |
CN104123328A (en) * | 2013-04-28 | 2014-10-29 | 北京千橡网景科技发展有限公司 | Method and device used for inhibiting spam comments in website |
US9477737B1 (en) | 2013-11-20 | 2016-10-25 | Consumerinfo.Com, Inc. | Systems and user interfaces for dynamic access of multiple remote databases and synchronization of data based on user rules |
US10963470B2 (en) | 2017-09-06 | 2021-03-30 | Siteimprove A/S | Website scoring system |
CN108313374A (en) * | 2017-12-28 | 2018-07-24 | 芜湖瑞思机器人有限公司 | A kind of Aloe Vera Gel boxing device |
CN108284981A (en) * | 2017-12-28 | 2018-07-17 | 芜湖瑞思机器人有限公司 | A kind of Aloe Vera Gel mounted box method |
CN108298137A (en) * | 2017-12-28 | 2018-07-20 | 芜湖瑞思机器人有限公司 | A kind of Aloe Vera Gel boxing device front delivery mechanism |
CN108298138A (en) * | 2017-12-28 | 2018-07-20 | 芜湖瑞思机器人有限公司 | A kind of Aloe Vera Gel boxing device send case structure |
US20200074541A1 (en) | 2018-09-05 | 2020-03-05 | Consumerinfo.Com, Inc. | Generation of data structures based on categories of matched data items |
US11315179B1 (en) | 2018-11-16 | 2022-04-26 | Consumerinfo.Com, Inc. | Methods and apparatuses for customized card recommendations |
US11238656B1 (en) | 2019-02-22 | 2022-02-01 | Consumerinfo.Com, Inc. | System and method for an augmented reality experience via an artificial intelligence bot |
US11544653B2 (en) * | 2019-06-24 | 2023-01-03 | Overstock.Com, Inc. | System and method for improving product catalog representations based on product catalog adherence scores |
US11941065B1 (en) | 2019-09-13 | 2024-03-26 | Experian Information Solutions, Inc. | Single identifier platform for storing entity data |
US11055208B1 (en) | 2020-01-07 | 2021-07-06 | Allstate Insurance Company | Systems and methods for automatically assessing and conforming software development modules to accessibility guidelines in real-time |
US11836439B2 (en) | 2021-11-10 | 2023-12-05 | Siteimprove A/S | Website plugin and framework for content management services |
US11397789B1 (en) | 2021-11-10 | 2022-07-26 | Siteimprove A/S | Normalizing uniform resource locators |
US11461430B1 (en) | 2021-11-10 | 2022-10-04 | Siteimprove A/S | Systems and methods for diagnosing quality issues in websites |
US11461429B1 (en) | 2021-11-10 | 2022-10-04 | Siteimprove A/S | Systems and methods for website segmentation and quality analysis |
US11687613B2 (en) | 2021-11-12 | 2023-06-27 | Siteimprove A/S | Generating lossless static object models of dynamic webpages |
US11468058B1 (en) | 2021-11-12 | 2022-10-11 | Siteimprove A/S | Schema aggregating and querying system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060179053A1 (en) * | 2005-02-04 | 2006-08-10 | Microsoft Corporation | Improving quality of web search results using a game |
US20080320010A1 (en) * | 2007-05-14 | 2008-12-25 | Microsoft Corporation | Sensitive webpage content detection |
US8589391B1 (en) * | 2005-03-31 | 2013-11-19 | Google Inc. | Method and system for generating web site ratings for a user |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6643641B1 (en) | 2000-04-27 | 2003-11-04 | Russell Snyder | Web search engine with graphic snapshots |
US7284008B2 (en) * | 2000-08-30 | 2007-10-16 | Kontera Technologies, Inc. | Dynamic document context mark-up technique implemented over a computer network |
US20030236721A1 (en) | 2002-05-21 | 2003-12-25 | Plumer Edward S. | Dynamic cost accounting |
US7392474B2 (en) * | 2004-04-30 | 2008-06-24 | Microsoft Corporation | Method and system for classifying display pages using summaries |
US7788132B2 (en) * | 2005-06-29 | 2010-08-31 | Google, Inc. | Reviewing the suitability of Websites for participation in an advertising network |
US9286388B2 (en) * | 2005-08-04 | 2016-03-15 | Time Warner Cable Enterprises Llc | Method and apparatus for context-specific content delivery |
US8769673B2 (en) * | 2007-02-28 | 2014-07-01 | Microsoft Corporation | Identifying potentially offending content using associations |
-
2011
- 2011-06-01 US US13/151,146 patent/US8732017B2/en active Active
-
2014
- 2014-02-19 US US14/184,264 patent/US20140379443A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060179053A1 (en) * | 2005-02-04 | 2006-08-10 | Microsoft Corporation | Improving quality of web search results using a game |
US8589391B1 (en) * | 2005-03-31 | 2013-11-19 | Google Inc. | Method and system for generating web site ratings for a user |
US20080320010A1 (en) * | 2007-05-14 | 2008-12-25 | Microsoft Corporation | Sensitive webpage content detection |
Non-Patent Citations (1)
Title |
---|
How to Write Advertisements that Sell, author unknown, from System, the magazine of Business, dated 1912, downloaded from http://library.duke.edu/digitalcollections/eaa_Q0050/ on 21 February 2015 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180253661A1 (en) * | 2017-03-03 | 2018-09-06 | Facebook, Inc. | Evaluating content for compliance with a content policy enforced by an online system using a machine learning model determining compliance with another content policy |
US11023823B2 (en) * | 2017-03-03 | 2021-06-01 | Facebook, Inc. | Evaluating content for compliance with a content policy enforced by an online system using a machine learning model determining compliance with another content policy |
CN109101502A (en) * | 2017-06-20 | 2018-12-28 | 阿里巴巴集团控股有限公司 | A kind of flow configuration method, switching method and the device of the page |
Also Published As
Publication number | Publication date |
---|---|
US8732017B2 (en) | 2014-05-20 |
US20120010927A1 (en) | 2012-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8732017B2 (en) | Methods, systems, and media for applying scores and ratings to web pages, web sites, and content for safe and effective online advertising | |
US11868375B2 (en) | Method, medium, and system for personalized content delivery | |
US11061946B2 (en) | Systems and methods for cross-media event detection and coreferencing | |
US20110047006A1 (en) | Systems, methods, and media for rating websites for safe advertising | |
US8812494B2 (en) | Predicting content and context performance based on performance history of users | |
US9223849B1 (en) | Generating a reputation score based on user interactions | |
JP6130609B2 (en) | Client-side search templates for online social networks | |
US9324112B2 (en) | Ranking authors in social media systems | |
US8412648B2 (en) | Systems and methods of making content-based demographics predictions for website cross-reference to related applications | |
US20130124653A1 (en) | Searching, retrieving, and scoring social media | |
US9311599B1 (en) | Methods, systems, and media for identifying errors in predictive models using annotators | |
US8732015B1 (en) | Social media pricing engine | |
US20120016642A1 (en) | Contextual-bandit approach to personalized news article recommendation | |
US20120102027A1 (en) | Compatibility Scoring of Users in a Social Network | |
US20160239865A1 (en) | Method and device for advertisement classification | |
CN110597962B (en) | Search result display method and device, medium and electronic equipment | |
US9779169B2 (en) | System for ranking memes | |
WO2018130201A1 (en) | Method for determining associated account, server and storage medium | |
CN107526718B (en) | Method and device for generating text | |
US8838435B2 (en) | Communication processing | |
WO2019055654A1 (en) | Systems and methods for cross-media event detection and coreferencing | |
US20160055521A1 (en) | Methods, systems, and media for reviewing content traffic | |
AlMansour et al. | A model for recalibrating credibility in different contexts and languages-a twitter case study | |
Margaris et al. | Improving Collaborative Filtering's Rating Prediction Accuracy by Considering Users' Rating Variability | |
US20230041339A1 (en) | Method, device, and computer program product for user behavior prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SILICON VALLEY BANK, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:INTEGRAL AD SCIENCE, INC.;REEL/FRAME:043305/0443 Effective date: 20170719 |
|
AS | Assignment |
Owner name: ADSAFE MEDIA, LTD., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ATTENBERG, JOSHUA M.;PROVOST, FOSTER J.;SIGNING DATES FROM 20110831 TO 20110921;REEL/FRAME:045848/0059 |
|
AS | Assignment |
Owner name: INTEGRAL AD SCIENCE, INC., NEW YORK Free format text: CHANGE OF NAME;ASSIGNOR:ADSAFE MEDIA, LTD.;REEL/FRAME:046245/0517 Effective date: 20121221 |
|
AS | Assignment |
Owner name: GOLDMAN SACHS BDC, INC., AS COLLATERAL AGENT, NEW YORK Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:INTEGRAL AD SCIENCE, INC.;REEL/FRAME:046594/0001 Effective date: 20180719 Owner name: INTEGRAL AD SCIENCE, INC., NEW YORK Free format text: TERMINATION AND RELEASE OF INTELLECTUAL PROPERTY SECURITY AGREEMENT;ASSIGNOR:SILICON VALLEY BANK;REEL/FRAME:046615/0943 Effective date: 20180716 Owner name: GOLDMAN SACHS BDC, INC., AS COLLATERAL AGENT, NEW Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:INTEGRAL AD SCIENCE, INC.;REEL/FRAME:046594/0001 Effective date: 20180719 |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
AS | Assignment |
Owner name: INTEGRAL AD SCIENCE, INC., NEW YORK Free format text: RELEASE OF SECURITY INTEREST IN PATENT COLLATERAL AT REEL/FRAME NO. 46594/0001;ASSIGNOR:GOLDMAN SACHS BDC, INC., AS COLLATERAL AGENT;REEL/FRAME:057673/0706 Effective date: 20210929 Owner name: PNC BANK, NATIONAL ASSOCIATION, AS ADMINISTRATIVE AGENT, PENNSYLVANIA Free format text: PATENT SECURITY AGREEMENT;ASSIGNOR:INTEGRAL AD SCIENCE, INC.;REEL/FRAME:057673/0653 Effective date: 20210929 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION COUNTED, NOT YET MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |