WO2013039789A1 - System and method for indirectly classifying a computer based on usage - Google Patents

System and method for indirectly classifying a computer based on usage Download PDF

Info

Publication number
WO2013039789A1
WO2013039789A1 PCT/US2012/054262 US2012054262W WO2013039789A1 WO 2013039789 A1 WO2013039789 A1 WO 2013039789A1 US 2012054262 W US2012054262 W US 2012054262W WO 2013039789 A1 WO2013039789 A1 WO 2013039789A1
Authority
WO
WIPO (PCT)
Prior art keywords
web
web request
user computer
computer
instructions
Prior art date
Application number
PCT/US2012/054262
Other languages
French (fr)
Inventor
Simon Michael Rowe
Original Assignee
Google Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Inc. filed Critical Google Inc.
Priority to EP20120832565 priority Critical patent/EP2758892A4/en
Priority to JP2014530707A priority patent/JP6165734B2/en
Priority to CA2848472A priority patent/CA2848472C/en
Priority to KR1020147009864A priority patent/KR102021062B1/en
Priority to CN201280054521.4A priority patent/CN103917969A/en
Publication of WO2013039789A1 publication Critical patent/WO2013039789A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements

Definitions

  • the disclosed embodiments relate generally to web browsing activity, and more specifically to classifying a user computer based on that web browsing activity.
  • Disclosed embodiments provide methods to classify a user's computer, and thereby enable a web server to provide better information to the user of the computer.
  • each computer is classified as a "home” computer, a "work” computer, a “mobile” computer, or a smart phone. Once this classification is made, a web server responding to a request can provide more relevant or better targeted information. For example, knowing that a computer is used at work can enable better selection of
  • This information also makes it is possible to make suggestions to users as to other content of interest.
  • the classification of a user's computer also enables providing valuable information to advertisers, such as the viewing and Internet behaviors of different viewer segments.
  • classification of a user's computer is implemented on a server with one or more processors and memory.
  • the memory stores programs that are executed by the processors.
  • the server computer system receives a plurality of web request events corresponding to web requests issued by users. Each web request event includes: (i) a cookie that identifies the user computer that originated the corresponding web request; (ii) an IP address corresponding to the user computer at the time the web request was issued; and (iii) a date/time stamp indicating when the corresponding web request was received at a web server.
  • the server computer system stores the web request events.
  • the server system selects a subset of the plurality of web request events. All of the web request events in the subset are associated with a single first cookie.
  • the server system computes a geographical location corresponding to the user computer, where the computation uses the IP address associated with the web request event.
  • the server system determines the local time and local day of week corresponding to the web request using the stored date/time stamp of the web request event and the computed geographic location.
  • the server system then classifies the user computer based, at least in part, on a usage pattern of the local time and local day of week data corresponding to the web request events in the subset.
  • Figure 1 is a block diagram of a system that classifies user computers in accordance with some embodiments.
  • Figure 2 is a functional block diagram of a client computer in accordance with some embodiments.
  • Figure 3 is a functional block diagram of a log server in accordance with some embodiments.
  • Figure 4 is a functional block diagram of a web server in accordance with some embodiments.
  • Figures 5 is an exemplary screen shot viewed by a panelist who participates in a research panel in accordance with some embodiments.
  • Figures 6 - 7 are exemplary screen shots of programs used to manage a research panel in accordance with some embodiments.
  • Figure 8 illustrates a process used to generate and correlate survey information from panelists according to some embodiments.
  • Figures 9A-B illustrates an exemplary process flow according to some embodiments.
  • Embodiments illustrated in Figure 1 can be used to classify user computers
  • a user computer 200 can be any electronic device that runs a web browser and has access to the Internet. For example, desktop computers, laptop computers, tablet computers, and many handheld devices such as smart phones.
  • the classifications of user computers are "home computer,” “work computer,” “mobile computer,” and “smart phone.” In some embodiments there are more or fewer classifications of computers, such as a single classification that combines "mobile computers” (e.g., laptop computers) and "smart phones.”
  • the various user computers 200, web servers 400, and log servers 300 communicate over a communications network 100, such as the Internet, local area networks, wide area networks, wireless networks, etc.
  • a web server 400 provides responses to user requests, as described more fully below with respect to Figure 4.
  • a log server 300 maintains a log 320 of user web requests 322. In some embodiments, the log server 300 includes modules to perform calculations based on the information in the log 320. This is described more fully below in Figure 3.
  • FIG. 2 illustrates a typical client computer 200.
  • a client computer 200 generally includes one or more processing units (CPUs) 202, one or more network or other communications interfaces 204, memory 214, and one or more communication buses 212 for interconnecting these components.
  • the communication buses 212 may include circuitry
  • a client computer 200 includes a user interface 206, for instance a display 208 and one or more input devices 210, such as a keyboard and a mouse.
  • Memory 214 may include high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices.
  • Memory 214 may include mass storage that is remotely located from the central processing unit(s) 202.
  • Memory 214, or alternately the non- volatile memory device(s) within memory 214 comprises a computer readable storage medium.
  • memory 214 or the computer readable storage medium of memory 214 stores the following programs, modules and data structures, or a subset thereof:
  • an operating system 216 e.g., WINDOWS or MAC OS X
  • an operating system 216 e.g., WINDOWS or MAC OS X
  • WINDOWS or MAC OS X an operating system 216 that generally includes procedures for handling various basic system services and for performing hardware dependent tasks
  • a network communications module 218 that is used for connecting the client computer 200 to servers or other computing devices via one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and the like;
  • each browser is associated with a unique browser agent 330;
  • cookies 222 which provide persistent data for web sites visited by a household member 118 at the client computer 200.
  • there is a special classification cookie which uniquely identifies the computer where the cookie is stored.
  • a single classification cookie is used by multiple web pages.
  • the log server 300 generally includes one or more processing units (CPUs) 302, one or more network or other communications interfaces 304, memory 314, and one or more communication buses 312 for interconnecting these components.
  • the communication buses 312 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
  • the log processor 300 may optionally include a user interface 306, for instance a display 308 and a keyboard 310.
  • Memory 314 may include high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
  • Memory 314 may include mass storage that is remotely located from the central processing unit(s) 302. Memory 314, or alternately the non-volatile memory device(s) within memory 314, comprises a computer readable storage medium. In some embodiments, memory 314 or the computer readable storage medium of memory 314 stores the following programs, modules and data structures, or a subset thereof:
  • an operating system 316 e.g., Linux or Unix
  • Linux or Unix an operating system 316 that generally includes procedures for handling various basic system services and for performing hardware dependent tasks.
  • a network communications module 318 that is used for connecting the log server 300 to servers or other computing devices via one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and the like.
  • each web request record includes cookie data 324, which identifies the computer that issued the web request (e.g., a MAC address); the IP address 326 of the user computer that issued the request (generally the IP address of the modem or router used by the user computer); a date/time stamp 328 that specifies when the user request was issued (generally the date/time when the web request was received at a web server 400, or the date/time when the web request information is received at the log server if web request information is forwarded quickly from the web server); and a browser agent 330, which identifies the web browser and version.
  • cookie data 324 which identifies the computer that issued the web request (e.g., a MAC address); the IP address 326 of the user computer that issued the request (generally the IP address of the modem or router used by the user computer); a date/time stamp 328 that specifies when the user request was issued (generally the date/time when the web request was received at a web server 400, or the date/time when the web
  • the web request records 322 include an alternative date/time stamp 332, which is from the user computer 200 that originated the web request. In some embodiments, more or less data is stored with each record in the web request records 322. • a date/time conversion module 340, which translates between time zones based on the location of the user computer and the location of the web server responding to the web request or the location of the log server.
  • a location module 342 which computes the geographic location of a user computer based on the IP address 326 assigned to the computer.
  • the correlation between IP address 326 and location may use a database that correlates IP address ranges to specific geographic locations.
  • the correlation database is updated on a periodic basis as new IP address ranges are assigned or IP address ranges are relocated to different geographic locations.
  • a classification module 344 which identifies a classification corresponding to a computer based on a usage pattern for the computer.
  • a computer is classified as a "home computer” if it is used primarily during nonbusiness hours, e.g, only mornings, evenings, or on weekends. Conversely, some embodiments classify a computer as a "work computer” if it is used primarily during business hours (e.g., Monday - Friday, 8:00 - 5:00).
  • a computer is classified as a "smart phone” based on the browser agent. For example, the web browser may be one that is only used on a phone.
  • a computer is classified as a "mobile computer” when none of the other classifications apply. In different geographic regions, different patterns of usage may be applied. For example, standard business hours in San Francisco may be different from standard business hours in Spain.
  • Figure 3 shows a log server
  • Figure 3 is intended more as functional descriptions of the various features which may be present in a set of servers than as a structural schematic of the embodiments described herein.
  • items shown separately could be combined and some items could be separated.
  • some items shown separately in Figure 3 could be implemented on single server and single items could be implemented by one or more servers.
  • the actual number of servers used to implement a log server, and how features are allocated among them will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.
  • a web server 400 generally includes one or more processing units (CPUs) 402, one or more network or other communications interfaces 404, memory 414, and one or more communication buses 412 for interconnecting these components.
  • the communication buses 412 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
  • the web server 400 may optionally include a user interface 406, for instance a display 408 and a keyboard 410.
  • Memory 414 may include high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
  • Memory 414 may include mass storage that is remotely located from the central processing unit(s) 402. Memory 414, or alternately the non-volatile memory device(s) within memory 414, comprises a computer readable storage medium. In some embodiments, memory 414 or the computer readable storage medium of memory 414 stores the following programs, modules and data structures, or a subset thereof:
  • an operating system 416 e.g., Linux or Unix
  • Linux or Unix an operating system 416 that generally includes procedures for handling various basic system services and for performing hardware dependent tasks.
  • a network communications module 418 that is used for connecting the web server 400 to servers or other computing devices via one or more communication networks 100, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and the like.
  • HTTP server module 420 e.g., Apache Tomcat
  • HTTP server module 420 e.g., Apache Tomcat
  • HTTPS HyperText Transfer Protocol
  • FTP FTP Transfer Protocol
  • a web page repository 422 which includes web pages that may be requested by users.
  • Web pages may be static, or dynamically constructed as requested. In many cases, web pages include certain fixed content and certain dynamic content that is filled in while responding to the user request.
  • web request storage 424 which holds information about web requests until forwarded to a log server 300.
  • web request storage is volatile memory, because the web requests are forwarded on to the log server 300 immediately.
  • web request storage 424 is a database, and web request records 322 are accumulated for some period of time before being transmitted to a log server 300.
  • the web request records 322 stored at an individual web server are removed after transmission to a log server 300.
  • the web request records 322 are retained at the web server.
  • a date/time module 426 which assigns a date/time stamp to web requests as they are received.
  • a highly stable time clock system is utilized to guarantee accuracy of the generated date/time stamps.
  • Figure 4 shows a web server
  • Figure 4 is intended more as functional descriptions of the various features which may be present in a set of servers than as a structural schematic of the embodiments described herein.
  • items shown separately could be combined and some items could be separated.
  • some items shown separately in Figure 4 could be implemented on single server and single items could be implemented by one or more servers.
  • the actual number of servers used to implement a web server, and how features are allocated among them will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.
  • Each of the methods described herein may be performed by instructions that are stored in a computer readable storage medium and that are executed by one or more processors of one or more servers or clients.
  • Each of the operations shown in Figures 1 - 4 may correspond to instructions stored in a computer memory or computer readable storage medium.
  • a log server comprises just has a database engine, and the analytic operations of the Date/Time module 340, Location Module 342, and Classification Module 344 are implemented on a third server.
  • the Date/Time Module 426 is implemented on the log server, so that a Date/Time Module is not required on each individual web server. This configuration for the Date/Time Module 426 works when information about web requests is forwarded to the log server 300 quickly.
  • a "single source panel" is an all-in-one ratings system that measures viewership across television and the Internet.
  • Surveys are one way to get ratings information. For example, a sample page of an exemplary survey is depicted in Figure 5.
  • a typical survey tool 500 there are three ads 504 that were actually shown on a computer, and another three ads that were not shown (used as a control group).
  • a consumer completes the survey in 2 - 3 minutes, and indicates which ads were remembered.
  • the tool 500 asks (502) questions of the panelist, and instructs (502) the panelist how to use the tool.
  • the panelist answers (508) yes or no to each question 506.
  • the panelist can return (510) to a previous ad, or submit (512) the survey after answering all of the questions.
  • the selected panelists must be within the target population (e.g., in the right demographic segment and having an appropriate viewing history and/or set of interests) and they must be willing to complete the survey tools.
  • identifying appropriate panelists has required the use of lengthy screening procedures, including written and/or online questionnaires, and incentives for panelists to complete the surveys. While these approaches may be effective for small panels, they are cumbersome if used for large panels (for example, to generate surveys for those large panels).
  • Information from large panels is especially useful because it can be aggregated and used to spot trends among large groups of users. The information can also provide helpful information for targeting ads and content for large numbers of individuals.
  • a method implemented at a log server 300 can conduct searches on web request records 322 for individuals to generate survey tools 500 for individual panelists and identify likely panelists. For example, likely panelists can be identified based on records of click-through history for web ads, and logs of media content viewed. Once identified, the surveyor can interact with the panelists and prospective panelists through additional status and approval screens, such as the screens illustrated in Figures 6 and 7. Panelist feedback can also be associated with the IP address 326, and used to tailor subsequent survey tools. In some embodiments, this process can also be used to identify and place panelists or computers on a white list, indicating clearance and willingness to participate in surveys.
  • Figure 6 illustrates a Survey Panel Ad Approval Tool 600, which is an interactive web form.
  • the form 600 instructs (602) a panelist how to use the form.
  • the form displays one or more media excerpts 604, which can be film clips, individual frames from film clips, or single images (e.g., from a web-based advertisement).
  • Each media excerpt advertisement 604 has a corresponding Ad ID 606, and a corresponding status 608.
  • the status indicates the audience for which the panelist considers the ad to be approved (e.g., what ages, such as "all," "18+,” or "21+”).
  • form 600 includes an indicator 610 of the last (most recent) status change.
  • the information on the last status change includes the name 612 of the panelist who made the change, and the date/time 614 when the change was made.
  • Some embodiments also include change notes 616, which provides free-form space 618 for the panelist to write additional notes.
  • Figure 7 illustrates a Problem History and Ad Approval tool 700, which is used in some embodiments to track both approvals as well as problems identified by panelists.
  • the form 700 displays an advertisement 702, together with the corresponding Ad ID 704, which uniquely identifies the advertisement.
  • the advertisement 702 is displayed together with a question 706 asking the panelist whether the advertisement 702 had been seen.
  • there is space 708 to answer the question which may be implemented as a pair of yes / no radio buttons.
  • the form includes a control (not shown) to move forward or backward in the set of questions.
  • the form 700 includes a problem history section 710, which enables panelists to report problems with the survey questions.
  • a panelist can specify the problem category 716 and additional comments 720.
  • the form automatically fills in the date 714 during entry or when saved.
  • the panelist specifies the problem category 718 (e.g., "other" in the illustrated embodiment) and comments 722.
  • the ad approval section 724 in form 700 is similar to Figure 6. Because
  • Figure 7 illustrates a combination problem history / ad approval form 700, there is less space available for ad approval.
  • the approval information corresponds to the one ad 702 shown on the form 700.
  • the Ad ID 726 is repeated (duplicating Ad ID 704). Some embodiments omit this repetition.
  • the ad approval section 724 also includes the approval status 728.
  • the approval status 728 may indicate an age range for which the panelist believes the ad is appropriate.
  • the ad approval section 724 includes the last approval status change 730, which typically includes the date 732, the changed status 736, the reason 740 for the change, and the name 744 of the panelist making the change.
  • the values for these fields may be displayed below the field labels.
  • the date and panelist name are filled in automatically by the form.
  • the reason 740 is optional, and thus the reason field 742 may be blank as illustrated in form 700.
  • Figure 8 shows an exemplary process flow for a process that automatically generates survey tools.
  • This flow employs a log file, which is an XML file in this example.
  • the log file represents TV viewing and web usage data for one or more households to generate ad viewing information for panelists and to identify panelists for a particular survey from white list information.
  • the flow then generates new surveys for the white listed panelists.
  • the flow automatically generates appropriate ads for surveying each enrolled household (including already viewed ads and control ads).
  • the illustrated process flow 800 in Figure 8 starts (802), two independent operations occur.
  • the status of existing surveys from the market research system is generated (804).
  • the generation (804) includes creating an XML file of panelists and survey results.
  • the process flow 800 generates (806) a survey results file for existing surveys.
  • the survey results file may include partially completed surveys, or may be limited to surveys that are fully complete.
  • some embodiments process (810) the created panel XML file. While processing (810) the panel XML file, the process flow 800 generates (812) ad viewing information for the panelists and processes (814) the panelist white list from the market research system. In some embodiments, the panelist white list identifies TV subscribers who meet the eligibility requirements (e.g., having appropriate hardware and software). Using the ad viewing information and the panelist white list, the process 800 generates (816) new surveys for panelists. In addition, the process flow 800 processes the survey status updates from the market research systems. The process flow 800 ends (818) when the survey status updates are processed (808), the new surveys for panelists are generated (816), and the survey results for existing surveys are generated (806).
  • FIGS 9A and 9B illustrate an exemplary process flow according to some embodiments.
  • the method 900 classifies (902) a user computer without explicit input from a user. Rather than directly asking the user to classify the computer, the method 900 here classifies the computer indirectly based on an assessment of the user's web activity.
  • This indirect methodology has several beneficial aspects. First, the determination is made without placing a burden on a user or interrupting a user's normal tasks. Second, because the classification is based on actual usage, it is less subject to incorrect classification by the user. As illustrated in more detail below, some embodiments classify the computer based on when a user is accessing the computer. Both the time of day and the day of the week provide information about where the computer is being used. In other embodiments, the classification is based on other information, such as the websites accessed by the user.
  • the web requests are logged, either at the web server 400 processing the request, or a separate log server 300, or both.
  • the log server 300 receives (904) a plurality of web request events corresponding to web requests issued by users.
  • a web request is the actual request issued by a user, such as an HTTP request for a web page or a search query.
  • a web request event comprises information about a web request.
  • Each web request event 322 includes (906) a cookie 324 that identifies the user computer that originated the corresponding web request.
  • the cookie 324 is referred to herein as a classification cookie because it is used in the classification process 900.
  • the classification cookie 324 uniquely identifies the computer that originated the request, using a unique identifier, such as a MAC address.
  • the unique computer identifier is generated by the log server 300 (or other central server).
  • the classification cookie 324 is reused for multiple web requests, allowing the log server to correlate multiple requests from the same user computer.
  • Each web request event 322 also includes (908) the IP address 326 corresponding to the user computer at the time the web request was issued.
  • the IP address 326 is generally not the IP address assigned to the user computer because web requests are generally sent over an internal network before being routed over the Internet. IP addresses from the internal network provide no information about the geographic location of a computer because the same internal IP addresses can be used everywhere. For example, many computers throughout the world are assigned an IP address of 192.168.1.5. Therefore, the relevant IP address 326 is the external IP address of the router that handles web traffic from the user computer. Because the IP address 326 is the one generally associated with a router or modem that connects to the Internet, the same IP address generally applies to the web request events 322 from multiple user computers. This is okay here because the IP address 326 will be used to establish a geographic location, not to uniquely identify the user computer 200.
  • the classification cookie 324 uniquely identifies the user computer 200.
  • the IP address 326 corresponding to a user computer is subject to change, and thus an IP address is associated with each transaction.
  • the relevant IP address 326 is the IP address of the router or modem that connects the user computer to the Internet, and does not change frequently. However, a new IP address for the modem or router may be assigned when the modem or router is rebooted.
  • Each web request event 322 includes (910) a date/time stamp 328 indicating when the corresponding web request event was received at the web server.
  • the data/time at the web server 400 is effectively a surrogate for the date/time the request was issued at the user computer. Since there is very little delay between sending the request and receiving the request at the web server 400, the two date/time stamps are effectively.
  • the clock on the user computer is both unreliable and potentially inconsistent with the actual time. For example, a user computer originally configured to be on Pacific Time may be physically located in a different time zone, such as Eastern Time.
  • the date/time stamp 328 is the date/time that the web request event is received at the log server 300.
  • These embodiments have the advantage that only the clock on the log server 300 is required to be accurate, instead of the clocks on a potentially large number of web servers 400. As long as the web request events 322 are forwarded directly to the log server, the time at the log server 300 reflects when the web request was issued.
  • each web request event 322 includes (912) information identifying the browser agent 330 corresponding to the user computer 200.
  • the browser agent 330 identifies the web browser that issued the web request, and generally includes the version number of the web browser as well.
  • the log server 300 stores (914) the web request events 322 in a web access log
  • the log server 300 After collecting and storing web request events 322, the log server 300 begins a series of operations to classify user computers corresponding to the web request events.
  • the web server 300 selects (916) a subset of the stored web request events 322 that are all associated with a single classification cookie 324. Because the classification cookie uniquely identifies a user computer, all of the web request events in the subset are associated with a single user computer 200.
  • the log server 300 For each (918) web request event 322 in the subset, the log server 300 computes (920) a geographical location corresponding to the user computer 200 using the IP address 326. In some embodiments, the computation uses a lookup table or dictionary that associates ranges of IP address with geographical locations. One objective is to establish the time zone that is applicable to the location of the user computer, and thus a precise location is not required. E.g., for locations in the United States, it is typically enough to know the city and state where IP addresses are assigned.
  • the log server 300 uses the stored date/time 328 and the computed geographic location of the user computer 200 to determine (922) the local time and day of week corresponding to the web request. Because the time zone of the web server 400 (or log server 300) are known, and the time zone of the user computer 200 are known from the computed geographic location, the offset between the two time zones is determined. This offset (which can be negative) is added to the date/time stamp 328 to determine the local date and time that the web request was issued. Using a calendar, the local date corresponds to a unique day of the week.
  • an additional originating date/time stamp 332 is also collected and stored (914). This alternate date/time stamp 332 identifies the date/time of the user computer 200 when the web request was issued. Although this alternate date/time stamp 332 may be less reliable than the determination (922) described above, it can be used when the IP address 326 is insufficient to determine the geographic location of the user computer 200.
  • the log server 300 then classifies (924) the user computer 200 based, at least in part, on a usage pattern of the of the local time and local day of week data corresponding to the web request events 322 in the subset.
  • classifying the computer includes assigning (926) a computer classification to the user computer 200.
  • the user computer 200 may be classified (928) as a home computer, work computer, or mobile computer.
  • the user computer 200 is classified (934) as a work computer when the web request events substantially occur during normal business hours. For example, normal business hours may be (936) 9:00 AM to 5:00 PM, Monday through Friday.
  • the user computer 200 is classified (938) as a home computer when the web requests substantially occur at times other than normal business hours.
  • normal business hours are (940) Monday to Friday, 9:00 AM to 5:00 PM in some embodiments.
  • normal business hours are defined regionally. Normal business hours are different for different countries, and normal business hours even varies within countries. For example, in California a normal work week is 40 hours, whereas many places on the east coast of the United States have a 37.5 hour standard work week.
  • a user computer 200 is classified (942) as a mobile computer when substantial portions of the web requests occur both during normal business hours and during hours other than normal business hours. For example, a user may have a laptop computer that is used both at the office and at home.
  • a classification label other than "mobile computer” is used to classify computers that are used both at home and for work. For example, a person who works at home may have a single computer that is used both for work and for personal activities. Because such a “mixed-use" computer need not be mobile, some embodiments refer to these computers as “mixed-use.”
  • the log server 300 classifies (932) the user computer
  • the browser agent 330 identifies the web browser that is running on the device 200, and generally identifies the browser version as well.
  • the log server 300 classifies (930) the user computer 200 as a mobile phone. The classification of a user computer 200 as a mobile phone uses the browser agent information 330 in some embodiments.
  • a web server 400 receives (944) a subsequent request for a web page from the user computer 200.
  • the web server 200 selects (946) an information item for the web page, where the selection is based, at least in part, on the classification of the user computer 200.
  • An information item can be an advertisement, a search result, a press release or other news announcement, or any other content for the web page that is dynamically generated at run time.
  • the web server 200 then returns (948) the web page with the selected information item.
  • the selection of the information item is performed by the log server 400, or another server (not depicted in Figure 1).

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A server computer receives web request events corresponding to web requests issued by users. Each web request event includes: a cookie that identifies the user computer that originated the corresponding web request; an IP address corresponding to the user computer; and a date/time stamp indicating when the corresponding web request was received at a web server. The server stores the web request events. The server selects a subset of the web request events, all of which are associated with the same cookie. Then the server computes a geographical location corresponding to the user computer, where the computation uses the IP address associated with the web request event. The server determines the local time and day of week corresponding to the web request. The server then classifies the user computer based, at least in part, on a usage pattern corresponding to the web request events in the subset.

Description

SYSTEM AND METHOD FOR INDIRECTLY CLASSIFYING A
COMPUTER BASED ON USAGE
TECHNICAL FIELD
[0001] The disclosed embodiments relate generally to web browsing activity, and more specifically to classifying a user computer based on that web browsing activity.
BACKGROUND
[0002] Users access a wide variety of web sites over the Internet. In general, the web server receiving a request has little or no knowledge about the user, and thus the response must be made generically. In some instances, a user explicitly provides information, such as responding to online questions. Usage of user-provided information is subject to error, and is burdensome to users. In other instances, information about a user is collected over time based on previous activity.
[0003] However, even with some information about a user, a web server typically has little information about the user's computer.
SUMMARY OF THE INVENTION
[0004] Disclosed embodiments provide methods to classify a user's computer, and thereby enable a web server to provide better information to the user of the computer. In some embodiments, each computer is classified as a "home" computer, a "work" computer, a "mobile" computer, or a smart phone. Once this classification is made, a web server responding to a request can provide more relevant or better targeted information. For example, knowing that a computer is used at work can enable better selection of
advertisements for web pages or better selection of search results responsive to a user query. This information also makes it is possible to make suggestions to users as to other content of interest. The classification of a user's computer also enables providing valuable information to advertisers, such as the viewing and Internet behaviors of different viewer segments.
[0005] In some embodiments, classification of a user's computer is implemented on a server with one or more processors and memory. The memory stores programs that are executed by the processors. The server computer system receives a plurality of web request events corresponding to web requests issued by users. Each web request event includes: (i) a cookie that identifies the user computer that originated the corresponding web request; (ii) an IP address corresponding to the user computer at the time the web request was issued; and (iii) a date/time stamp indicating when the corresponding web request was received at a web server. The server computer system stores the web request events. The server system selects a subset of the plurality of web request events. All of the web request events in the subset are associated with a single first cookie. Then, for each web request event in the subset, the server system computes a geographical location corresponding to the user computer, where the computation uses the IP address associated with the web request event. The server system determines the local time and local day of week corresponding to the web request using the stored date/time stamp of the web request event and the computed geographic location. The server system then classifies the user computer based, at least in part, on a usage pattern of the local time and local day of week data corresponding to the web request events in the subset.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] Figure 1 is a block diagram of a system that classifies user computers in accordance with some embodiments.
[0007] Figure 2 is a functional block diagram of a client computer in accordance with some embodiments.
[0008] Figure 3 is a functional block diagram of a log server in accordance with some embodiments.
[0009] Figure 4 is a functional block diagram of a web server in accordance with some embodiments.
[0010] Figures 5 is an exemplary screen shot viewed by a panelist who participates in a research panel in accordance with some embodiments.
[0011] Figures 6 - 7 are exemplary screen shots of programs used to manage a research panel in accordance with some embodiments.
[0012] Figure 8 illustrates a process used to generate and correlate survey information from panelists according to some embodiments. [0013] Figures 9A-B illustrates an exemplary process flow according to some embodiments.
[0014] Like reference numerals refer to corresponding parts throughout the several views of the drawings.
DESCRIPTION OF EMBODIMENTS
[0015] Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
[0016] Embodiments illustrated in Figure 1 can be used to classify user computers
200. As used herein, a user computer 200 can be any electronic device that runs a web browser and has access to the Internet. For example, desktop computers, laptop computers, tablet computers, and many handheld devices such as smart phones. In some embodiments, the classifications of user computers are "home computer," "work computer," "mobile computer," and "smart phone." In some embodiments there are more or fewer classifications of computers, such as a single classification that combines "mobile computers" (e.g., laptop computers) and "smart phones." As shown, the various user computers 200, web servers 400, and log servers 300 communicate over a communications network 100, such as the Internet, local area networks, wide area networks, wireless networks, etc. A web server 400 provides responses to user requests, as described more fully below with respect to Figure 4. A log server 300 maintains a log 320 of user web requests 322. In some embodiments, the log server 300 includes modules to perform calculations based on the information in the log 320. This is described more fully below in Figure 3.
[0017] Figure 2 illustrates a typical client computer 200. A client computer 200 generally includes one or more processing units (CPUs) 202, one or more network or other communications interfaces 204, memory 214, and one or more communication buses 212 for interconnecting these components. The communication buses 212 may include circuitry
(sometimes called a chipset) that interconnects and controls communications between system components. A client computer 200 includes a user interface 206, for instance a display 208 and one or more input devices 210, such as a keyboard and a mouse. Memory 214 may include high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non- volatile solid state storage devices. Memory 214 may include mass storage that is remotely located from the central processing unit(s) 202. Memory 214, or alternately the non- volatile memory device(s) within memory 214, comprises a computer readable storage medium. In some embodiments, memory 214 or the computer readable storage medium of memory 214 stores the following programs, modules and data structures, or a subset thereof:
• an operating system 216 (e.g., WINDOWS or MAC OS X) that generally includes procedures for handling various basic system services and for performing hardware dependent tasks;
• a network communications module 218 that is used for connecting the client computer 200 to servers or other computing devices via one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and the like;
• a web browser 220, which allows a user of the client computer 200 to access web sites and other resources over the communication network. In some embodiments, each browser is associated with a unique browser agent 330; and
• one or more cookies 222, which provide persistent data for web sites visited by a household member 118 at the client computer 200. In some embodiments there is a special classification cookie, which uniquely identifies the computer where the cookie is stored. In some embodiments, a single classification cookie is used by multiple web pages.
[0018] Referring to Figure 3, the log server 300 generally includes one or more processing units (CPUs) 302, one or more network or other communications interfaces 304, memory 314, and one or more communication buses 312 for interconnecting these components. The communication buses 312 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. The log processor 300 may optionally include a user interface 306, for instance a display 308 and a keyboard 310. Memory 314 may include high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 314 may include mass storage that is remotely located from the central processing unit(s) 302. Memory 314, or alternately the non-volatile memory device(s) within memory 314, comprises a computer readable storage medium. In some embodiments, memory 314 or the computer readable storage medium of memory 314 stores the following programs, modules and data structures, or a subset thereof:
• an operating system 316 (e.g., Linux or Unix) that generally includes procedures for handling various basic system services and for performing hardware dependent tasks.
• a network communications module 318 that is used for connecting the log server 300 to servers or other computing devices via one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and the like.
• one or more Web Access Logs 320, which store information about online web browsing activity. The log 320 includes a collection of web request records 322. In some embodiments, each web request record includes cookie data 324, which identifies the computer that issued the web request (e.g., a MAC address); the IP address 326 of the user computer that issued the request (generally the IP address of the modem or router used by the user computer); a date/time stamp 328 that specifies when the user request was issued (generally the date/time when the web request was received at a web server 400, or the date/time when the web request information is received at the log server if web request information is forwarded quickly from the web server); and a browser agent 330, which identifies the web browser and version. In some embodiments, the web request records 322 include an alternative date/time stamp 332, which is from the user computer 200 that originated the web request. In some embodiments, more or less data is stored with each record in the web request records 322. • a date/time conversion module 340, which translates between time zones based on the location of the user computer and the location of the web server responding to the web request or the location of the log server.
• a location module 342, which computes the geographic location of a user computer based on the IP address 326 assigned to the computer. The correlation between IP address 326 and location may use a database that correlates IP address ranges to specific geographic locations. In some embodiments, the correlation database is updated on a periodic basis as new IP address ranges are assigned or IP address ranges are relocated to different geographic locations.
• a classification module 344, which identifies a classification corresponding to a computer based on a usage pattern for the computer. In some embodiments, a computer is classified as a "home computer" if it is used primarily during nonbusiness hours, e.g, only mornings, evenings, or on weekends. Conversely, some embodiments classify a computer as a "work computer" if it is used primarily during business hours (e.g., Monday - Friday, 8:00 - 5:00). In some embodiments, a computer is classified as a "smart phone" based on the browser agent. For example, the web browser may be one that is only used on a phone. In some embodiments, a computer is classified as a "mobile computer" when none of the other classifications apply. In different geographic regions, different patterns of usage may be applied. For example, standard business hours in San Francisco may be different from standard business hours in Spain.
[0019] Although Figure 3 shows a log server, Figure 3 is intended more as functional descriptions of the various features which may be present in a set of servers than as a structural schematic of the embodiments described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. For example, some items shown separately in Figure 3 could be implemented on single server and single items could be implemented by one or more servers. The actual number of servers used to implement a log server, and how features are allocated among them will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods. [0020] Referring to Figure 4, a web server 400 generally includes one or more processing units (CPUs) 402, one or more network or other communications interfaces 404, memory 414, and one or more communication buses 412 for interconnecting these components. The communication buses 412 may include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. The web server 400 may optionally include a user interface 406, for instance a display 408 and a keyboard 410. Memory 414 may include high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices; and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 414 may include mass storage that is remotely located from the central processing unit(s) 402. Memory 414, or alternately the non-volatile memory device(s) within memory 414, comprises a computer readable storage medium. In some embodiments, memory 414 or the computer readable storage medium of memory 414 stores the following programs, modules and data structures, or a subset thereof:
• an operating system 416 (e.g., Linux or Unix) that generally includes procedures for handling various basic system services and for performing hardware dependent tasks.
• a network communications module 418 that is used for connecting the web server 400 to servers or other computing devices via one or more communication networks 100, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and the like.
• an HTTP server module 420 (e.g., Apache Tomcat), which receives web requests from users and responds by providing web pages. Although many web page requests use HTTP, embodiments can use other network protocols as well, such as HTTPS or FTP.
• a web page repository 422, which includes web pages that may be requested by users.
Web pages may be static, or dynamically constructed as requested. In many cases, web pages include certain fixed content and certain dynamic content that is filled in while responding to the user request.
• web request storage 424, which holds information about web requests until forwarded to a log server 300. In some embodiments, web request storage is volatile memory, because the web requests are forwarded on to the log server 300 immediately. In some embodiments, web request storage 424 is a database, and web request records 322 are accumulated for some period of time before being transmitted to a log server 300. In some embodiments, the web request records 322 stored at an individual web server are removed after transmission to a log server 300. In other embodiments, the web request records 322 are retained at the web server.
• a date/time module 426, which assigns a date/time stamp to web requests as they are received. In some embodiments, a highly stable time clock system is utilized to guarantee accuracy of the generated date/time stamps.
[0021] Although Figure 4 shows a web server, Figure 4 is intended more as functional descriptions of the various features which may be present in a set of servers than as a structural schematic of the embodiments described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated. For example, some items shown separately in Figure 4 could be implemented on single server and single items could be implemented by one or more servers. The actual number of servers used to implement a web server, and how features are allocated among them will vary from one implementation to another, and may depend in part on the amount of data traffic that the system must handle during peak usage periods as well as during average usage periods.
[0022] Each of the methods described herein may be performed by instructions that are stored in a computer readable storage medium and that are executed by one or more processors of one or more servers or clients. Each of the operations shown in Figures 1 - 4 may correspond to instructions stored in a computer memory or computer readable storage medium.
[0023] Although some embodiments use web servers 400 and log servers 300 as illustrated in Figure 1 , the functionality can be distributed to one or more servers in various ways. In some embodiments, all of the functionality of the log server 300 and web server
400 is implemented at a single server. In other embodiments, the functionality illustrated for log server 300 and web server 400 is allocated among three or more computer. In one such embodiment, a log server comprises just has a database engine, and the analytic operations of the Date/Time module 340, Location Module 342, and Classification Module 344 are implemented on a third server. In some embodiments, the Date/Time Module 426 is implemented on the log server, so that a Date/Time Module is not required on each individual web server. This configuration for the Date/Time Module 426 works when information about web requests is forwarded to the log server 300 quickly.
[0024] The classification of computers can also be valuable to users who agree to participate in a ratings panel. A "single source panel" is an all-in-one ratings system that measures viewership across television and the Internet.
[0025] Surveys are one way to get ratings information. For example, a sample page of an exemplary survey is depicted in Figure 5. In a typical survey tool 500, there are three ads 504 that were actually shown on a computer, and another three ads that were not shown (used as a control group). A consumer completes the survey in 2 - 3 minutes, and indicates which ads were remembered. The tool 500 asks (502) questions of the panelist, and instructs (502) the panelist how to use the tool. The panelist answers (508) yes or no to each question 506. The panelist can return (510) to a previous ad, or submit (512) the survey after answering all of the questions.
[0026] In addition, for such surveys to be useful to an advertiser, the selected panelists must be within the target population (e.g., in the right demographic segment and having an appropriate viewing history and/or set of interests) and they must be willing to complete the survey tools. Conventionally, identifying appropriate panelists has required the use of lengthy screening procedures, including written and/or online questionnaires, and incentives for panelists to complete the surveys. While these approaches may be effective for small panels, they are cumbersome if used for large panels (for example, to generate surveys for those large panels). Information from large panels is especially useful because it can be aggregated and used to spot trends among large groups of users. The information can also provide helpful information for targeting ads and content for large numbers of individuals.
[0027] For example, a method implemented at a log server 300 can conduct searches on web request records 322 for individuals to generate survey tools 500 for individual panelists and identify likely panelists. For example, likely panelists can be identified based on records of click-through history for web ads, and logs of media content viewed. Once identified, the surveyor can interact with the panelists and prospective panelists through additional status and approval screens, such as the screens illustrated in Figures 6 and 7. Panelist feedback can also be associated with the IP address 326, and used to tailor subsequent survey tools. In some embodiments, this process can also be used to identify and place panelists or computers on a white list, indicating clearance and willingness to participate in surveys.
[0028] Figure 6 illustrates a Survey Panel Ad Approval Tool 600, which is an interactive web form. The form 600 instructs (602) a panelist how to use the form. The form displays one or more media excerpts 604, which can be film clips, individual frames from film clips, or single images (e.g., from a web-based advertisement). Each media excerpt advertisement 604 has a corresponding Ad ID 606, and a corresponding status 608. In some embodiments, the status indicates the audience for which the panelist considers the ad to be approved (e.g., what ages, such as "all," "18+," or "21+"). In some embodiments, form 600 includes an indicator 610 of the last (most recent) status change. In some embodiments, the information on the last status change includes the name 612 of the panelist who made the change, and the date/time 614 when the change was made. Some embodiments also include change notes 616, which provides free-form space 618 for the panelist to write additional notes.
[0029] Figure 7 illustrates a Problem History and Ad Approval tool 700, which is used in some embodiments to track both approvals as well as problems identified by panelists. In the illustrated embodiment, the form 700 displays an advertisement 702, together with the corresponding Ad ID 704, which uniquely identifies the advertisement. In some embodiments, the advertisement 702 is displayed together with a question 706 asking the panelist whether the advertisement 702 had been seen. In these embodiments, there is space 708 to answer the question, which may be implemented as a pair of yes / no radio buttons. In some embodiments, the form includes a control (not shown) to move forward or backward in the set of questions.
[0030] The form 700 includes a problem history section 710, which enables panelists to report problems with the survey questions. In the illustrated embodiment, a panelist can specify the problem category 716 and additional comments 720. In some embodiments, there is also a date field 712 for each problem. In preferred embodiments, the form automatically fills in the date 714 during entry or when saved. The panelist specifies the problem category 718 (e.g., "other" in the illustrated embodiment) and comments 722.
[0031] The ad approval section 724 in form 700 is similar to Figure 6. Because
Figure 7 illustrates a combination problem history / ad approval form 700, there is less space available for ad approval. In the illustrated embodiment, the approval information corresponds to the one ad 702 shown on the form 700. In this embodiment, the Ad ID 726 is repeated (duplicating Ad ID 704). Some embodiments omit this repetition. The ad approval section 724 also includes the approval status 728. The approval status 728 may indicate an age range for which the panelist believes the ad is appropriate. Similar to the embodiment of Figure 6, the ad approval section 724 includes the last approval status change 730, which typically includes the date 732, the changed status 736, the reason 740 for the change, and the name 744 of the panelist making the change. The values for these fields (date 734, status 738, reason 742, and panelist 746) may be displayed below the field labels. In preferred embodiments, the date and panelist name are filled in automatically by the form. In some embodiments, the reason 740 is optional, and thus the reason field 742 may be blank as illustrated in form 700.
[0032] Using the log information maintained by the log server 300, some embodiments are configured to automatically generate new surveys and otherwise monitor the progress of surveys. For example, Figure 8 shows an exemplary process flow for a process that automatically generates survey tools. This flow employs a log file, which is an XML file in this example. The log file represents TV viewing and web usage data for one or more households to generate ad viewing information for panelists and to identify panelists for a particular survey from white list information. Using this information, the flow then generates new surveys for the white listed panelists. In other words, using the associated TV and web data, along with panelist information, such as white list information, the flow automatically generates appropriate ads for surveying each enrolled household (including already viewed ads and control ads).
[0033] When the illustrated process flow 800 in Figure 8 starts (802), two independent operations occur. In one branch of the process flow 800, the status of existing surveys from the market research system is generated (804). In some embodiments, the generation (804) includes creating an XML file of panelists and survey results.. In addition, the process flow 800 generates (806) a survey results file for existing surveys. The survey results file may include partially completed surveys, or may be limited to surveys that are fully complete.
[0034] After the survey status is generated (804), some embodiments process (810) the created panel XML file. While processing (810) the panel XML file, the process flow 800 generates (812) ad viewing information for the panelists and processes (814) the panelist white list from the market research system. In some embodiments, the panelist white list identifies TV subscribers who meet the eligibility requirements (e.g., having appropriate hardware and software). Using the ad viewing information and the panelist white list, the process 800 generates (816) new surveys for panelists. In addition, the process flow 800 processes the survey status updates from the market research systems. The process flow 800 ends (818) when the survey status updates are processed (808), the new surveys for panelists are generated (816), and the survey results for existing surveys are generated (806).
[0035] Figures 9A and 9B illustrate an exemplary process flow according to some embodiments. The method 900 classifies (902) a user computer without explicit input from a user. Rather than directly asking the user to classify the computer, the method 900 here classifies the computer indirectly based on an assessment of the user's web activity. This indirect methodology has several beneficial aspects. First, the determination is made without placing a burden on a user or interrupting a user's normal tasks. Second, because the classification is based on actual usage, it is less subject to incorrect classification by the user. As illustrated in more detail below, some embodiments classify the computer based on when a user is accessing the computer. Both the time of day and the day of the week provide information about where the computer is being used. In other embodiments, the classification is based on other information, such as the websites accessed by the user.
[0036] When a user issues a web request, the web requests are logged, either at the web server 400 processing the request, or a separate log server 300, or both. In the embodiment illustrated in Figures 9A and 9B, the log server 300 receives (904) a plurality of web request events corresponding to web requests issued by users. A web request is the actual request issued by a user, such as an HTTP request for a web page or a search query. A web request event comprises information about a web request. Each web request event 322 includes (906) a cookie 324 that identifies the user computer that originated the corresponding web request. The cookie 324 is referred to herein as a classification cookie because it is used in the classification process 900. The classification cookie 324 uniquely identifies the computer that originated the request, using a unique identifier, such as a MAC address. In some embodiments, the unique computer identifier is generated by the log server 300 (or other central server). The classification cookie 324 is reused for multiple web requests, allowing the log server to correlate multiple requests from the same user computer.
[0037] Each web request event 322 also includes (908) the IP address 326 corresponding to the user computer at the time the web request was issued. Note that the IP address 326 is generally not the IP address assigned to the user computer because web requests are generally sent over an internal network before being routed over the Internet. IP addresses from the internal network provide no information about the geographic location of a computer because the same internal IP addresses can be used everywhere. For example, many computers throughout the world are assigned an IP address of 192.168.1.5. Therefore, the relevant IP address 326 is the external IP address of the router that handles web traffic from the user computer. Because the IP address 326 is the one generally associated with a router or modem that connects to the Internet, the same IP address generally applies to the web request events 322 from multiple user computers. This is okay here because the IP address 326 will be used to establish a geographic location, not to uniquely identify the user computer 200. The classification cookie 324 uniquely identifies the user computer 200.
[0038] The IP address 326 corresponding to a user computer is subject to change, and thus an IP address is associated with each transaction. As noted above, the relevant IP address 326 is the IP address of the router or modem that connects the user computer to the Internet, and does not change frequently. However, a new IP address for the modem or router may be assigned when the modem or router is rebooted.
[0039] Each web request event 322 includes (910) a date/time stamp 328 indicating when the corresponding web request event was received at the web server. In this embodiment, the data/time at the web server 400 is effectively a surrogate for the date/time the request was issued at the user computer. Since there is very little delay between sending the request and receiving the request at the web server 400, the two date/time stamps are effectively. Although some embodiments use the date/time from the user computer, the clock on the user computer is both unreliable and potentially inconsistent with the actual time. For example, a user computer originally configured to be on Pacific Time may be physically located in a different time zone, such as Eastern Time. In other embodiments, the date/time stamp 328 is the date/time that the web request event is received at the log server 300. These embodiments have the advantage that only the clock on the log server 300 is required to be accurate, instead of the clocks on a potentially large number of web servers 400. As long as the web request events 322 are forwarded directly to the log server, the time at the log server 300 reflects when the web request was issued.
[0040] In some embodiments, each web request event 322 includes (912) information identifying the browser agent 330 corresponding to the user computer 200. The browser agent 330 identifies the web browser that issued the web request, and generally includes the version number of the web browser as well. [0041] The log server 300 stores (914) the web request events 322 in a web access log
320. After collecting and storing web request events 322, the log server 300 begins a series of operations to classify user computers corresponding to the web request events. The web server 300 selects (916) a subset of the stored web request events 322 that are all associated with a single classification cookie 324. Because the classification cookie uniquely identifies a user computer, all of the web request events in the subset are associated with a single user computer 200.
[0042] For each (918) web request event 322 in the subset, the log server 300 computes (920) a geographical location corresponding to the user computer 200 using the IP address 326. In some embodiments, the computation uses a lookup table or dictionary that associates ranges of IP address with geographical locations. One objective is to establish the time zone that is applicable to the location of the user computer, and thus a precise location is not required. E.g., for locations in the United States, it is typically enough to know the city and state where IP addresses are assigned.
[0043] For each (918) web request event 322 in the subset, the log server 300 uses the stored date/time 328 and the computed geographic location of the user computer 200 to determine (922) the local time and day of week corresponding to the web request. Because the time zone of the web server 400 (or log server 300) are known, and the time zone of the user computer 200 are known from the computed geographic location, the offset between the two time zones is determined. This offset (which can be negative) is added to the date/time stamp 328 to determine the local date and time that the web request was issued. Using a calendar, the local date corresponds to a unique day of the week.
[0044] In some embodiments, an additional originating date/time stamp 332 is also collected and stored (914). This alternate date/time stamp 332 identifies the date/time of the user computer 200 when the web request was issued. Although this alternate date/time stamp 332 may be less reliable than the determination (922) described above, it can be used when the IP address 326 is insufficient to determine the geographic location of the user computer 200.
[0045] The log server 300 then classifies (924) the user computer 200 based, at least in part, on a usage pattern of the of the local time and local day of week data corresponding to the web request events 322 in the subset. In some embodiments, classifying the computer includes assigning (926) a computer classification to the user computer 200. For example, the user computer 200 may be classified (928) as a home computer, work computer, or mobile computer. In some embodiments, the user computer 200 is classified (934) as a work computer when the web request events substantially occur during normal business hours. For example, normal business hours may be (936) 9:00 AM to 5:00 PM, Monday through Friday. In some embodiments, the user computer 200 is classified (938) as a home computer when the web requests substantially occur at times other than normal business hours. As noted above, normal business hours are (940) Monday to Friday, 9:00 AM to 5:00 PM in some embodiments. In some embodiments, normal business hours are defined regionally. Normal business hours are different for different countries, and normal business hours even varies within countries. For example, in California a normal work week is 40 hours, whereas many places on the east coast of the United States have a 37.5 hour standard work week. In some embodiments, a user computer 200 is classified (942) as a mobile computer when substantial portions of the web requests occur both during normal business hours and during hours other than normal business hours. For example, a user may have a laptop computer that is used both at the office and at home. In some embodiments, a classification label other than "mobile computer" is used to classify computers that are used both at home and for work. For example, a person who works at home may have a single computer that is used both for work and for personal activities. Because such a "mixed-use" computer need not be mobile, some embodiments refer to these computers as "mixed-use."
[0046] In some embodiments, the log server 300 classifies (932) the user computer
200 based, at least in part, on the browser agent 330 associated with the user computer 200. Because many smart phone devices have much of the functionality of a desktop or laptop computer, such phones are included in set of devices considered as a "user computer" 200. In fact, any electronic device that has a web browser and the ability to connect to the Internet is considered a user computer 200. The browser agent 330 identifies the web browser that is running on the device 200, and generally identifies the browser version as well. In some embodiments, the log server 300 classifies (930) the user computer 200 as a mobile phone. The classification of a user computer 200 as a mobile phone uses the browser agent information 330 in some embodiments.
[0047] In some embodiments, a web server 400 receives (944) a subsequent request for a web page from the user computer 200. The web server 200 selects (946) an information item for the web page, where the selection is based, at least in part, on the classification of the user computer 200. An information item can be an advertisement, a search result, a press release or other news announcement, or any other content for the web page that is dynamically generated at run time. The web server 200 then returns (948) the web page with the selected information item. In some embodiments, the selection of the information item is performed by the log server 400, or another server (not depicted in Figure 1).
[0048] The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

What is claimed is:
1. A method of indirectly classifying a user computer, comprising:
receiving at a server computer system a plurality of web request events corresponding to web requests issued by users, wherein each web request event includes:
i) a cookie that identifies a user computer that originated the corresponding web request;
ii) an IP address corresponding to the user computer at the time the web request was issued; and
iii) a date/time stamp indicating when the corresponding web request was received at a web server;
storing the web request events;
selecting a subset of the plurality of web request events, wherein each web request event in the subset is associated with a single first cookie;
for each web request event in the subset:
computing a geographical location corresponding to the user computer using the IP address associated with the web request event; and
determining a local time and local day of week corresponding to the web request using the stored date/time stamp of the web request event and the computed geographic location; and
classifying the user computer based, at least in part, on a usage pattern of the local time and local day of week data corresponding to the web request events in the subset.
2. The method of claim 1, wherein classifying the user computer comprises assigning a computer classification to the user computer.
3. The method of claim 1, wherein the user computer is classified as a home computer, work computer, or mobile computer.
4. The method of claim 1, wherein the user computer is classified as a mobile phone.
5. The method of claim 1, wherein each web request event further includes information identifying a browser agent corresponding to the user computer.
6. The method of claim 5, wherein classifying the user computer is further based, at least in part, on the browser agent.
7. The method of claim 1, wherein the user computer is classified as a work computer when the web request events substantially occur during normal business hours.
8. The method of claim 6, wherein normal business hours are Monday through Friday, 9:00 AM to 5:00 PM.
9. The method of claim 1, wherein the user computer is classified as a home computer when the web requests substantially occur at times other than normal business hours.
10. The method of claim 8, wherein normal business hours are Monday through Friday, 9:00 AM to 5:00 PM.
11. The method of claim 1 , wherein the user computer is classified as a mobile computer when a substantial portion of the web requests occur during normal business hours and a substantial portion of the web requests occur during hours other than normal business hours.
12. The method of claim 1, further comprising:
receiving a subsequent request for a web page from the user computer;
selecting an information item for the web page, wherein the selection is based, at least in part, on the classification of the user computer; and
returning the requested web page with the selected information item.
13. A server computer system for indirectly classifying a user computer, comprising: memory;
one or more processors; and
one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs including:
instructions for receiving at the server computer system a plurality of web request events corresponding to web requests issued by users, wherein each web request event includes:
i) a cookie that identifies a user computer that originated the corresponding web request; ii) an IP address corresponding to the user computer at the time the web request was issued; and
iii) a date/time stamp indicating when the corresponding web request was received at a web server;
instructions for storing the web request events;
instructions for selecting a subset of the plurality of web request events, wherein each web request event in the subset is associated with a single first cookie;
instructions for processing each web request event in the subset, including: instructions for computing a geographical location corresponding to the user computer using the IP address associated with the web request event; and
instructions for determining a local time and local day of week corresponding to the web request using the stored date/time stamp of the web request event and the computed geographic location; and
instructions for classifying the user computer based, at least in part, on a usage pattern of the local time and local day of week data corresponding to the web request events in the subset.
14. The server computer system of claim 13, wherein the instructions for classifying the user computer include instructions for assigning a computer classification to the user computer.
15. The server computer system of claim 13, wherein the user computer is classified as a home computer, work computer, or mobile computer.
16. The server computer system of claim 13, wherein the user computer is classified as a mobile phone.
17. The server computer system of claim 13, wherein each web request event further includes information identifying a browser agent corresponding to the user computer.
18. The server computer system of claim 17, wherein the instructions for classifying the user computer further include instructions to perform the classification based, at least in part, on the browser agent.
19. The server computer system of claim 13, wherein the instructions for classifying a user computer include instructions for classifying the user computer as a work computer when the web request events substantially occur during normal business hours.
20. The method of claim 19, wherein normal business hours are Monday through Friday, 9:00 AM to 5:00 PM.
21. The server computer system of claim 13, wherein the instructions for classifying a user computer include instructions for classifying the user computer as a home computer when the web requests substantially occur at times other than normal business hours.
22. The server computer system of claim 21, wherein normal business hours are Monday through Friday, 9:00 AM to 5:00 PM.
23. The server computer system of claim 13, wherein the instructions for classifying a user computer include instructions for classifying the user computer as a mobile computer when a substantial portion of the web requests occur during normal business hours and a substantial portion of the web requests occur during hours other than normal business hours.
24. The server computer system of claim 13, further comprising:
instructions for receiving a subsequent request for a web page from the user computer;
instructions for selecting an information item for the web page, wherein the selection is based, at least in part, on the classification of the user computer; and
instructions for returning the requested web page with the selected information item.
25. A non-transitory computer readable storage medium storing one or more programs to be executed by a server computer system, the one or more programs comprising:
instructions for receiving at the server computer system a plurality of web request events corresponding to web requests issued by users, wherein each web request event includes:
i) a cookie that identifies a user computer that originated the corresponding web request;
ii) an IP address corresponding to the user computer at the time the web request was issued; and iii) a date/time stamp indicating when the corresponding web request was received at a web server;
instructions for storing the web request events;
instructions for selecting a subset of the plurality of web request events, wherein each web request event in the subset is associated with a single first cookie;
instructions for processing each web request event in the subset, including:
instructions for computing a geographical location corresponding to the user computer using the IP address associated with the web request event; and
instructions for determining a local time and local day of week corresponding to the web request using the stored date/time stamp of the web request event and the computed geographic location; and
instructions for classifying the user computer based, at least in part, on a usage pattern of the local time and local day of week data corresponding to the web request events in the subset.
PCT/US2012/054262 2011-09-13 2012-09-07 System and method for indirectly classifying a computer based on usage WO2013039789A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP20120832565 EP2758892A4 (en) 2011-09-13 2012-09-07 System and method for indirectly classifying a computer based on usage
JP2014530707A JP6165734B2 (en) 2011-09-13 2012-09-07 System and method for indirectly classifying computers based on usage
CA2848472A CA2848472C (en) 2011-09-13 2012-09-07 System and method for indirectly classifying a computer based on usage
KR1020147009864A KR102021062B1 (en) 2011-09-13 2012-09-07 System and method for indirectly classifying a computer based on usage
CN201280054521.4A CN103917969A (en) 2011-09-13 2012-09-07 System and method for indirectly classifying a computer based on usage

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/231,805 2011-09-13
US13/231,805 US8700766B2 (en) 2011-09-13 2011-09-13 System and method for indirectly classifying a computer based on usage

Publications (1)

Publication Number Publication Date
WO2013039789A1 true WO2013039789A1 (en) 2013-03-21

Family

ID=47830848

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/054262 WO2013039789A1 (en) 2011-09-13 2012-09-07 System and method for indirectly classifying a computer based on usage

Country Status (7)

Country Link
US (1) US8700766B2 (en)
EP (1) EP2758892A4 (en)
JP (1) JP6165734B2 (en)
KR (1) KR102021062B1 (en)
CN (1) CN103917969A (en)
CA (1) CA2848472C (en)
WO (1) WO2013039789A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103812961A (en) * 2013-11-01 2014-05-21 北京奇虎科技有限公司 Method and device for recognizing Internet protocol (IP) addresses of designated class and defending method and system

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9218611B1 (en) 2011-09-27 2015-12-22 Google Inc. System and method for determining bid amount for advertisement to reach certain number of online users
US9064269B1 (en) * 2011-09-27 2015-06-23 Google Inc. Cookie correction system and method
EP2636371B1 (en) * 2012-03-09 2016-10-19 Sony Mobile Communications AB Activity classification
US9847948B2 (en) 2012-07-09 2017-12-19 Eturi Corp. Schedule and location responsive agreement compliance controlled device throttle
US9887887B2 (en) 2012-07-09 2018-02-06 Eturi Corp. Information throttle based on compliance with electronic communication rules
US9854393B2 (en) 2012-07-09 2017-12-26 Eturi Corp. Partial information throttle based on compliance with an agreement
US8706872B2 (en) 2012-07-09 2014-04-22 Parentsware, Llc Agreement compliance controlled information throttle
US10079931B2 (en) * 2012-07-09 2018-09-18 Eturi Corp. Information throttle that enforces policies for workplace use of electronic devices
US9184994B2 (en) * 2012-08-01 2015-11-10 Sap Se Downtime calculator
US9588675B2 (en) 2013-03-15 2017-03-07 Google Inc. Document scale and position optimization
US9231996B2 (en) * 2013-04-12 2016-01-05 International Business Machines Corporation User-influenced page loading of web content
WO2015062652A1 (en) * 2013-10-31 2015-05-07 Telefonaktiebolaget L M Ericsson (Publ) Technique for data traffic analysis
US11589083B2 (en) 2014-09-26 2023-02-21 Bombora, Inc. Machine learning techniques for detecting surges in content consumption
US9940634B1 (en) 2014-09-26 2018-04-10 Bombora, Inc. Content consumption monitor
US9979693B2 (en) * 2016-01-28 2018-05-22 Fiber Logic Communications, Inc. IP allocation method for use in telecommunication network automatic construction
US10204146B2 (en) 2016-02-09 2019-02-12 Ca, Inc. Automatic natural language processing based data extraction
US10002144B2 (en) * 2016-03-25 2018-06-19 Ca, Inc. Identification of distinguishing compound features extracted from real time data streams
US9996409B2 (en) 2016-03-28 2018-06-12 Ca, Inc. Identification of distinguishable anomalies extracted from real time data streams
WO2018053448A1 (en) * 2016-09-16 2018-03-22 Eturi Corp. Information throttle that enforces policies for workplace use of electronic devices
US10908983B2 (en) * 2017-03-31 2021-02-02 Cae Inc. Method and system for preventing an anomaly in a simulator
US10440063B1 (en) 2018-07-10 2019-10-08 Eturi Corp. Media device content review and management
US11631015B2 (en) 2019-09-10 2023-04-18 Bombora, Inc. Machine learning techniques for internet protocol address to domain name resolution systems
WO2022071615A1 (en) * 2020-09-29 2022-04-07 제이엠사이트 주식회사 Failure prediction method and apparatus implementing same

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020138331A1 (en) * 2001-02-05 2002-09-26 Hosea Devin F. Method and system for web page personalization
US20050203952A1 (en) * 2004-03-11 2005-09-15 Microsoft Corporation Tracing a web request through a web server
US20080010307A1 (en) * 2000-02-14 2008-01-10 Overture Sevices, Inc. System and Method to Determine the Validity of an Interaction on a Network
US20100192069A1 (en) * 2009-01-23 2010-07-29 Cisco Technology, Inc. Differentiating a User from Multiple Users Based on a Determined Pattern of Accessing a Prescribed Network Destination
US20110016129A1 (en) * 2008-03-04 2011-01-20 Invicta Networks, Inc. Method and system for variable or dynamic classification

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6466970B1 (en) * 1999-01-27 2002-10-15 International Business Machines Corporation System and method for collecting and analyzing information about content requested in a network (World Wide Web) environment
KR100366311B1 (en) * 2000-02-25 2002-12-31 이은정 A direct transaction information service system by automatic crossing and out-calling both side directions communication and a method combining ars telecommunication system with network
KR100307723B1 (en) * 2000-03-21 2001-11-03 이재원 An Advertiser Driven Advertising Method and the Operating System on both the Wireless Internet and the Internet
JP2001358745A (en) * 2000-06-09 2001-12-26 Toshiyuki Nakanishi Method and system for providing adapted contents
US7634463B1 (en) * 2005-12-29 2009-12-15 Google Inc. Automatically generating and maintaining an address book
US8108517B2 (en) * 2007-11-27 2012-01-31 Umber Systems System and method for collecting, reporting and analyzing data on application-level activity and other user information on a mobile data network
US9767464B2 (en) * 2009-09-11 2017-09-19 Comscore, Inc. Determining client system attributes
US8626901B2 (en) * 2010-04-05 2014-01-07 Comscore, Inc. Measurements based on panel and census data
US8615605B2 (en) * 2010-10-22 2013-12-24 Microsoft Corporation Automatic identification of travel and non-travel network addresses

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080010307A1 (en) * 2000-02-14 2008-01-10 Overture Sevices, Inc. System and Method to Determine the Validity of an Interaction on a Network
US20020138331A1 (en) * 2001-02-05 2002-09-26 Hosea Devin F. Method and system for web page personalization
US20050203952A1 (en) * 2004-03-11 2005-09-15 Microsoft Corporation Tracing a web request through a web server
US20110016129A1 (en) * 2008-03-04 2011-01-20 Invicta Networks, Inc. Method and system for variable or dynamic classification
US20100192069A1 (en) * 2009-01-23 2010-07-29 Cisco Technology, Inc. Differentiating a User from Multiple Users Based on a Determined Pattern of Accessing a Prescribed Network Destination

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2758892A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103812961A (en) * 2013-11-01 2014-05-21 北京奇虎科技有限公司 Method and device for recognizing Internet protocol (IP) addresses of designated class and defending method and system
CN103812961B (en) * 2013-11-01 2016-08-17 北京奇虎科技有限公司 Identify and specify the method and apparatus of classification IP address, defence method and system
US10033694B2 (en) 2013-11-01 2018-07-24 Beijing Qihoo Technology Company Limited Method and device for recognizing an IP address of a specified category, a defense method and system

Also Published As

Publication number Publication date
EP2758892A4 (en) 2015-05-20
US8700766B2 (en) 2014-04-15
JP2014527250A (en) 2014-10-09
JP6165734B2 (en) 2017-07-19
KR102021062B1 (en) 2019-09-11
CA2848472C (en) 2017-02-28
EP2758892A1 (en) 2014-07-30
US20130067070A1 (en) 2013-03-14
KR20140064958A (en) 2014-05-28
CA2848472A1 (en) 2013-03-21
CN103917969A (en) 2014-07-09

Similar Documents

Publication Publication Date Title
US8700766B2 (en) System and method for indirectly classifying a computer based on usage
US12015681B2 (en) Methods and apparatus to determine media impressions using distributed demographic information
KR101753179B1 (en) Method and apparatus to determine ratings information for online media presentations
JP6827515B2 (en) Viewing time clustering for video search
US10134058B2 (en) Methods and apparatus for identifying unique users for on-line advertising
US9118542B2 (en) Methods and apparatus to determine an adjustment factor for media impressions
AU2011238471B2 (en) Measurements based on panel and census data
US20120084828A1 (en) System and Method for Linking Web Browsing with Television Viewing
US8751303B2 (en) Systems and methods of tracking online advertisement exposure
US10163130B2 (en) Methods and apparatus for identifying a cookie-less user
US20140100948A1 (en) Automated Monitoring and Verification of Internet Based Advertising
US20150154632A1 (en) Determining a number of view-through conversions for an online advertising campaign
EP2779073A1 (en) Method and system for determining changes in brand awareness after exposure to on-line advertisements
US8667135B1 (en) Detecting and reporting on consumption rate changes
US10115124B1 (en) Systems and methods for preserving privacy
WO2008092145A9 (en) Marketplace for interactive advertising targeting events
EP2210229A1 (en) Targeted online advertising
US20200234328A1 (en) System and method for measuring the relative and absolute effects of advertising on behavior based events over time
US20140278796A1 (en) Identifying Target Audience for a Product or Service
US20200058037A1 (en) Reporting of media consumption metrics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12832565

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2848472

Country of ref document: CA

Ref document number: 2014530707

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2012832565

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2012832565

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20147009864

Country of ref document: KR

Kind code of ref document: A