EP1277141A2 - System, verfahren und computer-programm-produkt zum erstellen eines demographischen hyperwürfels mit einem inventar als zentrum - Google Patents

System, verfahren und computer-programm-produkt zum erstellen eines demographischen hyperwürfels mit einem inventar als zentrum

Info

Publication number
EP1277141A2
EP1277141A2 EP00939705A EP00939705A EP1277141A2 EP 1277141 A2 EP1277141 A2 EP 1277141A2 EP 00939705 A EP00939705 A EP 00939705A EP 00939705 A EP00939705 A EP 00939705A EP 1277141 A2 EP1277141 A2 EP 1277141A2
Authority
EP
European Patent Office
Prior art keywords
user
data
user data
users
location
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00939705A
Other languages
English (en)
French (fr)
Inventor
Christopher M. Kirby
Steven C. P. Chang
John D. Bartels
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Teralytics Inc
Original Assignee
Teralytics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Teralytics Inc filed Critical Teralytics Inc
Publication of EP1277141A2 publication Critical patent/EP1277141A2/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • the present invention relates to understanding purchasing behavior in traditional stores and to an improved method of performing demographic, psychographic, and behavioral analysis of the Internet.
  • a first technique for solving this problem can involve simple counting of attributes of interest. Specific examples can include tracking the amount of user activity, the number of users, the number of users that visited a given page X, and the breakdown of males versus females. Companies like WebTrends of Portland, OR and others perform this type of analysis. By providing these metrics, this technique can increase understanding by providing information on the total audience. Unfortunately, this technique does not scale well when information on targeted audiences rather than a total audience is needed.
  • a second technique involves user-centric clustering that extends the first technique to provide information on subgroups of the total audience. All users can be assigned to a cluster, using a classification, such as, e.g., matching pre-defined categories, or using traditional clustering techniques, where clusters are generated dynamically. Personify of San Francisco, CA and DataSage of Reading, MA are examples of companies performing this type of analysis. With users assigned to clusters, the second technique can generate totals as before, but on a per-cluster basis. So, e.g., if there are five clusters of users, then there could be five sets of totals. This technique can allow for an understanding of subgroups of the total audience. Unfortunately, the second technique is constrained in that it can only provide information about those users represented by clusters. If there is interest in some other subgroup not represented by a cluster, this second technique cannot offer any information.
  • Advertisers e.g., are interested in targeting ads to particular users.
  • Electronic commerce (e-commerce) companies also attempt to target customers on the Internet.
  • Web servers also want to recognize a return visitor to a web site in order to provide customized presentation of a web page. Different methods of detecting and tracking access to web sites are available.
  • conventional web traffic analysis and tracking tools have limitations.
  • Effectiveness of conventional tracking and reporting systems analyzing user accesses of web sites is limited by the granularity of data available regarding user behavior, and by methods used to access the Internet.
  • Conventional user interfaces for analyzing demographic data are limited in that they provide statistical summary data only on a per site basis. It is desirable that statistics regarding user behavior such as, e.g., Internet user behavior, be provided on a per location basis.
  • statistics regarding user behavior such as, e.g., Internet user behavior
  • conventional systems cannot provide user behavior information on a per location basis. Instead, conventional systems can only provide user traffic statistics on a per-site basis.
  • a per site basis as compared to a per location basis, can only provide statistics regarding Internet user behavior traffic generally about a site itself, and cannot provide granular statistical data down to the level of a specific page within that site.
  • location refers to a distinct webpage within a website.
  • a website ofAmazon.com can include many web pages, each web page corresponds to a separate file having a separate universal resource locator (URL) filename associated with it.
  • URL universal resource locator
  • URL filename of the Amazon.com site such as, e.g., http://www.amazon.com/subdirectory/subsubdirectory/filename.htm, is thus referred to as a "location.”
  • User behavior statistics are not conventionally available to this level of granularity.
  • Analysis of behavior of client users can include tracking demographic attributes of the users of the Internet.
  • a demographic attribute can include a "pure demographic" attribute such as, e.g., a client user's age, gender, and salary range.
  • Another demographic attribute can be obtained by analyzing the behavior of a client user. It is desirable that user demographic statistical data be available on a per location basis.
  • NCSs network communications servers
  • ISP Internet service providers
  • Proxy servers can shield some Internet requests by a client host from the rest of the Internet. For example, proxy servers often cache, i.e., store for future access, certain popular web pages from a web server. Caching can improve access time to the web page for users and can save communications costs for the ISP.
  • the web client requests access to a cached web page, the cached web page is accessed from the proxy server's cache and no request is made to the web site where the requested web page resides. It is possible then that the web server of the requested page is never accessed once the page is cached. Thus the web server is not made aware that the web client accessed its web server site.
  • IP Internet protocol
  • An Internet host's IP address is analogous to a postal mailing address and is used for sending information between multiple Internet hosts.
  • a network communications server (NCS) is often used to assign an IP address to a computer host.
  • NCS can permanently assign an IP address to a host, known as static IP address assignment.
  • the NCS can also temporarily assign IP addresses, known as dynamic IP address assignment.
  • Web servers have, for a long time, had the ability to customize a web site for a particular person on a person-by-person basis. Imagine how difficult it would be to maintain a list of preferences for each user that ever visited a particular search site such as, e.g., Yahoo. To keep such preferences up-to-date, if a Yahoo web server was being accessed by millions of users, then it could amount to millions of bytes of data requiring to be stored on the web server, which would need to be retrieved in a timely manner. It was thought to be better to have each user maintain his or her own preferences locally to eliminate retrieval time and to maintain privacy. "Cookies" came about to enable timely retrieval of customized web pages.
  • Cookies can be used to identify web access by some user clients. Cookies are a general mechanism which server side Internet web connections such as, e.g., common gateway interface (CGI) scripts, can use to store and retrieve information on a client side of a hypertext transfer protocol (HTTP) connection.
  • CGI common gateway interface
  • HTTP hypertext transfer protocol
  • a cookie is a well- known term used for describing an opaque piece of software data held by an intermediary.
  • a cookie is a holder of information. It cannot be used to get information off of a client's hard disk drive. Rather, a cookie can be used to save information entered voluntarily by a client and can be saved for future reference to avoid retyping of this information.
  • cookies include, e.g., indicating a preference for viewing web pages in frames or text-only format, viewing a page in a particular language, storing a password and user name or other account number for sites that charge for viewing, and saving any other personal data needs which can be saved in a cookie so long as it isn't too long so as to exceed the 4K bytes limit for a cookie.
  • a cookie can be sent from an HTTP server to a client. Once sent, the cookie will be forwarded along with any request to the server from the client.
  • HTTP servers are internet servers which can contain hypertext software code such as, e.g., hypertext markup language (HTML).
  • a client on the Internet When a client on the Internet enters a universal resource locator (URL) address into a web browser, it is converted by a domain name server (DNS) into an IP address corresponding to a file on a server.
  • DNS domain name server
  • the HTML source code is sent from the server to the client's browser.
  • the browser parses the code into several requests which can be sent to the server from the client.
  • a server when returning an object to a client, can also send along a cookie, which the client can store on its workstation. Included in the state object is a description of the range of URLs for which that state is valid, i.e., the domain of the cookie. Any future requests made by the client which fall in that range, i.e.
  • domain can include a transmittal of the current value of the state object from the client's browser to the server.
  • references to HTTP requests, HTTP servers and HTTP clients in this document could also include other types of servers, clients and information transfer such as, e.g., data, media, audio, telephony, and streaming technologies.
  • Each cookie is a multipurpose Internet mail extension (MIME) header that can be used to exchange information automatically between a server and a browser without a user seeing what is being transmitted.
  • the server can provide the user's browser a web page customized according to the pre-defined preferences contained in the cookie.
  • Cookies can be used, e.g., bye-commerce shopping applications to store information about currently selected items, and for fee services to store registration information. Cookies can free the client from retyping a user ID on the next connection, and can store user preferences for the client such as, initial screens preferred upon entry to a domain.
  • a user disables the use of cookies it can be difficult to identify access by a client user.
  • a cookie can be useful in tracking a user's actions and preferences.
  • cookie data can be used to save values of data entered into a form.
  • a cookie can be used by a web server site to store a user's preference information over several visits such as, e.g., how the user prefers to view the web page (in text or frame format), a user's name or address and preferred language.
  • Conventionally designed cookies support only one domain, so a different cookie is needed for each domain. Unfortunately, this can require a large number of cookies to be placed on the user's hard drive. Once an architectural limit is reached, some cookies are also deleted. Also, if more than one person uses the same computer, unfortunately, no provision exists to alert the web server that a different user is accessing the site.
  • a user accesses a web site using a different computer, there is no provision to notify the web server of the identity of the user. Also, if a user disables the use of cookies, then a domain has no way to identify the user requesting a web page view so as to permit customizing the view of the domain's web page. To some users, the automatic creation and retrieval of cookies raises privacy concerns. Browsers can allow users to disable the cookie feature, thus eliminating the tracking mechanisms.
  • proxy server Although useful, conventional cookies do not identify all attempted accesses to web sites.
  • Using a proxy server presents special problems to those attempting to gather data regarding web access usage by users accessing the Internet via a proxy. It has proven especially difficult to track usage by such users which cannot be identified uniquely by a permanently assigned unique IP address.
  • the proxy server does not forward all requests to the website (i.e., the server). Instead, the proxy server returns pages previously retrieved which it has stored in its cache.
  • a document describing a methodology, "Basic Advertising Measures,” is basicadmeasures.
  • the methodology can help a single site to determine how many persons saw a particular ad on a web site and clicked on the ad. Therefore, this methodology lets a web server know that someone came to the site, but it does not permit the web site to know who came to the site, unless they also use a cookie.
  • the methodology is used for counting Internet banner ad impressions and clicks. The methodology was designed such that two compliant implementations would generate basic impression and basic click counts that differ by less than 5%.
  • Ad requests refer to the method of counting an ad impression when a page containing the ad HTML is requested.
  • the ad download method counts an ad impression when the ad media (in this case, an image) is requested from a server.
  • the methodology defines an ad counter as a program that responds to browser requests (e.g., an image tag IMG SRC request, and an anchor tag A HREF request) related to advertising.
  • a valid basic impression is counted only when the ad counter receives and responds to a request for an image from a browser.
  • This image request must be the result of an IMG tag in the HTML page.
  • the ad counter returns a location redirect, specifying the location of a file or other program that delivers the image media.
  • a valid basic click is recorded only when an ad counter receives and responds to a click request from a browser.
  • the click request is the result of a user clicking on an anchor tag in the HTML page.
  • the ad counter returns a location redirect, specifying the location of the destination for the ad.
  • the methodology includes several mechanisms to defeat proxy caching. To defeat caching, the methodology requires the IMG SRC URL to be unique across page requests by a single browser.
  • the methodology suggests inserting the current time with seconds, or a sufficiently large random number in the IMG SRC URL as the page is delivered to the browser.
  • the methodology is rather complex and still only results in information including the number of ad impressions, with no identification of who accessed the ad, unless the user has enabled cookies.
  • using a cookie is complementary to using the methodology to provide additional information to the basic ad impressions. A better approach is needed.
  • a global profile cookie is provided by a global profile service.
  • a global profile service can provide ads to multiple web content providers.
  • a global profile service can store a single file on a user's machine that includes identification information for that user.
  • Different domains can then subscribe to the global profile service to permit the domains to use the global profile service to provide features such as targeted ad banners on the domains' web pages.
  • the subscribed domains use the global profile service to perform broader analysis of user traffic across the subscribed domains using the global profile cookie. Unfortunately, if a domain does not subscribe to the global profile service, traffic by the user to the unsubscribed domain would not be tracked and/or analyzed.
  • Analysis of a client user can include tracking demographic attributes of the user.
  • a demographic attribute can include a "pure demographic” attribute such as, e.g., a client user's age, gender, and salary range.
  • Another demographic attribute can be obtained by analyzing the behavior of a client user.
  • a method, system, and computer program product for analyzing client user accesses to the Internet in a substantially real-time manner can include accessing raw data, processing the raw data using a core technology and interfacing the raw data with the core technology.
  • interfacing can include accessing a proxy log including a proxy log data record having a field including a location requested by the client user, a first IP address of the client user making the request, an action requested by the client user, or a time of the request; accessing an IP address assignment log including an IP address assignment log data record having a field including a second IP address assigned to the client user, a userlD of the client user, or a time window of assignment of the second IP address to the client user; and merging the proxy log and IP address assignment log to obtain the clean raw data including a virtual cookie identification data including a location, an action, or a userlD.
  • An exemplary embodiment of the invention includes generating a virtual cookie.
  • An exemplary embodiment includes identifying a user accessing the Internet via a proxy server, including accessing a proxy log, accessing an IP address assignment log, and merging the proxy log and the IP address assignment log to obtain virtual cookie identification data.
  • the method can be performed post-browsing.
  • the method can be performed real-time.
  • the proxy server is owned, leased or operated by an Internet service provider (ISP).
  • the proxy server is owned, leased or operated by a corporate network.
  • the proxy server is a caching technology or a logging technology that can observe and record activity of users.
  • the proxy log is a log of the caching technology or logging technology.
  • the IP address assignment log is a dial-up log or a dynamically assigned IP address log.
  • the IP address assignment log is a statically assigned IP address log, where a network of workstations are assigned an IP address by a server.
  • the proxy log can include a proxy log data record having fields including a location requested, a first IP address of the computer of the user making the request, an action requested, and a time of the request.
  • the IP address assignment log includes an IP address assignment log data record having fields including a second IP address, a userlD of the user being assigned the second IP address, and a time window of the assignment.
  • Another embodiment features virtual cookie identification data including a location, an action, and a userlD.
  • merging can further feature correlating the first IP address and the second IP address and the time of the request and the timewindow of the assignment to the user to determine the userlD making the request.
  • IP address fields of the two log files can be referred to as a first IP address and a second IP address, respectively, the correlating step matches identical IP addresses and overlapping request timewindows to determine the user making the request. Information in other logs could also be correlated in this way.
  • Another embodiment outputs the virtual cookie identification data.
  • Yet another embodiment analyzes the virtual cookie identification data.
  • demographic analysis is performed using the virtual cookie identification data.
  • the analyzing step includes associating demographic information with the userlD.
  • Demographic information can include attribute information about the user, provided by the user.
  • psychographic analysis is performed using the virtual cookie identification data.
  • Psychographic information can include attribute information about the user which is based on analysis of observed behavior of the user.
  • analyzing can include associating psychographic information with the userlD.
  • analysis of the virtual cookie identification data is done post-browsing.
  • the analysis of the virtual cookie identification data is real-time.
  • the raw data can be provided by a website, a store, or other provider which has access to user activity information.
  • processing the raw data using the core technology can include receiving clean raw data from the interfacing step, processing the clean raw data using a raw data processor, processing output of the raw data processor to obtain an inventory-centric demographic hyper-cube, merging a plurality of the inventory-centric demographic hyper-cubes into a merged inventory-centric demographic hyper-cube, and analyzing the merged inventory- centric demographic hyper-cube.
  • processing output of the raw data processor can include loading user demographics, loading user actions, detecting and removing user activity, determining behavioral interest groups, determining user profiles, and building the inventory- centric demographic hyper-cubes.
  • loading user demographics can include accessing user demographics records including a userlD of the client user, or demographic data of the client user; accessing a user demographics database; or adding the user demographic records to the user demographics database.
  • loading user actions can include accessing user action records from the virtual cookie, including a userlD of the client user, a location requested by the client user, or a userlD of the client user; accessing a user action database, or adding the user action records to the user action database.
  • detecting and removing atypical user activity can include accessing a user action database, scanning records for an atypical client user such as a software robot or an administrative user, accessing the user demographics database, or removing the atypical client users from the clean raw data.
  • determining behavioral interest groups can include accessing a user action database, accessing an interest group definition, matching the interest group definitions and the user actions to obtain an interest group record, accessing the user demographics database, or inserting the interest group records in the user demographics database.
  • determining user profiles can include accessing a user demographics database, accessing a profile definitions database, matching the user demographics database and the profile definitions database to obtain a user profile record, accessing the user profiles database, or updating the user profiles database with the user profile record.
  • building the inventory-centric demographic hyper cubes can include accessing user demographics, accessing user actions, or combining the user demographics and the user actions to obtain the inventory-centric demographic hyper-cubes including inventory- centric data hyper-cube files and a timestamp.
  • merging of the hyper-cubes can include accessing a plurality of the inventory-centric demographic hyper-cubes, accessing a demographic date file, or merging the inventory-centric demographic hyper-cubes and the demographic date file to obtain the merged inventory-centric demographic hyper-cubes.
  • the method of the present invention in an illustrative embodiment can further include providing an interactive user interface to the core technology.
  • An exemplary embodiment of a system, method and computer program product of the interactive user interface can feature reporting Internet user behavior statistics on a per location basis.
  • An exemplary embodiment can include a method for displaying location-specific reports as a user browses the Internet, including browsing the Internet using a browser, monitoring activity with the browser, observing a location browsed where the location includes content, requesting a report on the location, and displaying the report regarding the location.
  • the method's content can be from a website.
  • content is static or dynamically generated.
  • monitoring activity can include using an activity monitor.
  • requesting a report can include requesting the report from a report server.
  • displaying the report can include displaying it on a report display.
  • the browser is an Internet browser application program.
  • the browsing step is performed by a user.
  • the user can be a producer researching audience for the location, an advertising sales person looking for a specific target audience, or an advertiser looking for a specific target audience.
  • the activity monitor can perform steps including monitoring the browser using a separate browser window, monitoring the browser using a separate application or a separate applet, monitoring the browser using a plug-in module installed into the browser, or monitoring the browser with a module incorporated into the browser.
  • report requesting can include requesting a demographic and behavioral breakdown of an audience of the location, requesting a targeted demographic and behavioral breakdown of an audience subset of the location, requesting historical traffic levels for the location, or requesting predicted future traffic availability for the location.
  • the report server is running on a computer of a user, a separate computer from the computer of the user, or the computer of the user having an activity monitor integrated with the report server.
  • the requesting of a location report can include sending the location, sending two or more preferences of a user, generating the report on a report server, or receiving the report from the report server.
  • the preferences of the user include a type of the report to be generated by the report server; or a display preference determining how the report is to be displayed.
  • FIG. 1 A depicts a high level block diagram of an example implementation of the analysis technology of the present invention
  • FIG. IB depicts an example block diagram of a network illustrating client access to the Internet
  • FIG. 2A depicts a flow diagram illustrating an exemplary implementation of generation of an exemplary virtual cookie according to the present invention
  • FIG. 2B depicts a block diagram of an exemplary embodiment of a proxy server telecommunications network configuration
  • FIG. 2C depicts a block diagram illustrating an exemplary embodiment of use of a conventional cookie
  • FIG. 2D depicts a block diagram illustrating an exemplary embodiment of use of a conventional global profile cookie
  • FIG. 2E depicts an example environment illustrating the virtual cookie and an example universal profile server of the present invention
  • FIG. 3 depicts a detailed block diagram of an example embodiment of the present invention illustrating an example implementation of the core technology
  • FIG. 4A depicts a flow diagram illustrating an example of loading user demographic data in an exemplary process database
  • FIG. 4B depicts a flow diagram illustrating an example of loading user action data in an exemplary process database
  • FIG. 4C depicts a flow diagram illustrating an example of detecting and removing activity of atypical users such as robots in an exemplary process database
  • FIG. 4D depicts a flow diagram illustrating an example of determining user behavioral data in an exemplary process database
  • FIG. 4E depicts a flow diagram illustrating an example of determining user profiles in an exemplary process database
  • FIG. 4F depicts a flow diagram illustrating an example of building inventory-centric cubes in an exemplary process database
  • FIG. 4G depicts a flow diagram illustrating an example process database technique of the present invention
  • FIG. 5 depicts a flow diagram illustrating an example of cube validity merger in an exemplary data merger of the present invention.
  • FIG. 6 depicts an exemplary computer system.
  • FIG. 7 depicts an exemplary embodiment of a traffic report display illustrating an example range of traffic summarized on an exemplary weekly basis
  • FIG. 8 depicts an exemplary embodiment block diagram illustrating an example of interaction between an activity monitor, a report server and report display with an internet browser
  • FIG. 9 depicts an exemplary embodiment of the report display
  • FIG. 10 depicts an exemplary embodiment of the demographic report display
  • FIG. 11 depicts an exemplary embodiment of the targeted demographic report display. Detailed Description of the Invention
  • the present invention is directed to an improved demographic and behavioral analysis system architecture for use in, e.g., identifying tracking and understanding user behavior on the Internet and in traditional stores.
  • FIG. 1 A depicts a high level block diagram of an example implementation of the analysis technology of the present invention.
  • FIG. 1A depicts a block diagram of an exemplary high level system architecture 100 according to the present invention.
  • High level system architecture 100 includes, e.g., raw data 102, a raw data interface 104, a core technology 106, a user interface 108, and users 110.
  • the raw data 102 is inputted into raw data interface 104.
  • the output of raw data interface 104 is input into a core technology 106.
  • Core technology 106 is described further with respect to FIG. 3, below.
  • Core technology 106 is accessed interactively by users 110 via user interface 108.
  • Raw data 102 includes, e.g., Internet client user information including logs of web page views by web browsers of demographic profile information, and information regarding purchases by users.
  • Raw data 102 can be made available by a website, an online service provider (OLSP), an internet services provider (ISP) or other entity tracking Internet client users, such as, e.g., a corporation, hereafter these entities will be collectively referred to as "customers," or users 110, although users 110 can be different than those providing raw data 102.
  • OLSP online service provider
  • ISP internet services provider
  • Raw data files 102 can be in the form of database records and data files, such as, e.g., a proxy server log file, a web server log file, an ad (i.e., advertisement, such as, e.g., a banner ad) server log file, and user registration information.
  • Raw data files 102 can also include, e.g., user purchase information, which could be obtained from a customer ISP or company, or, e.g, from the output of a super market grocery store's point-of-sale (POS) purchase card user purchase tracking file.
  • the format of raw data 102 could include log files and other files stored in a customer- specific data format.
  • Raw data 102 files include information regarding "who, what, and where.”
  • high level system architecture 100 can be used to determine "who” on the Internet did “what" kind of action, and "where" did they do it.
  • “Who” can be the client user of the Internet such as those shown in FIG. IB, below.
  • “What” can be the action the Internet client user performed.
  • "Where” can be the Internet location, e.g., universal resource locator (URL), or address of the domain and path where the Internet user performed the action, i.e., the web page requested.
  • a client user identified as User395 could have looked at (i.e. a page view action) a web page referred to as page94 (which could be a URL such as, e.g., http://www.someplace.com/index.htm).
  • Raw data interface 104 can be an application program that can be customer specific that processes raw data 102 files and sends the resulting processed output to core technology 106.
  • Raw data interface 104 in one embodiment, can read in raw data 102 files, can break down the files into useful data, and then can pass the data to a raw data processor 302, described further below with reference to FIG. 3.
  • Core technology 106 processes raw data to gain an understanding of inventory-centric demographics.
  • Inventory-centric means tracking the demographics of the client user audience that visits each location, i.e., Internet location, such as, e.g., a server of HTTP data, audio, video, telephony, media, streaming technology, or other kind of data.
  • Core technology 106 enables near real-time (or real-time) demographic reporting on a per location basis.
  • Core technology 106 enables near real-time (or real-time) demographic reporting with drill-down analysis of specific target audiences on a per location basis.
  • Core technology 106 enables near real-time (or realtime) searching for locations that best match a specified target audience.
  • Core technology is described further with reference to FIG. 3 below.
  • User interface 108 can be used for searching and dynamically generating reports. User interface 108 enables users 110 to drill down through the data contained in core technology 106. For example, user interface 108 can permit analysis of demographic and behavioral data stored in core technology 106. User interface 108 can provide a product-specific user interface into core technology 106. User interface 108 can allow users 110 to interact with demographic and behavioral data. User interface 108 provides access to users 110 to access the functionality of core technology 106.
  • User interface 108 can be web-based in one embodiment. In another embodiment any other user interface could be used, such as, e.g., a client-server based interface.
  • Users 1 10 can include, e.g., any entity with a large amount of data, that wants to analyze the data.
  • large Internet server sites, ISPs, grocery stores, and corporations may desire to analyze large amounts of data tracking client user requests or purchasing.
  • Users 110 can also include companies seeking to perform targeting promotions or advertising.
  • FIG. IB illustrates an example environment.
  • FIG. IB the figure depicts an example block diagram of a network illustrating client access to the Internet.
  • FIG. 1 depicts a block diagram of an exemplary telecommunications network 120.
  • Telecommunications network 120 includes a plurality of networks interconnected via the global Internet.
  • An internet (with a lower case "i") is a network that connects multiple networks.
  • the Internet (with a capitalized "I”) is an internet which connects computer workstation hosts in many networks which communicate using the Internet protocol (IP). Each host of the Internet has its own IP address, which is used as a source or destination address in routing packets of information through the Internet.
  • IP Internet protocol
  • FIG. IB illustrates a variety of methods available for connecting to the Internet.
  • telecommunications network 120 includes a network 122 and a network 124 which are connected to Internet 158 via a proxy server 148.
  • network 122 is a token ring network including workstations 126, 128, 130 and 132.
  • Network 124 is an ethernet network including workstations 136, 138, 140 and 142.
  • the workstations in network 122 and the workstations in network 124 are connected to proxy server 148 via network connections as represented by lines 134 and 144.
  • Lines 134 and 144 are logical connections and could represent a variety of different communications links and devices such as, e.g., cabling, gateways, bridges and routers.
  • Proxy server 148 connects to Internet 158 via a connection to Internet 156.
  • Connection to Internet 156 is a logical connection and could also represent a variety of different communications links and devices.
  • Each of workstations 126-132 and 136-142 include a network interface card (NIC) for physically connecting to the other workstations on networks 122 and 124. It will be apparent to those skilled in the art that other network connections could equally be used.
  • Subscriber 146 is connected to proxy server 148 via a modem connection (such as, e.g., a dial-up connection) using modems 150 and 152 to proxy server 148.
  • Proxy server 148 also serves to permit subscriber 146 to access Internet 158 even though it does not have a network interface card (NIC).
  • NIC network interface card
  • the machine running proxy server 148 can also act as a network communications server (NCS) 242 (described further with reference to FIG. 2E.
  • NCS network communications server
  • An NCS can provide network access to workstations so they can access the Internet by, e.g., a dial-up communications link.
  • Subscriber 146 can dial into proxy server 148 using a modem 150.
  • Subscriber 146 can be, e.g., a corporate user accessing a corporate network while out of town on business, a home user dialing up via a modem, such as, e.g., a cable modem connection or other means of access to the proxy server such as an integrated services digital network (ISDN) or a digital subscriber loop (DSL) or ISDN concentrator.
  • modems 150 and 152 in other embodiments could include any other dynamic access methods, such as, e.g., cable modems, digital subscriber line (DSL), or other means of remote or local access.
  • modems 150 and 152 are conventional analog modems that can operate at different speeds and can include various error-checking capabilities, and modulation protocols.
  • An NCS manages a pool of IP addresses which it can assign to any of workstations 126-132, 136-142 and
  • proxy server 148 can include the functionality of both a proxy server and an NCS. ISPs often route all their hypertext transport protocol (HTTP) traffic through a proxy server. Routing traffic through a proxy permits caching requests. Box
  • Proxy server 148 can act as a firewall to provide security to workstations on downstream networks 122 and 124.
  • proxy server 148 is used by an entity other than an ISP, such as, e.g., a company with telecommuting employees dialing in via an NCS.
  • the NCS is a remote access device (RAD) which can be compliant with the dynamic host configuration protocol (DHCP).
  • RAD can be used to connect off-site users to a corporate network. These users can include, e.g., salespeople, and other business professionals who travel or telecommute rather than work in a fixed office location.
  • Internet 148 includes various networks connected together communicating via the Internet protocol (IP). Different networks can be coupled via a router. A router communicates between networks and is knowledgeable of workstations on multiple domains and can route information between those domains.
  • network 162 is coupled to Internet 148 via router 160 as indicated by line 172.
  • Network 162 includes workstations 164, 166, 168 and 170 connected in an exemplary ethernet topology.
  • Router 160 of network 120 routes IP packets from workstations on network 162 to other workstations on Internet 158. It is important to note that workstations 164, 166, 168 and 170 each have their own permanently assigned IP address.
  • hidden workstations in box 154 can be assigned an IP address by an NCS and can have some hypertext transport protocol (HTTP) requests hidden by proxy server 128.
  • the hidden workstations can use a different IP address, i.e. a dynamically assigned one, each time they connect to the Internet 138.
  • a network communication system such as 242, below, can manage assigning a pool of IP addresses to the workstations it is responsible for, alternatively NCS functionality in the computer workstation of proxy server 148 could do so. It would be apparent to those skilled in the art that users 136-142 could also have permanently assigned IP addresses, known as statically assigned, assigned by proxy server 148.
  • Proxy server 148 running proxy server software can perform numerous proxy functions such as, e.g., caching of web pages. Caching of web page requests can save, e.g., as much as 50% of the traffic between connection to Internet 156 and Internet 148. Caching can be used to attain a high cache hit rate to decrease network traffic for an ISP and for enabling faster access time for users.
  • Example cache protocols include, e.g., Internet cache protocol (ICP) and cache array routing protocol (CARP). Multiple proxies can be used to store large amounts of cached data. Itwl be apparent to those skilled in the art that when a proxy server is referred to in this document, the proxy server could also be any other caching or logging technology that can observe and record user activity.
  • FIG. 2A depicts a flow diagram illustrating an exemplary optional process which can be used to process file data for use as a raw data source.
  • the data processing step creates a virtual cookie.
  • FIG. 2A illustrates a more detailed block diagram of a raw data interface 104.
  • SQUID Internet Object Cache 2 available from FTP site squid.nlanr.net.
  • SQUID-2 is derived from software developed and funded by the advanced research projects administration (ARPA) Harvest Project.
  • ARPA advanced research projects administration
  • SQUID-2 is a high-performance proxy caching server for web clients, supporting FTP, Gopher and HTTP data object requests.
  • the SQUID-2 cache software is available only in source code, is relatively fast because it handles all requests in a single, non- blocking, I/O-driven process.
  • SQUID-2 never needs to fork, is implemented with non-blocking input/output (I/O), keeps meta data and hot objects in virtual memory (VM), caches domain name server (DNS) lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests.
  • SQUID runs on all popular UNIX operating system platforms, such as, e.g., AIX, FreeBSD, HP-UX, IRIX, Linux, NeXTStep, OSF/1, Solaris, and SunOS, the OS/2 operating system platform, and the Windows/NT platform.
  • a detailed description of SQUID is available at URL http://squid.nlanr.net/Squid/, and a frequently asked question (FAQ) list is available at URL http://squid.nlanr.net/Squid/FAQ/FAQ.html, the contents of which are hereby incorporated by reference in their entirety.
  • Another example of a proxy server 128 is MICROSOFT Proxy Server 2.0 available from Microsoft Corporation of Redmond, WA.
  • FIG. 2A illustrates a flowchart 200 which depicts the process of creating a virtual cookie 214.
  • Flowchart 200 includes as input, proxy logs 202a, 202b and 202c and IP address assignment logs 204a, 204b and 204c.
  • Proxy logs 202a-c can reside on the same or different proxy servers, or web servers of customers such as an ISP.
  • IP address assignment logs 204a, 204b and 204c can also reside on the same or different proxy servers, or other servers.
  • Proxy logs 202a-c can contain requests from user client machines, such as, e.g., a subscriber. It will be apparent to those skilled in the art, that requests referred to as "HTTP requests,” or “requests” could also include other types of requests from other types of servers by other kinds of clients, such as, e.g., data, media, audio, telephony, and streaming technology requests.
  • the request can contain the requested URL (location), the IP address or number of the client user making the request, and the time that the request was made.
  • actions requested by a client user can also be captured which will usually be a web page pageview, although some users may perform another action, such as, e.g., may be selecting an advertisement or click- through, or other parsed HTML request, such as, e.g., an image request.
  • information may be logged relating to pageview actions, and in other embodiments, other actions can also be logged such as, e.g., click-through information to identify other behavior. Therefore, proxy logs 202a-c including location, action, IP address, and time of the requests, can be combined as indicated in processing step 206, and can be output as represented by line 210.
  • IP address assignment logs 204a-c can be maintained by, e.g., an NCS, an RAD or other type of server of, e.g., an ISP, a corporation or other customer entity.
  • Internet client users can connect to the global Internet by using, e.g., a modem to establish a dial-up connection to, e.g., an ISP or customer.
  • the ISP can assign the user a temporary IP address or an IP number, which the client user can use for the duration of the user's connection to the Internet.
  • IP address assignment logs 204a-c can also be consolidated as depicted in FIG.2A as part of raw data interface 104.
  • IP address assignment logs 204a-c can include for a dial-up Internet access by a client user, the client user's user identification (userlD), an IP address, and a time window during which the client user was connected to the Internet.
  • a userlD is a unique identifier for a user.
  • IP address assignment logs 204a-c including UserlD, IP address, and time window logged on, can be combined as indicated in processing step 208, and output as represented by line 212 of raw data interface 104.
  • Virtual cookie 214 can take as input the output of steps 206 and 208 as indicated by lines 210 and 212, respectively.
  • Line 210 can represent the output of processing of proxy logs 202a-c and line 212 can represent the output of processing of IP address assignment logs 204a-c.
  • Virtual cookie 214 can merge the data contained in steps 206 and 208 and, can correlate using IP address (or IP number) and time to obtain the locations requested and actions requested by userlD. Specifically, virtual cookie 214 can create a merged file which can include, e.g., for each location accessed, the action requested, and by what userlD, which is indicated in step 216.
  • virtual cookie 214 can identify, e.g., all locations accessed by a specific user 222 (shown in FIG. 2E). Further demographic and psychographic analysis can be performed to create a profile for user 222 using the identified locations accessed by the user.
  • FIG. 2B depicts block diagram 248 which illustrates an exemplary network configuration for an example proxy server 148.
  • Block diagram 248 includes at its base at the physical level dynamic access method 258, which in one embodiment could be, e.g., modem 132, an internal network interface card (NIC) 250 facing downstream networks 122and 124, and an external network interface card (NIC) 252 which provides upstream access to Internet 158 via connection to Internet 156, which could be, for example, a router.
  • An example router is a CISCO router available from CISCO Corporation of Mountain View, CA.
  • Included in block diagram 248 are low level protocol drivers 254, transmission control program/Internet protocol (TCP/IP) network protocol stack 256, web proxy 260, IP address assignment log 204 and proxy log 207.
  • TCP/IP transmission control program/Internet protocol
  • IP address assignment log 204 is a dial-up log, which could be, e.g., a log of dial- up subscribers 146 dialing up to access an ISP. In another embodiment, IP address assignment log 204 tracks static or dynamic assignments of IP addresses to users over time.
  • IP address assignment log 204 and dynamic access method 258 run on a separate network communications server (NCS) computer than the proxy server, see FIG. 2B below.
  • NCS network communications server
  • proxy server 148 could include web proxy 260 and proxy log 202 and would have the proxy server software running on a separate computer from the NCS.
  • NCS network communications server
  • an example NCS is a remote access device (RAD).
  • the RAD can comply with dynamic host control protocol (DHCP).
  • DHCP dynamic host control protocol
  • the RAD can be used to provide dynamic IP address assignment to network workstations connected through a proxy server.
  • a dynamic IP log tracks the assignment of IP addresses to the network workstations. IP addresses can be dynamically or statically assigned to the network workstations.
  • Proxy server 128 supports Internet access requests from downstream workstations 126- 130. and 136-142, and subscriber 146. Requests can come into proxy server 128 through internal NIC 250 and can be handled by, e.g., web proxy 260, to open a connection to Internet 158 out through external NIC 252. Requests can also come into an NCS (shown in FIG. IB as part of proxy server 148) via modem 152 from a subscriber 146 and can be similarly handled.
  • NCS shown in FIG. IB as part of proxy server 148
  • Proxy server software can perform logging functions. Each request from a workstation in box 135 to access Internet 158 is logged in proxy log 202. Proxy log 202 in a typical environment can log a location that a client attempts to request, e.g., a URL address. Proxy log 202 can also include a log of the action requested such as to open the URL, the IP address requesting the action, and the time at which the request was made by that IP address.
  • proxy logs could also be the log from any caching or logging technology that can observe and record user activity.
  • a network communication server also performs logging functions. For example, when subscriber 146 attempts to log onto Internet 158 by initiating a connection via modem 150 to modem 152 of proxy server 148, an IP address assignment log 204 records information such as the time period that a subscriber was logged on or the time period an IP address was assigned (statically or dynamically) to a network workstation. Specifically, IP address assignment log 204 can track information about, e.g., subscriber 146, including, e.g., a user ID of subscriber 146, the IP address assigned to subscriber 146, and the time period logged on including, e.g., a start time along with either an end time or a duration.
  • a dynamic IP address assignment log can record IP address assignments of a DHCP-compliant remote access device (RAD). Other alerts and logs can also be maintained.
  • RAD DHCP-compliant remote access device
  • FIG. 2C depicts a block diagram 262 illustrating the use of a persistent client state cookie.
  • a user 264 accesses the Internet 158 through an client 304 having an IP address.
  • Block diagram 262 assumes that the IP address is either permanently assigned to client 266, such as, for workstations 164-170, or is temporarily assigned to client 266, such as, e.g., for workstations 126-132, 136-142 or subscriber 146, using an assigned IP address from proxy server 148.
  • client 266 is connected to Internet 158 to access various servers such as, e.g., servers 268 and 270.
  • servers 268 and 270 e.g., servers 268 and 270.
  • an administrator of server 268 wishes to be able to track accesses by user 264.
  • a software tool, the cookie has been developed to enable server 268 to do so.
  • user 164 requests to view a particular URL (e.g. http://www.something.com), as illustrated by line 272
  • server 268 can then respond, as illustrated, by line 274 to client 266.
  • the process of accessing a particular web page is now briefly described.
  • the browser of user 264 parses the HTML source code which comprises the entered URL.
  • Parsing involves breaking up the HTML source file into separate requests of the domain server corresponding to the URL.
  • the HTML source file could include several image tag references.
  • An image tag reference (IMG SRC) can require the browser to request a graphical bitmap image for insertion in the hypertext document.
  • server 306 can send down to user 264, e.g., the requested text of the web page, and/or parsed images, associated with the URL requested in line 272.
  • server 268 can send along an embedded software object, known as a persistent client state cookie 276, the "cookie.”
  • Cookie 276 can include a required name field which contains a value which may include information encoded within its value placed there by server 268 to identify user 264.
  • cookie 276 can contain an expiration date and time, in Greenwich Mean Time (GMT), a domain of the cookie which is limited to a single domain, a path, and a security setting. Assuming user 164 is connected to Internet 158 via a connection, such as subscriber 146, then cookie 276 could be placed on the hard disk drive of the workstation of subscriber 146.
  • GTT Greenwich Mean Time
  • server 268 can also be sent by the browser the cookie 276 from the hard disk drive of dial-up subscriber 146 and can decode the information contained within cookie 276 in order to recognize user 264 and customize presentation of the webpage according to the preferences of user 264.
  • FIG. 2D depicts a block diagram 184 illustrating another way of tracking users on the Internet, in this case using a global cookie 289.
  • the global 289 cookie is based on the idea that the more user actions observed of a user, the more information will be available about the user, and the more accurately can the user target by analysis.
  • Block diagram 284 includes a user 286a accessing the Internet via an client 288 with a permanently assigned IP address.
  • Block diagram 284 also includes a user 286b which is using the subscriber 146 work station as indicated within box 299 whose access to Internet 158 is via proxy server 148 which could be that of e.g., an Internet service provider (ISP).
  • ISP Internet service provider
  • user 286b is assigned an IP address by ISP proxy server 148.
  • client 288 can be one of workstations 164-175 with a permanently assigned IP address.
  • FIG. 2D illustrates how users 286a and 286b can be identified using a global cookie so as to provide better serving of advertisements, through the pooling of ad requests.
  • a global cookie 289 is used in this example.
  • servers 290 and 292 provide web pages to users 286a and 286b, they pool their banner ads by using an Ad server shown as global profile server 298.
  • Ad server shown as global profile server 298.
  • a single global cookie 287 is placed on the workstation of user 286a.
  • global profile server 298 can analyze the browsing habits of user286a and target an ad using the user profile 281 of user 286a.
  • Servers 290 and 292 would need to have subscribed to the advertising services of global profile server 298 which can include user profiles 281 including ad preferences and viewing history for users 286a and 286b for web sites which have subscribed with global profile server 298.
  • An example global profile server 298 is ProfileServer 4.0 available from Engage Technologies, Inc. of Andover, MA.
  • Global profile server 298 only supports HTTP servers 406 and 408 which have subscribed to the global profile server 298 advertising services.
  • the browser of user 286a parses the HTML source of the page into multiple requests, such as, e.g., IMG SRC requests, as indicated by line 294. For example, a bitmap image can be sent down as indicated by line 296.
  • an advertisement banner request can be parsed out which is then sent as a GET request to global profile server 298, including global cookie 287.
  • global cookie 287 can be used by global profile server 298, to access user profile 281 corresponding to user 286a to determine a banner ad to display in the requested web page.
  • global profile server 298 can send the banner ad to the browser of workstation client 288 of user 286a for viewing.
  • Server 290 can then query global profile server 298 as indicated by line 291 and can receive results about user 286a as indicated by line 293 from a subscribed user profile .
  • Global profile server 298 can store browsing information about user 2186a on global profile server 298 for use in targeting future ads to user 286a.
  • User profile 281 can include other information about user 286a such as, e.g., declared profiles and behavior profiles, for local browsing behavior and web wide (so long as a subscribed server 290, 292).
  • Declared profiles would need some how to be captured, e.g., during access to an ad, user 286a would need to offer information, or such information would need to be supplied to subscribed server 290 and 292, which would need to capture and forward such information to global profile server 298.
  • Global profile server 298 would be sent the same global cookie 287 by the browser of user 286a, as indicated by line 295.
  • Global profile server 298 would send a targeted ad as indicated by line 297 to the browser of client 298 of user 286a.
  • Global profile server 298 would pool the information gleaned regarding user 286a from the multiple subscribing servers 290 and 292.
  • global profile server 298 could place an ad on the requested web page from Server 292 to user 286a, to direct user 286a to services on server 290.
  • global profile server 298 can pool behavior from multiple subscribed server 290 and 292 sites to more narrowly target advertising to user 286a, based on user profile 281.
  • Server 290 and server 292 when connected via subscriber workstation 146 and proxy server 148, the browser of user 286b could similarly parse the requested web page and then could request part of the HTML from servers 290 and 292, and could similarly request ad banners from global profile server 298 by sending global cookie 289 to global profile server 298 to identify user 286b.
  • Global profile cookies 289 and 287 provide the advantage of using a single global profile cookie for permitting observation of behavior of users across multiple subscribed domains.
  • using global profile cookies has limitations. For example, servers 290 and 292 must be subscribed with global profile server 298. Use of a global cookie still requires that the cookies feature of a user's browser be enabled. If servers 290, 292 do not subscribe to global profile cookie 287, 289, then browsing of non-subscribed sites by users 286a and 286b is not tracked.
  • An example of an ad server of this sort is, e.g., http://www.doubleclick.com.
  • global profile server 298 can observe behavior of anonymous visitors across multiple subscribed web sites.
  • Global profile server 298 can build an interest profile for users 286a and 286b based on which subscribed sites global profile server 298 observes users 286a and 286b browsing.
  • FIG. 2E depicts an example environment illustrating the virtual cookie and an example universal profile server of the present invention. Specifically, FIG. 2E depicts a block diagram 220 illustrating the use of a universal profile server 240 according to the present invention.
  • Universal profile server 240 can use post browsing analysis to create a virtual cookie 214 to track web browsing behavior by users, in one embodiment of the invention. In another embodiment of the invention, universal profile server 240 can create the virtual cookie 214 in real-time. Virtual cookie 214 advantageously does not actually require that any user enable the cookie feature of browsers. Virtual cookie 214 also provides much more targeted information regarding user browsing habits than available through any conventional behavior tracking approaches.
  • virtual cookie 214 can provide much more robust analysis information regarding not only how many visited sites, but also, e.g., what types of client users, i.e., who visited a location, what action was performed at the location visited, and where was the location visited.
  • the present invention uses the IP address of the workstation of users 222a and 222b in order to uniquely identify users 222a and 222b. It is conventionally thought that the IP address of the workstations of users 222a and 222b is insufficient to track all users, because of the large amount of dynamic IP allocation.
  • user 222a has a permanently assigned IP address, assigned by network communications server (NCS) 242.
  • NCS functionality is contained on the same machine as the proxy server software, e.g., proxy server 148.
  • User 222b accesses Internet 148 using a temporarily or dynamically assigned IP address from NCS 242, so user 222b can use a different IP address each time it accesses Internet 158.
  • HTTP servers 226 and 228 can never definitively know whether a particular user 222b is accessing servers 226 and 228.
  • a post browsing analysis technique or a realtime technique in an alternative embodiment of the present invention, named a virtual cookie 214 (recall FIG. 2 A above)
  • all web site browsing of user 222b and 222a can be tracked and analyzed by individual user.
  • the analysis tool “virtual cookie,” it is not in fact a cookie at all, and does not require enablement of browser cookie features.
  • universal profile server 240 uses a virtual cookie 214 to identify and analyze all web sites browsed by users 222a and 222b.
  • virtual cookies 214a or 214b all sites accessed by the users can be analyzed with no cookie needing to be stored on HTTP client 224 or workstation 146 of users 222a and 222b, respectively.
  • virtual cookie 214 enables tracking and analysis of requests which were completed by the proxy server as a result of cache hits.
  • HTTP client 224 or other data, media, audio, video, telephony or streaming technology client
  • workstations 164-170, 126-132, or 136-142 accessing web pages of HTTP servers 226 and 228 (or other servers) using a permanently assigned IP address through proxy server 148
  • all traffic can be tracked by virtual cookie 214a.
  • HTTP client 224 would request websites from HTTP servers 226 and 228, as represented by lines 230-232 and 236-238, but would often rather receive a cached version of the requested web pages from the proxy server 148 as represented by lines 244 and 246.
  • Virtual cookie 214 tracks all requests by HTTP client 224 (and subscriber 146), including those for which a proxy server returned a cached web page to the client.
  • HTTP client 224 and subscriber 146
  • the user's browser is configured to use the proxy server. If HTTP client 224 is assigned an address by a DHCP compliant RAD type NCS 242 (as shown), then the IP address assignment log 204 can track all locations accessed by user 222a.
  • HTTP client 222b is a dynamically assigned IP address device, assigned via an NCS 242
  • the client can also be analyzed and tracked in the same way as described with reference to statically assigned IP address devices.
  • Virtual cookie 214 can be created by using analysis after browsing by user 222b is completed and logged. This post browsing analysis can be performed at the proxy server as illustrated by virtual cookie 214a of FIG. 2E. Alternatively, post browsing analysis can be performed at a separate server such as, e.g., universal profile server 240 with access to the log data necessary for creating virtual cookie 214b.
  • a significant advantage of universal profile server 240 over any conventional profile server technology is its ability to identify users accessing Internet 158 via a proxy server 148, such as user 222b using subscriber workstation 146.
  • a very large portion of the Internet population accesses the Internet via proxy servers, and in particular via proxy servers of internet service providers (ISPs) and other corporate entities.
  • ISPs internet service providers
  • AOL ISP American On Line
  • proxies Often all HTTP (and data, media, audio, video, telephony and streaming technology) traffic is sent through proxies to take advantage of proxy functions such as caching. With access via a proxy server, caching of web pages conventionally prevents accurate tracking of web site requests.
  • Using the universal profile server 240 also provides the advantage of enabling cross web site analysis. For example, a web client may go to a combination of web sites which when analyzed together can indicate a particular attribute about the user.
  • Creating a virtual cookie 214 as already described above with reference to FIG. 2 A permits tracking and analyzing all browsing activity of users 222a and 222b.
  • the technique maps an IP address to a userlD. In one embodiment, this mapping is performed post-browsing. In another embodiment, this mapping is performed in substantially real-time.
  • virtual cookie 214a can be determined on proxy server 148, which can, e.g., be a proxy server.
  • virtual cookie 214b can be created on a separate universal profile server 240. Universal profile server 240 can be a separate server computer or several computers with connectivity to Internet 158.
  • Virtual cookie 214a can identify user 222b by using information contained on proxy server 148 including requests to servers 226 and 228, and contained in logs on NCS 522.
  • users 222b could not be identified by IP address, since it could have changed with every access.
  • specific browsing habits of such users were only accessible by using a conventional cookie and without one, only general information was available.
  • proxy logs 202 from, e.g., an ISP browsing of users 222b of the ISP could only be reviewed generally because the proxy logs 202 only contain the requesting IP address, which does not uniquely identify a specific user 222b. Instead, the IP address represents a number of different users 222b, since a pool of IP addresses are assigned to a variety of users 222b by NCS server 242, at the time of IP address assignment.
  • Virtual cookie 214 can be used to analyze demographic and behavioral information about client users of the Internet 158.
  • User activity data from, e.g., a virtual cookie, or user activity recorded by websites, stores, or other entities can be used and analyzed.
  • Demographic information can include, e.g., attribute information about a given user that is provided by the given user.
  • Demographic information can be collected by an ISP or website, for example.
  • Demographic information is often misleading. For example, an Internet user can often attempt to protect his or her privacy by withholding information or providing intentionally false or misleading information in a profile request of, e.g., the ISP.
  • Demographic information can be collected from registration and from other sources.
  • Behavioral information can often be substantially different from demographic information attributes provided by a given user. Behavioral information includes observed behavior based attributes for a given user. Behavioral information is based on tracking observed behavior to create a behavioral profile for a given user. Since behavioral information tracks real user behavior, it is thought to be often more trustworthy than entered or claimed attribute information.
  • a universal profile server 240 can associate user demographic information with an identified user. For example, once a userlD is determined for a client user, demographic information can be associated with the userlD. For example, demographic information can be obtained by an ISP during the registration process. Demographic information is also often entered into servers during interaction between the user client and the server. However, demographic information captured by servers is often difficult to easily access and can be retained as proprietary by a given server.
  • universal profile server 240 can prepare a behavioral profile based on observed behavior of users.
  • this behavior information is more easily accessed and can be used as a highly reliable proxy for less reliable, less easily accessible demographic information.
  • Demographic information which can be collected about users and observed behavioral information compiled from the virtual cookie can then be analyzed in combination to provide, e.g., targeted advertising, targeted e-commerce offerings, and customized or personalized content, products and services. Analysis can be performed post-browsing or in real-time.
  • universal profile server 240 can store the information output from flowchart 200 in step 210 and can perform further analysis on the information, using the information in the virtual cookie 214 as an index of information regarding browsing habits of users 222a and 222b. Additional profile information could be collected and associated with a given user 222a and 222b by associating the information with the userlD, in universal profile server 240.
  • universal profile server 240 could track user demographic information.
  • a demographic information profile can be gathered about a user.
  • a user's demographic profile information can include information such as, e.g., user ID, nicknames, aliases, e-mail addresses, home post office addresses, home city and state, home zip codes, work post office addresses, work city and state, work zip codes, home telephone numbers with area codes, work telephone numbers with area codes, home and work fax numbers, personal URL homepages, favorite URLs, preferred languages, and other user demographic information.
  • User demographic information for a given user can also include, e.g., gender, age, national origin, race, orientation, marital status, weight, height, other dimensions, music preferences, drinking preference, smoking preference, education attained, income brackets, occupation, years employed, particular interest groups such as, e.g., golf, fishing, sewing, safety, women's issues, and for business accounts, other interest groups such as, e.g., industry areas, employer information, size of the business, sales of the business, earnings of the business, number of employees employed by the business, the business type, SIC code, and industry SIC code, business location, business size (small, medium, large, multi-national business), other company information such as, company e-mail information, address information, telephone, fax, and company home page URL.
  • interest groups such as, e.g., golf, fishing, sewing, safety, women's issues, and for business accounts
  • other interest groups such as, e.g., industry areas, employer information, size of the business, sales of the business, earnings of the
  • user psychographics, or behavior can be observed and associated with a userlD of a user to provide perhaps an even more accurate profile of the user.
  • Behavioral information is tracked based on analyzing the virtual cookie 214 for a user including the locations browsed and actions taken by the user.
  • the sites accessed by users 222a and 222b would be generated by virtual cookie 214.
  • universal profile server 240 can place a user in one or more categories. For example, if a user frequents many golf club manufactures' sites and golf course condition sites, then the user might be placed in a golfing enthusiasts' interest group.
  • psychographics or behavioral analysis can indicate an expected demographic profile based on behavior analysis of comparable users 222a and 222b. Analysis decisions could be made such as, e.g., determining whether to trust a user's declared profile, or whether to rather rely on the behavior observed in virtual cookie
  • User behavioral information could include the history of URLs visited, and advertising banners selected.
  • Interest group profile categories could be created based on certain observed behavior as recognized by analyzing browsing history captured in virtual cookie 214 of users
  • FIG. 3 depicts a detailed block diagram 300 of an example embodiment of the present invention illustrating an example implementation of the core technology. Specifically, FIG. 3 includes a detailed description of core technology 106. Block diagram 300 details components of core technology 106, including a raw data processor 302, a process database (DB) 304, a data merger 306, and a data analyzer 308.
  • DB process database
  • Raw data processor 302 interacts with the raw data stream sent by raw data interface 104. Due to the large volume of raw data that potentially must be handled, speed and memory efficiency are critical. For example, an Internet portal site can easily have 10 gigabytes (GB) of raw data to process each day.
  • Raw data processor 302 efficiently handles incoming raw data, identifying known users, actions, and locations, and prepares the raw data file records for input into process database 304.
  • Raw data processor 302 takes as input the output of raw data interface 104.
  • Raw data processor 302 can be thought of as manipulating raw data 102 and cleaning up the data for processing by the process database 304.
  • Raw data interface 104 is described further with respect to FIG. 2A, above.
  • Process database 304 performs in-depth analysis on the processed data. Individual behavioral demographics can be generated based on client user activity. Inventory-centric demographics can also be calculated by process database 304.
  • Process database 304 converts the raw data into a cleaned up form and generates an inventory-centric demographic hyper-cube.
  • the inventory-centric demographic hyper-cube (ICDHC) or "cube" holds demographic information by location.
  • An inventory-centric demographics hyper-cube is an n-dimensional cube of data including, for each location, demographic information including, e.g., pure demographic information (such as, e.g., age, sex, and occupation), and behavioral demographics, (i.e., generated by observing behavior of a client user, such as the locations requested on the Internet by the user).
  • Process database 304 is described further below with reference to FIGs. 3, 4A, 4B, 4C, 4D, 4E, 4F, and 4G.
  • Data merger 306 can support processing of data which has become too large for an individual computer. For example, data merger 306 can take fully processed data from multiple process databases 304 and can combine the data into a single, consolidated data set. Specifically, data merger 306 can merge multiple cubes contained in process database 304 to form a single consolidated ICDHC cube. Data merger 306 enables merger of data from, e.g., several days, weeks or months. Data merger 306 also enables merging of data from a large data set that is too large to be easily processed. The large data section set can be split into subsets and can then be processed separately.
  • Data merger 306 can merge a plurality of ICDHC cubes by, e.g., averaging, running rolling averages, and averaging with data from a former year to detect shifts from previous years.
  • Data merger 306 permits merger of such data without requiring processing of all data at once, e.g., to process 6 months of data, one need not load all 6 months worth of data, but can rather load, analyze and process each of the 6 months separately and then can merge the results to obtain a merged ICDHC. Thereafter, the merged ICDHC can be stored rather than storing the data of all 6 months. It will be apparent to those skilled in the art that several advantages are obtained from processing a plurality of cubes and then merging the separately processed results. Data merger 306 is described further with reference to FIG. 5 below.
  • Data analyzer 308 can enable near real-time (or real-time) in-depth reporting and search capabilities on the processed data set. Queries can be made against the ICDHC cube. Queries on, e.g., individual locations, target audience versus individual locations, and searches based on target audience, are supported. Data analyzer 308 enables near real-time (or real-time) demographic reporting on a per location basis by accessing a cube. Data analyzer 308 enables near real-time (or real-time) demographic reporting with, e.g., drill-down capability on specific target audiences on a per location basis by accessing a cube. Data analyzer 308 enables near realtime (or real-time) searching for locations that best match a specific target audience by accessing a cube. Data analyzer 308 allows users 110 to query the demographics for a particular location.
  • user 110 can query the breakdown by sex of client users accessing a location, i.e., e.g., a page on a web site on the Internet. Then user 110 can drill down within the percentage of females accessing a site, to determine, e.g. what percentage of the females are interested in sports as demonstrated by their behavior.
  • a location i.e., e.g., a page on a web site on the Internet.
  • user 110 can drill down within the percentage of females accessing a site, to determine, e.g. what percentage of the females are interested in sports as demonstrated by their behavior.
  • a user 110 can seek a target audience.
  • user 110 can seek a target audience of male, ages 2-17, and interested in yo-yos.
  • a query with such information could yield Internet locations most likely to yield the targeted male 2-17 yo-yo enthusiast audience. Then user 110 could use this information to target an advertisement directly to those client users by placing ads on the resulting locations.
  • the process database 304 component of core technology 304 is now described with reference to FIGs. 4A-4G.
  • FIG. 4A depicts a flow diagram 304A illustrating an example of loading user demographic data in an exemplary process database 304.
  • Flow diagram 304A illustrates how user-specific demographic information that raw data processor 302 has collected can be stored in process database 304.
  • flow diagram 304A depicts demographic data can be indexed by userlD as represented by processing step 402.
  • the user demographic data in processing step 402 is then added as indicated by line 404 (representing adding records) to user demographic data 406 of process database 304.
  • An example of user demographic data for client users follows. For a user user28, demographic data can include, e.g., gender is male, age is 18- 21, and occupation is student.
  • demographic data can include, e.g., gender is female, age is 2-17, and occupation is student.
  • the data records for users user28 and user29 can both be added to process database 304 as illustrated in FIG. 4A.
  • Such demographic data can be obtained from, e.g., a registration process.
  • FIG. 4B depicts a flow diagram 304B illustrating an example of loading user action data in an exemplary process database 304.
  • Flow diagram 304C represents an example technique by which user action records collected by raw data processor 302 can be stored in process database 304.
  • flow diagram 304B depicts user action data can be indexed by userlD, including location accessed (e.g., the URL requested), and action requested (e.g., a page view, or click through) as represented by processing step 408.
  • the user action data in processing step 408 is then added as indicated by line 410 (representing adding records) to user action data 412 of process database 304.
  • An example of user action data for client users follows.
  • a first record of user action data can include, e.g., location is URL requested, http://www.somewhere.com/sports/football.htm, action is pageview, requested by user, user28.
  • a second record of user action data can include, e.g., location requested is ad934 clicked on webpage http://www.somewhere.com/sports/football.htm, action is clickthrough, requested by user user28.
  • a third record of user action data can include, e.g., location is productl0035, action is purchased by user user28.
  • the data records for the three user actions performed by user user29 can be added to process database 304 as illustrated in FIG. 4B.
  • FIG. 4C depicts a flow diagram 304C illustrating an example of detecting and removing robots in an exemplary process database 304.
  • Some raw data 102 that was collected can contain non-representative data. For example, user actions requested by in-house system administrators monitoring a web site location, and requests from visits by computer robots can be logged as activity, but such requests are not really typical client user requests of the sort that are sought to be tracked. By analyzing actions, atypical user data activity can be detected and removed in order to yield more accurate resultant data.
  • flow diagram 304C depicts detection and removal of robot requests from user demographics 406 of process database 304, by detecting actions by robots by scanning action records 412.
  • Flow diagram 304C includes scanning records represented by line 414 of actions database 412.
  • Scanning records step 414 finds requests by robots or spiders, i.e., computer software agents creating by parsing routines and search engines, for example by using statistical methods.
  • step 416 can be performed.
  • robot action records can be removed as represented by line 418, from user demographics database 406 of process database 304, by removing users identified as non-representative of client users. It would be apparent to those skilled in the art that other means could be used to remove atypical users from further processing.
  • FIG. 4D depicts a flow diagram 304D illustrating an example of a technique for determining user behavioral data in an exemplary process database 304.
  • behavior-based demographics can be determined. For example, since user28 visited the page known as http://www.somewhere.com/sports/football.htm one can assume that user28 is interested in football.
  • flow diagram 304D depicts how an interest group definitions database 420 can be matched with user actions database 412, as indicated by line 422, and can be processed as shown by an interest group builder 424 step.
  • Interest group definitions 420 can include actions that a client user performs to be considered part of that particular interest group.
  • FIG. 4E depicts a flow diagram 304E illustrating an example of determining user profiles in an exemplary process database 304.
  • Profile definitions 428 are sets of demographics that a client user has in order to be considered part a particular profile. For example, in one embodiment the audience of client users can be divided into separate buckets for separate analysis.
  • flow diagram 304E depicts user demographics 406 and profile definitions 428 can be matched as indicated by line 430 as part of a profile determination step 432. As indicated by line 434, records of a user profiles database 436 can then be updated using the output of profile determination 432.
  • An example of profile determination of client users follows.
  • males and females may be analyzed separately.
  • client users of different age group ranges could be analyzed separately.
  • a third example could analyze groups divided up by sex and age into separate buckets, to obtain for example, separate analysis for males 2-17 and males 18-21, and females 2- 17 and females 18-21.
  • the resulting data records can update records, as shown in line step 434, of user profiles database 436 of process database 304, as illustrated in FIG. 4E.
  • User profiles 436 gives, e.g., several buckets of information for a given location, i.e., additional differentiation can be provided because data is stored in separate buckets.
  • additional drill-down analysis is enabled by users 110 analyzing the resultant data.
  • Another embodiment could use standard clustering techniques on a site or per location basis to better separate users into groups.
  • FIG. 4F depicts a flow diagram 304F illustrating an example technique of building inventory-centric cubes, such as, e.g., inventory-centric demographic hyper-cubes ICDHC, in an exemplary process database 304.
  • Flow diagram 304F illustrates how a profile builder 440 can interact with process database 304 to split out a plurality of files, collectively referred to as a cube 450.
  • Process builder 440 generates inventory-centric demographic hyper-cubes including data on multiple locations, and for each location tracks information such as, e.g., the profiles of the people in the various buckets such as the average demographics, and a timestamp indicating what data set the cube was generated from, i.e., including the effective date of the data.
  • flow diagram 304F depicts user demographics database 406 and user actions database 412 being matched, as indicated by line 438, by profile builder 440.
  • Profile builder 440 can convert information in the database into a cube 450.
  • Profile builder 440 can combine records at the same location with the same profile and output a cube 450.
  • the cube 450 can include the averaged demographic data for each of the location's profile types.
  • the timestamp file 448 can contain the timestamp of the data set that the cube 450 was generated from.
  • FIG. 4G depicts a flow diagram 460 illustrating an example process database 304 processing technique.
  • Flow diagram 460 begins with step 462 and can continue immediately with step 464.
  • process database 304 can load user demographics as described further already with respect to FIG. 4A, above. From step 464, flow diagram 460 can continue with step 466. In step 466, process database 304 can load user actions as described further already with respect to FIG. 4B, above. From step 466, flow diagram 460 can continue with step 468.
  • steps 464 and 466 can be performed in parallel. It should be appreciated that the order of the steps of the process can be varied within the spirit of the invention, as would be apparent to those skilled in the art, so long as any data required as an input to a process step is available at the time of performance of the given step, i.e., so long as there are no time dependencies requiring output of a particular step to be used as an input to the other step.
  • process database 304 can detect atypical users, such as, e.g., a robot, and can remove their actions as described further already with respect to FIG. 4C, above. From step 468, flow diagram 460 can continue with step 470.
  • atypical users such as, e.g., a robot
  • process database 304 can determine interest groups including, e.g., behavioral analysis as described further already with respect to FIG. 4D, above. From step 470, flow diagram 460 can continue with step 472.
  • step 472 process database 304 can determine user profiles as described further already with respect to FIG. 4E, above. From step 472, flow diagram 460 can continue with step 474.
  • step 474 process database 304 can build inventory-centric cubes as described further already with respect to FIG. 4F, above. From step 474, flow diagram 460 can end with step 476.
  • the data merger 306 component of core technology 304 is now described in detail with reference to FIG.5.
  • FIG. 5 depicts a flow diagram of data merger 306 illustrating an example of cube merger in an exemplary data merger of the present invention.
  • the example hyper-cube merger includes a demographic validity merger in the example embodiment.
  • data merger 306 enables merging a plurality of hyper-cubes as described briefly above with reference to FIG. 3.
  • FIG. 5 depicts an exemplary flow diagram 500 of exemplary data merger 306 including, e.g., a plurality of ICDHC cubes 450 (including, e.g., cubes 450a, 450b, 450c, 450d, 450e, and 450n), and a file of demographic dates 502.
  • Cubes 450 a-n could be generated from a plurality of different data sets.
  • Data sets of raw data 102 can usually include captured log data from, e.g., different days, weeks, months, or years.
  • the plurality of data sets could also be the result of dividing a large data set into multiple process databases 304 by splitting the large data set into several smaller ones.
  • a "demographic" can mean a particular demographic attribute (i.e., pure demographic or behavioral information) about a given user.
  • Demographic dates 502 can include a file containing a date for each demographic that specifies when that demographic was first valid. As demographics are deleted or changed, e.g., the values for the deleted or changed demographic may no longer be valid. Tracking the validity of each demographic is important to maintain data integrity. For each demographic, demographic dates file 502, a file including the date that the demographic was first valid can be maintained. Thus, some data may no longer be valid such as, e.g., where a demographic was changed or was deleted.
  • Flow diagram 500 includes merging the plurality of cubes 450 along with demographic dates 502 as represented by line 504 and a profile merging process in step 506 to obtain merged ICDHC cubes 510a, 510b, and 510c.
  • profile merger step 506 for each location, for each profile type, the portions of the profile that are still valid are merged.
  • Profile merger step 506 of the exemplary flow diagram of FIG. 5 of data merger 306 can read in the different cubes 450a-n and builds a new merged cube 510a-510c based on the demographic information in cubes 450a-n.
  • Profile merger 506 goes through each location and finds so-called "buckets" that are the same and then averages the values of the buckets. Averaging/merging of demographic bucket values can eliminate daily fluctuations in usage of locations.
  • merging can be done by averaging the demographics from each of the valid cubes.
  • rolling averages or averaging in the previous year's data can be done to take into account seasonality differences in data.
  • Profile merger 506 can account for the fact that not all profiles may be up to date by merging only valid portions. For example, if a demographic is added today, then profile merger 506 will not merge in yesterday's invalid values of the demographic.
  • Profile merger 506 of data merger 306 permits creation of, e.g., multi-day, multi-week, multi-month, and multi-year cubes and permits analysis to be performed on a combination of the cubes.
  • Merged cubes 51 Oa-c is an inventory-centric demographic hyper-cube that includes the location- specific demographic information generated by combining the demographics from cubes 450a-n.
  • a requirement of profile merger step 506 is that the cubes to be merged must have been generated in a consistent manner. While each location can use a different clustering of users, the particular clustering that each location uses must be used during the generation of all of the ICDHC cubes being merged. Consistent clusters enable the profiles to be merged accurately since a profile from one cube can be combined with other profiles that have the same cluster membership. For example, if for a certain location we split the users by sex, male and female, we would generate two profiles where the demographics were averaged for males in one profile, and for females in the other profile. When we later generated a second ICDHC, we would get another pair of profiles, one each for male and female.
  • Merged cube 51 Oa-c can then be analyzed by users 110 using user interface 108 in conjunction with data analyzer 308 of core technology 106.
  • users 110 can perform in-depth analysis in a rapid manner.
  • log file analysis provided only shallow, single level data, with limited searching capability and no ability to drill down.
  • Conventional data mining on the other hand enabled some in-depth analysis, but requires extensive time and costly processing, e.g., queries taking several minutes and performing analysis on high performance, expensive super computer machines might be needed.
  • inexpensive, in-depth analysis can be performed in a near real-time (or even real-time) manner.
  • Data analyzer 308 provides near real-time or better in-depth reporting and search capabilities on the processed data set. Queries on individual locations, target audience versus individual locations, and searches based on target audience are all supported. As a result, sales staff of users 1 10 can plan targeted ad campaigns and report on previous campaign results in near real-time. Demographic and behavioral information is available on a per location basis. Demographic and behavioral information is also available on target audience subsets at a location.
  • the architecture is configured to interact with confidential data of users 110, e.g., ISPs, etc., and can ensure that the information remains confidential.
  • no confidential information is stored on a web server that users 110 interact with.
  • the web server can forward all user 110 requests to the architecture that can handle all interactions with the confidential information.
  • the architecture can check and verify that users 110 are correctly logged in before handling a request.
  • the architecture can work with a firewall of a user 110.
  • the architecture can be behind the firewall where it can remain well protected. All interactions by user 110 with user interface 108 can be logged in and verified, in one embodiment of the invention.
  • User interface 108 can run on a computer as described further below, with reference to FIG. 6, following the description of FIGs. 8-11, and 7.
  • FIG. 7 is discussed further below, following the description of FIG. 11.
  • An exemplary reporting tool is now further described with reference to FIGs. 8-11.
  • FIG. 8 depicts block diagram 800 including an Internet browser 802, an activity monitor 804, a report server 806, and a report display 808.
  • Internet browser 802 monitors client user activity such as, e.g., observing a location browsed by a client user, and the locations linked to by that location.
  • client user activity such as, e.g., observing a location browsed by a client user, and the locations linked to by that location.
  • a typical client user could include, e.g., a producer of content researching an audience for the location browsed.
  • Other client users can include, e.g., an advertising sales person looking for a specific target audience and an advertiser looking for a specific target audience.
  • Activity monitor 804 can monitor the Internet browser 802, using, e.g., a separate browser window, a separate application or separate applet, a plug-in module installed into the browser, and a model inco ⁇ orated into the browser.
  • Activity monitor 804 can then forward a query including, e.g., the location browsed and the locations linked to by the location browsed, to the report server 806.
  • Report server 806 can then perform, e.g., processing functions, such as, e.g., generating a report for display by the report display 808.
  • Report server 806 can provide a demographic and behavioral breakdown of an audience by the location.
  • Report server 806 can also provide a targeted demographic and behavioral breakdown of an audience subset of the location.
  • Report server 806 can also provide historical traffic levels for the location.
  • Report server 806 can also provide a predicted future traffic availability for the location.
  • Report server 806 can also provide audience analysis for the location and the locations to which it links.
  • Report server 806 can then send, e.g., a report based on the query, to the report display 808 for display.
  • a report query can include the steps of sending the location from the internet browser 802, to activity monitor 804, and on to report server 806, sending a plurality of preferences of the requested information, generating a report on the report server 806 and receiving the report at the report display 808 for display, from the report server 806.
  • Report display 808 can display various statistical summary results of client user activity of the location browsed by the user of internet browser 802.
  • report display 808 can provide detailed tracked activity statistics summarized to the level of the location being viewed using the internet browser 802.
  • Report display 808 can display the results, e.g., using such tools as, e.g., a frame of the internet browser 802, a separate internet browser 802 window, and a separate applet.
  • the summarized user behavior information can be obtained using the processes outlined in the second cross-referenced application.
  • the demographic and behavioral analysis system architecture of the second cross-referenced application is reviewed below with reference to FIG. 1 of the present invention.
  • the demographic and behavior analysis system of FIG. 1 provides analyzed information tracking client usage on a per location basis, including, e.g., identifying, tracking and understanding user behavior on the Internet and in traditional stores.
  • FIG. 9 depicts an example report display 808 according to the present invention.
  • Report display 808 can include, e.g., a control panel portion 902 and a report panel portion 904.
  • Other panel portions can also be included in report display 808 such as, e.g., buttons, graphical charts, statistics including totals, subtotals, percentages, categories, demographics, target demographics, location identifiers, confidence ratings, filters, title bars, control, target, traffic and help icons.
  • FIG. 10 depicts an example embodiment of a demographic report 808a report display 808 according to the present invention.
  • Demographic report 808a illustratively depicts control panel 902a and report panel 904a.
  • Control panel 902a can enable selecting a location universal resource locator (URL) and controlling several example parameters associated with the report.
  • Report panel 904a can display the statistical usage information for the location and parameters chosen using control panel 902a.
  • URL location universal resource locator
  • Control panel 902a can include in one embodiment, title 1002, demographic report, one or more buttons, such as, e.g., target button 1004 (for specifying a target audience), traffic report button 1006 (for viewing traffic statistics), and help button 1008.
  • target button 1004 for specifying a target audience
  • traffic report button 1006 for viewing traffic statistics
  • help button 1008 for specifying a target audience
  • Control panel 902a can also include a copy button 1010 which can enable a user to store a location's URL for later use.
  • Control panel 902a can include a demographic filter 1012 field that can narrow the range of demographic attributes to be displayed in the report panel 904a.
  • demographics can be available for analysis for a given location.
  • demographic filter 1012 can narrow the displayed list to, e.g., only values of greater than 5%.
  • a simple data entry pull-down field permits the user to easily perform ad hoc trial and entry selections by merely selecting a value in filter field 1012 and then selecting the apply button 1014.
  • Control panel 902a can also include a location field, collectively illustrated as location fields 1016a and 1016b.
  • the location field 1016 a and 1016b can automatically be filled in, according to the current location being viewed by the user using input from the Internet browser 802 and activity monitor 804.
  • a location of interest to a user can be entered directly into, e.g., a location field 1016b, to view statistics on the location of interest.
  • the location of interest entered into field 1016b can automatically cause internet browser 802 to open a browser window to view content at that location.
  • Control panel 902a can include confidence fields 1018a and 1018b which can provide information regarding a confidence level in the data provided in report panel 904a for the given location in field 1016b.
  • confidence data can be based on the size of audience having visited the location. The data can be scaled or normalized based on other similar representative sites, based on pure observed page hits, or based on other criteria. For example, if only 10 total persons have viewed a given location, this would be indicative of a lower level of confidence in the demographic data provided, as compared to a location where 1000 client users have viewed the site.
  • confidence field 1018b can be used to indicate the confidence in the search results.
  • Control panel 902a can also include target demographics fields 1020 and 1022.
  • Target demographics field 1022 can display a list of targeted demographic attribute types, for which subtotal data can be provided.
  • Report display 808a includes no selected target audience. If a user wanted to target a specific type of audience, the user could select one of the listed demographic attributes ("demographics") in report panel 904a, such as, e.g., 1032a, 1034a, 1036a, and so on, through 1052a. In one embodiment of the invention, which ever targeted demographics were selected could be displayed in field 1022. Selected targeted demographics can also be deselected, i.e. removed from the targeted demographics list, by selecting a targeted demographic in field 1022, in one embodiment.
  • any and all fields can provide display functionality of report panel 904a and any and all fields can also be used to provide control functions of control panel 902a.
  • data associated with the targeted demographics selected can be displayed in field 1022 and thus field 1022 can be thought of as part of report panel 904a, as well as part of control panel 902a.
  • report panel 904a can include display of, e.g., other demographics 1024, which can in an embodiment of the invention, display demographics in column 1026, the percentage of client users tracked as visiting the location in columns 1028 and 1030.
  • Column 1028 can display the percentage information, e.g., in the form of a histogram, a bar graph, a pie graph, and other graphical, numerical or other iconic representation of relative value.
  • Column 1030 although illustrating a numerical representation of the value of the demographic percentages, can also illustrate the data in another form, such as, e.g., in the form of a histogram, a bar graph, a pie graph, and other numerical and other graphical or other iconic representation of relative value.
  • demographics can be grouped according to related types of demographics, such as, e.g., age based, or gender based, demographics can be listed together, and sorted for ease of comparative review.
  • gender demographics for male 1032a and female 1034a can be placed adjacent in order to permit improved readability and analysis, as shown of related percentage data 1032b, 1032c and 1034b, 1034c.
  • age based demographics 1036a through 1042a can be placed adjacent one another in one embodiment, and can be sorted in numerical order.
  • demographics groupings can be organized adjacent to one another for ease of viewing.
  • An example is the high level Internet domain of client users, such as, e.g., ".com,” “.gov,” “.net,” “.org.”
  • Other large demographic populations such as, e.g., client users from Internet service providers or online service providers, such as, e.g., America Online, i.e. aol.com can also be listed as a separate category.
  • gender based and age based demographics 1032a- 1042a can be placed at the top of the other demographics 1024 list for ease of reading.
  • the values of report panel 904a are automatically sorted before display by the value of column 1030.
  • the data is sorted including adjacent groupings such as gender and age based demographics groups 1032a-1042a, above.
  • the data can be sorted by the selected column.
  • the list of demographics are fixed and not necessarily in an alphabetical order.
  • a user can select one or more demographic categories and can then select the target audience button 1004.
  • the user can select a demographic group by another method of selection, such as, e.g., selecting a demographic and double clicking on it, or clicking with a right mouse button on a demographic and selecting target audience based on demographic, or selecting a demographic and dragging it to the target demographics fields 1020,1022, or selecting several demographics and similarly selecting a targeted audience.
  • a user selects a demographic group including all user activity at a /rec/woodworking location 1016b, that is from Internet domain ".com" 1044a. The results of such a selection are illustrated below with reference to FIG. 11.
  • FIG. 11 depicts an example targeted demographic report 808b report display 808 according to the present invention.
  • a user of the present invention could reach the screen as described, e.g., in the preceding paragraph.
  • Targeted demographic report 808b illustratively depicts control panel 902b and report panel 904b.
  • Control panel 902b can enable selecting a location universal resource locator (URL) and controlling several example parameters associated with the report.
  • control panel 902b includes only the title bar area including title 1102 and buttons 1104, 1106 and 1108.
  • control panel 902b can include any area of report 808b which can be used to control the data output in report panel 904b.
  • Report panel 904b can display the targeted demographic statistical usage information for the location and parameters chosen using control panel 902b.
  • report panel 904b can include portions of report 808b which are also included as portions of 902b.
  • Control panel 902b can include in one embodiment, title 1102, demographic report, one or more buttons, such as, e.g., target button 1104 (for specifying a target audience, used to reach targete demographic page 808b), traffic report button 1106 (for viewing traffic statistics), and help button 1108.
  • Control panel 902b can also include a copy button 1110 which can enable a user to store a location's URL for later use.
  • Control panel 902b can include a demographic filter 1112 field which can narrow the range of demographic attributes to be displayed in the report panel 904b.
  • demographics can be available for analysis for a given location.
  • demographic filter 1112 can narrow the displayed list to, e.g., only values of greater than 5%.
  • a simple data entry pull-down field permits the user to easily perform ad hoc trial and entry selections by merely selecting a value in filter field 1112 and then selecting the apply button 1114.
  • Control panel 902b can also include a location field, collectively illustrated as location fields 1116a and 11 16b.
  • the location field 1116 a and 1116b can automatically be filled in, according to the current location being viewed by the user using input from the Internet browser 802 and activity monitor 804.
  • a location of interest to a user can be entered directly into, e.g., a location field 1116b, to view statistics on the location of interest.
  • the location of interest entered into field 1116b can automatically cause internet browser 802 to open a browser window to view content at that location.
  • Control panel 902b can include confidence fields 1118a and 1118b which can provide information regarding a confidence level in the data provided in report panel 904b for the given location in field 1116b.
  • confidence data can be based on the size of audience having visited the location. The data can be scaled or normalized based on other similar representative sites, based on pure observed page hits, or based on other criteria. For example, if only 10 total persons have viewed a given location, this would be indicative of a lower level of confidence in the demographic data provided, as compared to a location where 1000 client users have viewed the site.
  • confidence field 1118b can be used to indicate the confidence in the search results.
  • Control panel 902b can also include target demographics fields 1120 and 1122a through 1122d.
  • Target demographics field 1122a-1122d can provide similar column headings for targeted demographics 1144a, 1144b, 1144c and 1144d for targeted demographic ".com" and can display data for the targeted demographic attribute including subtotaled data and graphical or numerical information about the targeted demographic.
  • Demographic data field 1144c can indicate the percentage of total users for the location which fall within the targeted demographic group.
  • Target demographic data field 1144d can include the percentage of total users for the location which fall also fall within the targeted demographic group 1144a which is, in this case, the same as the data in field 1144c.
  • Report display 808b includes the "com" target audience. If a user wanted to target a specific type of audience, the user could select one of the other listed demographic attributes ("demographics") in report panel 904b, such as, e.g., 1132a, 1134a, 1136a, and so on, through 1152a, in addition to targeted demographic 1144a.
  • demographics such as, e.g., 1132a, 1134a, 1136a, and so on, through 1152a, in addition to targeted demographic 1144a.
  • targeted demographics previously selected can be displayed below field 1122a-l 122d.
  • Selected targeted demographics can also be deselected, i.e. removed from the targeted demographics list, by deselecting a selected targeted demographic in field 1122, in one embodiment.
  • any and all fields of report 808b can provide display functionality of report panel 904b and any and all fields of report 808b can also be used to provide control functions of control panel 902b.
  • data associated with the targeted demographics selected can be displayed below field 1122 and thus field 1122 can be thought of as part of report panel 904b, as well as part of control panel 902b.
  • report panel 904b can include display of, e.g., other demographics 1124, which can in an embodiment of the invention, display, e.g., demographics in column 1126, the percentage of client users tracked as visiting the location (including an additional indication of those users who also are members of the targeted demographic or demographics) in columns 1128, 1130 and 1131.
  • Column 1128 can display the percentage information indicating the portion of users in the demographic only, and the portion of users in both the demographic and the targeted demographics.
  • the data can be provided in two separate forms (not shown) or integrated (as shown) using different colors , e.g., in the form of a histogram, a bar graph, a pie graph, and other graphical, numerical or other iconic representation of relative value.
  • Column 1130 although illustrating a numerical representation of the value of the demographic percentages, can also illustrate the data in another form, such as, e.g., in the form of a histogram, a bar graph, a pie graph, and other numerical and other graphical or other iconic representation of relative value.
  • Column 1131 can include similar data/information showing the percentage of users of the location 1116b which are members of demographic 1126 and targeted demographic 1144a.
  • a percentage of total client users of location 1116b is shown numerically in field 1132c and are graphed as part (the longer histogram) of field 1132b.
  • a percentage of the total users of location 1116d meeting both targeted and group demographics is listed in field 1132c and is graphed as the shorter graph in field 1132b.
  • the graphical representations of column 1 128 can include multiple colors, such as, e.g., blue for the shorter and yellow for the longer bar in field 1132b.
  • demographics can be grouped according to related types of demographics, such as, e.g., age based, or gender based, demographics can be listed together, and sorted for ease of comparative review.
  • gender demographics for male 1132a and female 1 134a can be placed adjacent in order to permit improved readability and analysis, as shown of related percentage data 1132b, 1132c, 1132d and 1134b, 1134c, and 1134d.
  • age based demographics 1136a through 1142a can be placed adjacent one another in one embodiment, and can be sorted in numerical order.
  • Other demographics groupings can be organized adjacent to one another for ease of viewing.
  • An example is the high level Internet domain of client users, such as, e.g., ".com,” “.gov,” “.net,” “.org.”
  • Other large demographic populations such as, e.g., client users from Internet service providers or online service providers, such as, e.g., America Online, i.e. aol.com can also be listed as a separate category in field 1148a, for example. Where individual ".com” or “.net” types are listed, other ".com” 1150b and other ".net” 1152b demographics can also be provided.
  • gender based and age based demographics 1132a-1142a can be placed at the top of the other demographics 1124 list for ease of reading.
  • the values of report panel 904b are automatically sorted before display by the value of column 1130.
  • the data is sorted including adjacent groupings such as gender and age based demographics groups 1132a- 1142a, above.
  • the data can be sorted by the selected column.
  • the list of demographics are fixed and not necessarily in an alphabetical order.
  • a user can select one or more demographic categories and can then select the target audience button 1104.
  • the user can select a demographic group by another method of selection, such as, e.g., selecting a demographic and double clicking on it, or clicking with a right mouse button on a demographic and selecting target audience based on demographic, or selecting a demographic and dragging it to the target demographics fields 1120,1 122 and 1124, or selecting several demographics and similarly selecting a targeted audience.
  • a user can select multiple demographics for target by clicking on the demographics in column 1126 to select them. If multiple members of a mutually exclusive group are selected, they are logically or'ed together and and'ed with the remaining target demographics.
  • a user can also select to display a traffic report by selecting button 1106.
  • select can include the use of, e.g., a mouse pointer, button, touchpad, pointing device, touchscreen, key, cursor or other known selection device.
  • a user selects to display traffic report statistics using button 1106. The results of such a selection are illustrated below with reference to FIG. 7.
  • FIG. 7 depicts an exemplary traffic report 808c report display 808 illustrating an example range of traffic summarized on an example weekly basis in an embodiment of the present invention.
  • Traffic report 808c illustratively depicts control panel 902c and report panel 904c.
  • Control panel 902c can enable selecting a location universal resource locator (URL) and controlling several example parameters associated with the report.
  • control panel 902c includes only the title bar area including title 702 and buttons 704, 706 and 708.
  • control panel 902c can include any area of report 808c which can be used to control the data output in report panel 904c.
  • Report panel 904c can display the targeted demographic statistical usage information for the location and parameters chosen using control panel 902c.
  • report panel 904c can include portions of report 808c which are also included as portions of 902c.
  • Control panel 902c can include in one embodiment, title 702, traffic report, one or more buttons, such as, e.g., demographics button 704 (for viewing a demographic report 808a) and help button 708.
  • Control panel 902c can include time span field 1112 which can narrow the range of traffic to be displayed in the report panel 904c.
  • traffic can be totaled, e.g., in daily, weekly, monthly, yearly increments.
  • a simple data entry pull-down field permits the user to selecting a time span in field 1 112 and then select the apply button 714.
  • a starting date 706 and ending date 710 for the range can also be selected.
  • Calendar buttons 705 and 709 permit graphical selection of start and end dates, in one embodiment.
  • Control panel 902c can also include a location field, collectively illustrated as location fields 716a and 716b.
  • location fields 716a and 716b can automatically be filled in, according to the current location being viewed by the user using input from the Internet browser 802 and activity monitor 804.
  • a location of interest to a user can be entered directly into, e.g., a location field 716b, to view statistics on the location of interest.
  • the location of interest entered into field 716b can automatically cause internet browser 802 to open a browser window to view content at that location.
  • Control panel 902c can also include traffic fields 720 and 722a through 722d.
  • Traffic fields include an include field 718, a date field 720, a traffic field 722, total fields 724 and 726.
  • Each time span can then be provided, in the illustrated example, each week, shown in fields 728b through 736b, and can be selected for inclusion in a separate total field such as, for example, total 738 and 740, using selection fields 728a-736a.
  • Weekly data can appear in fields 728c through 736c.
  • Partial time windows can be indicated in one embodiment as shown with text 742. Timestamp information can be included, such as, e.g., report time 744.
  • the date range is inclusive, based on US date systems, other date systems can be used.
  • scroll bars can appear.
  • a total field can be listed at the top and bottom of a long list as shown.
  • audience analysis can have its own control panel 902 and report panel 904.
  • the audience analysis control panel can include a method for selecting an analysis type, such as, e.g., a correlation or a bayesian.
  • the audience analysis report panel can include, e.g., the locations that score highest (or lowest) using the currently selected analysis method.
  • the locations can be, e.g., the list of locations linked to by the browsed location, or the list of all locations.
  • FIG. 6 depicts an exemplary computer system.
  • FIG. 6 illustrates an example computer 600 in a preferred embodiment is a personal computer (PC) system running an operating system such as Windows 98, OS/2, Mac/OS, or UNIX.
  • PC personal computer
  • the invention is not limited to these platforms. Instead, the invention can be implemented on any appropriate computer system running any appropriate operating system, such as Solaris, Irix, Linux, HPUX, OSF, Windows 98, Windows NT, OS/2, Mac/OS, and any others that can support Internet access.
  • the present invention is implemented on a computer system operating as discussed herein.
  • An exemplary computer system, computer 600 is shown in FIG. 6.
  • client workstations such as client workstations, proxy servers, network communication servers, remote access devices, client computers, server computers, routers, web servers, data, media, audio, video, telephony or streaming technology servers could also be implemented using a computer such as that shown in FIG. 6.
  • the computer system 600 includes one or more processors, such as processor 602.
  • the processor 602 is connected to a communication bus 604.
  • the computer system 600 also includes a main memory 606, preferably random access memory (RAM), and a secondary memory 608.
  • the secondary memory 608 includes, e.g., a hard disk drive 610 and/or a removable storage drive 612, representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, etc.
  • the removable storage drive 612 reads from and/or writes to a removable storage unit 614 in a well known manner.
  • Removable storage unit 614 also called a program storage device or a computer program product, represents a floppy disk, magnetic tape, compact disk, etc.
  • the removable storage unit 614 includes a computer usable storage medium having stored therein computer software and/or data, such as an object's methods and data.
  • Computer 600 also includes an input device such as (but not limited to) a mouse 616 or other pointing device such as a digitizer, and a keyboard 618 or other data entry device.
  • an input device such as (but not limited to) a mouse 616 or other pointing device such as a digitizer, and a keyboard 618 or other data entry device.
  • Computer 600 can also include output devices, such as, e.g., display 620.
  • Computer 600 can include input/output (I/O) devices such as, e.g., network interface cards 622 and modems 150 and 152.
  • I/O input/output
  • Computer programs also called computer control logic
  • object oriented computer programs are stored in main memory 606 and/or the secondary memory 608 and/or removable storage units 614, also called computer program products.
  • Such computer programs when executed, enable the computer system 600 to perform the features of the present invention as discussed herein.
  • the computer programs when executed, enable the processor 602 to perform the features of the present invention. Accordingly, such computer programs represent controllers of the computer system 600.
  • the invention is directed to a computer program product comprising a computer readable medium having control logic (computer software) stored therein.
  • control logic when executed by the processor 602, causes the processor 602 to perform the functions of the invention as described herein.
  • the invention is implemented primarily in hardware using, e.g., one or more state machines. Implementation of these state machines so as to perform the functions described herein will be apparent to persons skilled in the relevant arts.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Transfer Between Computers (AREA)
EP00939705A 1999-06-09 2000-06-09 System, verfahren und computer-programm-produkt zum erstellen eines demographischen hyperwürfels mit einem inventar als zentrum Withdrawn EP1277141A2 (de)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US379587 1982-05-19
US32889899A 1999-06-09 1999-06-09
US328898 1999-06-09
US37958799A 1999-08-24 1999-08-24
PCT/US2000/015823 WO2000079449A2 (en) 1999-06-09 2000-06-09 System, method and computer program product for generating an inventory-centric demographic hyper-cube

Publications (1)

Publication Number Publication Date
EP1277141A2 true EP1277141A2 (de) 2003-01-22

Family

ID=26986560

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00939705A Withdrawn EP1277141A2 (de) 1999-06-09 2000-06-09 System, verfahren und computer-programm-produkt zum erstellen eines demographischen hyperwürfels mit einem inventar als zentrum

Country Status (3)

Country Link
EP (1) EP1277141A2 (de)
AU (1) AU5475300A (de)
WO (1) WO2000079449A2 (de)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2367464A (en) * 2000-07-19 2002-04-03 Hewlett Packard Co Web traffic analysis
DE10139787A1 (de) * 2000-09-25 2002-04-18 Mythink Technology Co Ltd Verfahren und System zur Echtzeitanalyse und Echtzeitverarbeitung von Daten über das Internet
GB2370888B (en) * 2001-01-09 2003-03-19 Searchspace Ltd A method and system for combating robots and rogues
BE1014347A3 (fr) * 2001-08-22 2003-09-02 Mythink Technologie Co Ltd Methode et systeme d'analyse et de traitement en temps reel de donnees sur l'internet.
FR2829258A1 (fr) * 2001-09-03 2003-03-07 Profile For You Ltd Procede et systeme de surveillance et d'analyse de la frequentation d'une ou plusieurs plateformes de mise a disposition d'informations
US8812012B2 (en) 2008-12-16 2014-08-19 The Nielsen Company (Us), Llc Methods and apparatus for associating media devices with a demographic composition of a geographic area
AU2009345651B2 (en) 2009-05-08 2016-05-12 Arbitron Mobile Oy System and method for behavioural and contextual data analytics
CA3020551C (en) 2010-06-24 2022-06-07 Arbitron Mobile Oy Network server arrangement for processing non-parametric, multi-dimensional, spatial and temporal human behavior or technical observations measured pervasively, and related method for the same
US8340685B2 (en) 2010-08-25 2012-12-25 The Nielsen Company (Us), Llc Methods, systems and apparatus to generate market segmentation data with anonymous location data
US10154076B2 (en) 2011-10-11 2018-12-11 Entit Software Llc Identifying users through a proxy
US9363323B2 (en) 2013-08-29 2016-06-07 Paypal, Inc. Systems and methods for implementing access control based on location-based cookies

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0079449A2 *

Also Published As

Publication number Publication date
WO2000079449A2 (en) 2000-12-28
WO2000079449A8 (en) 2002-11-07
AU5475300A (en) 2001-01-09

Similar Documents

Publication Publication Date Title
US9514479B2 (en) System and method for estimating prevalence of digital content on the world-wide-web
Eirinaki et al. Web mining for web personalization
US8356097B2 (en) Computer program product and method for estimating internet traffic
US10447564B2 (en) Systems for and methods of user demographic reporting usable for identifiying users and collecting usage data
US7890451B2 (en) Computer program product and method for refining an estimate of internet traffic
US20080183664A1 (en) Presenting web site analytics associated with search results
US20030187677A1 (en) Processing user interaction data in a collaborative commerce environment
EP1061465A2 (de) Verfahren und Apparat zum Bereitstellen eines kostenreduzierten Online-Dienstes und von adaptiv gezielten Werbungen
US20150154632A1 (en) Determining a number of view-through conversions for an online advertising campaign
CN101321138A (zh) 用另一个广告替换一个广告的网络设备
CA2579312A1 (en) Methods and apparatus for automatic generation of recommended links
CA2381719A1 (en) Distributing promotional and advertising material based upon internet usage
US20100049791A1 (en) System and method of associating events with requests
US20060212349A1 (en) Method and system for delivering targeted banner electronic communications
WO2001044984A1 (en) Internet tool
WO2000079449A2 (en) System, method and computer program product for generating an inventory-centric demographic hyper-cube
US7277926B1 (en) Business method and user interface for representing business analysis information side-by-side with product pages of an online store
JP6329015B2 (ja) 広告配信サーバ
JP4025489B2 (ja) ポータルサイト提供端末装置
AU2001296367B2 (en) System and method for facilitating information requests
JP2000242626A (ja) 電子商取引履歴分析方法
Alves et al. Clickstreams, the basis to establish user navigation patterns on web sites
JP2003015996A (ja) サイト閲覧状況情報収集方法、この方法に用いられるファイル、及び、サイト閲覧状況情報収集システム
WO2000065422A2 (en) Method and system for facilitating establishment of economic marketplaces with improved content
JP2002073458A (ja) 履歴情報収集システム、履歴情報収集方法、閲覧用端末、履歴情報収集サーバ及び記録媒体

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020104

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20021231