WO2001054480A2 - System and method for inferring demographic profiles - Google Patents

System and method for inferring demographic profiles Download PDF

Info

Publication number
WO2001054480A2
WO2001054480A2 PCT/US2001/003214 US0103214W WO0154480A2 WO 2001054480 A2 WO2001054480 A2 WO 2001054480A2 US 0103214 W US0103214 W US 0103214W WO 0154480 A2 WO0154480 A2 WO 0154480A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
demographic
computer
activity
unknown
Prior art date
Application number
PCT/US2001/003214
Other languages
French (fr)
Inventor
Daniel B. Jaye
Original Assignee
Engage Technologies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Engage Technologies filed Critical Engage Technologies
Priority to AU2001238002A priority Critical patent/AU2001238002A1/en
Publication of WO2001054480A2 publication Critical patent/WO2001054480A2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • This application relates to the field of user behavioral data and more particularly to the field of Internet browsing habits and the purchasing and demographic user characteristics.
  • Each Web site has associated therewith a unique identifier, that can be represented as a URL (Uniform Resource Locator).
  • the user can connect to the site by, for example, typing the URL (such as "www.yahoo.com") or selecting a site from a predefined menu.
  • the user can review the content presented on the site, and can provide information and instructions for directing the site to provide services and goods.
  • the providers of such commercial Web sites are generally interested in the demographic profile or attributes of a user in order to be better able to target advertising, products and/or services to specific users or groups of users.
  • the user may provide his or her demographic profile by responding to prompts or by filling out a registration form.
  • the user demographic profile may include gender, age, marital status, education, profession, income and/or the geographic region as indicated, for example, by the user's ZIP code.
  • a number of Web sites providing information, goods and services may attract a cross-section of users sharing at least demographic characteristics. For example, the Web site "www.bettycrocker.com" may be preferred by women between the ages of 30 and 55 and having an interest in the culinary arts.
  • the Web site "www.harley-davidson.com” may preferably be visited by men between the age of 35-44 interested in the outdoors.
  • Web sites may be frequented by many different users, many of which share certain demographic characteristics, leading to a distribution of user profiles.
  • a user may be reluctant to provide detailed personal information, in which case the Web site providers have employed other means to generate an approximate profile of the user, for example, based on the user's click stream.
  • these techniques may be employed in a way that refrains from associating user identity information, such as name and address information, with information descriptive of the user's profile.
  • the problem with this approach is that the provider basically has to start with a "blank" sheet, i.e., zero data, for each new user and build the profile from sometimes random click stream information. This process may be slower than desired.
  • a web site that experiences sporadic but intense user contact such as a web site that sells toys during the Holiday season, may need profile information more quickly than presently available. It is therefore desired to have that methods and systems for providing a demographic user profile more rapidly, in particular for new users who have not visited a Web site before. It is further desirable to provide a measure of the demographic user profile based on the Web site visited by the user and the user's click stream.
  • the systems and methods described herein include systems and methods for inferring the demographic properties of users visiting a web site.
  • systems are provided wherein a set of discriminating web sites are identified. These discriminating web sites are identified by examining the browsing activity of certain known users that have associated with them profile information that is representative of demographic information for these users. For these known users a set of discriminating web sites can be identified, wherein a discriminating web site is understood as being prototypical of the characteristics of these known users.
  • the demographic information of these discriminating web sites may be employed to infer the demographic properties of other users visiting these web sites.
  • the invention provides methods for generating a demographic profile for an unknown user, comprising recording computer activity of the unknown user in response to the information provided to the user by at least one of the digital processors, and combining the recorded computer activity of the unknown user with a computer demographic score of the at least one of the digital processors, the computer demographic score being based on demographic information obtained from known users, to generate the demographic profile of the unknown user.
  • the invention may provide a process for generating a demographic profile of an unknown user accessing a server having a server profile.
  • the process may include recording computer activity by the unknown user in response to information provided by the server; determining whether the recorded computer activity by the unknown user is greater than a predetermined activity value, and combining the recorded computer activity by the unknown user with the server profile to form the unknown user demographic profile.
  • the recorded computer activity by the unknown user can be checked to determine if it is less than a predetermined activity value, and if so, the process may set the unknown user demographic profile equal to the server profile.
  • a weighting function may be applied to the recorded computer activity by the unknown user based on a duration of the computer activity. The applied weighting function may be selected to reduce the significance of a computer activity having a long duration.
  • the systems and methods described herein thus provide a process for generating a user demographic profile of an unknown user accessing at least one server.
  • These processes may comprise identifying a user accessing the at least one server and recording user activity on the server, and determining, as a function of the user identification, whether the user is an unknown user or a known user.
  • the process # may monitor at least a duration of the user activity and assign a demographic score to the unknown user based on the monitored user activity and a server profile of the at least one server accessed by the unknown user.
  • the demographic score may be combined with an existing demographic score of the unknown user, and the demographic profile of the • unknown user may be set equal to the combined demographic score.
  • the invention may be realized as computer programs having instructions for causing a computer to record computer activity of a user responding to information provided by at least one of the computers.
  • the program may identify the user as one of a known or an unknown user.
  • the program may compare the computer activity, for an unknown user, with a predetermined activity and assign to the unknown user a demographic score which is based on the computer activity and a computer demographic profile characteristic of the computer if the computer activity exceeds the predetermined activity, and on the computer demographic profile alone, if the computer activity is less than the predetermined activity.
  • the program may then combine the demographic score with another existing demographic score for the same unknown user generated during a previous session by the unknown user with the same computer or with another computer; and provide from the combination of the demographic scores a user demographic profile of the unknown user.
  • FIG. 1 is a functional block diagram a computer network
  • FIG. 2A is a flow diagram of a process for inferring an unknown user profile according to the invention.
  • FIG. 2B is a flow diagram of a process for updating a URL profile according to the invention
  • FIG. 3 is a data flow diagram of the process of FIGS. 2A and 2B;
  • FIG. 4 shows the organization of a hash table for known users
  • FIG. 5 is a flow chart for computing URL profiles of known users
  • FIGS. 6A-6C and 7 are detailed flow charts of the process of FIG. 5
  • FIG. 8 A and 8B show the organization of a hash table for unknown users
  • FIG. 9 is a flow chart for computing URL profiles of unknown users.
  • a demographic profile of an unknown user (hereinafter referred to as "unknown user profile”) interacting with one or more computers, such as Web servers, may be compiled based on the computer activity of the unknown user and a computer demographic profile (hereinafter referred to as "URL profile”) associated with the computer or Web server accessed.
  • the URL profile may be derived from and updated in response to the browsing activity of known users and/or from demographic information provided by the known users.
  • Known users can be distinguished from unknown users by the cookies exchanged between the user's PC and the Web server.
  • the demographic profile of the known users can be based on their activity on several Web servers.
  • a complete set of demographic information may be available for some users. These users will hereinafter be referred to as "known” users.
  • the browsing activity of the known users is obtained from the Web sites (hereinafter referred to as "URL") they have visited over a period of time along with the number of visits (hits) and the browsing duration for each URL.
  • This information is used to identify certain URLs, which are prototypical of the characteristics of these users. Such URLs are referred to as “discriminating" URLs.
  • URLs are prototypical of the characteristics of these users.
  • URLs are referred to as "discriminating" URLs.
  • users whose demographic characteristics are not known are referred to as "unknown” users.
  • the process and system of the invention attempts to establish a reliable demographic profile of an unknown user based the browsing activity of the unknown user and the demographic properties of the discriminating URLs visited.
  • a measure in the form of a demographic score vector (dscore) is assigned to each URL.
  • Each element of this vector represents an attribute value of the demographic information.
  • This dscore element is a representation of the probability that a user visiting this URL can be associated with that particular attribute value.
  • the demographic profile of a user can also be expressed as a dscore vector.
  • the dscore vector for either the user or the URL may include the following data: Gender (male/female)
  • Profession (broad categories, e.g.: Technical, Professional, Homemaker, student, Trade, Management, Sales and Marketing, Service).
  • An exemplary URL dscore may have the form shown in Table 1 :
  • a part 10 of a computer network includes a profile server 12, personal computers (PC) 14, 15, and Web servers 16 and 17.
  • the PCs 14 and 15 may be any one of a variety of conventional, commercially available, hardware and software combinations configured to access Internet servers by any one of a variety of suitable means.
  • the Profile Server 12 and the Web servers 16 and 17 may also be any one of a variety of conventional, commercially available, hardware and software combinations configured to provide conventional Internet services to users.
  • the conventional server software is supplemented to provide the functionality discussed herein.
  • the PCs 14, 15 and the servers 12, 16 and 17 communicate with each other via communication links 22, 24-27 which are all connected to a communication channel 21, such as the Internet.
  • each of the servers 16, 17 has a unique identification, generally referred to as a URL (Uniform Resource Locator) serving as the server's network address.
  • the Web server 16 may, for example, provide advertisements to PC 14 or 15 accessing the server 16. Alternatively, the Web server 16 may provide the advertisements, for example, in banner form to another server 17 with which the PC 14 is communicating.
  • the Web servers 16, 17 may also be used in electronic commerce applications offering goods and services to the users 14, 15, as is known in the art.
  • One of the servers 12 is designated as a "Profile Server" 12 capable of monitoring user interactions with the Web servers 16, 17.
  • the Profile Server 12 can communicate with one or more Web servers 16, 17 which run profiling software that allows the Web servers 16, 17 to monitor and record user input in the PCs 14, 15, such as the users' click stream.
  • the Profile Server 12 may monitor the user input in real time or download user input data offline.
  • the Web servers 16, 17 can also query the Profile Server 12 to obtain information about user interaction with their own Web server 16 and 17, respectively, or with another Web server or other Web servers.
  • the Profile Server 12 can control and limit the user information made available to the Web server 16 or 17 from another Web server.
  • an unknown user logs on to a Web site, step 32.
  • the unknown user is most likely drawn to this Web site because of its displayed or assumed content.
  • the Web server of the Web site returns to the user's PC a cookie ID containing at least the URL of the Web server and a time stamp recording the time and duration of the user's click on information provided by the Web server, step 34.
  • a "cookie” is generally referred to as a packet of information sent by a server to a browser and then returned by the browser to the server each time the browser accesses the server. Cookies may contain any information the server chooses and are used to maintain state between otherwise stateless transactions, such as HTTP transactions.
  • this is used to authenticate or identify a registered user of a web site without requiring the user to sign in again every time he/she accesses that site.
  • Other uses are, e.g. maintaining a "shopping basket" of goods selected for purchase during a session at a site, site personalization (presenting different pages to different users), and/or tracking a particular user's access to a site.
  • the software running on the visited Web server records the unknown user's browsing activity on that Web site, step 36.
  • the browsing activity is likely to reflect the user's interests and can be used to compile a user's interest score (iscore) which may be retained in a database of the Profile Server 12 or the visited Web server, where the iscore can be updated when the user logs on to the Web site again.
  • iscore user's interest score
  • the compilation of interest scores is described, for example, in the above mentioned and commonly assigned US patent application entitled "SYSTEM AND METHOD FOR BUILDING USER PROFILES".
  • the software computes the demographic profile of the unknown user (unknown_user_dscore) based on the user's browsing activity monitored in step 36 and the dscore of the visited Web site (URL_dscore), step 38.
  • the URL_dscore is obtained based on information of known users having an established user profile for the visited Web site using process 50, which will be described below with reference to FIG. 2B.
  • the unknown_user_dscore is then stored in a history table to be used for a subsequent login of the same unknown user, step 42. Referring now to FIG. 2B, process 50 computes and updates the URL_dscores of discriminating Web sites (URLs) based in the browsing activity of known users.
  • the software running on the visited Web server records the known user's browsing activity on that Web site and updates a browsing log of the known user, step 56.
  • the updated browsing log is then compared with existing records of known users. If the updated browsing log is statistically identical to the existing records, then the URL_dscore of the Web site remains unchanged. If, on the other hand, the updated browsing log is statistically different from the existing records, then the URL_dscore of the Web site is updated to reflect the altered browsing patterns of known users, step 58.
  • the updated URL_dscore can then be stored in a file or database, step 60, and is used to compute the unknown_user_dscore in step 38 of FIG. 2 A, as described above.
  • the log record is processed by log record processing means, step 64, wherein the users are segregated according to the received cookies into known users for which a demographic profile has been established, and unknown users for which a dscore is to be computed using the system and method of the invention.
  • Interest scores (iscores) based on the user browsing activity may also be compiled in step 64, as mentioned above.
  • the records 78 of known users will be used for the URL_dscore computation 80 to update a URL file of known users 82 based on the new browsing activity of the known users.
  • the process 70 also maintains a database 84 with the dscores of other Web sites (URLs) profiled by the Profile Server 12.
  • the database 84 is also updated as needed or on a periodic basis.
  • the updated URL dscores are then supplied to the log record processing means 64 which computes a session dscore of the unknown user for the current session on the visited Web site (URL) based on the user's browsing activity, in particular the browsing duration, and the URL_dscore of the visited Web site, step 66.
  • the session dscore is then merged with dscores compiled for the unknown user during previous sessions and stored in a history file 74, providing a new dscore of the unknown user, step 72.
  • the dscore and other browsing attributes, such as the URLs of the visited Web sites and browsing duration, in the history file may also be updated.
  • the history file of an unknown user may have the following form:
  • a flow chart 200 describes the browsing activity of an unknown user.
  • the time duration for which the unknown user browses a Web site (URL) is recorded, step 202.
  • Duration counts have proven to represent a better measure for calculating an unknown user's dscore than merely the number of visits (hits).
  • Storing all browsing activity, including all hits, for all visits of unknown users to all URLs is computationally extensive, since both the number of unknown users and discriminating URLs can be very large.
  • a non-linear function is applied to the duration values which caps large duration values, step 204 (see Eq. 12 in the Appendix).
  • the value of this non-linear function is selected to increase rapidly for small but significant values of duration, but remains constant for large duration values.
  • the non-linear function also weights the discriminating ability of the URL browsed so that the dscores of the URLs that are more discriminating, are scaled by a larger amount.
  • step 206 If it is decided in step 206 that the browsing activity of the user is "significant", i.e. more extensive than a predetermined threshold value, then the duration count (DC) is computed and updated, step 210, using Eq. 2 of the Appendix, and the dscore of the unknown user for the URL browsed is computed for each attribute value from the weighted and aged duration count and the URL dscore of that URL, step 212.
  • the duration counts are "aged” in a manner known in the art to reduce the effect of old duration counts on the new dscore values.
  • the dscore values of all URLs browsed by the unknown user are computed, step 214, using Eqs. 12 and 13 given in the Appendix.
  • the dscore value of the unknown user will be more heavily weighted towards the more discriminating URL (see Table A5 in the Appendix).
  • the new dscore values computed in step 214 are then merged with the old dscore values of the unknown user stored in the history table 74, step 216.
  • the old duration counts and the old dscore values are first aged, using Eqs. 11-18 listed in the Appendix.
  • the merged and aged dscore values represent the current dscore values of the unknown user, step 218.
  • the predicted dscore for the unknown user is set equal to a typical user profile for the URL browsed, step 208.
  • the demographic profile of the user showing insignificant browsing activity will be set to the profile shown, for example, in Table 1 if the user accesses this exemplary Web site.
  • Log processing process 64 has a list of cookies of the known users. If the cookie exists in the list, all records in the session will be identified as those of known users. This comparison could be made more efficient by hashing the cookies as illustrated in FIG. 4.
  • a hash table 90 may be implemented in the form of an array of pointers 62. All pointers in the hash table initially have NULL values.
  • a known user cookie ID 94, 96, 98 hashes to a hash bucket 96', 98'
  • the array is indexed by bucket number and assigned a corresponding pointer, thereby linking the cookie to the pointer.
  • the number of hash buckets 96', 98' may be set equal to the number of known users.
  • the hash table 90 of known users may, in some embodiments, always be present in a memory. An incoming hashed cookie is checked whether it points to an existing hash bucket. With a suitable hash function and a suitable number of buckets, known and unknown users may be identified after a very few comparisons.
  • a flow chart 100 shows the process step 80 of FIG. 3 for computing the URL dscores for known users.
  • a duration count (DC) is computed for each user and each visited Web site (URL), step 102.
  • the duration count represents the time during which a user interacts with the URL.
  • an activity count is computed which is then "aged” to reflect the elapsed time since the user's last access to the URL, step 104.
  • the new dscore of the URL for known users is then computed from the aged activity count, step 106.
  • the URL_dscores are then updated and the temporary files created during the process 100 are deleted, step 180.
  • FIGS. 6A - 6C illustrate the process 100 of Fig. 5 in more detail.
  • the read pointer of the first record in the known user log 78 and of the first file in the URL_known user file 82 are initialized, step 112.
  • the read pointer of the first record in the known user log 78 and of the first file in the URL_known user file 82 are initialized, step 112.
  • URL_known user file 82 is already sorted according to the URLs, for example in ascending order.
  • the first record in the known user log 78 is read, step 114, and that URL is compared with the URL in the current record of the URL nown user file 82, step 116.
  • the one of the following three process steps is executed: If it is determined in step 116 that the URL in the user browsing log record is less than the URL in the URLJcnown user matrix file 82, a buffer for storing duration counts of all users is allocated and initialized to zero, step 118.
  • the records for that URL are read one by one from the user browsing log record, step 120, and the duration is normalized (see Eq. 1 in the Appendix) to give duration counts, step 122.
  • the duration counts are then aged up to the current date-time (Eq. 5), step 124, and added to the duration counts of that user in the buffer (Eq. 2), step 126.
  • the duration counts of all known users are normalized to produce activity counts (Eq. 3), step 128.
  • the dscores for the respective URL are computed using the known user dscores and the accumulated activity counts (Eq. 7), step 130.
  • the dscores so computed show that the URL is discriminating, then the dscores of the URL are appended to and inserted in a temporary file 'insert_file', and the activity counts for all users for that URL are also appended to a new file of the URLJcnown user file 82 in form of a 'newUrlAcFile' along with a bit denoting whether that URL is discriminating or not, step 132.
  • URLs that track the browsing activity of known users over a period of time along with the number of sites and duration of browsing for the URL are referred to as discriminating URLs.
  • step 116 If it is determined in step 116 that the URL in the user browsing log record is equal to the URL in the URLJcnown user file record, then the activity counts for all known users for that URL are read from the URLJcnown user file 82 into a memory buffer, step 140. The activity counts are denormalized to get duration counts, step 142. The duration counts are then aged up to the current date-time, step 144, and the aged duration counts are added to the duration counts of that user in the memory buffer, step 146.
  • step 148 If it is determined in step 148 that the URL was previously discriminating and is now no longer discriminating (as is denoted by the discriminating status bit in the URLJcnown user file record), then the duration counts are appended to a temporary delete file 'del_File', step 150. Otherwise, the process 100 goes to step 152. If it is determined in step 152 that the URL was previously discriminating and still remains discriminating, then the URL dscore is computed as in steps 128 and 130 and appended to a temporary update file 'update_file', step 154.
  • step 152 If, on the other hand, it is determined in step 152 that the URL was not discriminating before and has now become discriminating, then the duration counts are appended to a temporary file 'insert_file', and the activity counts for all users for that URL along with the updated discriminating status bit are also appended to the 'newUrlAcFile', step 132.
  • step 116 If it is determined in step 116 that the URL in the user browsing log record is greater than the URL in the URLJcnown user matrix record, then the activity counts for all known users for that URL are read from the URLJcnown user matrix record into a memory buffer, step 160. The activity counts are denormalized to get duration counts, step 162. The duration counts are then aged up to the current date-time, step 164, and new activity counts and URL dscores are computed, step 166. If it is determined in step 168 that the URL was previously discriminating and is now no longer discriminating (as is denoted by the discriminating status bit in the URLJcnown user matrix record), then the activity counts are appended to a temporary file 'del_file', step 170.
  • the URL dscore and the associated new activity counts along with the current discriminating status bit are appended to the 'newUrlAcFile', step 172. If, on the other hand, it is determined in step 168 that the URL was and still is discriminating, then the URL dscore is appended to a temporary 'update ⁇ le', step 154.
  • process 180 next updates the URL dscore database 84 and the URLJcnown user file 82.
  • the updated URLJcnown user file 82 now resides in the 'newUrlAcFile'.
  • the other files produced by the process 100 are the temporary del_file, update_file and insert_file.
  • the del_file is processed first, and all non-discriminating URLs are deleted from the URL demographics file, step 182.
  • the update_file is processed next, step 184, wherein the 'newUrlAcFile' updates the URLJcnown user file.
  • the insert_file is processed last, step 186, where all new discriminating URLs are inserted.
  • the three temporary delete, update and insert URL dscore files are then deleted, step 188.
  • the old URLJcnown user file is deleted, step 190, and the 'newUrlAcFile' is renamed as the new URLJcnown user matrix file, step 192.
  • the processes 100 and 180 maintain the sorted order of the URLJcnown user file.
  • the main components of the memory cache are a hash table 220 and an aging queue.
  • the hash table 220 is similar to the hash table 92 for known users.
  • the URL such as URL 1
  • the hash bucket 222' contains a pointer to the chain of URLs 222, 224 that hash to that hash bucket 222'.
  • the URLs can be distributed uniformly across all hash buckets by selecting a suitable hash function.
  • the aging queue is provided to identify URLs that can be replaced, because they have not been accessed for some time.
  • the memory cache may, for example, be already full when a URL is fetched from the database to be stored it in the memory cache.
  • a queue of all URLs is then formed in the memory cache.
  • Each URL has an aging queue pointer 230 pointing to the previous URL in the chain and an aging queue pointer 232 pointing to the next URL in the chain.
  • a newly fetched URL is added to the tail of the queue.
  • the previous pointer of URL 1 (222) points to URL 2 (224), whereas the next pointer of URL 1 points to URL n.
  • the previous pointer of URL 2 points to URL n (226), whereas the next pointer of URL 2 points to URL 1.
  • the aging queue order is therefore URL 1 , URL 2, URL n.
  • a URL for example URL n
  • the URL is moved from its current position and placed at the tail of the queue, as indicated by the arrows in Fig. 8B.
  • the URLs at the tail of the aging queue are the most recently accessed URLs, whereas those at the head of the aging queue have been accessed least recently.
  • the URL at the head of the queue is replaced, if necessary.
  • the known users provide all necessary attribute values of all attributes.
  • some known users may choose not to supply values for some demographic attributes, such as their age or household income.
  • values will be referred to as missing values.
  • missing values should be incorporated to reduce the error in the URL_dscores and thereby minimize the error in the demographic profile predicted for unknown users.
  • one of the following strategies may be adopted:
  • the missing attribute value field may be replaced by NULL values which are ignored in the dscore computation.
  • the missing demographic information for known users may be predicted either statically by creating an auxiliary function called a "bridging agent" in ML terminology that can predict the value of the missing attribute.
  • Conventional data mining algorithms such as the association rule finding algorithm? can be applied to the existing known user demographics file and used to predict missing demographic information based on the demographic attributes from other known users. This may need to be done only once, with confidence values for the missing demographic attributes permanently stored in the known user demographics file.
  • the missing demographic information for a known users may be predicted dynamically by treating the known user as an unknown user and finding the dscore in a manner similar to the dscore computation for unknown users discussed above. This method would take into account the browsing patterns of the known user along with the demographic attributes which other known users have provided.
  • Each of the aforedescribed methods for predicting the missing demographic information for known users has disadvantages.
  • the unknown attributes values of the known users are discarded, then vital relationships between different attributes may be lost. Also, extra information will have to be recorded about the total activity count of known users who have not declared a particular attribute. This extra information will be required because the dscore computation formulas use the total activity count of all known users for a URL and the total activity count for all URLs. The activity counts of users who have missing values can therefore not be considered.
  • the known users are treated differently for different attributes in that the attributes, for which the known users have provided information, are included in the URL_dscore computation, whereas the attributes which are missing are not included in the dscore computation.
  • inter- dependencies already existing between different attributes can provide more accurate predictions.
  • Apparatus of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output.
  • the invention can advantageously be implemented in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device.
  • Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language.
  • Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory.
  • Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non- volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
  • the invention can be implemented on a computer system having a display device such as a monitor or LCD screen for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer system.
  • the computer system can be programmed to provide a graphical user interface through which computer programs interact with users. While the invention has been disclosed in connection with certain illustrated embodiments shown and described in detail, various modifications and improvements thereon will become readily apparent to those skilled in the art. Moreover, the systems and methods described herein may be employed for a plurality of different applications including for generating profiles of unknown users on a computer network.
  • systems and methods described herein may be employed for determining the success of a web site for attracting users of a selected profile, or demographic. Additionally, it will be understood that the systems and methods described herein may be operated in a way to generated meaningful user profile information, without having to provide user identify information, such as user name, or address, as part of the profile. Accordingly, the spirit and scope of the invention is to be limited only by the following claims.
  • a j , A 2 , ... , A s represent the demographic attributes.
  • Each attribute d has a number of attribute values (v,,, v l2 , ... , v ⁇ ⁇ where j, differs for each i.
  • K j , K 2 , ... , K t dXQ the known users.
  • D is the dscore of URL R for attribute value v of attribute A,.
  • a URL is called discriminating if its dscore for any attribute value differs from a typical distribution of that attribute value by a certain tunable threshold value, as described below in Section 2 of the Appendix.
  • the computation of the Dscore for a URL will consider predominantly the browsing activity of known users rather than the number of known users.
  • the browsing activity includes the duration for which the URL was browsed and the number of hits to the URL.
  • the duration values should be significant enough to indicate the interest of the user in that URL.
  • a scaling function is applied to the duration values to remove undue importance to a single visit having a large duration.
  • is the tunable parameter for capping the duration for which a URL is visited by a user.
  • the duration count may be updated by adding the new duration ⁇ (d .) count to the old duration count DC old to give the new updated duration count DC repeatedly eu
  • Each of multiple visits by a certain user may be treated as equivalent to a visit by another user having the same profile.
  • a large number of multiple visits from the same user does not provide much additional information.
  • the duration count is therefore scaled by a non-linear function which increases rapidly for a relatively small number of visits, but remains approximately constant for a relatively large number of visits. The result is called a final activity count AC.
  • ⁇ 2 is a tunable scaling parameter for capping the number of visits to a URL by a user and DC is the duration count determined above. The value of this function is less than one.
  • a short duration and/or a small number of visits have reduced effect on activity count.
  • the activity count increases with longer duration and/or the number of visits.
  • the activity count increases less rapidly for long durations and a large number of visits.
  • the effect of duration and number of visits on activity count can be adjusted by changing the values of ⁇ 1 and ⁇ 2 .
  • the change of a user's browsing pattern should be reflected in the dscores.
  • the activity counts of the users are "aged” to decrease the effect of old activity counts on the new dscore values.
  • the activity count AC of a known user is first de- normalized.
  • the de-normalized activity count AC is calculated from the equation AC
  • the de-normalized activity count AC is then aged to give AC' aged by applying the aging formula
  • the de-normalized aged activity count is then again normalized to give an aged normalized activity count
  • the dscore D for an attribute value v for attribute A, for a URL R v can now be computed.
  • the following equation can be used to define the dscore D :
  • n is the sum of activity counts for all known users K, for URL R 3 i.e.
  • a URL is discriminating if its dscore for any attribute value v y differs from the normal distribution of a known user having that attribute value (p ) by a certain defined threshold value ⁇ , so that
  • the dscore of an unknown user will be such that if the browsing activity is sufficiently large the dscores are a combination of the URLs browsed But if the browsing activity is insignificant the dscore will weigh down to that of a typical user
  • duration counts for unknown user visits are computed by a method identical to that used for computing duration counts for known users using equation 2
  • the duration count for unknown user is aged to using a formula similar to equation 5
  • the dscores for each demographic attnbute A, of the unknown user can now be computed from the duration count of the unknown user and the dscores of the URL obtained from equation 7
  • the disc ⁇ minating ability of the browsed URL and the duration count obtained from the browsing activity of the unknown user should be suitably weighted, for example, by the following function f(x)
  • the function /fo assumes a high value for those values of x that are far away from the onginal probability distnbution p, and a low value for those values of x that are close to p Hence the dscores of the URLs that are more discnrmnating are scaled by a larger amount As a result, the dscore values of the unknown user reflects the discnrmnating measure (x-p) of a URL
  • the value of ' /(dscore) is first computed for each attnbute value
  • the so computed values of j "(dscore) are then weighted and averaged, with the weights being the duration counts for the URLs.
  • the inverse function/' of the resulting value is calculated to provide the new dscore value for the unknown user:
  • D 2 is the dscore of a URL R 2 browsed by the unknown user in that session
  • DC 2 is the duration count for URL R 2 , and so on.
  • the dscores for R, and R 2 for a particular attribute value are assumed to be 0.9 and 0.6, respectively, and the probability ? of a known user having an attribute value v tJ in the original distribution is assumed to be 0.
  • the newly calculated dscore calc for a session of the unknown user may be merged with existing dscores for the unknown user. Furthermore, the duration counts also need to be updated.
  • the process therefor is as follows:
  • D old and DC old are zero
  • the new duration count DC ⁇ ew is obtained by adding DC old Aged to D,.
  • a cu ⁇ ent unknown user dscore D cu ⁇ and a cunent duration count DC CUIT is related to the sum X of the products of old f(dscores) and duration counts DC by the following relation computed at time t c :
  • D curT ages to D cur ⁇ Aged .
  • D curT ages to D cur ⁇ Aged .
  • both DC curr and X age according to equation 5 with an aging time ⁇ then the aged dscore D curr Aged for unknown users can be calculated as:

Description

01/544
- 1
SYSTEM AND METHOD FOR INFERRING DEMOGRAPHIC PROFILES
Field of the Invention
This application relates to the field of user behavioral data and more particularly to the field of Internet browsing habits and the purchasing and demographic user characteristics.
Background of The Invention
Users connected to the Internet can access an ever increasing number of Web sites to obtain information or to conduct business. Each Web site has associated therewith a unique identifier, that can be represented as a URL (Uniform Resource Locator). The user can connect to the site by, for example, typing the URL (such as "www.yahoo.com") or selecting a site from a predefined menu. Once at the site, the user can review the content presented on the site, and can provide information and instructions for directing the site to provide services and goods.
The providers of such commercial Web sites are generally interested in the demographic profile or attributes of a user in order to be better able to target advertising, products and/or services to specific users or groups of users. In many cases, the user may provide his or her demographic profile by responding to prompts or by filling out a registration form. The user demographic profile may include gender, age, marital status, education, profession, income and/or the geographic region as indicated, for example, by the user's ZIP code. A number of Web sites providing information, goods and services may attract a cross-section of users sharing at least demographic characteristics. For example, the Web site "www.bettycrocker.com" may be preferred by women between the ages of 30 and 55 and having an interest in the culinary arts. Conversely, the Web site "www.harley-davidson.com" may preferably be visited by men between the age of 35-44 interested in the outdoors. In other words, Web sites may be frequented by many different users, many of which share certain demographic characteristics, leading to a distribution of user profiles. Frequently, however, a user may be reluctant to provide detailed personal information, in which case the Web site providers have employed other means to generate an approximate profile of the user, for example, based on the user's click stream. Optionally, these techniques may be employed in a way that refrains from associating user identity information, such as name and address information, with information descriptive of the user's profile. Although these profiling techniques may work well, the problem with this approach is that the provider basically has to start with a "blank" sheet, i.e., zero data, for each new user and build the profile from sometimes random click stream information. This process may be slower than desired. Thus, a web site that experiences sporadic but intense user contact, such as a web site that sells toys during the Holiday season, may need profile information more quickly than presently available. It is therefore desired to have that methods and systems for providing a demographic user profile more rapidly, in particular for new users who have not visited a Web site before. It is further desirable to provide a measure of the demographic user profile based on the Web site visited by the user and the user's click stream.
Summary of the Invention
The systems and methods described herein include systems and methods for inferring the demographic properties of users visiting a web site. In one aspect, systems are provided wherein a set of discriminating web sites are identified. These discriminating web sites are identified by examining the browsing activity of certain known users that have associated with them profile information that is representative of demographic information for these users. For these known users a set of discriminating web sites can be identified, wherein a discriminating web site is understood as being prototypical of the characteristics of these known users. The demographic information of these discriminating web sites may be employed to infer the demographic properties of other users visiting these web sites. More specifically, the invention provides methods for generating a demographic profile for an unknown user, comprising recording computer activity of the unknown user in response to the information provided to the user by at least one of the digital processors, and combining the recorded computer activity of the unknown user with a computer demographic score of the at least one of the digital processors, the computer demographic score being based on demographic information obtained from known users, to generate the demographic profile of the unknown user. Further, the invention may provide a process for generating a demographic profile of an unknown user accessing a server having a server profile. The process may include recording computer activity by the unknown user in response to information provided by the server; determining whether the recorded computer activity by the unknown user is greater than a predetermined activity value, and combining the recorded computer activity by the unknown user with the server profile to form the unknown user demographic profile. The recorded computer activity by the unknown user can be checked to determine if it is less than a predetermined activity value, and if so, the process may set the unknown user demographic profile equal to the server profile. In an optional practice a weighting function may be applied to the recorded computer activity by the unknown user based on a duration of the computer activity. The applied weighting function may be selected to reduce the significance of a computer activity having a long duration.
The systems and methods described herein thus provide a process for generating a user demographic profile of an unknown user accessing at least one server. These processes may comprise identifying a user accessing the at least one server and recording user activity on the server, and determining, as a function of the user identification, whether the user is an unknown user or a known user. For an unknown user, the process# may monitor at least a duration of the user activity and assign a demographic score to the unknown user based on the monitored user activity and a server profile of the at least one server accessed by the unknown user. The demographic score may be combined with an existing demographic score of the unknown user, and the demographic profile of the • unknown user may be set equal to the combined demographic score.
In a further aspect, the invention may be realized as computer programs having instructions for causing a computer to record computer activity of a user responding to information provided by at least one of the computers. The program may identify the user as one of a known or an unknown user. The program may compare the computer activity, for an unknown user, with a predetermined activity and assign to the unknown user a demographic score which is based on the computer activity and a computer demographic profile characteristic of the computer if the computer activity exceeds the predetermined activity, and on the computer demographic profile alone, if the computer activity is less than the predetermined activity. The program may then combine the demographic score with another existing demographic score for the same unknown user generated during a previous session by the unknown user with the same computer or with another computer; and provide from the combination of the demographic scores a user demographic profile of the unknown user.
Further features and advantages of the present invention will be apparent from the following description of certain illustrated embodiments and from the claims.
Brief Description of the Drawings FIG. 1 is a functional block diagram a computer network;
FIG. 2A is a flow diagram of a process for inferring an unknown user profile according to the invention;
FIG. 2B is a flow diagram of a process for updating a URL profile according to the invention; FIG. 3 is a data flow diagram of the process of FIGS. 2A and 2B;
FIG. 4 shows the organization of a hash table for known users; FIG. 5 is a flow chart for computing URL profiles of known users; FIGS. 6A-6C and 7 are detailed flow charts of the process of FIG. 5; FIG. 8 A and 8B show the organization of a hash table for unknown users; and FIG. 9 is a flow chart for computing URL profiles of unknown users.
Detailed Description of Certain Illustrated Embodiments
To provide an overall understanding of the invention, certain illustrative embodiments will now be described. However, it will be understood by one of ordinary skill in the art that the systems described herein can be adapted and modified to provide systems for other suitable applications and that other additions and modifications can be made to the invention without departing from the scope hereof.
A demographic profile of an unknown user (hereinafter referred to as "unknown user profile") interacting with one or more computers, such as Web servers, may be compiled based on the computer activity of the unknown user and a computer demographic profile (hereinafter referred to as "URL profile") associated with the computer or Web server accessed. The URL profile may be derived from and updated in response to the browsing activity of known users and/or from demographic information provided by the known users. Known users can be distinguished from unknown users by the cookies exchanged between the user's PC and the Web server. The demographic profile of the known users can be based on their activity on several Web servers.
To provide a better understanding of the invention, certain terms should first be defined. A complete set of demographic information may be available for some users. These users will hereinafter be referred to as "known" users. The browsing activity of the known users is obtained from the Web sites (hereinafter referred to as "URL") they have visited over a period of time along with the number of visits (hits) and the browsing duration for each URL. This information is used to identify certain URLs, which are prototypical of the characteristics of these users. Such URLs are referred to as "discriminating" URLs. In contrast, users whose demographic characteristics are not known are referred to as "unknown" users. The process and system of the invention attempts to establish a reliable demographic profile of an unknown user based the browsing activity of the unknown user and the demographic properties of the discriminating URLs visited.
A measure in the form of a demographic score vector (dscore) is assigned to each URL. Each element of this vector represents an attribute value of the demographic information. This dscore element is a representation of the probability that a user visiting this URL can be associated with that particular attribute value. The demographic profile of a user can also be expressed as a dscore vector.
Typically, the dscore vector for either the user or the URL may include the following data: Gender (male/female)
Age (2-11, 12-17, 18-24,25-34, 35-44,45-54, 55+)
Marital Status (Single, married, divorced)
Children (by age: i.e., 1+ children 0-5, 1+ 6-11, 1+ 12+)
Education (some high school, high school degree, some college, college degree, advanced degree, professional degree) Profession (broad categories, e.g.: Technical, Professional, Homemaker, student, Trade, Management, Sales and Marketing, Service).
Income (US$ < $25k, $25k-$35k, $35k-$45k, $45k-$75k, $75k-$100k, >$100k) Geography (Country, State, Town, ZIP code or Area code). The broad categories are the attributes, and the sub-categories enclosed in parentheses are the attribute values.
An exemplary URL dscore may have the form shown in Table 1 :
Figure imgf000007_0001
As seen in Table 1 , a typical user logging on to the exemplary Web site is expected to be a married women between the age of 20 and 40 years. It is therefore not unreasonable to expect that an unknown user logging on to the same Web site will have a statistically significant probability to fit this profile.
Referring now to FIG. 1 , a part 10 of a computer network includes a profile server 12, personal computers (PC) 14, 15, and Web servers 16 and 17. The PCs 14 and 15 may be any one of a variety of conventional, commercially available, hardware and software combinations configured to access Internet servers by any one of a variety of suitable means. Similarly, the Profile Server 12 and the Web servers 16 and 17 may also be any one of a variety of conventional, commercially available, hardware and software combinations configured to provide conventional Internet services to users. In some instances, such as those described below, the conventional server software is supplemented to provide the functionality discussed herein. The PCs 14, 15 and the servers 12, 16 and 17 communicate with each other via communication links 22, 24-27 which are all connected to a communication channel 21, such as the Internet.
For the system described herein, each of the servers 16, 17 has a unique identification, generally referred to as a URL (Uniform Resource Locator) serving as the server's network address. The Web server 16 may, for example, provide advertisements to PC 14 or 15 accessing the server 16. Alternatively, the Web server 16 may provide the advertisements, for example, in banner form to another server 17 with which the PC 14 is communicating. The Web servers 16, 17 may also be used in electronic commerce applications offering goods and services to the users 14, 15, as is known in the art. One of the servers 12 is designated as a "Profile Server" 12 capable of monitoring user interactions with the Web servers 16, 17. The Profile Server 12 can communicate with one or more Web servers 16, 17 which run profiling software that allows the Web servers 16, 17 to monitor and record user input in the PCs 14, 15, such as the users' click stream. The Profile Server 12 may monitor the user input in real time or download user input data offline. The Web servers 16, 17 can also query the Profile Server 12 to obtain information about user interaction with their own Web server 16 and 17, respectively, or with another Web server or other Web servers. The Profile Server 12 can control and limit the user information made available to the Web server 16 or 17 from another Web server.
Referring now to FIG. 2A, in process 30 an unknown user logs on to a Web site, step 32. The unknown user is most likely drawn to this Web site because of its displayed or assumed content. The Web server of the Web site returns to the user's PC a cookie ID containing at least the URL of the Web server and a time stamp recording the time and duration of the user's click on information provided by the Web server, step 34. A "cookie" is generally referred to as a packet of information sent by a server to a browser and then returned by the browser to the server each time the browser accesses the server. Cookies may contain any information the server chooses and are used to maintain state between otherwise stateless transactions, such as HTTP transactions. Typically this is used to authenticate or identify a registered user of a web site without requiring the user to sign in again every time he/she accesses that site. Other uses are, e.g. maintaining a "shopping basket" of goods selected for purchase during a session at a site, site personalization (presenting different pages to different users), and/or tracking a particular user's access to a site.
The software running on the visited Web server records the unknown user's browsing activity on that Web site, step 36. The browsing activity is likely to reflect the user's interests and can be used to compile a user's interest score (iscore) which may be retained in a database of the Profile Server 12 or the visited Web server, where the iscore can be updated when the user logs on to the Web site again. The compilation of interest scores is described, for example, in the above mentioned and commonly assigned US patent application entitled "SYSTEM AND METHOD FOR BUILDING USER PROFILES".
The software computes the demographic profile of the unknown user (unknown_user_dscore) based on the user's browsing activity monitored in step 36 and the dscore of the visited Web site (URL_dscore), step 38. The URL_dscore is obtained based on information of known users having an established user profile for the visited Web site using process 50, which will be described below with reference to FIG. 2B. The unknown_user_dscore is then stored in a history table to be used for a subsequent login of the same unknown user, step 42. Referring now to FIG. 2B, process 50 computes and updates the URL_dscores of discriminating Web sites (URLs) based in the browsing activity of known users. A known user logs on to a discriminating Web site, step 52, and the user PC sends a cookie with user information from prior interactions to the Web server, step 54. The software running on the visited Web server records the known user's browsing activity on that Web site and updates a browsing log of the known user, step 56. The updated browsing log is then compared with existing records of known users. If the updated browsing log is statistically identical to the existing records, then the URL_dscore of the Web site remains unchanged. If, on the other hand, the updated browsing log is statistically different from the existing records, then the URL_dscore of the Web site is updated to reflect the altered browsing patterns of known users, step 58. The updated URL_dscore can then be stored in a file or database, step 60, and is used to compute the unknown_user_dscore in step 38 of FIG. 2 A, as described above.
Referring now to FIG. 3, the processes 30 and 50 of FIGS. 2A and 2B will now be described with reference to a data flow diagram 70. The software running on, for example, Web server 16 which may be monitored by the Profile Server 12, receives a user's cookie, the URL visited and a time stamp indicating, for example, the time of the visit and the duration of the browsing activity, step 62, creating a log record. The log record is processed by log record processing means, step 64, wherein the users are segregated according to the received cookies into known users for which a demographic profile has been established, and unknown users for which a dscore is to be computed using the system and method of the invention. Interest scores (iscores) based on the user browsing activity may also be compiled in step 64, as mentioned above.
The records 78 of known users will be used for the URL_dscore computation 80 to update a URL file of known users 82 based on the new browsing activity of the known users. The process 70 also maintains a database 84 with the dscores of other Web sites (URLs) profiled by the Profile Server 12. The database 84 is also updated as needed or on a periodic basis. The updated URL dscores are then supplied to the log record processing means 64 which computes a session dscore of the unknown user for the current session on the visited Web site (URL) based on the user's browsing activity, in particular the browsing duration, and the URL_dscore of the visited Web site, step 66. The session dscore is then merged with dscores compiled for the unknown user during previous sessions and stored in a history file 74, providing a new dscore of the unknown user, step 72. The dscore and other browsing attributes, such as the URLs of the visited Web sites and browsing duration, in the history file may also be updated. The history file of an unknown user may have the following form:
Figure imgf000010_0001
Referring now to FIG. 9, a flow chart 200 describes the browsing activity of an unknown user. First, the time duration for which the unknown user browses a Web site (URL) is recorded, step 202. Duration counts have proven to represent a better measure for calculating an unknown user's dscore than merely the number of visits (hits). Storing all browsing activity, including all hits, for all visits of unknown users to all URLs is computationally extensive, since both the number of unknown users and discriminating URLs can be very large.
Moreover, only those duration values which are significant enough to indicate the interest of the user in that URL should be taken into consideration. However, undue importance should not be given to a single visit having a large duration. For this reason, a non-linear function is applied to the duration values which caps large duration values, step 204 (see Eq. 12 in the Appendix). The value of this non-linear function is selected to increase rapidly for small but significant values of duration, but remains constant for large duration values. The non-linear function also weights the discriminating ability of the URL browsed so that the dscores of the URLs that are more discriminating, are scaled by a larger amount.
If it is decided in step 206 that the browsing activity of the user is "significant", i.e. more extensive than a predetermined threshold value, then the duration count (DC) is computed and updated, step 210, using Eq. 2 of the Appendix, and the dscore of the unknown user for the URL browsed is computed for each attribute value from the weighted and aged duration count and the URL dscore of that URL, step 212. The duration counts are "aged" in a manner known in the art to reduce the effect of old duration counts on the new dscore values. At the end of a session, the dscore values of all URLs browsed by the unknown user are computed, step 214, using Eqs. 12 and 13 given in the Appendix. The dscore value of the unknown user will be more heavily weighted towards the more discriminating URL (see Table A5 in the Appendix). The new dscore values computed in step 214 are then merged with the old dscore values of the unknown user stored in the history table 74, step 216. Before merging, the old duration counts and the old dscore values are first aged, using Eqs. 11-18 listed in the Appendix. The merged and aged dscore values represent the current dscore values of the unknown user, step 218.
Alternatively, if it is determined in step 206 that the unknown user's browsing activity is not "significant" by being smaller than the predetermined value, then the predicted dscore for the unknown user is set equal to a typical user profile for the URL browsed, step 208. In other words, the demographic profile of the user showing insignificant browsing activity will be set to the profile shown, for example, in Table 1 if the user accesses this exemplary Web site.
As mentioned above, known users may be distinguished from unknown users based on the cookie attributes. Log processing process 64 has a list of cookies of the known users. If the cookie exists in the list, all records in the session will be identified as those of known users. This comparison could be made more efficient by hashing the cookies as illustrated in FIG. 4.
Referring now to FIG. 4, a hash table 90 may be implemented in the form of an array of pointers 62. All pointers in the hash table initially have NULL values. When a known user cookie ID 94, 96, 98 hashes to a hash bucket 96', 98', the array is indexed by bucket number and assigned a corresponding pointer, thereby linking the cookie to the pointer. To resolve conflicts between two pointers 94, 96 pointing to the same hash bucket 96', the cookies 94, 96 will be chained. The number of hash buckets 96', 98' may be set equal to the number of known users. The hash table 90 of known users may, in some embodiments, always be present in a memory. An incoming hashed cookie is checked whether it points to an existing hash bucket. With a suitable hash function and a suitable number of buckets, known and unknown users may be identified after a very few comparisons.
For computing the URL_dscores in step 80, all incoming known user records are first sorted according to the URLs visited. If the log records of the known user are stored in more than one file, each file will be sorted individually with respect to the URLs. The sorted files will then be merged and organized for each URL in, for example, ascending order. These output files are advantageously read in the same order in which they were created. Referring now to FIG. 5, a flow chart 100 shows the process step 80 of FIG. 3 for computing the URL dscores for known users. First, a duration count (DC) is computed for each user and each visited Web site (URL), step 102. The duration count represents the time during which a user interacts with the URL. From the duration count, an activity count is computed which is then "aged" to reflect the elapsed time since the user's last access to the URL, step 104. The new dscore of the URL for known users is then computed from the aged activity count, step 106. The URL_dscores are then updated and the temporary files created during the process 100 are deleted, step 180.
FIGS. 6A - 6C illustrate the process 100 of Fig. 5 in more detail. Referring first to FIG. 6A, the read pointer of the first record in the known user log 78 and of the first file in the URL_known user file 82 are initialized, step 112. As mentioned above, the
URL_known user file 82 is already sorted according to the URLs, for example in ascending order. The first record in the known user log 78 is read, step 114, and that URL is compared with the URL in the current record of the URL nown user file 82, step 116. Depending upon whether the URL from the user browsing log record is less than, equal to, or greater than the URL from the record of the URL_known user file 82, the one of the following three process steps is executed: If it is determined in step 116 that the URL in the user browsing log record is less than the URL in the URLJcnown user matrix file 82, a buffer for storing duration counts of all users is allocated and initialized to zero, step 118. The records for that URL are read one by one from the user browsing log record, step 120, and the duration is normalized (see Eq. 1 in the Appendix) to give duration counts, step 122. The duration counts are then aged up to the current date-time (Eq. 5), step 124, and added to the duration counts of that user in the buffer (Eq. 2), step 126. After all log records for that URL are processed, the duration counts of all known users are normalized to produce activity counts (Eq. 3), step 128. The dscores for the respective URL are computed using the known user dscores and the accumulated activity counts (Eq. 7), step 130. If the dscores so computed show that the URL is discriminating, then the dscores of the URL are appended to and inserted in a temporary file 'insert_file', and the activity counts for all users for that URL are also appended to a new file of the URLJcnown user file 82 in form of a 'newUrlAcFile' along with a bit denoting whether that URL is discriminating or not, step 132. As mentioned above, URLs that track the browsing activity of known users over a period of time along with the number of sites and duration of browsing for the URL are referred to as discriminating URLs.
If it is determined in step 116 that the URL in the user browsing log record is equal to the URL in the URLJcnown user file record, then the activity counts for all known users for that URL are read from the URLJcnown user file 82 into a memory buffer, step 140. The activity counts are denormalized to get duration counts, step 142. The duration counts are then aged up to the current date-time, step 144, and the aged duration counts are added to the duration counts of that user in the memory buffer, step 146. If it is determined in step 148 that the URL was previously discriminating and is now no longer discriminating (as is denoted by the discriminating status bit in the URLJcnown user file record), then the duration counts are appended to a temporary delete file 'del_File', step 150. Otherwise, the process 100 goes to step 152. If it is determined in step 152 that the URL was previously discriminating and still remains discriminating, then the URL dscore is computed as in steps 128 and 130 and appended to a temporary update file 'update_file', step 154. If, on the other hand, it is determined in step 152 that the URL was not discriminating before and has now become discriminating, then the duration counts are appended to a temporary file 'insert_file', and the activity counts for all users for that URL along with the updated discriminating status bit are also appended to the 'newUrlAcFile', step 132.
If it is determined in step 116 that the URL in the user browsing log record is greater than the URL in the URLJcnown user matrix record, then the activity counts for all known users for that URL are read from the URLJcnown user matrix record into a memory buffer, step 160. The activity counts are denormalized to get duration counts, step 162. The duration counts are then aged up to the current date-time, step 164, and new activity counts and URL dscores are computed, step 166. If it is determined in step 168 that the URL was previously discriminating and is now no longer discriminating (as is denoted by the discriminating status bit in the URLJcnown user matrix record), then the activity counts are appended to a temporary file 'del_file', step 170. The URL dscore and the associated new activity counts along with the current discriminating status bit are appended to the 'newUrlAcFile', step 172. If, on the other hand, it is determined in step 168 that the URL was and still is discriminating, then the URL dscore is appended to a temporary 'update ϊle', step 154.
Referring now to FIG. 7, process 180 next updates the URL dscore database 84 and the URLJcnown user file 82. After both the input files, i.e., the known user browsing log 78 and the URLJcnown user file 82, have been processed in the manner described above, the updated URLJcnown user file 82 now resides in the 'newUrlAcFile'. The other files produced by the process 100 are the temporary del_file, update_file and insert_file. The del_file is processed first, and all non-discriminating URLs are deleted from the URL demographics file, step 182. The update_file is processed next, step 184, wherein the 'newUrlAcFile' updates the URLJcnown user file. The insert_file is processed last, step 186, where all new discriminating URLs are inserted. The three temporary delete, update and insert URL dscore files are then deleted, step 188. Finally, the old URLJcnown user file is deleted, step 190, and the 'newUrlAcFile' is renamed as the new URLJcnown user matrix file, step 192. The processes 100 and 180 maintain the sorted order of the URLJcnown user file.
Referring now to FIGS. 8A and 8B, the computation speed of dscores and duration counts of unknown users can increased by providing a memory cache. The main components of the memory cache are a hash table 220 and an aging queue. The hash table 220 is similar to the hash table 92 for known users. The URL, such as URL 1 , is hashed to provide the bucket number 222' of the hash bucket. The hash bucket 222' contains a pointer to the chain of URLs 222, 224 that hash to that hash bucket 222'. For example, the URL 1 and 2 and the associated dcsore hash to the same bucket 222'. The URLs can be distributed uniformly across all hash buckets by selecting a suitable hash function.
The aging queue is provided to identify URLs that can be replaced, because they have not been accessed for some time. The memory cache may, for example, be already full when a URL is fetched from the database to be stored it in the memory cache. A queue of all URLs is then formed in the memory cache. Each URL has an aging queue pointer 230 pointing to the previous URL in the chain and an aging queue pointer 232 pointing to the next URL in the chain. A newly fetched URL is added to the tail of the queue. In the example illustrated in FIG. 8 A, the previous pointer of URL 1 (222) points to URL 2 (224), whereas the next pointer of URL 1 points to URL n. Likewise, the previous pointer of URL 2 (224) points to URL n (226), whereas the next pointer of URL 2 points to URL 1. The aging queue order is therefore URL 1 , URL 2, URL n. When a URL, for example URL n, from the cache is accessed, the URL is moved from its current position and placed at the tail of the queue, as indicated by the arrows in Fig. 8B. Thus the URLs at the tail of the aging queue are the most recently accessed URLs, whereas those at the head of the aging queue have been accessed least recently. The URL at the head of the queue is replaced, if necessary.
In the foregoing, it is assumed that the known users provide all necessary attribute values of all attributes. However, some known users may choose not to supply values for some demographic attributes, such as their age or household income. Such values will be referred to as missing values. These missing values, however, should be incorporated to reduce the error in the URL_dscores and thereby minimize the error in the demographic profile predicted for unknown users. In such cases, one of the following strategies may be adopted: The missing attribute value field may be replaced by NULL values which are ignored in the dscore computation. Alternatively, the missing demographic information for known users may be predicted either statically by creating an auxiliary function called a "bridging agent" in ML terminology that can predict the value of the missing attribute.
Conventional data mining algorithms, such as the association rule finding algorithm? can be applied to the existing known user demographics file and used to predict missing demographic information based on the demographic attributes from other known users. This may need to be done only once, with confidence values for the missing demographic attributes permanently stored in the known user demographics file. Alternatively, the missing demographic information for a known users may be predicted dynamically by treating the known user as an unknown user and finding the dscore in a manner similar to the dscore computation for unknown users discussed above. This method would take into account the browsing patterns of the known user along with the demographic attributes which other known users have provided. Each of the aforedescribed methods for predicting the missing demographic information for known users has disadvantages. For example, if the unknown attributes values of the known users are discarded, then vital relationships between different attributes may be lost. Also, extra information will have to be recorded about the total activity count of known users who have not declared a particular attribute. This extra information will be required because the dscore computation formulas use the total activity count of all known users for a URL and the total activity count for all URLs. The activity counts of users who have missing values can therefore not be considered. In other words, the known users are treated differently for different attributes in that the attributes, for which the known users have provided information, are included in the URL_dscore computation, whereas the attributes which are missing are not included in the dscore computation. However, inter- dependencies already existing between different attributes can provide more accurate predictions.
Considering the known user as an unknown user for the missing attributes and predicting the missing attribute values in the same manner as for unknown users tends to be more accurate, since the attributes are computed from disclosed demographic information and the browsing activity of the known user. However, this process is computation- intensive and may be justified if the results have to be very accurate.
The systems and methods described above may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof. Apparatus of the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output. The invention can advantageously be implemented in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non- volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
To provide for interaction with a user, the invention can be implemented on a computer system having a display device such as a monitor or LCD screen for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer system. The computer system can be programmed to provide a graphical user interface through which computer programs interact with users. While the invention has been disclosed in connection with certain illustrated embodiments shown and described in detail, various modifications and improvements thereon will become readily apparent to those skilled in the art. Moreover, the systems and methods described herein may be employed for a plurality of different applications including for generating profiles of unknown users on a computer network. Additionally, the systems and methods described herein may be employed for determining the success of a web site for attracting users of a selected profile, or demographic. Additionally, it will be understood that the systems and methods described herein may be operated in a way to generated meaningful user profile information, without having to provide user identify information, such as user name, or address, as part of the profile. Accordingly, the spirit and scope of the invention is to be limited only by the following claims.
- I S
Appendix
1. Notations
The following notations are used throughout the document:
Aj, A 2, ... , As represent the demographic attributes.
Each attribute d, has a number of attribute values (v,,, vl2, ... , vψ} where j, differs for each i.
Examples are:
Aj = gender. vu = male; vl2 = female. In this case, j, = 2.
A2 = age. v2l = 2-11; v22 = 12-17; v23 = 18-24; v24 = 25-34; v25 = 35-44; v26 = 45-54; v27 =
55+. In this case, j, = 7.
Kj, K2, ... , Kt dXQ the known users.
Uj, U2, ... , Uz , .... are the unknown users.
Rμ R2, ..." , Ry , .... are the discriminating URLs.
D is the dscore of URL R for attribute value v of attribute A,.
A URL is called discriminating if its dscore for any attribute value differs from a typical distribution of that attribute value by a certain tunable threshold value, as described below in Section 2 of the Appendix.
2 Design of Dscore Equations for URLs
2.1 Browsing activity of a known user
The computation of the Dscore for a URL will consider predominantly the browsing activity of known users rather than the number of known users. The browsing activity includes the duration for which the URL was browsed and the number of hits to the URL. The duration values should be significant enough to indicate the interest of the user in that URL. However, a scaling function is applied to the duration values to remove undue importance to a single visit having a large duration.
Duration Count (DC) (Equation 1 )
Figure imgf000019_0001
where dyl is the browsing duration.
β, is the tunable parameter for capping the duration for which a URL is visited by a user. For every visit by a user, the duration count may be updated by adding the new duration λ(d .) count to the old duration count DCold to give the new updated duration count DC„eu
DClιew = DCold + Λ d (Equation 2)
Each of multiple visits by a certain user may be treated as equivalent to a visit by another user having the same profile. However, a large number of multiple visits from the same user does not provide much additional information. The duration count is therefore scaled by a non-linear function which increases rapidly for a relatively small number of visits, but remains approximately constant for a relatively large number of visits. The result is called a final activity count AC.
The function is given by: β i * D C
A C (Equation 3)
( β i * D C ) + 1
β2 is a tunable scaling parameter for capping the number of visits to a URL by a user and DC is the duration count determined above. The value of this function is less than one.
The effect of β, and β2 on the activity count for different values of the number of visits and the duration of a visit can be seen from Tables A1-A4: β,=0.1 β2=0.2
Browsing duration
1 10 100 1000
1 0.018 0.091 0.154 0.165
Number of 10 0.154 0.5 0.645 0.664 visits 100 0.645 0.909 0.948 0.952
1000 0.948 0.994 0.005 0.995
Table A 1 β,=0.1 β2=0.5
Browsing duration
1 10 100 1000
1 0.043 0.2 0.312 0.331
Number of 10 0.312 0.714 0.812 0.832 visits 100 0.820 0.962 0.978 0.980
1000 0.974 0.996 0.998 0.998
Table A2
β,=0.2 β2=0.5 Browsing duration
1 10 100 1000
1 0.062 0.143 0.164 0.166
Number of 10 0.4 0.625 0.662 0.666 visits 100 0.870 0.943 0.951 0.952
1000 0.985 0.994 0.995 0.995
Table A3
β,=0.5 β2=0.5
Browsing duration
1 10 100 1000
1 0.143 0.294 0.329 0.333
Number of 10 0.625 0.806 0.831 0.833 visits 100 0.943 0.976 0.980 0.980
1000 0.994 0.997 0.998 0.998
Table A4
As seen from the tables, a short duration and/or a small number of visits have reduced effect on activity count. The activity count increases with longer duration and/or the number of visits. The activity count increases less rapidly for long durations and a large number of visits. The effect of duration and number of visits on activity count can be adjusted by changing the values of β1 and β2.
The change of a user's browsing pattern should be reflected in the dscores. For this, the activity counts of the users are "aged" to decrease the effect of old activity counts on the new dscore values.
To age the activity count AC of a known user, the activity count AC is first de- normalized. The de-normalized activity count AC is calculated from the equation AC
A C = ~ βRi - AC* βi (Equation 4)
wherein AC<\ .
For AC=\, AC is set to a very high value H.
The de-normalized activity count AC is then aged to give AC'aged by applying the aging formula
ACaged = AC*2 (Equation 5) It will be appreciated that any exponentially decaying function can be used in lieu of equation 5.
The de-normalized aged activity count is then again normalized to give an aged normalized activity count
ACa8βd (Equation 6)
Figure imgf000022_0001
From the values of the activity count of known users computed above, the dscore D for an attribute value v for attribute A, for a URL Rv can now be computed. In an exemplary embodiment, the following equation can be used to define the dscore D : Dyj =
Figure imgf000022_0002
(Equation 7) ny + t*a wherein w is the sum of activity counts for all known users K, (l=l, ... , t) for whom A, takes the value vv for URL Ry, i.e. riyij = ∑ ACyι * c (Equation 8)
Figure imgf000022_0003
For each attribute A, and a known user K„ only one of the J will have the value 1, whereas the remaining vtJ for that attribute A, and user K,, have the value zero. The parameter C is defined as
C ~ (Equation 9)
Figure imgf000022_0004
The activity counts are calculated from equations 4 and 6, respectively. n is the sum of activity counts for all known users K, for URL R 3 i.e.
rly = 2_j A Cyl (Equation 10)
/=ι
In equation 7, t is the total number of known users; py is the probability of known users having an attribute A, with attribute values ι ; α is an adjustable parameter to distribute the dscore proportionately between 0 and 1. It was experimentally observed that α can suitably be set to \lny. However, other values of α may further improve the dscore distribution. Also, n - will always be less than or equal to t*p .
A URL is discriminating if its dscore for any attribute value vy differs from the normal distribution of a known user having that attribute value (p ) by a certain defined threshold value γ, so that | DyιJ - ptJ \ > γ.
3. Design of Dscore Equations for Unknown Users According to our problem specification, we want to predict a demographic profile for an unknown user The demographic profile is represented in the form of a dscore for each attnbute value for the unknown user This dscore value depends on following factors
Dscores of the discπminating URLs being browsed in the current session
Old dscore of the unknown user
Browsing activity of the unknown user for the current session
The dscore of an unknown user will be such that if the browsing activity is sufficiently large the dscores are a combination of the URLs browsed But if the browsing activity is insignificant the dscore will weigh down to that of a typical user
3 1 Browsing Activity for an Unknown User
The duration counts for unknown user visits are computed by a method identical to that used for computing duration counts for known users using equation 2
Like the de-normahzed activity count AC for known users, the duration count for unknown user is aged to using a formula similar to equation 5
DCaged =
Figure imgf000023_0001
(Equation 11)
The dscores for each demographic attnbute A, of the unknown user can now be computed from the duration count of the unknown user and the dscores of the URL obtained from equation 7 To predict the dscores for each demographic attnbute value of the unknown user, the discπminating ability of the browsed URL and the duration count obtained from the browsing activity of the unknown user should be suitably weighted, for example, by the following function f(x)
(ec(x~p) -e~c{ ~p)) f(x) = i 1 (Equation 12) wherein c is an adjustable parameter, x is a vanable, in this case the dscore of the URL browsed, and p is the probability of a known user having an attnbute value v in the oπginal distnbution
The function /fo assumes a high value for those values of x that are far away from the onginal probability distnbution p, and a low value for those values of x that are close to p Hence the dscores of the URLs that are more discnrmnating are scaled by a larger amount As a result, the dscore values of the unknown user reflects the discnrmnating measure (x-p) of a URL
To compute the dscores for all URLs that an unknown user browses in a session, the value of '/(dscore) is first computed for each attnbute value The so computed values of j "(dscore) are then weighted and averaged, with the weights being the duration counts for the URLs. Finally, the inverse function/' of the resulting value is calculated to provide the new dscore value for the unknown user:
-1 f(Dx) * DC + f(Dι) * DCι + .
Unew — J (Equation 13)
Figure imgf000024_0001
where D, is the dscore of a URL R, browsed by the unknown user in that session,
DC, is the duration count for URL R„
D2 is the dscore of a URL R2 browsed by the unknown user in that session,
DC2 is the duration count for URL R2, and so on.
For example, an unknown user browses URLs R, and R2 with DC,=DC2=\ . Also, the dscores for R, and R2 for a particular attribute value are assumed to be 0.9 and 0.6, respectively, and the probability ? of a known user having an attribute value vtJ in the original distribution is assumed to be 0.
The followin dscores are calculated from the above values:
Figure imgf000024_0003
Table A5
As seen from the values of the original dscore of / , anάR2, R, is more discriminating thanR2. Consequently, the dscorecalc of the unknown user is more heavily weighted towards R,. Moreover, as the value of c increases, the dscorecalc of an unknown user is pulled more towards the dscore of the URL that is more discriminating.
The newly calculated dscorecalc for a session of the unknown user may be merged with existing dscores for the unknown user. Furthermore, the duration counts also need to be updated. The process therefor is as follows:
For an attribute value vi let the old dscore of the unknown user U be D0,d and the old duration count DCold, both calculated at a time tc. In the next session, at time t, a new dscore Dnew and a new duration count DCnew is calculated for the same unknown user U, using equation 13. The old duration count DCold and the old dscore Dold are first aged
Figure imgf000024_0002
Initially, Dold and DCold are zero, k is a parameter related to a significance of a user activity. Since/y(0) = p, where p is the probability of a known user having an attribute value vy in the original distribution, the dscore predicted for the unknown user will substantially be equal to a typical profile, if the activity count of the unknown user is small compared to k. Otherwise, the dscores are those predicted by the URLs browsed. The new duration count DCπew is obtained by adding DCold Agedto D,.
Like the dscores of known users, the dscores of unknown users may also be aged. According to equation 15, a cuπent unknown user dscore Dcuτ and a cunent duration count DCCUIT is related to the sum X of the products of old f(dscores) and duration counts DC by the following relation computed at time tc:
Dcurr = / "' ( ) (Equation 16) which is equivalent to
X =f(Dc *(DCcurr +k). (Equation 16a)
At a future time t, DcurTages to Dcurτ Aged. Assuming that both DCcurr and X age according to equation 5 with an aging time λ, then the aged dscore Dcurr Aged for unknown users can be calculated as:
Dcurr _ Aged — (Equation 17)
Figure imgf000025_0001
or by substituting equation 16a
-1 f (Dcurr) * (DCcιιrr + k) * 2
Dcurr _ Aged — ι-tc (Equation 18)
D * 2 Ύ + k
4. Configuration Options
The following configuration parameters were found to provide satisfactory results for calculating dscores:
Parameter Set by Values Default value Descnption c User 1, 1.5, 2, 2.5, .. . , 20 10 Used m URL dscore calculations for weighting the discnrmnating behavior of a URL
User 0, , 1 0 1 Tunable parameter for capping the βi duration for which a URL is visited by a user
User 0, , 1 0 5 Tunable parameter for capping the β2 number of visits to a URL by a user r User 0, , 1 0 1 Parameter to decide if a URL is discriminating

Claims

1. In a computer network formed of a communication channel and a plurality of digital processors coupled to the communication channel for communication thereon and providing information to a user, a method for generating a demographic profile for an unknown user comprising: recording computer activity of the unknown user in response to the information provided to the user by at least one of the digital processors, and combining the recorded computer activity of the unknown user with a computer demographic score of the at least one of the digital processors, the computer demographic score being based on demographic information obtained from known users, to generate the demographic profile of the unknown user.
2. A process for generating a demographic profile of an unknown user accessing a server, the server having a server profile, the process comprising: recording computer activity by the unknown user in response to information provided by the server; if the recorded computer activity by the unknown user is greater than a predetermined activity value, combining the recorded computer activity by the unknown user with the server profile to form the unknown user demographic profile, and if the recorded computer activity by the unknown user is less than a predetermined activity value, setting the unknown user demographic profile equal to the server profile.
3. The process of claim 2, wherein a weighting function is applied to the recorded computer activity by the unknown user based on a duration of the computer activity.
4. The process of claim 3, wherein the applied weighting function is selected to reduce the significance of a computer activity having a long duration.
5. The process of claim 2, wherein the unknown user demographic profile formed during a first session with the server is retained in a history table and merged with the unknown user demographic profile formed during a subsequent second session.
6. The process of claim 5, wherein the unknown user demographic profile formed in the first session with the server is aged before being merged with the unknown user demographic profile formed during the subsequent second session.
7. The process of claim 2, wherein at least a portion of the server profiles are stored in memory cache.
8. The process of claim 7, wherein the memory cache is a hash table.
9. The process of claim 2, wherein the unknown user accesses at least two servers and the unknown user demographic profile is formed by merging the user demographic profiles from all accessed servers that have a server profile.
10. The process of claim 2, wherein the server profile is updated in response to a computer activity by a known user.
11. A process for generating a user demographic profile of an unknown user accessing at least one server, comprising: identifying a user accessing the at least one server and recording user activity on the server; based on the user identification, determining if the user is an unknown user or a known user; for an unknown user: monitoring at least a duration of the user activity and assigning a demographic score to the unknown user based on the monitored user activity and a server profile of the at least one server accessed by the unknown user; combining the demographic score with an existing demographic score of the unknown user; and setting the demographic profile of the unknown user equal to the combined demographic score.
12. The process of claim 12, combining comprises merging the demographic scores obtained from the same server and from another server accessed by the unknown user.
13. The process of claim 12, further comprising for a known user: monitoring at least a duration of the known user activity and updating a browsing record of the known user; updating the server profile in response to the updated browsing record; and using the updated server profile to compute the demographic score of the unknown user.
14. The process of claim 12, wherein the user is identified based on a cookie returned by the at least one server.
15. In a computer network formed of a communication channel and a plurality of computers coupled to the communication channel for communication thereon, a computer program, residing on a computer-readable medium, comprising instructions for causing a computer to: record a computer activity of a user responding to information provided by at least one of the computers; identifying the user as one of a known and an unknown user for the at least one computer; for an unknown user, compare the computer activity with a predetermined activity and assign to the unknown user a demographic score which is based on the computer activity and a computer demographic profile characteristic of the computer, if the computer activity exceeds the predetermined activity, and on the computer demographic profile alone, if the computer activity is less than the predetermined activity; combine the demographic score with another existing demographic score for the same unknown user generated during a previous session by the unknown user with the same computer or with another computer; and provide from the combination of the demographic scores a user demographic profile of the unknown user.
16. The process of claim 15, wherein the computer demographic profile is generated from the computer activity of the known users.
17. The process of claim 15, wherein the computer activity is generated in response to information provided to the known or unknown user by the computer.
PCT/US2001/003214 2000-01-31 2001-01-31 System and method for inferring demographic profiles WO2001054480A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001238002A AU2001238002A1 (en) 2000-01-31 2001-01-31 System and method for inferring demographic profiles

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US49441200A 2000-01-31 2000-01-31
US09/494,412 2000-01-31

Publications (1)

Publication Number Publication Date
WO2001054480A2 true WO2001054480A2 (en) 2001-08-02

Family

ID=23964365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/003214 WO2001054480A2 (en) 2000-01-31 2001-01-31 System and method for inferring demographic profiles

Country Status (2)

Country Link
AU (1) AU2001238002A1 (en)
WO (1) WO2001054480A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7260783B1 (en) 2003-07-08 2007-08-21 Falk Esolutions Gmbh System and method for delivering targeted content

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7260783B1 (en) 2003-07-08 2007-08-21 Falk Esolutions Gmbh System and method for delivering targeted content

Also Published As

Publication number Publication date
AU2001238002A1 (en) 2001-08-07

Similar Documents

Publication Publication Date Title
US11790396B2 (en) Preservation of scores of the quality of traffic to network sites across clients and over time
AU769336B2 (en) System and method for building user profiles
US11798034B1 (en) Directed content to anonymized users
AU2011204803B2 (en) Browser based user identification
US20190279230A1 (en) Online Content Delivery Based on Information from Social Networks
US8108245B1 (en) Method and system for web user profiling and selective content delivery
US11526905B1 (en) Systems and methods for preserving privacy
US8549163B2 (en) Passive parameter based demographics generation
Kazienko et al. AdROSA—Adaptive personalization of web advertising
Eirinaki et al. Web mining for web personalization
US7337127B1 (en) Targeted marketing system and method
US7734632B2 (en) System and method for targeted ad delivery
US7647387B2 (en) Methods and systems for rule-based distributed and personlized content delivery
US8768768B1 (en) Visitor profile modeling
RU2757546C2 (en) Method and system for creating personalized user parameter of interest for identifying personalized target content element
US20160117736A1 (en) Methods and apparatus for identifying unique users for on-line advertising
US10311124B1 (en) Dynamic redirection of requests for content
US20030023598A1 (en) Dynamic composite advertisements for distribution via computer networks
US20120054187A1 (en) Selection and delivery of invitational content based on prediction of user interest
US8756172B1 (en) Defining a segment based on interaction proneness
US20110022938A1 (en) Apparatus, method and system for modifying pages
US9098857B1 (en) Determining effectiveness of advertising campaigns
EP2478448A1 (en) Method and apparatus for data traffic analysis and clustering
US7574651B2 (en) Value system for dynamic composition of pages
WO2001054480A2 (en) System and method for inferring demographic profiles

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP