WO2016194996A1 - ユーザ推定装置、ユーザ推定方法、および、ユーザ推定プログラム - Google Patents
ユーザ推定装置、ユーザ推定方法、および、ユーザ推定プログラム Download PDFInfo
- Publication number
- WO2016194996A1 WO2016194996A1 PCT/JP2016/066344 JP2016066344W WO2016194996A1 WO 2016194996 A1 WO2016194996 A1 WO 2016194996A1 JP 2016066344 W JP2016066344 W JP 2016066344W WO 2016194996 A1 WO2016194996 A1 WO 2016194996A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- page
- website
- request
- estimation
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
- H04L67/025—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/2866—Architectures; Arrangements
- H04L67/30—Profiles
- H04L67/306—User profiles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/535—Tracking the activity of the user
Definitions
- the present invention relates to a user estimation device, a user estimation method, and a user estimation program.
- the user is clustered and learned using a combination of transition information composed of the functions of the viewed pages (for example, search, product list, purchase, etc.), and the browsing behavior pattern of the user is determined.
- transition information composed of the functions of the viewed pages (for example, search, product list, purchase, etc.)
- the browsing behavior pattern of the user is determined.
- Patent Document 1 There is a technique for predicting or presenting users with similar browsing behavior patterns.
- the present invention solves the above-described problem and accurately estimates which user the user has visited in the past even when the number of pages viewed by the user and the number of links included in the page are small. Is an issue.
- the present invention provides a page transition order on the user's website as a feature amount of the user's page browsing from data representing a request to the user's website to be learned. And at least one of transition time intervals to each page, and from the data representing a request to the website by any user, which is an estimation target, as the feature amount of the user's page browsing, the web
- An extraction unit that extracts at least one of the page transition order on the site and the transition time interval to each page, and learns the page browsing feature amount of each user to be learned, extracted by the extraction unit
- the present invention even if the number of pages viewed by a user and the number of links included in the page are small, it is possible to accurately estimate which user the user has visited in the past.
- FIG. 1 is a functional block diagram of the user estimation device.
- FIG. 2 is a diagram illustrating an example of data received by the input unit.
- FIG. 3 is a diagram illustrating an example of session information constructed by the session information construction unit.
- FIG. 4 is a diagram illustrating an example of a conversion table used by the feature amount extraction unit.
- FIG. 5 is a diagram illustrating an example of the feature amount of the user's page browsing extracted by the feature amount extraction unit.
- FIG. 6 is a flowchart illustrating a processing procedure when the user estimation device creates a model.
- FIG. 7 is a flowchart illustrating a processing procedure when the user estimation device estimates a user.
- FIG. 8 is a diagram for explaining truncation by hierarchy in the input unit.
- FIG. 9 is a diagram for explaining replacement by a regular expression in the input unit.
- FIG. 10 is a diagram illustrating a computer that executes a user estimation program.
- the user estimation device 10 of each embodiment uses the feature amount (for example, page transition order and each page) of each user on the website from data (for example, an access log) representing a request to the website. Time required for transition to a page) is extracted, and a profile (model) indicating the characteristics of each user's page browsing is created. And if the user estimation apparatus 10 receives the estimation object data (data representing a request to any user's website), the user's behavior feature amount indicated in the data and the above model are obtained. Referring to this, it is estimated which user the user is.
- data for example, an access log
- the user estimation device 10 As shown in FIG. 1, the user estimation device 10 according to the first embodiment includes an extraction unit 11, a learning unit 12, a model storage unit 13, an estimation unit 14, and an output unit 15.
- the extraction unit 11 When the extraction unit 11 receives data representing a request to the website by the user, the extraction unit 11 extracts the feature amount of the user's page browsing based on the URL (Uniform Resource Locator) of the request destination indicated in the data. Specifically, the extraction unit 11 extracts the page transition order on the user's website and the time required for the transition to each page as the feature amount of the user's page browsing. Note that the extraction unit 11 extracts both the feature quantity of the user's page browsing for creating a model and the feature quantity of the user's page browsing for user estimation. The extraction unit 11 outputs the extracted feature amount to the learning unit 12 when data for model creation is received, and outputs the extracted feature amount to the estimation unit 14 when data for a user's estimation target is received. To do.
- the output destination switching in the extraction unit 11 adds a flag indicating whether the data is data for model creation (data to be learned) or data to be estimated by the user to the input data.
- the determination may be made based on the flag, or the user estimation device 10 may have two states of the learning mode and the estimation mode, and the mode is switched manually or automatically by a time zone or the like.
- the generated data may be determined as data for model creation, and the data input in the estimation mode may be determined as estimation target data.
- the method of giving input data to the extraction unit 11 may be either batch processing in which a plurality of lines of data are collectively delivered, or real-time processing in which data is delivered line by line and processed each time.
- batch processing will be described as an example, but real-time processing is also possible.
- the extraction unit 11 includes an input unit 111, a session information construction unit 112, and a feature amount extraction unit 113.
- the input unit 111 receives data representing a request from a user to a website.
- the data representing this request is, for example, a captured packet of communication contents between the user's terminal device (not shown) and the server (the server providing the website), an access log of the server, etc. (see FIG. 2). ).
- the data representing this request may be in any format as long as it includes information such as time, URL path of request destination, session ID or originating IP address necessary for constructing user session information described later. .
- the session information construction unit 112 constructs session information by picking up a request constituting the session from the data obtained by the input unit 111.
- the criteria for picking up will be described later.
- the session information is information in which the user name (user ID) accessing in the session, the URL of the request destination URL included in the session, and the time are arranged in time series for each session of the website ( (See FIG. 3).
- the URL path included in the session information may include an HTTP (HyperText Transfer Protocol) method (such as GET or POST) or the entire URL (http: // domain / path /).
- a session is a sequence of requests made by the same user from the login of the user (user terminal) to the logout on the website.
- a session made by adding a request made by the same user before or after logging in to the website to the above-mentioned session may be used as a session.
- a series of requests from the same source IP address that are equal to or smaller than a certain threshold time interval may be used as a session.
- the session information illustrated in FIG. 3 represents one session information as one row of a table.
- the session information expression method is not limited to the above table, but may be JSON (JavaScript (registered trademark) Object Notation) or the like.
- a criterion for picking up a request constituting a session but a criterion for picking up a request having the same session ID included in a cookie used in a web application can be considered. Further, it may be a criterion that, among requests coming from the same IP address, a request having a time interval with the immediately preceding request that is equal to or smaller than a predetermined threshold is picked up.
- identifying the user name included in the session information it is possible to obtain from the parameters entered by the user on the login page as used in general web applications, as well as the originating IP address and user. May be used instead of the user name in an environment where is associated.
- the feature amount extraction unit 113 determines the page transition order (page transition order) of the website and the time required for transition to each page as the feature amount of the user's page browsing from the individual session information (see FIG. 3). (Page transition time interval) is extracted.
- the feature quantity extraction unit 113 extracts the page transition order as follows. First, the feature quantity extraction unit 113 associates the URL of the request destination URL included in the session information (see FIG. 3) with the number on a one-to-one basis using the conversion table (see FIG. 4). This conversion table (see FIG. 4) may be prepared in advance. If the URL of the request destination URL to be processed is not in the conversion table (see FIG. 4), a new number is assigned and added. It may be a conversion table updated each time.
- the feature quantity extraction unit 113 replaces the sequence including the URL path of the request destination included in the session information (see FIG. 3) with a number character string. For example, the feature quantity extraction unit 113 obtains “# 1 # 2 # 3...” As a character string corresponding to the session information of item number 1 shown in FIG. Also, the feature quantity extraction unit 113 obtains “# 4 # 1 # 5...” As a character string corresponding to the session information of item number 2 shown in FIG.
- the feature amount extraction unit 113 may extract the character string obtained as described above as the page transition order as it is, or a pair of adjacent requests in session information (for example, “# 1 # 2 # 3 From “...”, “# 1 # 2, # 2 # 3,...”) May be extracted, and a set of n (n is 2 or more) adjacent requests may be extracted as the page transition order. That is, the feature quantity extraction unit 113 may extract the page transition order by applying a feature quantity extraction method called n-gram in character string processing.
- the feature amount extraction unit 113 extracts a page transition time interval from each request time in the session information (see FIG. 3). For example, if the feature quantity extraction unit 113 refers to the session information (see FIG. 3) and finds that the time interval between the requests of # 1 and # 2 is 3 seconds, it is referred to as “# 1 # 2: 3”. Get features.
- the page transition time interval the time interval of each request shown in the page transition order may be used as it is. For example, when a plurality of the same page transitions exist in the same session, the time intervals of these page transitions An average value of may be used.
- the feature amount extraction unit 113 when the time interval of each request shown in the page transition order is used as it is as the page transition time interval, the feature amount extraction unit 113, as shown in “when used as it is” in FIG. 5, ⁇ user: user, feature Data of quantity: # 1 # 2: 3, # 2 # 3: 4, # 3 # 1: 3, # 1 # 2: 5,... ⁇ Is extracted as a user's page browsing feature quantity.
- the feature amount extraction unit 113 may extract the time when transition from the transition source page to any page occurs as the feature amount. For example, if the time interval between the # 1 request and the # 2 request is 3 seconds, the feature quantity extraction unit 113 ignores the transition destination # 2 and extracts the feature quantity “# 1: 3”. To do. This represents a feature amount “transition from page # 1 to any page in 3 seconds”. For example, the feature amount extraction unit 113 uses the data ⁇ user: user, feature amount: # 1: 3, # 2: 4, # 3: 3, # 1: 5, ... ⁇ as the feature amount of the user's page browsing. Extract.
- the time when the page transitioned from the transition source page to any one of the pages may be used as it is, or when there are multiple same page transitions in the same session, the time of these page transitions An average value of intervals may be used.
- the feature amount extraction unit 113 may extract the time when transition from any page to the transition destination page is performed as the feature amount. For example, if the time interval between the # 1 request and the # 2 request is 3 seconds, the feature quantity extraction unit 113 ignores the transition source # 1 and extracts the feature quantity “# 2: 3” To do. This represents a feature amount “transition from any page to page # 2 in 3 seconds”. For example, the feature quantity extraction unit 113 uses ⁇ user: user, feature quantity: # 2: 3, # 3: 4, # 1: 3, # 2: 5, ... ⁇ as the feature quantity of the user's page browsing. Extract. Also in this case, as the feature quantity, the transition time from any page to the transition destination page may be used as it is, or when there are multiple same page transitions in the same session, the time of these page transitions An average value of intervals may be used.
- the feature amount extraction unit 113 When the data received by the input unit 111 is data for creating the model, the feature amount extraction unit 113 outputs the extracted feature amount of the user's page browsing to the learning unit 12. When the data received by the input unit 111 is data to be estimated by the user, the feature amount extraction unit 113 outputs the extracted feature amount of the user's page browsing to the estimation unit 14.
- the output and timing of the feature amount by the feature amount extraction unit 113 may output all the feature amounts obtained from the entire session at the time when the session to the website is terminated by the user logout or the like. May be output one by one each time the page transition occurs and the page browsing feature amount increases. Further, the timing for outputting the feature amount may be determined by a timer, the number of extracted feature amounts, the data amount, or the like.
- the learning unit 12 When the learning unit 12 obtains the user's page browsing feature amount output from the feature amount extraction unit 113, the learning unit 12 creates a profile (model) indicating the page browsing feature of each user according to the machine learning algorithm, and stores the model storage. Store in the unit 13.
- the machine learning algorithm used by the learning unit 12 is implemented by an existing machine learning library such as Jubatus (http://jubat.us/) or scikit-learn (http://scikit-learn.org/). Any method may be used.
- the learning unit 12 may use a multi-class classifier that labels which user has a certain feature amount among a plurality of users, or a multi-label classifier that allows a plurality of labels at the time of labeling. May be used.
- a multi-class classifier or a multi-label classifier may be configured by arranging a plurality of binary classifiers.
- an anomaly detector may be used instead of the classifier. When an abnormality detector is used, it is interpreted that the labeling is performed by a user who is not determined to be abnormal.
- the model storage unit 13 stores the model created by the learning unit 12.
- the estimation unit 14 When the estimation unit 14 obtains the user's page browsing feature amount (that is, the estimation target feature amount) output from the feature amount extraction unit 113, the estimation unit 14 uses the model stored in the model storage unit 13 according to the machine learning algorithm. Thus, the user of the feature amount is estimated. The estimation unit 14 outputs the user estimation result to the output unit 15.
- the output unit 15 outputs the user estimation result output from the estimation unit 14 to an external device or the like.
- the session information construction unit 112 constructs session information (see FIG. 3) (see FIG. 3). S2).
- the feature amount extraction unit 113 extracts the feature amount of the user's page browsing from the session information (see FIG. 3) constructed in S2 (S3).
- the learning unit 12 creates a profile (model) indicating the page browsing feature of each user using the user's page browsing feature amount extracted in S3 (S4: model creation). The learning unit 12 stores the created model in the model storage unit 13.
- the session information construction unit 112 constructs session information (see FIG. 3) (S12).
- the feature quantity extraction unit 113 extracts the feature quantity of the user's page browsing from the session information (see FIG. 3) constructed in S12 (S13).
- the estimation unit 14 estimates the user with reference to the feature amount of the user's page browsing extracted in S13 and the model stored in the model storage unit 13 (S14). And the estimation part 14 outputs a user's estimation result via the output part 15 (S15).
- the user estimation device 10 extracts the page transition order on the user's website and the transition time interval to each page as the feature amount of the user's page browsing. It is also possible to extract one of the transition order of and the transition time interval to each page.
- the user estimation apparatus 10 of 2nd Embodiment is demonstrated.
- the same configurations as those of the above-described embodiment are denoted by the same reference numerals and description thereof is omitted.
- the feature amount extraction unit 113 uses at least one of the session start page, the number of unique pages included in the session, and the session length. One or more are extracted as the feature amount of the user's page browsing.
- the session start page refers to the path of the request destination URL included at the beginning of the session information.
- the feature amount extraction unit 113 converts “start page: /index.html” of request 1 or a value after conversion according to the conversion table (see FIG. 4) ( Start page: # 1) is extracted as a feature value.
- the number of unique pages included in the session refers to the number of unique request URL paths included in the session information. For example, if the conversion sequence (see Fig. 4) shows that the sequence of requests replaced with the number string is "# 1 # 2 # 1 # 3 # 2", the included unique requests are # 1, # 2, Since # 3 is three, “number of unique pages: 3” is extracted as a feature amount.
- the session length refers to the number of requests included in the session information (see FIG. 3). For example, when the request sequence replaced with the character string of the number is “# 1 # 2 # 3 # 4” according to the conversion table (see FIG. 4), four requests are included. Extracts “session length: 4” as a feature quantity.
- the user estimation device 10 extracts the session start page, the number of unique pages included in the session, the session length, and the like as the feature amount of the user's page browsing, and creates the model and estimates the user. Can be estimated with higher accuracy.
- the user estimation apparatus 10 of 3rd Embodiment is demonstrated.
- the same configurations as those of the above-described embodiment are denoted by the same reference numerals and description thereof is omitted.
- the user estimation device 10 according to the third embodiment is characterized in that the input unit 111 selects significant data from input data and passes it to the session information construction unit 112.
- a server access log includes a request for acquiring an image or JavaScript (registered trademark) included in the page in addition to a page request. These requests are automatically made by the browser and do not directly reflect user behavior. Therefore, the input unit 111 of the user estimation device 10 according to the third embodiment removes a request corresponding to a predesignated pattern from the data representing the input request to the website and passes it to the session information construction unit 112. .
- JavaScript registered trademark
- the pattern designation method is, for example, designation by regular expressions, but other methods may be used.
- the input unit 111 may remove a request whose time interval with the immediately previous request is equal to or less than a predetermined threshold in the same session and pass it to the session information construction unit 112.
- the input unit 111 excludes a request for obtaining a figure (JPEG, PNG, GIF), JavaScript (registered trademark), CSS, or a request whose time interval with a previous request is equal to or less than a predetermined threshold in the same session.
- the request is passed to the session information construction unit 112.
- the feature amount extraction unit 113 acquires a request for acquiring an image, JavaScript (registered trademark), and CSS included in the request destination page, and a request whose time interval with the immediately preceding request is equal to or less than a predetermined threshold in the same session. Based on the excluded request, the feature amount of the user's page browsing is extracted.
- the user estimation device 10 is based on a request that excludes a request that is likely to be automatically performed by the browser (that is, a request that is likely to represent a user's direct behavior). Since the user's page browsing feature amount is extracted and the user is estimated, the user's estimation accuracy can be further improved.
- the user estimation apparatus 10 of 4th Embodiment is demonstrated.
- the same configurations as those of the above-described embodiment are denoted by the same reference numerals and description thereof is omitted.
- the user estimation device 10 according to the fourth embodiment is characterized in that the input unit 111 abstracts the URL path of the request destination of the input data and passes it to the session information construction unit 112.
- the input unit 111 aborts a hierarchy delimited by “/” in the URL path of the request destination of input data at a predetermined hierarchy. For example, as shown in FIG. 8, the input unit 111 deletes the portion after the third layer of the URL path (the underlined portion in FIG. 8). Then, the input unit 111 passes to the session information construction unit 112 a request in which the hierarchy delimited by “/” in the URL path of the request destination of input data is terminated at a predetermined hierarchy. Thereby, the feature quantity extraction unit 113 extracts the feature quantity of the user's page browsing based on the path up to a predetermined hierarchy among the hierarchies separated by “/” of the URL of the request destination URL.
- the user estimation device 10 can estimate the user based on the user's page browsing feature amount in the hierarchy (directory) desired by the user of the user estimation device 10 among the contents of the website. it can.
- the URL path of this website is configured as “/ category / date / time / article”.
- the user estimation device 10 terminates at “/ category /” in the first layer of the request destination URL, the user can be estimated based on the feature amount of the user's page browsing focusing on the category.
- the input unit 111 replaces the request destination URL of the input data with a regular expression pattern designated in advance. For example, when the input unit 111 includes a three-digit number or more in the request destination URL, the number is replaced with “% NUM”. In this case, the input unit 111 specifies “before replacement:“ [0-9] ⁇ 3, ⁇ ”after replacement:“% NUM ””, and, as shown in FIG. Replace the numeric part (underlined part in FIG. 9) with “% NUM”. Then, the input unit 111 passes the request replaced as described above to the session information construction unit 112. Thereby, when a unique ID is assigned to each session in the request destination URL, the feature amount extraction unit 113 replaces the ID part with a regular expression such as “% NUM” based on the URL of the user. Extract browsing features.
- the user estimation device 10 selects a URL that differs only in the ID of the accessed URL. It can be treated as the same URL, and the feature amount of the user's page browsing can be extracted. As a result, the user estimation device 10 can further improve the user's estimation accuracy.
- program (program)
- the program which described the process which the user estimation apparatus 10 which concerns on the said embodiment performs with the language which can be performed by a computer can also be created and executed.
- the same effect as the above-described embodiment can be obtained by the computer executing the program.
- such a program may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer and executed to execute the same processing as in the above embodiment.
- An example of a computer that executes a user estimation program that implements the same function as the system will be described below.
- FIG. 10 is a diagram illustrating a computer that executes a user estimation program.
- a computer 1000 includes, for example, a memory 1010, a CPU (Central Processing Unit) 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network. Interface 1070. These units are connected by a bus 1080.
- the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012.
- the ROM 1011 stores a boot program such as BIOS (Basic Input Output System).
- BIOS Basic Input Output System
- the hard disk drive interface 1030 is connected to the hard disk drive 1090.
- the disk drive interface 1040 is connected to the disk drive 1100.
- a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100, for example.
- a mouse 1110 and a keyboard 1120 are connected to the serial port interface 1050.
- a display 1130 is connected to the video adapter 1060.
- the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094.
- the information described in the above embodiment is stored in, for example, the hard disk drive 1090 or the memory 1010.
- the user estimation program is stored in the hard disk drive 1090 as a program module in which a command executed by the computer 1000 is described, for example.
- a program module describing each process executed by the system described in the above embodiment is stored in the hard disk drive 1090.
- data used for information processing by the user estimation program is stored as program data in, for example, the hard disk drive 1090.
- the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the hard disk drive 1090 to the RAM 1012 as necessary, and executes the above-described procedures.
- the program module 1093 and the program data 1094 related to the user estimation program are not limited to being stored in the hard disk drive 1090.
- the program module 1093 and the program data 1094 are stored in a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. May be issued.
- the program module 1093 and the program data 1094 related to the user estimation program are stored in another computer connected via a network such as a LAN (Local Area Network) or a WAN (Wide Area Network), and via the network interface 1070. May be read by the CPU 1020.
Abstract
Description
第1の実施形態のユーザ推定装置10は、図1に示すように、抽出部11と、学習部12と、モデル記憶部13と、推定部14と、出力部15とを備える。
次に、第2の実施形態のユーザ推定装置10を説明する。前記した実施形態と同じ構成は、同じ符号を付して説明を省略する。第2の実施形態のユーザ推定装置10は、モデルの作成時およびユーザの推定時に、特徴量抽出部113において、セッションの開始ページ、セッションに含まれるユニークページ数、および、セッション長の少なくともいずれか1つ以上をユーザのページ閲覧の特徴量として抽出する。
次に、第3の実施形態のユーザ推定装置10を説明する。前記した実施形態と同じ構成は、同じ符号を付して説明を省略する。第3の実施形態のユーザ推定装置10は、入力部111において、入力データから有意なものを選別してセッション情報構築部112に渡すことを特徴とする。
次に、第4の実施形態のユーザ推定装置10を説明する。前記した実施形態と同じ構成は、同じ符号を付して説明を省略する。第4の実施形態のユーザ推定装置10は、入力部111において、入力データのリクエスト先のURLのパスを抽象化してセッション情報構築部112に渡すことを特徴とする。
入力部111は、入力データのリクエスト先のURLのパスの「/」で区切られた階層を予め定めた階層で打ち切る。例えば、入力部111は、図8に示すようにURLのパスの3階層目より後ろ(図8の下線部)を削除する。そして、入力部111は、入力データのリクエスト先のURLのパスの「/」で区切られた階層を、予め定めた階層で打ち切ったリクエストをセッション情報構築部112に渡す。これにより、特徴量抽出部113は、リクエスト先のURLのパスの「/」で区切られた階層のうち、予め定めた階層までのパスに基づき、ユーザのページ閲覧の特徴量を抽出する。
入力部111は、入力データのリクエスト先のURLを予め指定した正規表現のパターンで置換する。例えば、入力部111が、リクエスト先のURLに3桁以上の数字が含まれていた場合、当該数字を「%NUM」に置換する。この場合、入力部111は、「置換前:”[0-9]{3,}”置換後:”%NUM”」のように指定し、図9に示すように、リクエスト先のURLの3桁以上の数字の部分(図9の下線部)を「%NUM」に置換する。そして、入力部111は、上記のようにして置換したリクエストをセッション情報構築部112に渡す。これにより、特徴量抽出部113は、リクエスト先のURLにセッションごとに固有のIDが付与されている場合、このID部分を「%NUM」等の正規表現に置換したURLに基づき、ユーザのページ閲覧の特徴量を抽出する。
また、上記実施形態に係るユーザ推定装置10が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成し、実行することもできる。この場合、コンピュータがプログラムを実行することにより、上記実施形態と同様の効果を得ることができる。さらに、かかるプログラムをコンピュータに読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータに読み込ませて実行することにより上記実施形態と同様の処理を実現してもよい。以下に、システムと同様の機能を実現するユーザ推定プログラムを実行するコンピュータの一例を説明する。
11 抽出部
12 学習部
13 モデル記憶部
14 推定部
15 出力部
111 入力部
112 セッション情報構築部
113 特徴量抽出部
Claims (7)
- 学習対象となる、ユーザのウェブサイトへのリクエストを表すデータから、前記ユーザのページ閲覧の特徴量として、前記ユーザのウェブサイト上におけるページの遷移順序および各ページへの遷移時間間隔の少なくともいずれかを抽出し、また、推定対象となる、いずれかのユーザによるウェブサイトへのリクエストを表すデータから、当該ユーザのページ閲覧の特徴量として、前記ウェブサイト上におけるページの遷移順序および各ページへの遷移時間間隔の少なくともいずれかを抽出する抽出部と、
前記抽出部により抽出された、学習対象となる、ユーザそれぞれのページ閲覧の特徴量を学習することにより、前記ユーザごとのページ閲覧の特徴を示すモデルを作成する学習部と、
前記抽出部により抽出された、推定対象となる、前記ユーザのページ閲覧の特徴量と、前記モデルとを参照して、前記ユーザがどのユーザかを推定する推定部とを備えることを特徴とするユーザ推定装置。 - 前記抽出部は、前記ユーザのページ閲覧の特徴量として、さらに、前記ウェブサイト上での閲覧の開始ページ、前記閲覧のユニークページ数、および、前記ウェブサイトの閲覧に要したセッション長の少なくともいずれか1つ以上を抽出することを特徴とする請求項1に記載のユーザ推定装置。
- 前記抽出部は、前記ウェブサイトへのリクエストを表すデータに、画像、JavaScript(登録商標)、または、CSS(Cascading Style Sheets)を取得するリクエストを表す第2のデータが含まれていた場合、前記第2のデータを除外したデータから、前記ユーザのページ閲覧の特徴量を抽出することを特徴とする請求項1に記載のユーザ推定装置。
- 前記抽出部は、前記リクエストを表すデータにおけるリクエスト先のURL(Uniform Resource Locator)のうち、所定階層までのURLに基づき、前記ユーザのページ閲覧の特徴量を抽出することを特徴とする請求項1に記載のユーザ推定装置。
- 前記抽出部は、前記リクエストを表すデータにおけるリクエスト先のURLにセッションごとに固有のIDが含まれている場合、前記ID部分以外のURLに基づき、前記ユーザのページ閲覧の特徴量を抽出することを特徴とする請求項1に記載のユーザ推定装置。
- 学習対象となる、ユーザのウェブサイトへのリクエストを表すデータから、前記ユーザのページ閲覧の特徴量として、前記ユーザのウェブサイト上におけるページの遷移順序および各ページへの遷移時間間隔の少なくともいずれかを抽出する第1の抽出ステップと、
前記第1の抽出ステップにより抽出されたユーザそれぞれのページ閲覧の特徴量を学習することにより、前記ユーザごとのページ閲覧の特徴を示すモデルを作成する学習ステップと、
推定対象となる、いずれかのユーザによる前記ウェブサイトへのリクエストを表すデータから、当該ユーザのページ閲覧の特徴量として、前記ウェブサイト上におけるページの遷移順序および各ページへの遷移時間間隔の少なくともいずれかを抽出する第2の抽出ステップと、
前記第2の抽出ステップにより抽出された前記ユーザのページ閲覧の特徴量と、前記モデルとを参照して、前記ユーザがどのユーザかを推定する推定ステップとを含んだことを特徴とするユーザ推定方法。 - 学習対象となる、ユーザのウェブサイトへのリクエストを表すデータから、前記ユーザのページ閲覧の特徴量として、前記ユーザのウェブサイト上におけるページの遷移順序および各ページへの遷移時間間隔の少なくともいずれかを抽出する第1の抽出ステップと、
前記第1の抽出ステップにより抽出されたユーザそれぞれのページ閲覧の特徴量を学習することにより、前記ユーザごとのページ閲覧の特徴を示すモデルを作成する学習ステップと、
推定対象となる、いずれかのユーザによる前記ウェブサイトへのリクエストを表すデータから、当該ユーザのページ閲覧の特徴量として、前記ウェブサイト上におけるページの遷移順序および各ページへの遷移時間間隔の少なくともいずれかを抽出する第2の抽出ステップと、
前記第2の抽出ステップにより抽出された前記ユーザのページ閲覧の特徴量と、前記モデルとを参照して、前記ユーザがどのユーザかを推定する推定ステップとをコンピュータに実行させることを特徴とするユーザ推定プログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/578,799 US10860669B2 (en) | 2015-06-05 | 2016-06-02 | User estimation apparatus, user estimation method, and user estimation program |
JP2017522234A JP6423529B2 (ja) | 2015-06-05 | 2016-06-02 | ユーザ推定装置、ユーザ推定方法、および、ユーザ推定プログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015-114983 | 2015-06-05 | ||
JP2015114983 | 2015-06-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016194996A1 true WO2016194996A1 (ja) | 2016-12-08 |
Family
ID=57441270
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2016/066344 WO2016194996A1 (ja) | 2015-06-05 | 2016-06-02 | ユーザ推定装置、ユーザ推定方法、および、ユーザ推定プログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US10860669B2 (ja) |
JP (1) | JP6423529B2 (ja) |
WO (1) | WO2016194996A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019067056A (ja) * | 2017-09-29 | 2019-04-25 | 富士通株式会社 | メッセージ出力制御方法、メッセージ出力制御プログラム、およびメッセージ出力制御装置 |
JP2020126550A (ja) * | 2019-02-06 | 2020-08-20 | ヤフー株式会社 | 情報処理装置、情報処理方法及び情報処理プログラム |
JP2021128553A (ja) * | 2020-02-13 | 2021-09-02 | ヤフー株式会社 | 情報処理装置、情報処理方法および情報処理プログラム |
JP2022521136A (ja) * | 2019-03-25 | 2022-04-06 | ボンワイズインコーポレイテッド | 歯の骨年齢を決定するための装置、方法及び命令を記録した記録媒体 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006127321A (ja) * | 2004-10-29 | 2006-05-18 | Solid Technology Kk | 端末属性後付装置および端末属性後付方法 |
JP2014115952A (ja) * | 2012-12-12 | 2014-06-26 | Nippon Telegr & Teleph Corp <Ntt> | 興味分野比較分析装置及びシステム及び方法及びプログラム |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8756342B1 (en) * | 2000-02-07 | 2014-06-17 | Parallel Networks, Llc | Method and apparatus for content synchronization |
US7698422B2 (en) * | 2007-09-10 | 2010-04-13 | Specific Media, Inc. | System and method of determining user demographic profiles of anonymous users |
US8255384B2 (en) * | 2009-09-30 | 2012-08-28 | Fujitsu Limited | Client-tier validation of dynamic web applications |
US8635334B2 (en) * | 2009-12-10 | 2014-01-21 | Riverbed Technology, Inc. | Web transaction analysis |
US20110191664A1 (en) * | 2010-02-04 | 2011-08-04 | At&T Intellectual Property I, L.P. | Systems for and methods for detecting url web tracking and consumer opt-out cookies |
US9665703B2 (en) * | 2010-11-29 | 2017-05-30 | Biocatch Ltd. | Device, system, and method of detecting user identity based on inter-page and intra-page navigation patterns |
US8566866B1 (en) * | 2012-05-09 | 2013-10-22 | Bluefin Labs, Inc. | Web identity to social media identity correlation |
JP2014106661A (ja) | 2012-11-27 | 2014-06-09 | Nippon Telegr & Teleph Corp <Ntt> | ユーザ状態予測装置及び方法及びプログラム |
-
2016
- 2016-06-02 WO PCT/JP2016/066344 patent/WO2016194996A1/ja active Application Filing
- 2016-06-02 US US15/578,799 patent/US10860669B2/en active Active
- 2016-06-02 JP JP2017522234A patent/JP6423529B2/ja active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006127321A (ja) * | 2004-10-29 | 2006-05-18 | Solid Technology Kk | 端末属性後付装置および端末属性後付方法 |
JP2014115952A (ja) * | 2012-12-12 | 2014-06-26 | Nippon Telegr & Teleph Corp <Ntt> | 興味分野比較分析装置及びシステム及び方法及びプログラム |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019067056A (ja) * | 2017-09-29 | 2019-04-25 | 富士通株式会社 | メッセージ出力制御方法、メッセージ出力制御プログラム、およびメッセージ出力制御装置 |
JP2020126550A (ja) * | 2019-02-06 | 2020-08-20 | ヤフー株式会社 | 情報処理装置、情報処理方法及び情報処理プログラム |
JP2022521136A (ja) * | 2019-03-25 | 2022-04-06 | ボンワイズインコーポレイテッド | 歯の骨年齢を決定するための装置、方法及び命令を記録した記録媒体 |
JP7202739B2 (ja) | 2019-03-25 | 2023-01-12 | ボンワイズインコーポレイテッド | 歯の骨年齢を決定するための装置、方法及び命令を記録した記録媒体 |
US11961235B2 (en) | 2019-03-25 | 2024-04-16 | Bonewise Inc. | Apparatus, method and recording medium storing instructions for determining bone age of teeth |
JP2021128553A (ja) * | 2020-02-13 | 2021-09-02 | ヤフー株式会社 | 情報処理装置、情報処理方法および情報処理プログラム |
JP7145901B2 (ja) | 2020-02-13 | 2022-10-03 | ヤフー株式会社 | 情報処理装置、情報処理方法および情報処理プログラム |
Also Published As
Publication number | Publication date |
---|---|
US10860669B2 (en) | 2020-12-08 |
US20180165369A1 (en) | 2018-06-14 |
JPWO2016194996A1 (ja) | 2017-11-09 |
JP6423529B2 (ja) | 2018-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8429177B2 (en) | Using exceptional changes in webgraph snapshots over time for internet entity marking | |
JP6423529B2 (ja) | ユーザ推定装置、ユーザ推定方法、および、ユーザ推定プログラム | |
US9300755B2 (en) | System and method for determining information reliability | |
US20080034279A1 (en) | Aggregate tag views of website information | |
CN102436564A (zh) | 一种识别被篡改网页的方法及装置 | |
JP2012529688A (ja) | 更新通知方法、およびシステム | |
JP5493845B2 (ja) | 検索支援プログラム、検索支援装置、及び検索支援方法 | |
US10311120B2 (en) | Method and apparatus for identifying webpage type | |
CN104899219B (zh) | 伪静态url的筛除方法、系统及网页爬取方法、系统 | |
WO2014194689A1 (en) | Method, server, browser, and system for recommending text information | |
JP2011022705A (ja) | 証跡管理方法、システム、及びプログラム | |
JP5178219B2 (ja) | アクセス解析装置及びアクセス解析方法及びアクセス解析プログラム | |
CN103793508B (zh) | 一种加载推荐信息、网址检测的方法、装置和系统 | |
CN104951566B (zh) | 一种关键词搜索排名确定方法及装置 | |
CN108874870A (zh) | 一种数据抽取方法、设备及计算机可存储介质 | |
CN108280102A (zh) | 上网行为记录方法、装置及用户终端 | |
JP2009301334A (ja) | ネットワーク行動を分析する情報処理装置、分析システム、ネットワーク行動の分析方法およびプログラム | |
JP2016062345A (ja) | ターゲティング広告配信装置、方法及びプログラム | |
CN104572874B (zh) | 一种网页信息的抽取方法及装置 | |
Bhat et al. | Browser simulation-based crawler for online social network profile extraction | |
JP5216654B2 (ja) | 重要度判定装置、重要度判定方法、およびプログラム | |
CN104008190B (zh) | 一种爬虫系统及其方法 | |
JP2015001795A (ja) | 性格分析装置および性格分析用プログラム | |
US20170177590A1 (en) | Natural classification of content using unsupervised learning | |
JP5183762B2 (ja) | 更新部分再掲載装置及び更新部分再掲載方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16803430 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2017522234 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15578799 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16803430 Country of ref document: EP Kind code of ref document: A1 |