JP6703621B2

JP6703621B2 - How to associate your domain name with website access

Info

Publication number: JP6703621B2
Application number: JP2018554480A
Authority: JP
Inventors: ダーシュンジャン
Original assignee: Shanghai Yamu Communication Technology Co Ltd
Current assignee: Shanghai Yamu Communication Technology Co Ltd
Priority date: 2016-04-14
Filing date: 2016-08-17
Publication date: 2020-06-03
Anticipated expiration: 2036-08-17
Also published as: JP2019514137A; WO2017177590A1; RU2709647C1; GB2567749A; RU2709647C9; CN105763633B; CN105763633A

Description

本発明は、インターネットＤＮＳドメイン名解決の分野及びウェブクローラー技術に関し、特にドメイン名とウェブサイトアクセス行為との関連付け方法に関する。 The present invention relates to the field of Internet DNS domain name resolution and web crawler technology, and more particularly to a method for associating a domain name with a website access act.

ＤＮＳ（ＤｏｍａｉｎＮａｍｅＳｙｓｔｅｍ、ドメイン名システム）は、インターネットにおいてドメイン名とＩＰアドレスとして互いにマッピングする分散型データベースであり、ユーザーがマシンによって直接読み取られたＩＰ数字列を覚える必要がなく、より便利にインターネットにアクセスすることを可能にする。「ＤＮＳドメイン名解決技術」とは、ユーザーがウェブサイトにアクセスすると、ブラウザにこのサイトのドメイン名を入力する必要があり、リターンキーを押すと、ブラウザは、まず、ＤＮＳリクエストをして、ＤＮＳ技術により、ブラウザはこのドメイン名に対応するサーバＩＰアドレスを取得し、その後に、このＩＰアドレスにＨＴＴＰリクエストをすることができることである。 The Domain Name System (DNS) is a distributed database that maps domain names and IP addresses to each other on the Internet, making it more convenient and convenient for users to access the Internet without having to remember the IP digit strings read directly by their machines. To be able to access. "DNS domain name resolution technology" means that when a user accesses a website, the domain name of this site must be entered in the browser, and when the return key is pressed, the browser first makes a DNS request and then the DNS The technique is that the browser can obtain the server IP address corresponding to this domain name and then make an HTTP request to this IP address.

ウェブクローラー技術は、一定のルールに応じて、ワールドワイドウェブ情報を自動的に収集するプログラム又はスクリプトである。それはユーザーをシミュレートしてウェブサイトにＨＴＴＰリクエストをして、かつ該過程において生成したＤＮＳリクエストを記録する。 Web crawler technology is a program or script that automatically collects World Wide Web information according to certain rules. It simulates a user, makes an HTTP request to a website, and records the DNS request generated in the process.

ＤＮＳデータの価値は、その分重視されず、ＩＰとドメイン名との対応関係のみと考えられるため、現在市場ではＤＮＳデータにより対応する関連付けを行っていない。 The value of the DNS data is not emphasized to that extent, and is considered to be only the correspondence relationship between the IP and the domain name. Therefore, the market is currently not associated with the corresponding DNS data.

本発明は、ＤＮＳログ収集とウェブクローラー技術を結合して、ＤＮＳログによりユーザーのインターネット閲覧行為を分析することもできるドメイン名とウェブサイトアクセス行為との関連付け方法を提供する。 The present invention provides a method of associating a domain name with a website access behavior, which can analyze DNS browsing behavior of a user's Internet by combining DNS log collection and web crawler technology.

本発明のドメイン名とウェブサイトアクセス行為との関連付け方法は、コンピュータープログラムにより実行されるドメイン名とウェブサイトアクセス行為との関連付け方法であって、クローラーによりユーザーのサイトアクセス行為をシミュレートし、今回のＨＴＴＰリクエストにおける全てのＤＮＳドメイン名リクエスト、すなわちキャプチャしたＤＮＳドメイン名リクエストセットを取得するステップＳ１と、ＤＮＳログを分割してｎ（ｎが１以上の整数である）個のドメイン名リクエストセットを取得するステップＳ２と、ステップＳ１でキャプチャしたＤＮＳドメイン名リクエストセット及びステップＳ２で分割したｎ個のドメイン名リクエストセットに対してセット同士とのマッチングを行い、ＤＮＳログから分割した１つのドメイン名リクエストセットが前記キャプチャしたＤＮＳドメイン名リクエストセットに等しいか又は含まれれば、前記ＤＮＳログがユーザーが前記クローラーのキャプチャする時に要求したＵＲＬのドメイン名をクリックしたことを示すと考えられるステップＳ３とを含む。
A method of associating a domain name with a website access act of the present invention is a method of associating a domain name with a website access act executed by a computer program, wherein a crawler simulates a user's site access act, and All DNS domain name requests in the HTTP request of step S1, that is, step S1 of acquiring the captured DNS domain name request set, and dividing the DNS log into n (n is an integer of 1 or more) number of domain name request sets. step S 2 to be acquired it matches the set together for n domain name request set divided by the DNS domain name request set and S2 captured in step S1, 1 single domain name resolved from DNS log If a request set is equal to or included in the captured DNS domain name request set, the DNS log is considered to indicate that the user has clicked on the domain name of the URL requested at the time of the capture of the crawler, step S3. Including.

好ましくは、ステップＳ２では、前記ＤＮＳログは、アクセス行為当日のＤＮＳログである。
好ましくは、ステップＳ２では、前記ＤＮＳログを分割することは、まずソースＩＰに基づいて分割し、その後にタイムスタンプの差に基づいて分割する二回分割を含む。
好ましくは、ソースＩＰに基づいてＤＮＳログを分割することは、ある時間内の同じソースＩＰの連続的なＤＮＳログを取得することである。
好ましくは、前記タイムスタンプの差に基づいてログを分割することは、ソースＩＰに基づいて分割されたログを、さらにＤＮＳログのタイムスタンプの差に基づいて分割し、２つのＤＮＳログのタイムスタンプの差が所定時間の長さよりも大きければ、前記２つのＤＮＳログを分割することである。
好ましくは、前記所定時間の長さは３秒間である。 Preferably, in step S2, the DNS log is a DNS log on the day of the access act.
Preferably, in step S2, the splitting of the DNS log comprises a split twice based on the source IP first and then on the time stamp difference.
Preferably, splitting a DNS log based on source IP is to get consecutive DNS logs for the same source IP within a certain time.
Preferably, dividing the log based on the difference of the time stamps further divides the log divided based on the source IP based on the difference of the time stamps of the DNS logs, and divides the time stamps of the two DNS logs. If the difference is larger than the predetermined length of time, it means to divide the two DNS logs.
Preferably, the length of the predetermined time is 3 seconds.

本発明のドメイン名とサイトアクセス行為との関連付け方法によれば、ＤＮＳログによりユーザーのインターネット閲覧行為に対する分析を実現することもできる。 According to the method of associating the domain name with the site access act of the present invention, it is possible to realize the analysis of the user's Internet browsing act by the DNS log.

図１は、クローラープログラムが収集したＤＮＳドメイン名リクエストセットの概略図である。FIG. 1 is a schematic diagram of a DNS domain name request set collected by the crawler program. 図２は、本発明のドメイン名とウェブサイトアクセス行為との関連付け方法の流れ図である。FIG. 2 is a flowchart of a method of associating a domain name with a website access act according to the present invention.

以下に、図面及び実施例を参照しながら、発明を詳細に説明する。以下の実施例は、本発明を限定するものではない。発明構想の精神及び範囲から逸脱しない場合、当業者が想到し得る変化及び利点はいずれも本発明に含まれる。 Hereinafter, the invention will be described in detail with reference to the drawings and embodiments. The following examples do not limit the invention. All changes and advantages that can be considered by those skilled in the art are included in the present invention without departing from the spirit and scope of the inventive concept.

上述したように、ＤＮＳ（ＤｏｍａｉｎＮａｍｅＳｙｓｔｅｍ、ドメイン名システム）は、インターネットにおいてドメイン名とＩＰアドレスとして互いにマッピングする分散型データベースであり、ユーザーがマシンによって直接読み取られたＩＰ数字列を覚える必要がなく、より便利にインターネットにアクセスすることを可能にする。ユーザーがサイトにアクセスすると、まず、ブラウザにこのサイトのドメイン名を入力し、リターンキーを押すと、ブラウザは、まず、ＤＮＳリクエストをして、ＤＮＳ技術により、ブラウザはこのドメイン名に対応するサーバＩＰアドレスを取得し、その後に、このＩＰアドレスにＨＴＴＰリクエストをすることができる。それは、ＤＮＳドメイン名解決技術である。 As mentioned above, DNS (Domain Name System, Domain Name System) is a distributed database that maps domain names and IP addresses to each other on the Internet, without the need for the user to remember the IP number strings read directly by the machine. Allows you to access the Internet more conveniently. When a user visits a site, first enter the domain name of this site in the browser and press the return key. The browser first makes a DNS request, and the DNS technology causes the browser to respond to the server corresponding to this domain name. An IP address can be obtained and then an HTTP request can be made to this IP address. It is a DNS domain name resolution technology.

上記ドメイン名解決の過程において、ＤＮＳログを生成する。ＤＮＳログは、毎回のＤＮＳリクエストの応答コンテンツを記録し、ほとんどユーザーによって要求された全てのドメイン名情報を記録することができる。ＤＮＳログのフォーマットは以下のとおりである。
１４．＊＊＊．＊＊＊．１０｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
ソースＩＰ｜ドメイン名｜タイムスタンプ｜解決したＩＰ｜状態コード In the domain name resolution process, a DNS log is generated. The DNS log records the response content of each DNS request and can record almost all domain name information requested by the user. The format of the DNS log is as follows.
14. ***. ***. 10|www. baidu. com|20141211035932|180. ***. ***. 107;180. ***. ***. 108|0
Source IP | domain name | time stamp | resolved IP | status code

すなわち、ＤＮＳログは、「ソースＩＰ」、「ドメイン名」、「タイムスタンプ」、「解決したＩＰ」及び「状態コード」の５つの部分の内容を含む。
以下に、図１を参照しながら、本発明のドメイン名とウェブサイトアクセス行為との関連付け方法を詳細に説明する。 That is, the DNS log includes the contents of five parts, "source IP", "domain name", "time stamp", "solved IP", and "status code".
Hereinafter, a method of associating a domain name with a website access act according to the present invention will be described in detail with reference to FIG.

まず、クローラープログラムによりユーザーのウェブサイトアクセス行為をシミュレートし、今回のＨＴＴＰリクエストにおける全てのＤＮＳドメイン名リクエスト、すなわち収集したＤＮＳドメイン名リクエストセットを取得する（ステップＳ１）。例えば、あるページを開くか又はあるＵＲＬ（リンク）をクリックし、クローラープログラムは、今回のＨＴＴＰリクエストにおける全てのＤＮＳドメイン名リクエストを収集する。あるユーザーがＵＲＬをクリックすると、現在のＵＲＬのドメイン名に加えて、幾つかの他のドメイン名も要求し、クローラー技術により該ＵＲＬをクリックした後に生成した全てのＤＮＳドメイン名リクエストを取得することができる。ここで、ユニフォームリソースロケータ（ＵＲＬ）は、インターネットから取得されたリソースの位置及びアクセス方法の簡潔な表示であり、インターネット上の標準リソースのアドレスである。インターネット上の各ファイルは、いずれも唯一のＵＲＬを有し、それが含む情報は、ファイルの位置及びブラウザがそれをどのように処理するかを示す。 First, the crawler program simulates a user's website access action, and acquires all DNS domain name requests in this HTTP request, that is, the collected DNS domain name request set (step S1). For example, by opening a page or clicking a URL (link), the crawler program collects all DNS domain name requests in this HTTP request. When a user clicks on a URL, it requests some other domain name in addition to the domain name of the current URL, and gets all DNS domain name requests generated by the crawler technology after clicking the URL. You can Here, the uniform resource locator (URL) is a simple display of the location and access method of the resource acquired from the Internet, and is the address of the standard resource on the Internet. Each file on the Internet has a unique URL, and the information it contains indicates the location of the file and how the browser handles it.

例えば、ユーザーは、以下に示すような具体的なＵＲＬ（リンク）、
「ｈｔｔｐ：／／ｂａｉｋｅ．ｂａｉｄｕ．ｃｏｍ／ｌｉｎｋ？ｕｒｌ＝Ｌｍ−ＴｋＫＵｚＶ６８７ＩＲｏＰＣＤＶＵＡＧ５ｑｓｌｇＭｙＺｔＮａ６ｅ６Ａ３ｎＰｎＷＸｏｒｃＸＥＡＩｌ５０Ｏ６ＸＨＺＷｐＴＪａｔ」をクリックする。
クローラープログラムは、該ＵＲＬをクリックした後に生成した全てのＤＮＳドメイン名リクエスト、すなわちＤＮＳドメイン名リクエストセットを収集し、具体的には図１に示す。 For example, the user may specify a specific URL (link) as shown below,
Click " http://baike.baidu.com/link?url=Lm-TkKUzV687IRoPCDVUAG5qslgMyZtNa6e6A3nPnWXorcXEAII50O6XHZWpTJat ".
The crawler program collects all the DNS domain name requests generated after clicking the URL, that is, the DNS domain name request set, and is specifically shown in FIG.

次に、ＤＮＳログを分割してｎ（ｎが１以上の整数である）個のドメイン名リクエストセットを取得する（ステップ２）。ここで、ＤＮＳログは、一般的には、アクセス行為当日のログである。前記分割は、まずソースＩＰに基づいて分割し、その後にタイムスタンプの差に基づいて分割する二回分割を含む。 Next, the DNS log is divided to obtain n (n is an integer of 1 or more) domain name request sets (step 2). Here, the DNS log is generally a log on the day of the access act. The division includes a two-time division in which the source IP is first divided and then the time stamp difference is used.

１）ソースＩＰに基づいてＤＮＳログを分割し、すなわちログのソースＩＰが異なれば、連続的なログを分割する。ソースＩＰに基づく分割は、ある時間内の同じソースＩＰの連続的なＤＮＳログを取得することである。以下のとおりである。
１．１．１．１｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
１．１．１．１｜ｗｗｗ．ｑｑ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−ログ分割線−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
２．２．２．２｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｗｗｗ．ｑｑ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０ 1) Split the DNS log based on the source IP, i.e., split the continuous log if the source IP of the log is different. Source IP based splitting is to get consecutive DNS logs of the same source IP within a certain time. It is as follows.
1.1.1.1| www. baidu. com |20141211035932|180. ***. ***. 107;180. ***. ***. 108|0
1.1.1.1| www. qq. com |20141211035932|180. ***. ***. 107;180. ***. ***. 108|0
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Log division line −−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
2.2.2.2| www. baidu. com |20141211035932|180. ***. ***. 107;180. ***. ***. 108|0
2.2.2.2| www. qq. com |20141211035932|180. ***. ***. 107;180. ***. ***. 108|0

２）タイムスタンプの差に基づく分割は、ソースＩＰに基づいて分割されたログを、さらにＤＮＳログのタイムスタンプの差に基づいて分割することである。２つの連続的なログのタイムスタンプの差が所定時間の長さよりも大きければ、分割される（分割の原因は、ログの時間間隔が長過ぎると２つの異なる行為であると見なされることである）。該所定時間の長さは、必要に応じて調整することができる。本実施例では、前記所定時間の長さは３秒間であり、即ちタイムスタンプの差が３秒間より大きいと分割される。 2) The division based on the time stamp difference is that the log divided based on the source IP is further divided based on the time stamp difference of the DNS log. If the difference between the time stamps of two consecutive logs is greater than a predetermined amount of time, it will be split (the reason for the split is that if the log time interval is too long, it is considered to be two different actions. ). The length of the predetermined time can be adjusted as needed. In this embodiment, the length of the predetermined time is 3 seconds, that is, the time stamp is divided when the difference is larger than 3 seconds.

例えば、ソースＩＰ２．２．２．２のＤＮＳログを、さらにそのタイムスタンプの差に基づいて分割し、以下のとおりである。（タイムスタンプ２０１４１２１１０３５９３２は、２０１４年１２月１１日３時５９分３２秒を示す） For example, the DNS log of the source IP 2.2.2.2 is further divided based on the difference of the time stamps thereof, as follows. (Timestamp 20141211035932 indicates 3:59:32 on December 11, 2014)

ソースＩＰ｜ドメイン名｜タイムスタンプ｜解決したＩＰ｜状態コード
２．２．２．２｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０００００１｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ａ．ｑｑ．ｃｏｍ｜２０１４１２１１０００００２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｂ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０００００３｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｃ．ｔａｎｘ．ｃｏｍ｜２０１４１２１１０００００４｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｃ．ａｌｌｙｅｓ．ｃｏｍ｜２０１４１２１１０００００５｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−ログ分割線−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
２．２．２．２｜ｗｗｗ．ｓｉｎａ．ｃｏｍ｜２０１４１２１１０００００９｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０ Source IP|domain name|timestamp|solved IP|status code 2.2.2.2| www. baidu. com |20141211000001|180. ***. ***. 107;180. ***. ***. 108|0
2.2.2.2|a. qq. com|20141211000002|180. ***. ***. 107;180. ***. ***. 108|0
2.2.2.2| b. baidu. com |20141211000003|180. ***. ***. 107;180. ***. ***. 108|0
2.2.2.2| c. tanx. com |20141211000004|180. ***. ***. 107;180. ***. ***. 108|0
2.2.2.2| c. allies. com |20141211000005|180. ***. ***. 107;180. ***. ***. 108|0
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Log division line −−−−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
2.2.2.2| www. sina. com |20141211000009|180. ***. ***. 107;180. ***. ***. 108|0

上述したように、タイムスタンプ２０１４１２１１０００００５の０５秒と２０１４１２１１０００００９の０９秒の間の差が４秒間（３秒間より大きい）であるため、ログは分割される。
ｗｗｗ．ｂａｉｄｕ．ｃｏｍ、ａ．ｑｑ．ｃｏｍ、ｂ．ｂａｉｄｕ．ｃｏｍ、ｃ．ｔａｎｘ．ｃｏｍ、ｃ．ｔａｎｘ．ｃｏｍは、ＤＮＳログにおける１つのドメイン名リクエストセットである。 As described above, the log is divided because the difference between 05 seconds of the time stamps 20141211000005 and 09 seconds of the 20141211000009 is 4 seconds (more than 3 seconds).
www. baidu. com, a. qq. com, b. baidu. com, c. tanx. com, c. tanx. com is one domain name request set in the DNS log.

続いて、ステップＳ１でクローラーが収集したＤＮＳドメイン名リクエストセット及びステップＳ２におけるＤＮＳログ分割により得られたドメイン名リクエストセットに対してセット同士とのマッチングを行う（ステップＳ３）。マッチングルールは、［（ａ，ｂ，ｃ）＝（ｂ，ｃ，ａ）＝（ａ，ｃ，ｂ）］である。 Subsequently, the DNS domain name request set collected by the crawler in step S1 and the domain name request set obtained by the DNS log division in step S2 are matched with each other (step S3). The matching rule is [(a, b, c)=(b, c, a)=(a, c, b)].

ログをマッチングした後に、ＤＮＳログの１つのドメイン名リクエストセットがクローラーの収集したドメイン名リクエストセットの一部に含まれ、又は２つのセットが同じであれば、該ＤＮＳログは、ユーザーが該ドメイン名（すなわちクローラーが収集する時に要求したＵＲＬのドメイン名）をクリックしたことを示すとみなされる。例えば、
クローラーが収集したＵＲＬは、ｗｗｗ．ａ．ｃｏｍ／ｄｏｃ／１２３４（該ＵＲＬは、あるユーザーのクリック行為である）である。収集した全てのドメイン名リクエストセットＡは、「ｗｗｗ．ａ．ｃｏｍ、ｗｗｗ．ｂ．ｃｏｍ、ｗｗｗ．ｃ．ｃｏｍ、ｗｗｗ．ｄ．ｃｏｍ、ｗｗｗ．ｅ．ｃｏｍ」である。
ＤＮＳログを分割した後にドメイン名リクエストセットＢの一部は、「ｗｗｗ．ａ．ｃｏｍ、ｗｗｗ．ｂ．ｃｏｍ、ｗｗｗ．ｅ．ｃｏｍ、ｗｗｗ．ｄ．ｃｏｍ」である。 After matching the logs, one domain name request set in the DNS log is included as part of the domain name request set collected by the crawler, or if the two sets are the same, the DNS log indicates that the user has It is considered to indicate that you have clicked on the name (ie the domain name of the URL that the crawler requested when collecting). For example,
The URL collected by the crawler is www. a. com/doc/1234 (the URL is a click action of a certain user). All the collected domain name request sets A are “ www.a.com , www.b.com , www.c.com , www.d.com , www.e.com ”.
After splitting the DNS log, part of the domain name request set B is " www.a.com , www.b.com , www.e.com , www.d.com ".

上述のように、ＢセットがＡセット内に含まれると、ドメイン名リクエストセットＢは、ドメイン名セットＡがマッピングしたｗｗｗ．ａ．ｃｏｍ／ｄｏｃ／１２３４というユーザーアクセス行為を反映しているとみなされる。このように、ＤＮＳログによりユーザーのインターネット閲覧行為に対する分析を実現することもできる。 As described above, when the B set is included in the A set, the domain name request set B is converted into the www. a. It is considered to reflect the user access behavior of com/doc/1234 . In this way, the DNS log can also realize the analysis of the user's Internet browsing activity.

以上の記載は本発明の好ましい実施例に過ぎず、本発明を限定するものではない。本発明の出願特許範囲内の内容に基づいて行われるいかなる同等変化や修飾は、いずれも本発明の技術的範囲内に属するべきである。 The above descriptions are merely preferred embodiments of the present invention, and do not limit the present invention. Any equivalent changes or modifications made based on the content within the patent scope of the present application should belong to the technical scope of the present invention.

Claims

A method for associating a website name with a domain name executed by a computer program,
Step S1 of simulating a user's site access action by a crawler, and acquiring all DNS domain name requests in the HTTP request of this time, that is, the captured DNS domain name request set,
Step S 2 of dividing the DNS log to obtain n (n is an integer of 1 or more) domain name request sets,
The DNS domain name request set captured in step S1 and the n domain name request sets divided in step S2 are matched with each other, and one domain name request set divided from the DNS log is the captured DNS. If equal to or included in the domain name request set, the DNS log is considered to indicate that the user has clicked on the domain name of the URL requested at the time of the crawler capture, step S3. How to associate a domain name with a site access act.

The method of associating a domain name with a site access act according to claim 1, wherein in step S2, the DNS log is a DNS log of the day of the access act.

The domain according to claim 1, wherein, in step S2, dividing the DNS log includes dividing the DNS log based on a source IP first and then dividing the DNS log based on a time stamp difference. How to associate a name with a site access act.

The method according to claim 3, wherein dividing the NS log based on the source IP is to obtain consecutive DNS logs of the same source IP within a certain time.

Splitting the log based on the time stamp difference means that the log divided based on the source IP is further divided based on the time stamp difference between the DNS logs so that the time stamp difference between the two DNS logs is The associating method according to claim 4, wherein the two DNS logs are divided if the length of the predetermined time is larger than the predetermined length of time.

The association method according to claim 5, wherein the length of the predetermined time period is 3 seconds.