JP2019514137A

JP2019514137A - How to associate a domain name with a website access activity

Info

Publication number: JP2019514137A
Application number: JP2018554480A
Authority: JP
Inventors: ダーシュンジャン
Original assignee: Shanghai Yamu Communication Technology Co Ltd
Current assignee: Shanghai Yamu Communication Technology Co Ltd
Priority date: 2016-04-14
Filing date: 2016-08-17
Publication date: 2019-05-30
Anticipated expiration: 2036-08-17
Also published as: JP6703621B2; WO2017177590A1; CN105763633B; RU2709647C1; GB2567749A; CN105763633A; RU2709647C9

Abstract

本発明は、クローラープログラムによりユーザーのウェブサイトアクセス行為をシミュレートし、今回のＨＴＴＰリクエストにおける全てのＤＮＳドメイン名リクエスト、すなわち収集したＤＮＳドメイン名リクエストセットを取得するステップＳ１と、ＤＮＳログを分割してｎ（ｎが１以上の整数である）個のドメイン名リクエストセットを取得するステップ２と、ステップＳ１で収集したＤＮＳドメイン名リクエストセット及びステップＳ２におけるＤＮＳログ分割により得られたドメイン名リクエストセットに対してセット同士とのマッチングを行い、ＤＮＳログ分割により得られたドメイン名リクエストセットのうちの１つが前記収集したＤＮＳドメイン名リクエストセットに等しいか又は含まれれば、前記ＤＮＳログがユーザーが前記クローラープログラムが収集する時に要求したＵＲＬのドメイン名をクリックしたことを示すとみなすステップＳ３とを含む、ドメイン名とウェブサイトアクセス行為との関連付け方法を提供する。本発明のドメイン名とウェブサイトアクセス行為との関連付け方法によれば、ＤＮＳログによりユーザーのインターネット閲覧行為に対する分析を実現することもできる。【選択図】図２The present invention simulates the website access behavior of the user by the crawler program, and divides the DNS log by the step S1 of acquiring all DNS domain name requests in the current HTTP request, that is, the collected DNS domain name request set. Step 2 for acquiring n (where n is an integer of 1 or more) domain name request sets, the DNS domain name request set collected in step S1, and the domain name request set obtained by DNS log division in step S2 , And if one of the domain name request set obtained by the DNS log division is equal to or included in the collected DNS domain name request set, the DNS log is used by the user. Serial and a step S3 regarded as indicating that you click on the domain name of the requested URL when the crawler program is to collect, to provide a method for associating the domain name and web site access act. According to the method for associating a domain name with a website access activity of the present invention, it is also possible to realize analysis of the Internet browsing activity of the user by means of the DNS log. [Selected figure] Figure 2

Description

本発明は、インターネットＤＮＳドメイン名解決の分野及びウェブクローラー技術に関し、特にドメイン名とウェブサイトアクセス行為との関連付け方法に関する。 The present invention relates to the field of Internet DNS domain name resolution and web crawler technology, and in particular to a method of associating a domain name with a website access activity.

ＤＮＳ（ＤｏｍａｉｎＮａｍｅＳｙｓｔｅｍ、ドメイン名システム）は、インターネットにおいてドメイン名とＩＰアドレスとして互いにマッピングする分散型データベースであり、ユーザーがマシンによって直接読み取られたＩＰ数字列を覚える必要がなく、より便利にインターネットにアクセスすることを可能にする。「ＤＮＳドメイン名解決技術」とは、ユーザーがウェブサイトにアクセスすると、ブラウザにこのサイトのドメイン名を入力する必要があり、リターンキーを押すと、ブラウザは、まず、ＤＮＳリクエストをして、ＤＮＳ技術により、ブラウザはこのドメイン名に対応するサーバＩＰアドレスを取得し、その後に、このＩＰアドレスにＨＴＴＰリクエストをすることができることである。 DNS (Domain Name System) is a distributed database that maps each other as domain names and IP addresses in the Internet, and it is not necessary for users to remember IP numeric strings read directly by machines, making the Internet more convenient. Allows access to "DNS domain name resolution technology", when the user accesses the website, it is necessary to enter the domain name of this site in the browser, and when the return key is pressed, the browser first makes a DNS request, and DNS According to the technology, the browser can obtain a server IP address corresponding to this domain name, and thereafter make an HTTP request to this IP address.

ウェブクローラー技術は、一定のルールに応じて、ワールドワイドウェブ情報を自動的に収集するプログラム又はスクリプトである。それはユーザーをシミュレートしてウェブサイトにＨＴＴＰリクエストをして、かつ該過程において生成したＤＮＳリクエストを記録する。 Web crawler technology is a program or script that collects world wide web information automatically according to certain rules. It simulates the user, makes an HTTP request to the website, and records the DNS request generated in the process.

ＤＮＳデータの価値は、その分重視されず、ＩＰとドメイン名との対応関係のみと考えられるため、現在市場ではＤＮＳデータにより対応する関連付けを行っていない。 Since the value of DNS data is not emphasized, it is considered that there is only correspondence between IPs and domain names, so the market does not make corresponding correspondence with DNS data.

本発明は、ＤＮＳログ収集とウェブクローラー技術を結合して、ＤＮＳログによりユーザーのインターネット閲覧行為を分析することもできるドメイン名とウェブサイトアクセス行為との関連付け方法を提供する。 The present invention combines DNS log collection and web crawler technology to provide a method of associating a domain name with a website access activity that can also analyze the user's Internet browsing activity with the DNS log.

本発明のドメイン名とウェブサイトアクセス行為との関連付け方法は、クローラーによりユーザーのサイトアクセス行為をシミュレートし、今回のＨＴＴＰリクエストにおける全てのＤＮＳドメイン名リクエスト、すなわち収集したＤＮＳドメイン名リクエストセットを取得するステップＳ１と、ＤＮＳログを分割してｎ（ｎが１以上の整数である）個のドメイン名リクエストセットを取得するステップ２と、ステップＳ１で収集したＤＮＳドメイン名リクエストセット及びステップＳ２におけるＤＮＳログ分割により得られたドメイン名リクエストセットに対してセット同士とのマッチングを行い、ＤＮＳログ分割により得られたドメイン名リクエストセットのうちの１つが前記収集したＤＮＳドメイン名リクエストセットに等しいか又は含まれれば、前記ＤＮＳログは、ユーザーが前記クローラーが収集する時に要求したＵＲＬのドメイン名をクリックしたことを示すとみなすステップＳ３とを含む。 The method for associating a domain name with a website access activity according to the present invention simulates the user's site access activity with a crawler, and acquires all DNS domain name requests in the current HTTP request, ie, collected DNS domain name request sets Step S1; step 2 of dividing the DNS log to obtain n (n is an integer of 1 or more) domain name request sets; DNS domain name request set collected in step S1; and DNS in step S2 Match the set with domain name request set obtained by log division, and one of the domain name request set obtained by DNS log division is equal to the collected DNS domain name request set, or In Marere, the DNS log includes a step S3 that the user viewed as indicating the click the domain name of the URL that requested when the crawler collects.

好ましくは、ステップＳ２では、前記ＤＮＳログは、アクセス行為当日のＤＮＳログである。
好ましくは、ステップＳ２では、前記ＤＮＳログを分割することは、まずソースＩＰに基づいて分割し、その後にタイムスタンプの差に基づいて分割する二回分割を含む。
好ましくは、ソースＩＰに基づいてＤＮＳログを分割することは、ある時間内の同じソースＩＰの連続的なＤＮＳログを取得することである。
好ましくは、前記タイムスタンプの差に基づいてログを分割することは、ソースＩＰに基づいて分割されたログを、さらにＤＮＳログのタイムスタンプの差に基づいて分割し、２つのＤＮＳログのタイムスタンプの差が所定時間の長さよりも大きければ、前記２つのＤＮＳログを分割することである。
好ましくは、前記所定時間の長さは３秒間である。 Preferably, in step S2, the DNS log is a DNS log on the day of an access act.
Preferably, in step S2, splitting the DNS log includes splitting twice based on the source IP and then splitting based on the time stamp difference.
Preferably, dividing the DNS log based on the source IP is to obtain a continuous DNS log of the same source IP within a certain time.
Preferably, dividing the log based on the time stamp difference divides the divided log based on the source IP, further based on the time stamp difference of the DNS log, and the time stamp of the two DNS logs Of the two DNS logs if the difference between the two is greater than the predetermined length of time.
Preferably, the length of the predetermined time is 3 seconds.

本発明のドメイン名とサイトアクセス行為との関連付け方法によれば、ＤＮＳログによりユーザーのインターネット閲覧行為に対する分析を実現することもできる。 According to the method for associating a domain name with a site access activity of the present invention, it is also possible to realize analysis of the Internet browsing activity of the user by DNS log.

図１は、クローラープログラムが収集したＤＮＳドメイン名リクエストセットの概略図である。FIG. 1 is a schematic diagram of a set of DNS domain name requests collected by the crawler program. 図２は、本発明のドメイン名とウェブサイトアクセス行為との関連付け方法の流れ図である。FIG. 2 is a flow chart of the method for associating a domain name with a website access activity of the present invention.

以下に、図面及び実施例を参照しながら、発明を詳細に説明する。以下の実施例は、本発明を限定するものではない。発明構想の精神及び範囲から逸脱しない場合、当業者が想到し得る変化及び利点はいずれも本発明に含まれる。 The invention will now be described in detail with reference to the drawings and examples. The following examples do not limit the invention. All changes and advantages which can occur to those skilled in the art are included in the present invention without departing from the spirit and scope of the inventive concept.

上述したように、ＤＮＳ（ＤｏｍａｉｎＮａｍｅＳｙｓｔｅｍ、ドメイン名システム）は、インターネットにおいてドメイン名とＩＰアドレスとして互いにマッピングする分散型データベースであり、ユーザーがマシンによって直接読み取られたＩＰ数字列を覚える必要がなく、より便利にインターネットにアクセスすることを可能にする。ユーザーがサイトにアクセスすると、まず、ブラウザにこのサイトのドメイン名を入力し、リターンキーを押すと、ブラウザは、まず、ＤＮＳリクエストをして、ＤＮＳ技術により、ブラウザはこのドメイン名に対応するサーバＩＰアドレスを取得し、その後に、このＩＰアドレスにＨＴＴＰリクエストをすることができる。それは、ＤＮＳドメイン名解決技術である。 As mentioned above, DNS (Domain Name System, Domain Name System) is a distributed database that maps each other as domain name and IP address in the Internet, without the need for the user to remember the IP digit string read directly by the machine Allows you to access the Internet more conveniently. When a user accesses a site, first, enter the domain name of this site in the browser and press the return key, the browser first makes a DNS request, and by DNS technology, the browser corresponds to the server corresponding to this domain name After obtaining an IP address, you can make an HTTP request to this IP address. It is a DNS domain name resolution technology.

上記ドメイン名解決の過程において、ＤＮＳログを生成する。ＤＮＳログは、毎回のＤＮＳリクエストの応答コンテンツを記録し、ほとんどユーザーによって要求された全てのドメイン名情報を記録することができる。ＤＮＳログのフォーマットは以下のとおりである。
１４．＊＊＊．＊＊＊．１０｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
ソースＩＰ｜ドメイン名｜タイムスタンプ｜解決したＩＰ｜状態コード A DNS log is generated in the process of domain name resolution. The DNS log records the response content of each DNS request, and can record almost all domain name information requested by the user. The format of the DNS log is as follows:
14. ****. ****. 10 | www. baidu. com. ****. ****. 107; 180. ****. ****. 108 | 0
Source IP | Domain Name | Timestamp | Resolved IP | Status Code

すなわち、ＤＮＳログは、「ソースＩＰ」、「ドメイン名」、「タイムスタンプ」、「解決したＩＰ」及び「状態コード」の５つの部分の内容を含む。
以下に、図１を参照しながら、本発明のドメイン名とウェブサイトアクセス行為との関連付け方法を詳細に説明する。 That is, the DNS log includes the contents of five parts of "source IP", "domain name", "time stamp", "resolved IP" and "status code".
In the following, referring to FIG. 1, the method of associating the domain name of the present invention with the website access action will be described in detail.

まず、クローラープログラムによりユーザーのウェブサイトアクセス行為をシミュレートし、今回のＨＴＴＰリクエストにおける全てのＤＮＳドメイン名リクエスト、すなわち収集したＤＮＳドメイン名リクエストセットを取得する（ステップＳ１）。例えば、あるページを開くか又はあるＵＲＬ（リンク）をクリックし、クローラープログラムは、今回のＨＴＴＰリクエストにおける全てのＤＮＳドメイン名リクエストを収集する。あるユーザーがＵＲＬをクリックすると、現在のＵＲＬのドメイン名に加えて、幾つかの他のドメイン名も要求し、クローラー技術により該ＵＲＬをクリックした後に生成した全てのＤＮＳドメイン名リクエストを取得することができる。ここで、ユニフォームリソースロケータ（ＵＲＬ）は、インターネットから取得されたリソースの位置及びアクセス方法の簡潔な表示であり、インターネット上の標準リソースのアドレスである。インターネット上の各ファイルは、いずれも唯一のＵＲＬを有し、それが含む情報は、ファイルの位置及びブラウザがそれをどのように処理するかを示す。 First, the crawler program simulates the user's website access behavior, and acquires all DNS domain name requests in the current HTTP request, that is, collected DNS domain name request sets (step S1). For example, opening a page or clicking on a URL (link), the crawler program collects all DNS domain name requests in the current HTTP request. When a user clicks on a URL, in addition to the domain name of the current URL, it also requests some other domain names and gets all DNS domain name requests generated after clicking the URL by crawler technology Can. Here, the uniform resource locator (URL) is a brief indication of the location and access method of the resource obtained from the Internet, and is the address of a standard resource on the Internet. Each file on the Internet has a unique URL, and the information it contains indicates the location of the file and how the browser handles it.

例えば、ユーザーは、以下に示すような具体的なＵＲＬ（リンク）、
「ｈｔｔｐ：／／ｂａｉｋｅ．ｂａｉｄｕ．ｃｏｍ／ｌｉｎｋ？ｕｒｌ＝Ｌｍ−ＴｋＫＵｚＶ６８７ＩＲｏＰＣＤＶＵＡＧ５ｑｓｌｇＭｙＺｔＮａ６ｅ６Ａ３ｎＰｎＷＸｏｒｃＸＥＡＩｌ５０Ｏ６ＸＨＺＷｐＴＪａｔ」をクリックする。
クローラープログラムは、該ＵＲＬをクリックした後に生成した全てのＤＮＳドメイン名リクエスト、すなわちＤＮＳドメイン名リクエストセットを収集し、具体的には図１に示す。 For example, the user can specify a specific URL (link) as shown below:
Click on " http://baike.baidu.com/link?url=Lm-TkKUzV687IRoPCDVUAG5qslgMyZtNa6e6A3nPnWn OrxXEall50O6XHZWpTJat ".
The crawler program collects all DNS domain name requests generated after clicking on the URL, ie, DNS domain name request set, and specifically shown in FIG.

次に、ＤＮＳログを分割してｎ（ｎが１以上の整数である）個のドメイン名リクエストセットを取得する（ステップ２）。ここで、ＤＮＳログは、一般的には、アクセス行為当日のログである。前記分割は、まずソースＩＰに基づいて分割し、その後にタイムスタンプの差に基づいて分割する二回分割を含む。 Next, the DNS log is divided to obtain n (where n is an integer of 1 or more) domain name request sets (step 2). Here, the DNS log is generally a log on the day of the access act. The division includes a division twice based on the source IP first and then the division based on the time stamp difference.

１）ソースＩＰに基づいてＤＮＳログを分割し、すなわちログのソースＩＰが異なれば、連続的なログを分割する。ソースＩＰに基づく分割は、ある時間内の同じソースＩＰの連続的なＤＮＳログを取得することである。以下のとおりである。
１．１．１．１｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
１．１．１．１｜ｗｗｗ．ｑｑ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−ログ分割線−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
２．２．２．２｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｗｗｗ．ｑｑ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０ 1) Split the DNS log based on the source IP, ie split the continuous log if the source IP of the log is different. Source IP based partitioning is to get continuous DNS logs of the same source IP within a certain time. It is as follows.
1.1.1.1 | www. baidu. com . ****. ****. 107; 180. ****. ****. 108 | 0
1.1.1.1 | www. qq. com . ****. ****. 107; 180. ****. ****. 108 | 0
-------------------------------Log division line------ ---------------------------
2.2.2.2 | www. baidu. com . ****. ****. 107; 180. ****. ****. 108 | 0
2.2.2.2 | www. qq. com . ****. ****. 107; 180. ****. ****. 108 | 0

２）タイムスタンプの差に基づく分割は、ソースＩＰに基づいて分割されたログを、さらにＤＮＳログのタイムスタンプの差に基づいて分割することである。２つの連続的なログのタイムスタンプの差が所定時間の長さよりも大きければ、分割される（分割の原因は、ログの時間間隔が長過ぎると２つの異なる行為であると見なされることである）。該所定時間の長さは、必要に応じて調整することができる。本実施例では、前記所定時間の長さは３秒間であり、即ちタイムスタンプの差が３秒間より大きいと分割される。 2) Division based on time stamp difference is to divide the log divided based on the source IP further based on the time difference of DNS log. If the difference between the timestamps of two consecutive logs is greater than the length of the predetermined time, it is divided (the cause of the division is that it is considered to be two different actions if the time interval of the logs is too long ). The length of the predetermined time can be adjusted as needed. In the present embodiment, the length of the predetermined time is 3 seconds, that is, if the time stamp difference is larger than 3 seconds, it is divided.

例えば、ソースＩＰ２．２．２．２のＤＮＳログを、さらにそのタイムスタンプの差に基づいて分割し、以下のとおりである。（タイムスタンプ２０１４１２１１０３５９３２は、２０１４年１２月１１日３時５９分３２秒を示す） For example, the DNS log of the source IP 2.2.2.2 is further divided based on the difference of its timestamp, and is as follows. (Time stamp 201412110393 532 shows 3:59:32 on Dec. 11, 2014)

ソースＩＰ｜ドメイン名｜タイムスタンプ｜解決したＩＰ｜状態コード
２．２．２．２｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０００００１｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ａ．ｑｑ．ｃｏｍ｜２０１４１２１１０００００２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｂ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０００００３｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｃ．ｔａｎｘ．ｃｏｍ｜２０１４１２１１０００００４｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｃ．ａｌｌｙｅｓ．ｃｏｍ｜２０１４１２１１０００００５｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−ログ分割線−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
２．２．２．２｜ｗｗｗ．ｓｉｎａ．ｃｏｍ｜２０１４１２１１０００００９｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０ Source IP | Domain Name | Timestamp | Resolved IP | Status Code 2.2.2.2 | www. baidu. com . ****. ****. 107; 180. ****. ****. 108 | 0
2.2.2.2 | a. qq. com | 20141211000002 | 180. ****. ****. 107; 180. ****. ****. 108 | 0
2.2.2.2 | b. baidu. com | 2014121 1000003 | 180. ****. ****. 107; 180. ****. ****. 108 | 0
2.2.2.2 | c. tanx. com . ****. ****. 107; 180. ****. ****. 108 | 0
2.2.2.2 | c. allyes. com . ****. ****. 107; 180. ****. ****. 108 | 0
-------------------------------Log division line------ ---------------------------
2.2.2.2 | www. sina. com . ****. ****. 107; 180. ****. ****. 108 | 0

上述したように、タイムスタンプ２０１４１２１１０００００５の０５秒と２０１４１２１１０００００９の０９秒の間の差が４秒間（３秒間より大きい）であるため、ログは分割される。
ｗｗｗ．ｂａｉｄｕ．ｃｏｍ、ａ．ｑｑ．ｃｏｍ、ｂ．ｂａｉｄｕ．ｃｏｍ、ｃ．ｔａｎｘ．ｃｏｍ、ｃ．ｔａｎｘ．ｃｏｍは、ＤＮＳログにおける１つのドメイン名リクエストセットである。 As described above, the log is split because the difference between 05 seconds for timestamp 20141211000005 and 09 seconds for 20141211000009 is 4 seconds (greater than 3 seconds).
www. baidu. com, a. qq. com, b. baidu. com, c. tanx. com, c. tanx. com is one domain name request set in the DNS log.

続いて、ステップＳ１でクローラーが収集したＤＮＳドメイン名リクエストセット及びステップＳ２におけるＤＮＳログ分割により得られたドメイン名リクエストセットに対してセット同士とのマッチングを行う（ステップＳ３）。マッチングルールは、［（ａ，ｂ，ｃ）＝（ｂ，ｃ，ａ）＝（ａ，ｃ，ｂ）］である。 Subsequently, the DNS domain name request set collected by the crawler in step S1 and the domain name request set obtained by the DNS log division in step S2 are matched with each other (step S3). The matching rule is [(a, b, c) = (b, c, a) = (a, c, b)].

ログをマッチングした後に、ＤＮＳログの１つのドメイン名リクエストセットがクローラーの収集したドメイン名リクエストセットの一部に含まれ、又は２つのセットが同じであれば、該ＤＮＳログは、ユーザーが該ドメイン名（すなわちクローラーが収集する時に要求したＵＲＬのドメイン名）をクリックしたことを示すとみなされる。例えば、
クローラーが収集したＵＲＬは、ｗｗｗ．ａ．ｃｏｍ／ｄｏｃ／１２３４（該ＵＲＬは、あるユーザーのクリック行為である）である。収集した全てのドメイン名リクエストセットＡは、「ｗｗｗ．ａ．ｃｏｍ、ｗｗｗ．ｂ．ｃｏｍ、ｗｗｗ．ｃ．ｃｏｍ、ｗｗｗ．ｄ．ｃｏｍ、ｗｗｗ．ｅ．ｃｏｍ」である。
ＤＮＳログを分割した後にドメイン名リクエストセットＢの一部は、「ｗｗｗ．ａ．ｃｏｍ、ｗｗｗ．ｂ．ｃｏｍ、ｗｗｗ．ｅ．ｃｏｍ、ｗｗｗ．ｄ．ｃｏｍ」である。 After matching the logs, if one domain name request set of DNS log is included in a part of the domain name request set collected by the crawler, or if the two sets are the same, the DNS log will It is considered to indicate that you have clicked on a name (ie the domain name of the URL that the crawler requested when collecting). For example,
The URLs collected by the crawler are www. a. com / doc / 1234 (the URL is a click action of a certain user). All Domain Name Request Set A collected is " www.a.com , www.b.com , www.c.com , www.d.com , www.e.com ".
After splitting the DNS log, part of the domain name request set B is " www.a.com , www.b.com , www.e.com , www.d.com ".

上述のように、ＢセットがＡセット内に含まれると、ドメイン名リクエストセットＢは、ドメイン名セットＡがマッピングしたｗｗｗ．ａ．ｃｏｍ／ｄｏｃ／１２３４というユーザーアクセス行為を反映しているとみなされる。このように、ＤＮＳログによりユーザーのインターネット閲覧行為に対する分析を実現することもできる。 As described above, when B set is included in A set, domain name request set B is mapped to www. a. It is considered to reflect the user access act of com / doc / 1234 . In this way, DNS logs can also provide an analysis of the user's browsing behavior on the Internet.

以上の記載は本発明の好ましい実施例に過ぎず、本発明を限定するものではない。本発明の出願特許範囲内の内容に基づいて行われるいかなる同等変化や修飾は、いずれも本発明の技術的範囲内に属するべきである。 The above descriptions are merely preferred embodiments of the present invention, and are not intended to limit the present invention. Any equivalent change or modification made on the basis of the contents within the application patent range of the present invention should fall within the technical scope of the present invention.

Claims

Simulating a user's website access behavior by the crawler program and acquiring all DNS domain name requests in the current HTTP request, ie, acquiring the collected DNS domain name request set;
Step 2 of dividing the DNS log to obtain n (where n is an integer of 1 or more) domain name request sets;
The DNS domain name request set collected in step S1 and the n domain name request sets obtained by DNS log division in step S2 are matched with each other, and the domain name request set obtained by DNS log division Step S3 to consider that the DNS log indicates that the user has clicked on the domain name of the requested URL when collecting by the crawler program if one of them is equal to or included in the collected DNS domain name request set And a method of associating a domain name with a website access activity, characterized in that the method comprises:

The method according to claim 1, wherein the DNS log in step S2 is a DNS log on the day of an access act.

The domain according to claim 1, wherein the step of dividing the DNS log in step S2 comprises dividing the DNS log first based on the source IP, and then dividing based on the time stamp difference. How to associate your name with your website access activity.

The domain name and website access behavior according to claim 3, wherein dividing the DNS log based on the source IP is to obtain a continuous DNS log of the same source IP within a certain time. How to associate with

Dividing the log based on the time stamp difference divides the divided log based on the source IP further based on the time stamp difference of the DNS log, and the time difference between the two DNS logs is The method according to claim 4, wherein the two DNS logs are divided if the length of time is greater than a predetermined length of time.

The method of claim 5, wherein the predetermined time is 3 seconds.