CN103324673A - Method for acquiring internet user behavior data - Google Patents

Method for acquiring internet user behavior data Download PDF

Info

Publication number
CN103324673A
CN103324673A CN2013101958136A CN201310195813A CN103324673A CN 103324673 A CN103324673 A CN 103324673A CN 2013101958136 A CN2013101958136 A CN 2013101958136A CN 201310195813 A CN201310195813 A CN 201310195813A CN 103324673 A CN103324673 A CN 103324673A
Authority
CN
China
Prior art keywords
focus window
browser
data
current
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101958136A
Other languages
Chinese (zh)
Other versions
CN103324673B (en
Inventor
刘冰
王利军
周鑫
王常青
周煜程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Internet Network Information Center
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201310195813.6A priority Critical patent/CN103324673B/en
Publication of CN103324673A publication Critical patent/CN103324673A/en
Application granted granted Critical
Publication of CN103324673B publication Critical patent/CN103324673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a method for acquiring internet user behavior data. The method for acquiring the internet user behavior data comprises the steps (1) establishing a focus window sequence table of a current focus window and obtaining a process name of the current focus window and a process identification (ID) through an operating system application programming interface (API); (2) judging that the current focus window is a browser according to the process name and the process ID and then obtaining a current browser label page handle and an address bar uniform resource locator (URL); (3) comparing and judging whether the process name of the obtained the current focus window, the process ID, the current browser label page handle and the address bar URL are consistent with a last record in the focus window sequence table or not; (4) using an inconsistent data record obtained in the previous step as a complete record and adding the record into the focus window sequence table and obtaining primary user operation data according to time difference of two adjacent records in the focus window sequence table. The focus window truly reflects software concerned and used by a user really and switching time of focus windows of a website, and other non-user behavior data noise is not mingled.

Description

A kind of acquisition method of Internet user's behavioral data
Technical field
The present invention relates to Internet user's behavioral data acquisition method, also do not relate to client-based user behavior data acquisition method, belong to internet data and gather the field.
Background technology
When being the Internet user and studying, be unable to do without the collection to user behavior data.And whether the data that collect are objective, comprehensive, then directly affect the quality of user study.Existing Internet user's behavioral data acquisition method, from can be divided into 2 large classes in form: a kind of is the service end data acquisition, and another kind is the client data collection.The service end data acquisition centered by the website, the whole pages that comprise in website of service end record, the situation that function is used; Client data gathers customer-centric, the whole actions, the behavior that occur user of client records.The collecting method that this paper proposes belongs to the client data collection.
At present, existing client data collection has:
The first, the network packet capturing: when the user accessed some websites, browser can send request of data to Website server, by the mode of network packet capturing, can monitor these requests, thereby obtained the behavior record of user's access websites.Although the method for network packet capturing can be caught a large amount of user data, exist data noise large, can't track user service time etc. problem.
At first, be not all request of data all must be that user's active behavior produces, may be at the background request ad content such as the advertisement position in the webpage, but this not representative of consumer initiatively clicked advertisement.Because at present also not having good way to distinguish which request of data is that the user initiatively sends, so the user behavior data that the network packet capturing collects has very large noise, can't objectively respond user's real behavior.
Secondly, the transmission of request of data only means the beginning of user behavior, the judgement but when user behavior finishes to have no way of, and this user behavior data that also just causes the network packet capturing to collect is comparatively single.
Second, internal memory scanning: the user is when using browser or other software, can in calculator memory, form a current program process tabulation that is moving, scan this tabulation with certain frequency, can obtain the user and use what program sometime, access what website, and the information such as when begin, when finish, these data gather the detailed behavior record that can form the user.But, there is the meticulous not problem of data in the method for internal memory scanning, it collect data can't be truly also the original subscriber behavior---the user may put at one time and open simultaneously a plurality of softwares or website, we can't know which the user is what use really like this.
Can see, existing client data acquisition method, exist noise greatly, the problem such as meticulous not, these problems have been brought larger difficulty to Internet user's research, this also just the present invention the problem such as to solve.
Summary of the invention
It is a kind of almost countless according to noise, detail record user actual concern object, the client data acquisition method of Internet user's behavior of can truly reducing that the object of the invention is to provide.
When the user uses browser or other software on computers, may open simultaneously a plurality of program windows, but at one time, the user only may carry out alternately with one of them window, and this current and window user interactions is called focus window.Focus window is the recipient of user action, and it has reflected what the user is what pay close attention to really, and the switching of focus window has then reflected the transfer case of user's notice.Therefore, focus window and variation thereof by on the monitoring active user computing machine can truly reflect user's behavior.The collecting method that this paper proposes just is based on a kind of client data acquisition method of " focus window ".
Technical scheme of the present invention is as follows: a kind of acquisition method of Internet user's behavioral data, and its step comprises:
1) sets up the focus window sequence table of current focus window, grasp process name and the process ID of described current focus window by operating system API;
2) judge that according to described process name and process ID focus window is the current browser Shipping Options Page handle of crawl behind the browser, address field URL;
3) whether consistent with the last item Record Comparison in the described focus window sequence table according to the described current focus window process name, process ID, browser Shipping Options Page handle, the address field URL that obtain;
4) the inconsistent data recording that above-mentioned crawl is obtained is added in the focus window sequence table as a complete documentation, according to user's service data of mistiming acquisition of two adjacent records in the described focus window sequence table;
5) repeating step 1)-4) collect user behavior data.
Further, the described method of setting up the focus window sequence table of current focus window is:
At subscriber computer mutual current focus window of user that one data acquisition program is used for surveying operational system is installed, and a focus window sequential recording of initialization focus window data.
Further, the sequence table of described initialization focus window comprises: detection time, focus window process name, process ID, browser Shipping Options Page handle, a browser URL5 field.
Further, the frequency acquisition of described data acquisition program is that 1-3 gathers a secondary data second.
Further, when judging that according to described process name and process ID focus window is browser, a pre-defined browser window process list, after described operating system API capture program grabs current focus window process, need to be in this row plan inquiry judging, the focus window that will be present in this tabulation is judged as browser.
Further, described browser window process list records following variable assignments information: duration, process name, process ID, name of product, product version, Business Name after the start.
Further, when described operating system is Windows operating system, can obtain the current focus window by calling GetForegroundWindow (), the window process ID can be obtained by calling GetWindowThreadProcessId (), the window process name can be obtained by calling GetModuleFileNameEx ().
Further, when described browser is the IE browser, grasp as follows current browser Shipping Options Page handle, address field URL:
1) enumerates all subwindows of browser by calling EnumChildWindows ();
2) find Shipping Options Page and address field subwindow by the class name coupling in described subwindow, the window class of described IE browser Shipping Options Page is called Frame Tab, and the window class of IE browser address bar is called Address Band Root;
3) for the address field subwindow, by calling the URL in GetWindowText () the acquisition browser address bar.
Further, if crawl obtains and winds data in full accord in the described step 4), then represent on the software or Website page of user before also resting on, not record data.
Further, current browser Shipping Options Page handle is address field handle and current TAB handle described step 2).
Advantage of the present invention:
Collecting method in this paper, it is a focus window sequence that user behavior is recorded faithfully, this sequence has directly reflected the interbehavior overall process of user and computing machine.
Compare with existing client data acquisition method, have the following advantages:
1) focus window has truly reflected real software, the website of paying close attention to and using of user;
2) switching time of focus window, directly certain uses beginning, the concluding time of certain software, website, intuitively reaction needed corresponding to the user;
3) the focus window sequence is the data centered by user behavior fully, can not be mingled with other non-user behavior data noise.
Description of drawings
Fig. 1 is the acquisition method schematic flow sheet of Internet user's behavioral data of the present invention;
Data acquisition schematic diagram among acquisition method one embodiment of Fig. 2 Internet user behavioral data.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, be understandable that, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those skilled in the art belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
Data acquisition schematic diagram among acquisition method one embodiment of Internet user's behavioral data of the present invention as described in Figure 2.
The technical scheme that adopts in one embodiment of the invention is: at subscriber computer a data capture program is installed, and is kept long-time continuous service.Data acquisition program is with the current focus window of certain frequency (suggestion be spaced apart 1-3 second) detecting user computing machine, and the corresponding data of crawl current focus window, then according to time sequencing, the data itemize that grabs is got off, thereby form the focus window sequence that the user contacted within a period of time, this sequence has directly reflected the interbehavior overall process of user and computing machine.
At subscriber computer mutual current focus window of user that one data acquisition program is used for surveying operational system is installed, and focus window sequence table of initialization is used for record focus window data, and the focus window sequence table comprises detection time, focus window process name, process ID, browser Shipping Options Page handle, browser URL totally 5 fields;
1) information such as the process name by operating system API crawl current focus window, process ID;
2) judge according to process name whether the current focus window is browser, if then continue to carry out, otherwise skip to step 6
3) survey the current Shipping Options Page (in 7.0 versions and above IE browser, also being referred to as " tab ") of this browser
4) grasp the information such as handle, address field URL of current browser Shipping Options Page by operating system API
5) the focus window process name, process ID, browser Shipping Options Page handle, browser URL of judging this crawl whether with the focus window sequence in the last item record in full accord, if then skip to step 8, otherwise continue to carry out
6) time, focus window process name, process ID, browser Shipping Options Page handle, the browser URL with this detection adds in the focus window sequence table as a record;
7) wait for behind the certain hour interval and (suggestion be spaced apart 1-3 second) restart execution in step 2
In an embodiment of the present invention, step 1 and step 2 need to realize in conjunction with concrete operating system, in different operating system, obtain the method possibility difference of focus window and progress information thereof.For example, in Windows operating system, can obtain the current focus window by calling GetForegroundWindow (), the window process ID can be obtained by calling GetWindowThreadProcessId (), the window process name can be obtained by calling GetModuleFileNameEx ().
In an embodiment of the present invention, step 3 needs a pre-defined browser window process list, after program grabs current focus window process, need to inquire about in this tabulation, if be present in this tabulation, then this focus window is judged as browser.
In another embodiment of the present invention, step 4 and step 5 need to realize in conjunction with concrete browser interface, and in different browsers, the interface that obtains its Shipping Options Page and address field URL may be different.For example, for the IE browser, can enumerate by calling EnumChildWindows () its all subwindows, then find Shipping Options Page and address field subwindow by the class name coupling, wherein the window class of IE browser Shipping Options Page is called " Frame Tab ", the window class of IE browser address bar is called " Address Band Root ", for the address field subwindow, further calls GetWindowText () and can obtain URL in the browser address bar.
In another embodiment of the present invention, if step 6 be judged as " with wind in full accord ", represent that then the user also rests on software before or the Website page, therefore do not need duplicate record, Just because of this, in the focus window sequence, the mistiming of two adjacent records, can reflect the stop duration of user on previous software or website.
The acquisition method schematic flow sheet of Internet user's behavioral data of the present invention as shown in Figure 1; Step is as follows: one, and the current focus window of detection system, by real software, the website of paying close attention to and using of focus window reflection user, the tabulation of predefine browser guarantees that the internet behavior of user on any browser can be recorded faithfully lower; Two, grasp the information such as handle of current browser Shipping Options Page, the details of finer reflection user internet behavior; Three, with record addition before the focus window sequence, at first judge current record whether with wind unanimously, can make like this focus window sequence intuitively reflect the switching of window, the while has also reduced data redundancy.
A. define the log recording variable, be initialized as sky, obtain the focus window handle;
B. after obtaining the handle success, obtain window title and obtain intersection point window process ID
C. focus process ID and the front focus process ID that once obtains are compared, look at whether the focus process changes;
If D. variation has occured in the process ID of focus, then empty the last URL object variable of record, the Clear Log record variable, and upgrade the last process ID record variable;
E. obtain the corresponding product version of process, title, Business Name;
F carries out assignment to the log recording variable:
Duration, process name, process ID after the start,
Name of product, product version, Business Name;
G judges whether process is browser; Then upgrade the last process ID record variable if process is browser, obtain URL, address field handle, current TAB handle;
If H. above-mentioned parameter is not by changing then to the log recording variable assignments, assignment is:
After the start duration open, process name, process ID, url, address field handle, TAB handle.

Claims (10)

1. the acquisition method of Internet user's behavioral data, its step comprises:
1) sets up the focus window sequence table of current focus window, grasp process name and the process ID of described current focus window by operating system API;
2) judge that according to described process name and process ID focus window is the current browser Shipping Options Page handle of crawl behind the browser, address field URL;
3) whether consistent with the last item Record Comparison in the described focus window sequence table according to the described current focus window process name, process ID, browser Shipping Options Page handle, the address field URL that obtain;
4) the inconsistent data recording that above-mentioned crawl is obtained is added in the focus window sequence table as a complete documentation, according to user's service data of mistiming acquisition of two adjacent records in the described focus window sequence table;
5) repeating step 1)-4) collect user behavior data.
2. the acquisition method of Internet user's behavioral data as claimed in claim 1 is characterized in that, the described method of setting up the focus window sequence table of current focus window is:
At subscriber computer mutual current focus window of user that one data acquisition program is used for surveying operational system is installed, and a focus window sequential recording of initialization focus window data.
3. the acquisition method of Internet user's behavioral data as claimed in claim 2 is characterized in that, the sequence table of described initialization focus window comprises: detection time, focus window process name, process ID, browser Shipping Options Page handle, a browser URL5 field.
4. the acquisition method of Internet user's behavioral data as claimed in claim 2 is characterized in that, the frequency acquisition of described data acquisition program is that 1-3 gathers a secondary data second.
5. the acquisition method of Internet user's behavioral data as claimed in claim 1, it is characterized in that, when judging that according to described process name and process ID focus window is browser, a pre-defined browser window process list, after described operating system API capture program grabs current focus window process, need to be in this row plan inquiry judging, the focus window that will be present in this tabulation is judged as browser.
6. the acquisition method of Internet user's behavioral data as claimed in claim 5, it is characterized in that, described browser window process list records following variable assignments information: duration, process name, process ID, name of product, product version, Business Name after the start.
7. the acquisition method of Internet user's behavioral data as claimed in claim 1, it is characterized in that, when described operating system is Windows operating system, can obtain the current focus window by calling GetForegroundWindow (), the window process ID can be obtained by calling GetWindowThreadProcessId (), the window process name can be obtained by calling GetModuleFileNameEx ().
8. the acquisition method of Internet user's behavioral data as claimed in claim 1 is characterized in that, when described browser is the IE browser, grasps as follows current browser Shipping Options Page handle, address field URL:
1) enumerates all subwindows of browser by calling EnumChildWindows ();
2) find Shipping Options Page and address field subwindow by the class name coupling in described subwindow, the window class of described IE browser Shipping Options Page is called Frame Tab, and the window class of IE browser address bar is called Address Band Root;
3) for the address field subwindow, by calling the URL in GetWindowText () the acquisition browser address bar.
9. the acquisition method of Internet user's behavioral data as claimed in claim 1 is characterized in that, if crawl obtains and winds data in full accord in the described step 4), then represents on the software or Website page of user before also resting on, not record data.
10. the acquisition method of Internet user's behavioral data as claimed in claim 1 is characterized in that, described step 2) in current browser Shipping Options Page handle be address field handle and current TAB handle.
CN201310195813.6A 2013-05-23 2013-05-23 A kind of acquisition method of Internet user's behavioral data Active CN103324673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310195813.6A CN103324673B (en) 2013-05-23 2013-05-23 A kind of acquisition method of Internet user's behavioral data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310195813.6A CN103324673B (en) 2013-05-23 2013-05-23 A kind of acquisition method of Internet user's behavioral data

Publications (2)

Publication Number Publication Date
CN103324673A true CN103324673A (en) 2013-09-25
CN103324673B CN103324673B (en) 2016-08-31

Family

ID=49193416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310195813.6A Active CN103324673B (en) 2013-05-23 2013-05-23 A kind of acquisition method of Internet user's behavioral data

Country Status (1)

Country Link
CN (1) CN103324673B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103957245A (en) * 2014-04-22 2014-07-30 北京微众文化传媒有限公司 Method and device for obtaining Internet data
CN107704174A (en) * 2017-10-12 2018-02-16 威创集团股份有限公司 A kind of window grab method and system, computer installation and memory
CN107797906A (en) * 2017-10-09 2018-03-13 四川巧夺天工信息安全智能设备有限公司 A kind of method for monitoring a variety of browsing device net pages in real time and browsing record
CN107968960A (en) * 2016-10-20 2018-04-27 中兴通讯股份有限公司 A kind of backstage audio and video playing control method and device
CN109684590A (en) * 2018-12-25 2019-04-26 威创集团股份有限公司 A kind of browsing device net page data sharing method and device
CN110674438A (en) * 2019-08-16 2020-01-10 中国平安财产保险股份有限公司 Advertisement putting method, device, computer system and readable storage medium
CN110674438B (en) * 2019-08-16 2024-07-02 中国平安财产保险股份有限公司 Advertisement putting method and device, computer system and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252401A (en) * 2008-02-27 2008-08-27 华为技术有限公司 Terminal equipment, system and method for downloading program data
CN102509233A (en) * 2011-11-29 2012-06-20 汕头大学 User online action information-based recommendation method
US20120210215A1 (en) * 2011-02-16 2012-08-16 Rovi Technologies Corporation Method and apparatus for providing networked assistance and feedback control for consumer electronic devices
US20120232951A1 (en) * 2011-03-08 2012-09-13 Alibaba Group Holding Limited Sending product information based on determined preference values

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252401A (en) * 2008-02-27 2008-08-27 华为技术有限公司 Terminal equipment, system and method for downloading program data
US20120210215A1 (en) * 2011-02-16 2012-08-16 Rovi Technologies Corporation Method and apparatus for providing networked assistance and feedback control for consumer electronic devices
US20120232951A1 (en) * 2011-03-08 2012-09-13 Alibaba Group Holding Limited Sending product information based on determined preference values
CN102509233A (en) * 2011-11-29 2012-06-20 汕头大学 User online action information-based recommendation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
向坚持等: "基于用户行为的Web使用挖掘数据采集技术研究", 《计算机与现代化》, no. 12, 31 December 2007 (2007-12-31) *
李莺等: "新一代WLAN网络监控与用户行为分析系统", 《重庆邮电大学学报(自然科学版)》, vol. 22, no. 4, 31 August 2010 (2010-08-31) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103957245A (en) * 2014-04-22 2014-07-30 北京微众文化传媒有限公司 Method and device for obtaining Internet data
CN103957245B (en) * 2014-04-22 2017-11-28 北京微众文化传媒有限公司 Internet data acquisition methods and device
CN107968960A (en) * 2016-10-20 2018-04-27 中兴通讯股份有限公司 A kind of backstage audio and video playing control method and device
CN107797906A (en) * 2017-10-09 2018-03-13 四川巧夺天工信息安全智能设备有限公司 A kind of method for monitoring a variety of browsing device net pages in real time and browsing record
CN107797906B (en) * 2017-10-09 2020-10-13 四川巧夺天工信息安全智能设备有限公司 Method for monitoring webpage browsing records of various browsers in real time
CN107704174A (en) * 2017-10-12 2018-02-16 威创集团股份有限公司 A kind of window grab method and system, computer installation and memory
CN109684590A (en) * 2018-12-25 2019-04-26 威创集团股份有限公司 A kind of browsing device net page data sharing method and device
CN110674438A (en) * 2019-08-16 2020-01-10 中国平安财产保险股份有限公司 Advertisement putting method, device, computer system and readable storage medium
CN110674438B (en) * 2019-08-16 2024-07-02 中国平安财产保险股份有限公司 Advertisement putting method and device, computer system and readable storage medium

Also Published As

Publication number Publication date
CN103324673B (en) 2016-08-31

Similar Documents

Publication Publication Date Title
US11489934B2 (en) Method and system for monitoring and tracking browsing activity on handled devices
CN101651707B (en) Method for automatically acquiring user behavior log of network
CN103092999B (en) A kind of webpage capture period modulation method and apparatus
CN104216921B (en) A kind of addition reminding method, apparatus and system for realizing quick links in browser
JP5134684B2 (en) How to understand website information through web page structure analysis
EP3031216A1 (en) Dynamic collection analysis and reporting of telemetry data
WO2015103122A2 (en) A method and system for tracking and gathering multivariate testing data
CN103324673A (en) Method for acquiring internet user behavior data
CN106294101A (en) The page gets method of testing and device ready
CN103729446A (en) Processing method and device for user operation data and server
CN104765689A (en) Method and device for conducting real-time supervision to interface performance data
CN102663049A (en) Method and device for updating search engine web address library
CN103546330A (en) Method, device and system for detecting compatibilities of browsers
CN103455600A (en) Video URL (Uniform Resource Locator) grabbing method and device and server equipment
CN101354706A (en) Method and apparatus for collecting web page information
CN113190512A (en) Power customer behavior data analysis method based on buried point technology
CN112818201A (en) Network data acquisition method and device, computer equipment and storage medium
CN111240847A (en) Data processing method, device, medium and computing equipment
CN105450460B (en) Network operation recording method and system
KR101282975B1 (en) A webpage crop server system of reconstructing a web page from tree structure of document elements
CA2824977C (en) Online content collection
CN105763633A (en) Association method of domain name and website visiting behavior
CN105338091A (en) High-transmission-efficiency personalized information interface display method and apparatus
US20200372434A1 (en) Systems and methods for interacting with a client device
WO2016027173A1 (en) Method of and a system for monitoring web site consistency

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210209

Address after: 100190 room 506, building 2, courtyard 4, South 4th Street, Zhongguancun, Haidian District, Beijing

Patentee after: CHINA INTERNET NETWORK INFORMATION CENTER

Address before: 100190 No. four, 4 South Street, Haidian District, Beijing, Zhongguancun

Patentee before: Computer Network Information Center, Chinese Academy of Sciences