WO2013137982A1 - Procédé et appareil de capture intelligente d'événements de modèle d'objet de document - Google Patents

Procédé et appareil de capture intelligente d'événements de modèle d'objet de document Download PDF

Info

Publication number
WO2013137982A1
WO2013137982A1 PCT/US2013/023636 US2013023636W WO2013137982A1 WO 2013137982 A1 WO2013137982 A1 WO 2013137982A1 US 2013023636 W US2013023636 W US 2013023636W WO 2013137982 A1 WO2013137982 A1 WO 2013137982A1
Authority
WO
WIPO (PCT)
Prior art keywords
dom
webpage
events
changes
web session
Prior art date
Application number
PCT/US2013/023636
Other languages
English (en)
Other versions
WO2013137982A4 (fr
Inventor
Travis Spence POWELL
Nadav Caspi
Ashwin Singhania
Robert I. Wenig
Original Assignee
International Business Machines Corporation ('ibm')
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/419,179 external-priority patent/US8868533B2/en
Application filed by International Business Machines Corporation ('ibm') filed Critical International Business Machines Corporation ('ibm')
Publication of WO2013137982A1 publication Critical patent/WO2013137982A1/fr
Publication of WO2013137982A4 publication Critical patent/WO2013137982A4/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Definitions

  • Known monitoring systems may capture and analyze web sessions.
  • the captured web sessions can be replayed at a later time to identify problems in web applications and obtain website analytics.
  • the monitoring systems may insert extensive instrumentation in the web application that log user session events, user actions, and webpage metadata (performance, etc). This style of logging might be performed by a client device, a server, or both.
  • the challenge from a replay perspective involves accurately stitching together the user experience from the log files obtained across multiple tiers. For example, some events may not be observable and therefore might not be captured during the web session. If events are not captured, the replayed web session may not reproduce the same states that occurred during the original web session. As a result, the replayed web session may not identify problems that happened during the original web session or may generate errors that never actually happened during the original web session.
  • the challenge from a physics perspective includes generating log files and moving the log files into a central repository without adversely affecting the original web session. Capturing and storing web session data uses client computer bandwidth and network bandwidth. The additional bandwidth usage might slow down the web session and cause the user to take evasive actions, such as aborting the web session.
  • FIG. 1 depicts an example of a system for capturing Document Object Model (DOM) events.
  • DOM Document Object Model
  • FIG. 2 depicts an example of an intelligent capture agent configured to capture Document Object Model (DOM) items for a webpage.
  • DOM Document Object Model
  • FIG. 3 depicts an example of an intelligent capture agent configured to capture DOM changes.
  • FIG. 4 depicts an example of a process for capturing DOM events.
  • FIG. 5 depicts an example of an intelligent capture agent configured to dynamically capture DOM events.
  • FIG. 6 depicts an example of a process for capturing a webpage DOM state based on checkpoint events.
  • FIG. 7 depicts an example of a process for capturing different DOM events based on webpage metadata.
  • FIG. 8 depicts an example of a DOM tree structure for a webpage.
  • FIG. 9 depicts an example of objects configured to capture DOM changes in a webpage.
  • FIG. 10 depicts an example of a process for capturing DOM events in a webpage.
  • FIG. 11 depicts an example of a DOM structure configured to generate network requests
  • FIG. 12 depicts an example of a system for capturing DOM events for different network requests.
  • FIG. 13 depicts an example of a process for capturing DOM events based on the network requests.
  • FIG. 14 depicts an example of a process for replaying captured DOM events.
  • FIG. I depicts an example of an intelligent capture system 100 configured to capture a web session 105.
  • a website 118 may comprise a web server 120 configured to operate a web application 122.
  • Web application 122 may be configured to conduct web session 105 with a client 102.
  • Web application 122 may comprise software, and a database containing multiple webpages and data for exchanging with client 102 during web session 105.
  • web application 122 may contain webpages and data for products available for purchasing from an on-line shopping website.
  • a network 101 may connect client 102 to web server 120 and also may connect client
  • Network 101 may comprise any combination of Local Area Networks (LANs), Wide Area Networks (WANs), Internet Protocol (IP) networks, phone networks, Public Services Telephone Networks (PSTN), wireless networks, cellular networks, Wi-Fi networks, Bluetooth networks, cable networks, data buses, or the like, or any combination thereof used for transferring information between client 102 and web server 120.
  • Client 102 may operate on any computing device configured to participate in web session 105 with web server 120.
  • client 102 may comprise a tablet computer, hand-held device, smart telephone, mobile telephone, personal digital assistant (PDA), laptop computer, personal computer, computer terminal, voice over internet protocol (VoIP) phones, or the like, or any combination thereof.
  • PDA personal digital assistant
  • VoIP voice over internet protocol
  • a user 126 may open a web browser 104 within a screen of client 102 and send a Hypertext Transfer Protocol (HTTP) request 128 over network 101 to web application 122.
  • Web application 122 may send a webpage 130 back to web browser 104 in response to HTTP request 128.
  • Web browser 104 may load and render webpage 130 on the computer screen of client 102.
  • web application 122 may operate an application server that communicates with an application running on client 102.
  • web browser 104 may not be used and the application running on client 102 may communicate and exchange data directly with the application server operating within website 1 18.
  • webpage 130 may comprise Hypertext Markup Language (HTML), Extensible Markup Language (XML), Cascading Style Sheets (CSS), JavaScript, Asynchronous JavaScript and XML (or other data provider) (AJAX), or the like, or any combination thereof.
  • Webpage 130 may be configured into a Document Object Model (DOM) 108 that defines a logical structure of an electronic document and the way the electronic document is accessed and manipulated.
  • DOM Document Object Model
  • webpage 130 may be part of an airline website and user 126 may enter text characters into a destination field displayed within webpage 130 for an airline destination.
  • JavaScript within webpage 130 may send a request 132 to web application 122 that contains the user input 124,
  • request 132 may include the letters "SFO" entered by user 126 into the airline destination field displayed in webpage 130.
  • Web application 122 may respond to request 132 with a reply 134 that contains updates for webpage 130. For example, web application 122 may send back data in reply 134 identifying San Francisco International Airport. Web browser 104 then may display the data from reply 134 within webpage 130. For example, webpage 130 may display a dropdown menu that identifies San Francisco International Airport. Any combination of requests 128 and 132, and replies 130 and 132, may be exchanged between web browser 104 and web application 122 during web session 105.
  • Requests 128 and 132, and replies 130 and 132 are alternatively referred to as network events 135.
  • User inputs 124 are alternatively referred to as user events 124. Any network events 135, user events 124, or any other logic that may change a state of the DOM 108 within webpage 130 may be referred to as a DOM event 136.
  • An intelligent capture agent 110 may be configured to capture DOM events 136 during web session 105 between client 102 and web application 122.
  • Capture agent 110 may comprise JavaScript added into the webpages sent by web application 122.
  • Capture agent 110 may send captured DOM events 136 to archive server 138.
  • Archive server 138 may store captured DOM events 136 as captured web session data 142.
  • Captured web session data 142 then may be replayed and analyzed at a later time, For example, an administrator of website 118 may replay captured web session data 142 to generate analytics for web session 105 and identify any problems with web application 122.
  • Capture agent 110 more efficiently captures DOM events 136 for web session 105.
  • Capture agent 110 could try and. capture every network event 135 and every user event 124 during web session 105.
  • captured user events 124 could be applied to captured webpages in network events 135 to reproduce the prior states of webpage 130.
  • using capture agent 110 to capture all of the network events 135 and user events 124 may substantially slow down web session 105.
  • web browser 104 may take longer to receive and load webpage 130 and webpage updates 134. This may cause user 126 to respond differently during web session 105 or abort web session 105.
  • not all user events 124 entered by user 126 may be successfully captured.
  • hostile HTML code within webpage 130 may prevent inputs or webpage changes from percolating to a level observable by capture agent 1 10.
  • network problems may prevent some captured DOM events 136 from being successfully transmitted to archive server 138. Therefore, captured web session data 142 may not include all of the events from original web session 105. The missing events may prevent accurate replay of the original web session 105.
  • a first input 124 may cause webpage 130 to display a particular field and the user may enter a second input 124 into the newly displayed field. If the first input 124 is not successfully captured, replay may not generate the field. Since the field is not generated, replaying the second input 124 may create an error condition that never happened during original web session 105. Thus, failing to capture even one web session event may prevent accurate replay of original web session 105.
  • Intelligent capture agent 110 is configured to intelligently capture DOM events 136 for webpage 130.
  • the intelligent capture of DOM events 136 reduces an overall amount of data that needs to be captured and transferred to archive server 138. Thus, capture agent 110 is less likely to slow down web session 105 and adversely affect the user experience during web session 105.
  • Intelligent capture of DOM events 136 also increases accuracy while replaying web session 105.
  • capture agent 110 may capture an entire DOM state of webpage 130.
  • the captured DOM state of webpage 130 may contain the results from web session events that may have otherwise been undetected or unsuccessfully captured.
  • the captured DOM state of webpage 130 operates as a checkpoint and allows the replay engine to resynchronize to a previously existing state of original web session 105. This allows a replay engine to more accurately recreate original web session 105.
  • the operating environment used during original web session 105 may be different from the operating environment used during replay.
  • a first web browser 104 may be used during original web session 105 and a second different web browser may be used to replay the captured web session.
  • Webpage 130 may operate differently on different web browsers.
  • Replaying captured DOM events 136 may provide a more consistent and accurate simulation of original web session 105 on a wider variety of different web browsers and operating conditions.
  • webpages may include links to multiple third party websites.
  • webpage 130 provided by website 118 may include multiple advertisements with links to other third party websites.
  • Some monitoring systems may only capture network events for one website.
  • a monitoring system might only capture network events 135 exchanged with website 118.
  • Capture agent 110 may be configured to selectively capture information exchanged with third party websites for more through capture of web session 105.
  • FIG. 2 depicts an example of a capture agent 110 configured to capture a web page 162.
  • User 126 may enter inputs 1 6 that cause web browser 104 to initiate a request 160 to a website. For example, user 126 may click on a link to a website that causes web browser 104 to send a Hypertext Transfer Protocol (HTTP) request 1 0 to a website that sells bicycles.
  • HTTP Hypertext Transfer Protocol
  • a web application on the bicycle website may send back webpage 162 in response to request 160.
  • Web browser 104 receives and renders webpage 162 on the screen of a client computing device.
  • webpage 162 may include multiple different DOM items 1 4.
  • a first DOM item 164A may comprise text prompting user 126 to purchase a particular product.
  • a second DOM item 164B may comprise an icon button configured to detect a mouse click, keystroke, and/or screen touch.
  • a third DOM item 164C may comprise an image 164C of the product for sale.
  • Capture agent 110 may need to capture all of DOM items 1 4 when webpage 1 2 is initially received and loaded by web browser 104, Some or all of DOM items 164 may be repeatedly and/or statically displayed on webpage 162.
  • webpage 1 2 may be the home page for the bicycle website.
  • Home webpage 162 may always contain the same known DOM items 164A-I64C.
  • Some of DOM items I64A-164C also may be displayed within other webpages for the bicycle website.
  • Capture agent 110 may be configured to identify known DOM items within webpage
  • Capture agent 110 may be programmed to detect known DOM items 164 and generate content identifiers or code words 168 that represent the information in the known DOM items. For example, capture agent 1 10 may look for any text within webpage 162 that begins with the phrase SELECT BUY BUTTON. Based on empirical data, the identified phrase may always be associated with the text in DOM item 164 A. Instead of capturing the entire text of DOM item 164A, capture agent 110 can then send a content identifier 168A to archive server 138 representing DOM item 164A.
  • Content identifier 168A can be used during replay of web session 158 to reproduce the entire text of DOM item 164A.
  • the replay engine may reference a table containing the text associated with content identifier 168A.
  • the replay engine may detect content identifier 168A within the captured web session data.
  • the replay engine may use content identifier 168A as an index to identify associated text SELECT BUY BUTTON TO PURCHASE ITEM in the table and display the identified text within webpage 162.
  • Text, JavaScript, images, control, or any other DOM item or data may be represented with content identifiers 168.
  • DOM item 164B may comprise an icon button and may be associated with a second content identifier 168B and DOM item 1 4C may comprise an image and may be associated with a third content identifier 168C.
  • DOM items 164B and 164C also may be pre-stored in the replay table along with DOM item 1 4A.
  • DOM items 164B and 164C may be accessed and displayed within webpage 162 during replay in response to the replay engine detecting the associated content identifiers 168B and I68C, respectively.
  • capture agent 110 may identify DOM items 164 in webpage 162 without having to capture and send the content of the DOM items 1 4 to the archive server.
  • a portion of a DOM item 164 may be represented by a content identifier 168 and another portion of the same DOM item 164 may be captured.
  • text in DOM item 164A may also include a specific name of the product being offered for sale.
  • Capture agent 1 10 may capture and send the name of the product to archive server 138.
  • the remaining generic text in DOM item 164A may be represented by content identifier 168A and may also be sent to the archive server 138.
  • the replay engine then accesses the text from
  • FIG. 3 depicts an example of a capture agent 1 10 configured to identify DOM changes in webpage 162.
  • user 126 may have selected the icon button for DOM item 164B.
  • Web browser 104 may have sent a request 172 in response to selection of the icon button.
  • HTTP request 172 may request purchase of the bicycle displayed on webpage 162.
  • the web application on the bicycle website may send back a response 174 in response to request 172.
  • Response 174 may be an update to currently displayed webpage 162 or may be a completely new webpage.
  • Content in response 174 may comprise any text, images, data, control, fields, or the like, or any combination thereof.
  • the bicycle selected for purchase by user 126 may be out of stock.
  • the web application on the bicycle website may send back DOM item 164D in response 174 comprising text indicating that the bicycle selected for purchase by user 126 is currently out of stock.
  • Capture agent 110 may detect a DOM change within webpage 162. For example, capture agent 110 may detect replacement of DOM item 164 A previously shown in FIG. 2 with the new text contained in DOM item 164D. Capture agent 110 also may determine that no other DOM items 164 have changed within webpage 162.
  • One technique for detecting DOM changes may comprise examining a DOMSUBTREE MODIFIED JavaScript message on webpage 162.
  • Capture agent 110 may be preprogrammed to look for any DOM items 164 that contain known content. For example, every time a product is out of stock, the web application for the bicycle website may generate the same message SORRY! WE ARE CURRENTLY OUT OF STOCK FOR THE ITEM YOU SELECTED. Capture agent 1 10 may determine DOM item 164D contains the known out of stock message. Instead of capturing the entire text message in DOM item 164D, capture agent 110 may generate a content identifier or codeword 178 representing the out of stock text message.
  • Capture agent 110 may send content identifier 178 back to archive server 138 in a message 176.
  • Message 176 also may include an action 179, a timestamp 180, and a location 182 associated with content identifier 178.
  • Action 170 may identify any selection, display, or other control associated with DOM item 164D.
  • Timestamp 180 may identify when DOM item 164D was detected within webpage 162 and location 182 may identify where DOM item 164D was displayed within webpage 162.
  • a webpage may only be displayed after a user logs into a website. For example, a user may have to enter a username and password in order to log into a bank account webpage.
  • Bank account information also may be unique to each user and also may constantly change over time.
  • other information on the bank webpage such as a bank banner or advertisements may be static and displayed on webpages for each user while the user views their bank accounts. Ail of the DOM elements of the bank webpage may need to be initial ty captured since the bank webpage cannot be reproduced during replay without an authorized user name and password.
  • User bank account information on the bank webpage might not be able to be represented by an associated content identifier 178, since the bank account information constantly changes over time.
  • capture agent 110 may capture the bank account information, assign a timestamp and location to the captured bank account information, and send the captured information to archive server 138.
  • the account information may be displayed in a particular format, such as a Graphics Interchange Format (GIF) image or a Joint Photographic Experts Group (JPEG) image.
  • GIF Graphics Interchange Format
  • JPEG Joint Photographic Experts Group
  • capture agent 110 may be configured to associate certain data formats with changing or unknown information. Accordingly, capture agent 110 may capture data having particular data formats.
  • Other information displayed on the bank webpage may comprise known data that is displayed to every user. Capture agent 110 may detect the known data and generate an associated content identifier, action, time stamp, and location. The content identifier and associated display information then may be sent to archive server 138.
  • Capture agent 110 may operate in conjunction with other monitoring devices.
  • capture agent 110 may operate in combination with a network session monitor that captures network data as described in U.S. Patent No. 8,127,000 entitled: METHOD AND APPARATUS FOR MONITORING AND SYNCHRONIZING USER INTERFACE EVENTS WITH NETWORK DATA, issued Feb 28, 2012, which is herein incorporated by reference.
  • capture agent 110 may capture the entire web session without using network data captured by a web session monitor.
  • Capture agent 110 may reduce the amount of data that needs to be captured and transmitted to archive server 138 for web session 158 by identifying and capturing the changes in webpage 162 instead of the entire webpage 162. Capture agent 1 10 can also reduce processing and network utilization by representing large amounts of data with content identifiers 178.
  • FIG. 4 depicts a process for intelligently capturing DOM events.
  • the capture agent may detect a DOM event.
  • the DOM event can be associated with any event, change, or data in a web session.
  • the DOM event may comprise loading a new webpage into a web browser, a change in information displayed in the webpage, a user input, a change in HTML code in the webpage, a JavaScript control operation, an asynchronous HTTP request or response, or the like, or any combination thereof.
  • the capture agent may identify the DOM changes in the webpage. For example, the web browser may have downloaded a new webpage from a website and the capture agent may determine that all of the DOM items in the new webpage need to be captured. In another example, the capture agent may determine that only one, or a few, DOM items changed within a particular DOM subtree of the webpage. The capture agent then may only need to capture the DOM changes in the identified DOM subtree. As explained above, a JavaScript DOMSUBTREE MODIFIED message may be examined in the webpage to identify the DOM changes.
  • the capture agent may determine if any of the identified DOM changes can be represented by a content identifier.
  • a content identifier As explained above, many DOM changes in a web page may comprise known content that may be repeatedly displayed to different users.
  • the capture agent can be preprogrammed to identify the known DOM changes in operation 204. For example, the capture agent may look for a particular word combination, image name, data format, etc.
  • the capture agent may generate a content identifier or code word that represents the known DOM changes and send the content identifier to the archive server. During replay, the content identifier is replaced with the actual content for the DOM change that was previously displayed on the webpage during the web session.
  • the capture agent may not be able to represent the DOM change with a content identifier.
  • the webpage may display unique bank account information for the user.
  • the capture agent may capture the DOM change and send the captured DOM change to the archive server.
  • the capture agent may use the DOMSUBTREE MODIFIED object to identify a DOM value.
  • the identified DOM value may be copied and sent along with an associated action identifier, time stamp identifier, and location identifier to the archive server.
  • FIG. 5 depicts an example of a capture system configured to dynamically capture
  • user 126 may generate inputs 170 that initiate network events 135.
  • web browser 104 may send different network requests 160 and 172 to web application 122 and receive back responses 162 and 174, respectively.
  • User 126 may generate other inputs 170 that may only result in local changes in webpage 162 without initiating network events 135.
  • Web session 158 may normally proceed in a particular sequence with an associated timing.
  • a first DOM event in web session 158 may start with a user initiating HTTP request 160.
  • a second DOM event may comprise web browser 104 loading web page 162. There may be a delay while web browser 104 loads web page 1 2 and user 126 reviews webpage 162.
  • a third event may comprise web browser 104 receiving another user input 170.
  • a fourth event may comprise sending network request 172 from web browser 104 to web application 122 requesting additional information.
  • a fifth DOM event may comprise web browser 104 receiving response 174 back from web application 122 containing webpage updates. Capture agent 1 10 and/or session analyzer 140 may monitor the sequence and timing of these DOM events 220.
  • capture agent 1 10 may capture a current state of webpage 162.
  • capture agent 1 10 may capture all of DOM items 164 within webpage 162 and send the captured DOM items to archive server 138. The state for webpage 162 can then be reproduced during replay to better identify possible problems that might have happened during original web session 158 when the irregularity was originally detected.
  • capture agent 110 may send captured DOM events 220 to archive server 138.
  • Session analyzer 140 may monitor the sequence and timing for DOM events 220. Based on empiricai data obtained from prior web sessions, session analyzer 140 may identify irregularities in the sequence and/or timing of DOM events 220. if an irregularity is detected, session analyzer 140 may send a control message 222 instructing capture agent 110 to capture a current DOM state of webpage 162. Control message 222 may dynamically direct capture agent 110 to capture specific
  • control message 222 may direct capture agent 1 10 to capture content in a particular data field when the irregularity is associated with a user response.
  • session analyzer 140 may direct capture agent 110 to capture network or client computer processing or memory capacity information when the irregularity is associated with a slower than normal sequence of DOM events.
  • capture agent 110 may send metadata 224 along with captured DOM events 220.
  • Metadata 224 provides additional descriptive information about webpage 162. Typical metadata 224 might include keywords, a description, author, date of update, or other information describing webpage 162. Other metadata may identify different types of data, such as images within webpage 1 2.
  • Session analyzer 140 may dynamically determine what DOM events 220 to capture in webpage 162 based on metadata 224.
  • metadata 224 may identify an image in webpage 162 that repeatedly changes. The image may comprise a face where the eyes on the face continue to change directions. Session analyzer 140 may determine that these changes in the image data do not need to be repeatedly captured and may send control message 222 directing capture agent 110 not to capture the image changes.
  • capture agent 110 may monitor the sequence and timing of DOM events 220 and autonomously determine what DOM events to capture based on the sequence and timing of DOM events 220. Similarly, capture agent 110 may autonomously determine what DOM events to capture based on metadata 224 for webpage 162.
  • session analyzer 140 may send an artificial stimulation of the DOM, herein referred to as a tickle, in control message 222.
  • the DOM tickle may force a DOM change in webpage 162.
  • Session analyzer 140 could send a first control message 222 directing capture agent 110 to capture a current DOM state of webpage 162.
  • Session analyzer 140 could then send the DOM tickle forcing a known DOM change in webpage 162.
  • the DOM tickle may comprise a user input, content, or logic for insertion into webpage 162 that should generate a known response in webpage 162.
  • session analyzer 140 may send a control message 222 directing capture agent 110 capture a complete DOM state for webpage 162 and/or may attach an error message to captured web session data 42.
  • FIG. 6 depicts an example of a process for capturing DOM events for a web session.
  • Either the capture agent and/or the session analyzer may perform the operations described below.
  • operation 240 a sequence of DOM events are monitored and in operation 242 timing between the DOM events are monitored. For example, network events, user events, and state changes in the webpage are monitored.
  • certain DOM events may prompt a checkpoint operation.
  • a new webpage loaded into the web browser may need to be captured and therefore identified for a checkpoint operation.
  • a webpage rendered by the web browser for some period of time may be identified as a checkpoint operation.
  • the capture agent may capture a current DOM state of the webpage in response to an identified check point operation in operation 244.
  • Capturing the current DOM state may comprise capturing some or all of the HTML, CSS and/or JavaScript within the webpage. Capturing the entire DOM state of the webpage allows resynchronization of the web session during replay. For example, missed DOM events may prevent the replay engine from reproducing the same states that happened during the original web session. The DOM state captured during the checkpoint operation can be replayed to force a previous state of the original web session. Otherwise, a missed DOM event could prevent the replay engine from accurately reproducing any subsequent web session states.
  • the sequence and timing of DOM events may be analyzed to identify any other unusual web session behavior. For example, an unusually long time gap may exist between two DOM events, in another example, no transition may exist between a first DOM event and a normally expected second DOM event.
  • the unusual sequence or timing of DOM events may be caused by client computer problems, network problems, web application problems, or user problems. If an unusual web session sequence or timing is detected in operation 246, the current DOM state also may be captured in its entirety in operation 248.
  • FIG. 7 depicts an example of a process for capturing DOM events based on webpage metadata
  • a DOM event may be detected. For example, an image may change within the webpage.
  • the metadata for the webpage may be analyzed and DOM events may be captured based on the metadata.
  • operation 264 may capture the entire webpage based on the metadata.
  • the metadata may indicate that the webpage has been rendered for a particular amount of time or that a particular network condition happened that requires capture or recapture of the entire webpage.
  • FIG. 8 depicts an example of a DOM tree structure 300 for a webpage 290 displayed by a web browser 104.
  • DOM tree structure 300 may comprises a top level window 302 and a sublevel document 304.
  • Document 304 may comprise a body 306 with paragraphs 308 and 314.
  • Paragraph 308 may comprise the text HELLO and paragraph 314 may comprise the text BUY CDS.
  • Control code 310 may monitor for a selection of paragraph 308. For example, control code 310 may detect a user mouse click or keystroke selection 320 on the HELLO text of paragraph 308.
  • Capture code 312 may be inserted into webpage 290 to capture the user input 320. Capture code 312 and other JavaScript code may be embedded in multiple different sections of DOM tree structure 300. Hostile code in webpage 290 may prevent user input 320 from propagating to a top level of DOM tree structure 300 and prevent capture code 312 from capturing user input 320. Accordingly, no captured events 322 are sent to archive server 138. Missed user input 320 may prevent the replay engine from generating the correct states for webpage 290 and prevent accurate simulation of the web session.
  • FIG. 9 depicts an example of code used a capture agent to more effectively capture
  • a DOMSUBTREE MODIFIED JavaScript message 326 may be located in DOM tree structure 300 and may be configured to detect DOM changes. Instead of monitoring for input 320, object 326 detects changes in DOM tree structure 300. This allows the capture agent to capture DOM events that may not have otherwise been captured.
  • user input 320 may cause text in paragraph 308 to change from HELLO to WORLD. Even without capturing user input 320, object 320 still may capture the result of click 320, namely, the change in paragraph 308 from HELLO to WORLD. The change in paragraph 308 is captured as DOM event 324 and sent to archive server 138. During replay, the replay engine may come to the web session state where user input 320 was previously entered by the user.
  • the DOMSUBTREE MODIFIED JavaScript message 326 is also located at a higher level of the DOM tree structure 300 and therefore may be less evasive and easier to examine within webpage 290.
  • Object 326 may be used in combination with capture code 312. For example, code 312 still may try and capture user input 320 and the replay engine still may try and generate the next webpage state by applying captured user input 320 to webpage 290. Object 326 may operate as a backup mechanism in case user input 320 is not successfully captured. In another example, some or all of the original web session may be captured using only object 326 and code 312 may not be embedded into associated portions of some webpages. In this example, the replay engine may simulate the different webpage states solely by replaying captured DOM events 324.
  • FIG. 10 depicts an example process for capturing DOM changes.
  • an object monitors a DOM tree structure for an electronic document.
  • DOM changes are identified within the DOM tree structure.
  • the DOMSUBTREE MODIFIED JavaScript message may identify any changes in the DOM tree structure of the webpage and identify the specific subtree and value for the DOM change.
  • the identified portion of the DOM tree structure is captured and in operation 356 the captured DOM change is sent to the archive server.
  • FIG. 11 depicts an example of a webpage 370 that exchanges network events with different websites.
  • a user may initiate a request 380 to a first website.
  • the first website may provide a response that includes webpage 370.
  • Portions of webpage 370 may include links to third party websites.
  • An XML HTTP command 374 may initiate a request 382 to a second website.
  • the second website may display an advertisement within webpage 370.
  • Web browser 104 may send request 382 to the second website in response to selection of the advertisement and may receive responses back from the second website in response to request 382.
  • the responses may comprise additional information for displaying within webpage 370 or may comprise a new home webpage for the second website.
  • a network session monitor server may capture network events 380 exchanged between web browser 104 and the first website.
  • the monitor server may reduce an amount of processing required by a capture agent embedded in webpage 370 for capturing DOM events. For example, the processing required for capturing request 380 and subsequent response from the first website can be offloaded to the session monitor.
  • a network session monitor server is described in U.S. Patent No. 8,127,000 entitled: METHOD AND APPARATUS FOR MONITORING AND SYNCHRONIZING USER INTERFACE EVENTS WITH NETWORK DATA, issued Feb 28, 2012, which has been incorporated by reference.
  • monitoring servers may not be able to capture the third party network events, such as request 382 and responses to request 382.
  • some monitoring servers may be located at the first website that supplies webpage 370 and may not have authorization or the ability to monitor network traffic to and from the third party website receiving request 382.
  • FIG. 12 depicts an example of how a local capture agent may more efficiently capture DOM events associated with third party websites.
  • Network request monitoring object 384 may be configured to detect network requests 380 and 382, For example, a PROTOTYPE XML HTTP REQUEST object may identify HTTP requests to different websites.
  • changes to webpage 370 from the responses to requests 380 and 382 may be captured by DOMSUBTREE MODIFIED object 320.
  • DOM events may be filtered based on the identified network events 380 and 382.
  • object 384 may identify the Universal Resource Locator (URL), protocol, and/or payload contained in network requests 380 and 382.
  • DOM events 390 may be captured based on which requests and responses are associated with third party websites. For example, only network events 382 associated with a third party websites may be captured and sent to archive server 138. Other network events associated with the primary website associated with webpage 370 and network request 380 may be captured by a network session monitor server as described above. This reduces the amount of processing and network bandwidth agent by only capturing the network events associated with the third party websites.
  • DOM events may be captured for both the primary website and the third party websites.
  • the capture agent may selectively choose which DOM events to capture based on request 380 and 382.
  • empirical data may indicate some DOM events associated with third party websites may not be significant when replaying the web session. Accordingly, some of the requests, responses, and other webpage content exchanged with the third party website may be filtered and not captured by the capture agent as part of captured DOM events 390.
  • FIG. 13 depicts an example of a process for capturing data based on network events.
  • network requests may be detected by a capture agent.
  • an object may identify network requests sent to a primary website and detect network requests sent to third party websites.
  • the object may identify responses to the network requests, such as the webpage and updates provide by the primary website and the additional webpage information and other webpages provided by the third party websites.
  • the capture agent may filter the content in the network requests and network responses.
  • the object may identify URLs, protocols, and/or payloads in the network requests and network responses.
  • the URLs may identify the websites associated with the network requests and network responses and the protocols and payloads may identify the types of data contained in the network requests and network responses.
  • the DOM events are captured agent based on the associated website and the types of associated data.
  • the selected network data is captured and sent to the archive server.
  • the capture agent selectively captures not only network traffic exchanged with the primary website but also selectiveiy captures network traffic from third parry websites.
  • FIG. 14 depicts an example of a replay operation performed by a replay engine.
  • captured web session data may be identified for replaying a previous web session.
  • an operator for a website may select a file of previously captured DOM events for the original web session.
  • the replay engine may identify content identifiers in the captured web session data.
  • a capture agent may have detected known DOM events during the original web session and sent content identifiers to the archive server instead of the actual DOM events.
  • the replay engine in operation 424 may locate the DOM events associated with the content identifiers.
  • the replay engine may reference a table that associates the content identifiers with the text, images, control data, etc. that was previously identified during the original web session.
  • the replay engine may replay the DOM events in a same manner as previously occurring during the original web session.
  • the DOM events may have associated actions, time stamps, and locations within the webpage.
  • the replay engine may replay the DOM events according to the associated actions, in a sequence according to the associated time stamps, and at the locations in a webpage according to the associated locations.
  • Replay of captured web sessions is described in U.S. Patent No. 8,042,055 entitled: REPLAYING CAPTURED NETWORK INTERACTIONS, issued October 18, 2011; and U.S. Patent No. 8,127,000 entitled: METHOD AND APPARATUS FOR MONITORING AND SYNCHRONIZING USER INTERFACE EVENTS WITH NETWORK DATA, issued Feb 28, 2012 which are both herein incorporated by reference.
  • embodiments of this disclosure may be implemented in a digital computing system, for example a CPU or similar processor. More specifically, the term “digital computing system,” can mean any system that includes at least one digital processor and associated memory, wherein the digital processor can execute instructions or "code” stored in that memory. (The memory may store data as well.)
  • a digital processor includes, but is not limited to a microprocessor, multi-core processor, Digital Signal Processor (DSP), Graphics Processing Unit (GPU), processor array, network processor, etc.
  • DSP Digital Signal Processor
  • GPU Graphics Processing Unit
  • a digital processor (or many of them) may be embedded into an integrated circuit. In other arrangements, one or more processors may be deployed on a circuit board (motherboard, daughter board, rack blade, etc.).
  • Embodiments of the present disclosure may be variously implemented in a variety of systems such as those just mentioned and others that may be developed in the future. In a presently preferred embodiment, the disclosed methods may be implemented in software stored in memory, further defined below.
  • Digital memory may be integrated together with a processor, for example Random Access Memory (RAM) or FLASH memory embedded in an integrated circuit Central Processing Unit (CPU), network processor or the like.
  • the memory comprises a physically separate device, such as an external disk drive, storage array, or portable FLASH device.
  • the memory becomes "associated" with the digital processor when the two are operatively coupled together, or in communication with each other, for example by an I/O port, network connection, etc. such that the processor can read a file stored on the memory.
  • Associated memory may be "read only” by design (ROM) or by virtue of permission settings, or not.
  • Other examples include but are not limited to WORM, EPROM, EEPROM, FLASH, etc.
  • Computer-readable storage medium includes all of the foregoing types of memory, as well as new technologies that may arise in the future, as long as they are capable of storing digital information in the nature of a computer program or other data, at least temporarily, in such a manner that the stored information can be "read” by an appropriate digital processor.
  • computer-readable is not intended to limit the phrase to the historical usage of "computer” to impiy a complete mainframe, mini-computer, desktop or even laptop computer.
  • Such media may be any available media that is locally and/or remotely accessible by a computer or processor, and it includes both volatile and non-volatile media, removable and non-removable media, embedded or discrete.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Un système de surveillance capture de manière intelligente des événements de modèle d'objet de document (DOM). Les événements DOM peuvent fournir des informations d'état qui ne peuvent pas habituellement être capturées pendant une session Web. Afin de réduire la bande passante de traitement, des identifiants de contenu peuvent être utilisés pour représenter certains événements DOM. Des points de contrôle peuvent être identifiés pendant la session Web et un état actuel de la page Web peut être capturé pour fournir une synchronisation de reproduction. Différentes données peuvent être capturées sur la base d'une séquence et d'une synchronisation des événements DOM pendant la session Web d'origine. Des données échangées avec des sites Web tiers peuvent également être capturées de manière sélective pour fournir une simulation plus directe de la session Web d'origine.
PCT/US2013/023636 2012-03-13 2013-01-29 Procédé et appareil de capture intelligente d'événements de modèle d'objet de document WO2013137982A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/419,179 2012-03-13
US13/419,179 US8868533B2 (en) 2006-06-30 2012-03-13 Method and apparatus for intelligent capture of document object model events

Publications (2)

Publication Number Publication Date
WO2013137982A1 true WO2013137982A1 (fr) 2013-09-19
WO2013137982A4 WO2013137982A4 (fr) 2013-11-07

Family

ID=47790491

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/023636 WO2013137982A1 (fr) 2012-03-13 2013-01-29 Procédé et appareil de capture intelligente d'événements de modèle d'objet de document

Country Status (1)

Country Link
WO (1) WO2013137982A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10324600B2 (en) 2015-07-27 2019-06-18 Adp, Llc Web page generation system
CN110098955A (zh) * 2019-04-01 2019-08-06 烽火通信科技股份有限公司 用于仿真业务的仿真模式灵活配置方法及系统
US10417317B2 (en) 2015-07-27 2019-09-17 Adp, Llc Web page profiler
US10498842B2 (en) 2015-07-13 2019-12-03 SessionCam Limited Methods for recording user interactions with a website
US10742764B2 (en) 2015-07-27 2020-08-11 Adp, Llc Web page generation system
CN112887381A (zh) * 2021-01-15 2021-06-01 中国地质大学(武汉) 用于面向特定网络入口的新内容检测和汇聚方法及装置
US11487931B1 (en) * 2021-10-18 2022-11-01 International Business Machines Corporation Replaying a webpage based on virtual document object model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010019258A1 (fr) * 2008-08-14 2010-02-18 Tealeaf Technology, Inc. Procédé et système de communication entre un système client et un système serveur
US8042055B2 (en) 2007-08-31 2011-10-18 Tealeaf Technology, Inc. Replaying captured network interactions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8042055B2 (en) 2007-08-31 2011-10-18 Tealeaf Technology, Inc. Replaying captured network interactions
WO2010019258A1 (fr) * 2008-08-14 2010-02-18 Tealeaf Technology, Inc. Procédé et système de communication entre un système client et un système serveur

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10498842B2 (en) 2015-07-13 2019-12-03 SessionCam Limited Methods for recording user interactions with a website
US10324600B2 (en) 2015-07-27 2019-06-18 Adp, Llc Web page generation system
US10417317B2 (en) 2015-07-27 2019-09-17 Adp, Llc Web page profiler
US10742764B2 (en) 2015-07-27 2020-08-11 Adp, Llc Web page generation system
CN110098955A (zh) * 2019-04-01 2019-08-06 烽火通信科技股份有限公司 用于仿真业务的仿真模式灵活配置方法及系统
CN112887381A (zh) * 2021-01-15 2021-06-01 中国地质大学(武汉) 用于面向特定网络入口的新内容检测和汇聚方法及装置
US11487931B1 (en) * 2021-10-18 2022-11-01 International Business Machines Corporation Replaying a webpage based on virtual document object model
WO2023066063A1 (fr) * 2021-10-18 2023-04-27 International Business Machines Corporation Relecture d'une page web sur la base d'un modèle d'objet de document virtuel

Also Published As

Publication number Publication date
WO2013137982A4 (fr) 2013-11-07

Similar Documents

Publication Publication Date Title
US9842093B2 (en) Method and apparatus for intelligent capture of document object model events
US11588922B2 (en) Capturing and replaying application sessions using resource files
CA2656539C (fr) Procede et systeme de surveillance et de synchronisation d'evenements d'interface utilisateur avec des donnees reseau
WO2013137982A1 (fr) Procédé et appareil de capture intelligente d'événements de modèle d'objet de document
CA2797451C (fr) Systeme d'identification et de deduction d'evenements de session internet
US9483572B2 (en) Interactivity analyses of web resources based on reload events
CN104133828B (zh) 用于html文档的拖放剪贴板
US11477298B2 (en) Offline client replay and sync
US11842142B2 (en) Systems and methods for co-browsing
US8886819B1 (en) Cross-domain communication in domain-restricted communication environments
US20120227116A1 (en) Implementing browser based hypertext transfer protocol session storage
TW200900956A (en) Identifying appropriate client-side script references
CN111431767A (zh) 多浏览器资源同步方法、装置、计算机设备和存储介质
CN103617043B (zh) 一种带图片网页数据上传的方法和系统
Laurent et al. Cookies
CN101887463A (zh) 一种基于虚拟域的http还原展示方法
CN112242944A (zh) 一种文件处理的方法以及相关装置
CN103885877A (zh) 一种http的模拟浏览器测试脚本生成方法及装置
CN106383765A (zh) 一种数据监控方法及装置
CN111368231B (zh) 一种异构冗余架构网站的测试方法及装置
CN109756393B (zh) 信息处理方法、系统、介质和计算设备
CN102811234B (zh) 保存应用的方法及装置
Aoki et al. Web operation recorder and player

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13707471

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13707471

Country of ref document: EP

Kind code of ref document: A1