WO2017131745A1

WO2017131745A1 - Extracting task flows from application usage records

Info

Publication number: WO2017131745A1
Application number: PCT/US2016/015655
Authority: WO
Inventors: Abhinav PARATE; Kyu-Han Kim
Original assignee: Hewlett Packard Enterprise Development Lp
Priority date: 2016-01-29
Filing date: 2016-01-29
Publication date: 2017-08-03

Abstract

In one example, a method to extract task flows is described. The method may include determining a navigation graph of an application, calculating at least one navigation hub in the navigation graph, identifying task starter pages in the navigation graph based upon links from the at least one navigation hub in the navigation graph, segmenting usage records of the application into sequences based upon the at least one navigation hub and the task starter pages, determining task finisher pages in the sequences, and extracting task flows from the sequences based upon the task finisher pages that are determined.

Description

EXTRACTING TASK FLOWS FROM APPLICATION USAGE RECORDS

BACKGROUND

[0001] Many applications collect large amounts of data capturing how users interact with such applications. For example, an Internet browser application or analytics software may collect log records regarding web pages retrieved from the Internet and may mine the log records to determine frequent user navigation patterns.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] FIG. 1 illustrates a block diagram of an example navigation graph of an application;

[0003] FIG. 2 illustrates an example of extracting task flows from usage records of an application;

[0004] FIG. 3 illustrates an example method to determine task flows from session sequence records;

[0005] FIG. 4 illustrates an additional example method to separate task flows from session sequence records;

[0006] FIG. 5 illustrates an example network or system of the present disclosure; [0007] FIG. 6 illustrates a flowchart of an example method to extract task flows;

[0008] FIG. 7 illustrates a flowchart of an example method to generate a plurality of task flows of a user session; and

[0009] FIG. 8 illustrates a high-level block diagram of an example computer that can be transformed into a machine capable of performing the functions described herein.

DETAILED DESCRIPTION

[0010] In one example, the present disclosure describes a device, method, and non-transitory computer-readable medium to extract task flows. For example, a device may determine a navigation graph of an application, calculate at least one navigation hub in the navigation graph, identify task starter pages in the navigation graph based upon links from the at least one navigation hub in the navigation graph, segment usage records of the application into sequences based upon the at least one navigation hub and the task starter pages, determine task finisher pages in the sequences, and extract task flows from the sequences based upon the task finisher pages that are determined.

[0011] In another example, the present disclosure describes a device, method, and non-transitory computer-readable medium to extract task flows. For example, a device may identify at least one navigation hub in a navigation graph of an application based upon a betweenness centrality metric of the at least one navigation hub, identify task starter pages in the navigation graph based upon links from the at least one navigation hub in the navigation graph, segment usage records of the application into sequences based upon the at least one navigation hub and the task starter pages, determine task finisher pages in the sequences, and extract task flows from the sequences based upon the task finisher pages that are determined.

[0012] In another example, the present disclosure describes a device, method, and non-transitory computer-readable medium to generate a plurality of task flows of a user session. For example, a device may receive usage records comprising a sequence of pages of an application utilized during a user session, segment the sequence of pages based upon at least one navigation hub of the application to generate a first plurality of intermediate sequences, segment at least one of the first plurality of intermediate sequences based upon at least one task starter page of the application, the at least one task starter page accessible via a direct link from the at least one navigation hub to generate a second plurality of intermediate sequences, and segment at least one of the second plurality of intermediate sequences based upon a page backtracking in the at least one of the second plurality of intermediate sequences to generate a plurality of task flows of the user session.

[0013] Application usage records may include time-series data comprising various actions that users perform on an application, e.g., an application for a mobile device. Examples of the present disclosure extract sequences from the application usage records where each sequence captures a task flow. Thus, each task flow represents actions taken by a user from the beginning of a task to the end of task. In one example, the present disclosure may generate a navigation graph of an application, and determine navigation hub pages, task starter pages, and task finisher pages in an automated manner. In addition, in one example, the present disclosure may separate different tasks flows within a sequence of usage records by segmenting the usage records based upon navigation hub pages and task starter pages, and further based upon when the usage records are indicative of a backtracking within the navigation graph. Thus, examples of the present disclosure clearly identify the point where a user begins a task and the point where a user finishes or abandons a task.

[0014] Examples of the present disclosure may also be utilized for various purposes. For example, a user interface efficiency may be evaluated by recording the average time spent and the average number of actions performed for a user to complete a particular task as compared to the average time spent and the average number of actions performed for other tasks, or as compared to some other benchmark. In another example, task popularities may be determined by counting the number of times particular tasks are found in the usage records as compared to other tasks. In addition, examples of the present disclosure may provide more accurate records in an automated manner insofar as task flows comprising partial task execution paths are properly counted as instances of a particular task. In another example, a rate of task abandonment may be determined based upon a number of sequences in the usage records that do not include a task finisher page of a particular task execution path. These and other aspects of the present disclosure are described in greater detail below in connection with the example FIGs. 1 -8.

[0015] FIG. 1 illustrates an example navigation graph 100 of an example application, e.g., a mobile application. As illustrated in FIG. 1 , there are a number of nodes, or vertices 1 10 representing pages, or screens of the application that may be visited by a user. The vertices 1 10 are interconnected with a number of links, or edges 120. Each edge 120 may represent a permissible transition between two pages of the application. For ease of illustration, only a few of the edges 120 in FIG. 1 are specifically labeled. In one example, the navigation graph 100 may represent a partial navigation graph of the application.

[0016] In accordance with the present disclosure, pages in an application may be organized in a hierarchical structure where a user starts from a home page or the first page of the application and navigates from the links available on that page. In this organization, some pages act as navigation hubs, providing links to various pages from where a user can begin executing one of a plurality of tasks. Such navigation hubs are often at the top of the hierarchy. Any page linked from the navigation hub is a "task starter page," a screen from where a user task begins. On the other hand, there are "task finisher pages" where a user task may end. A task finisher page is identified by the last page in the longest navigation path from the task starter page. Navigation graph 100 of FIG. 1 represents one example of such a hierarchy. For instance, the "Menu" and "Main Activity" vertices 1 10 may represent navigation hub pages, the "Flight Search," "Booked Flights," "Hotel Search," and "Flight Tracker" vertices 1 10 may represent task starter pages, and so on.

[0017] In one example, the present disclosure may automatically determine a navigation graph, or a partial navigation graph from usage records of the application, identify navigation hubs in the navigation graph, identify starter pages by following links from the navigation hubs, and identify task finisher pages and extract task flows from the usage records. For example, application usage records may comprise time-series data where each record is represented as a tuple <t, s, a> where "s" gives the name of the page, "a" gives the action taken on that page, and "t" gives the time when the action was taken. A user may take several actions on a page such as giving text input, selecting a check box, clicking on a button, and so on. In one example, the application usage records may also include an application launch action and an application exit action. In one example the application launch action is associated with a special page "Device Home" which is not part of the pages/screens of the application. In one example, for the application usage records of a particular user device, the application exit action may be followed by an application launch action, e.g., the next time the user utilizes the application.

[0018] In one example, a user session may be represented by the usage records of a user device between an application launch action and an application exit action. Since a user may take several actions on a page, there may be several usage records indicating actions on the same page before usage records indicating actions on a different page appear. Accordingly, in one example, duplicate screen records may first be removed. After this transformation, the modified usage records may comprise a time-ordered sequence of pages or screens, e.g., "s(1 ), s(2), s(3), ... , s(n)" where s(i) ≠ s(i+1 ) for each user in the application usage records. In one example, this sequence may be referred to as a "session sequence."

[0019] To determine a navigation graph, a data set "D" comprising a number of session sequences of one or more users may be utilized. For instance, in one example, a navigation graph may comprise a directed graph "G," represented as G(V,E) where "V" represents the set of vertices in the navigation graph, and Έ" represents the set of directed edges in the navigation graph. Each vertex may correspond to each unique page/screen in the mobile application. In addition, a directed edge, e(u,v) is found from vertex "u" to vertex "v" when there exists an action "a" that leads to a direct transition from the page corresponding to the vertex "u" to the page corresponding to the vertex "v." An example session sequence is represented in FIG. 2, sequence 210, and is described in greater detail below.

[0020] In one example, data set "D" may be traversed to determine a navigation graph for the application. For instance, for each pair of sequential usage records in a session sequence in the data set "D," if it is the first time the page/screen indicated in the first usage record in the pair is encountered, a vertex "vi " for the page may be added to the set of vertices "V". If it is the first time the page/screen indicated in the second usage record in the pair is encountered, a vertex "V2" for the page may also be added to the set of vertices "V." In addition, if there is no edge, e(vi ,V2), in the set of edges Έ," then the edge "e" may be added to the set of edges Έ." The process may repeated for all or a portion of sequential pairs of usage records in one or more session sequences in the data set "D." For instance, after a first pair of sequential usage records is parsed, the second usage record in the first pair may become the first usage record in the next pair, and so on. After processing all or a portion of data set "D" in this manner, a navigation graph G(V,E) may be determined. An example navigation graph 100 may be visually represented as illustrated in FIG. 1. However, at this point it may not be clear which vertices 1 10 are navigation hubs, task starter pages, task finisher pages, and so forth.

[0021] In one example, navigation hubs may be determined from a navigation graph using a "betweenness centrality metric" and a closeness centrality metric. Navigation hubs may be implemented in various ways in a user interface of an application. For example, there may be a home page with navigation links, there may be pull out navigation menus, e.g., a "navigation drawer," navigation tabs, and so forth. Some mobile applications may also provide a full or partial navigation menu that is accessible from all or a significant portion of the pages of the mobile application. Design patterns for navigation hubs may also evolve and change over time as new applications are launched and as applications are revised and updated. The present disclosure can detect navigation hubs of various types, without making any assumptions as to the particular implementation.

[0022] Notably, navigation hubs may tend to have higher number of outgoing and incoming edges. However, this may be true for intermediate pages as well. For example, the "Flight Details" vertex 1 10 in FIG. 1 indicates that this page also has a high number of outgoing and incoming edges (and therefore has links to a large number of other pages). Navigation hubs may be distinguishable from intermediate screens with many edges insofar as navigation hub screens are among the most used pages and appear to be central in the navigation graph. The centrality of the navigation hubs may be understood from the following two observations: there may exist a path from a navigation hub to most of the vertices in the navigation graph, and any path from a vertex in a task flow to a vertex in some other task flow may go through a navigation hub. Thus, for a navigation hub, the average distance to each vertex in the navigation graph may be less than the same metric computed for other vertices in the navigation graph. In other words, navigation hubs are closest to all the vertices. The second implication of these observations is that if a navigation hub is removed from the navigation graph, it will remove a large number of shortest paths between any pair of vertices in the remaining navigation graph. These two defining features of a navigation hub may be quantified as a "closeness centrality metric" and a "betweenness centrality metric", respectively.

The betweenness centrality metric of a vertex "v" may be represented as: CB(V) =∑ (Ost(v)/Ost) ; s≠ v≠ t Equation 1

Where CB(V) is the betweenness centrality metric for vertex "v," o_st(v) is the number of shortest paths from vertex "s" to vertex "t" that goes through vertex (v), o_st is the total number of shorted paths from vertex "s" to vertex "t," and where o_st(v)/o_st is summed for all possible pairs of vertices (s,t) in the navigation graph "G."

[0023] The closeness centrality metric of a vertex "v" may be represented as:

Cc(v) = |V|/∑ d(y,v) ; y≠ v Equation 2

Where Cc(v) is the closeness centrality metric for vertex "v," |V| represents the total number of vertices in the navigation graph "G," d(y,v) is the distance of a vertex "y" from the vertex "v," and where |V| is divided by d(y,v) for all possible vertices "y" in the navigation graph "G."

[0024] In one example, vertices in navigation graph "G" with a betweenness centrality metric or a closeness centrality metric that exceed a threshold may be determined to be navigation hubs. In one example, a joint score for a vertex combining the betweenness centrality metric and the closeness centrality metric may be calculated, e.g., CB(V) + Cc(v). Accordingly, in one example if the joint score of a vertex exceeds a threshold, the vertex may be determined to be a navigation hub. Returning to the example navigation graph 100 of FIG. 1 , it may be determined that "Main Activity" and "Menu" vertices 1 10 are the navigation hubs.

[0025] As described above, a task starter page is a screen/vertex in the navigation graph where a user begins executing some task. Task starter pages are generally directly accessible from one or more navigation hubs. As such, after navigation hubs are identified in a navigation graph, task starter pages can be identified by following outgoing edges from the navigation hubs. In one example, the destination vertices of these edges may comprise a set of task starter pages of the navigation graph. In the example navigation graph 100 of FIG. 1 , the destination vertices 1 10 from the "Main Activity" navigation hub are {Flight Search, Booked Flights, Hotel Search, Flight Tracker}. A similar set may be obtained for the "Menu" navigation hub. A union of these two sets may provide the set "S" of task starter pages: {Flight Search, Booked Flights, Hotel Search, Flight Tracker}.

[0026] It is possible that a set of task starter pages identified using the above approach may result in spurious pages being included in the set. For example, some user actions and page navigations may be missed during data collection or may be lost from a data set of usage records and/or session sequences for some other reason. In one example, to filter out such cases, a minimum support threshold may be implemented where a potential task starter page/vertex may be removed from the set (or not placed in the set) if a support metric for vertex is below the minimum support threshold. For instance, the support metric may comprise the number of occurrences of the vertex in the data set "D" divided by the total number of occurrences of all vertices in the set "S" in the data set "D." In any case, once a navigation graph, navigation hubs, and task starter pages are determined, individual task flows may then be extracted from the usage records/session sequences.

[0027] FIG. 2 illustrates an example process 200 of determining task finisher pages and extracting task flows from a session sequence. In one example, a task finisher page may comprise a page where a user task ends, or the last page in a longest navigation path from a task starter page in the navigation graph. In FIG. 2, stage 210 may represent an initial session sequence of the application associated with the navigation graph 100 of FIG. 1 , e.g., after duplicate page/screen records have been discarded. Throughout the process 200, the box symbol "□" is used to denote a task completion. In the present example, it can be seen that a user accomplishes four tasks in a single session: searching for a flight, looking at passenger details for a booked flight, looking at baggage details for a booked flight, and making a hotel reservation. However, it should be noted that at the initial stage 210, the component tasks of the session sequence may not yet be determined.

[0028] For instance, all tasks execution paths start on a task starter page. However, it can be seen that to transition from "Passenger Details" to "Baggage Details," the user backtracks to the "Flight Details" and follows "Baggage Details" from that page. In the present example, backtracking may be defined as a transition to a page previously visited that is not either a navigational hub or a task starter page. This transition behavior is different from the transition via the "Menu" page, which does not involve backtracking but instead returns the user to a task starter page from the "Menu Page," e.g., "Menu→ Booked Flights." In addition, the backtracking results in the sequence "→ Flight Details → Baggage Details □," which is only a partial task flow. The real task flow begins at "Booked Flights" and not at the "Flight details" screen. In addition, in the last transition, the sequence passes directly from 'Baggage Details" to "Hotel Search." This may be a consequence of the application including a shortcut in the "Baggage details" page, for example. Such seamless transitions further complicate task flow extraction.

[0029] As mentioned above, the present disclosure considers that a task flow begins on either a navigation hub or a task starter page. In this regard, process 200 may comprise initially segmenting a session sequence based upon navigation hubs in the navigation graph. The result is shown in stage 220 of the process 200. For example, there are now two intermediate sequences, "sub-Sequence 1" and "sub-Sequence 2." Sub-Sequence 1 is a complete task flow for a flight search task, whereas the Sub-Sequence 2 still has three task flows for "Passenger Details," "Baggage Details" and "Hotel Booking."

[0030] Next, the process 200 may include splitting the intermediate sequences of stage 220 across task starter pages. In the present example, the set of task starter pages "S" may include: {Flight Search, Booked Flights, Hotel Search, Flight Tracker}. Splitting across task start pages does not impact Sub-Sequence 1 , but results in the two new sub-sequences for Sub-Sequence 2, e.g., Sub-Sequence 2A and Sub-Sequence 2B. Sub-sequence 2B has a complete task flow for a "Hotel Booking" task, but the Sub-Sequence 2A still includes two task flows. The result is shown in stage 230 of the process 200.

[0031] It can be seen in Sub-Sequence 2A that a user completes a "Passenger Details" task, navigates back to a "Flight Details" page, and proceeds to complete another task of "Baggage Details." If Sub-Sequence 2A were to be split at the end of the "Passenger Details," the result would be a task flow of "Flight Details→ Baggage Details." However, this is not a complete task flow since a user cannot reach the "Flight Details" page without navigating through the "Booked Flights" page. Hence, the complete task flow is given by "Booked Flights→ Flight Details → Baggage Details."

[0032] Accordingly, in one example, process 200 may include segmenting the intermediate sequences of stage 230 based upon a detected page backtracking. As mentioned above, page backtracking may comprise a transition to a page previously visited that is not either a navigational hub or a task starter page. Where page backtracking is detected, a sequence (or sub-sequence/intermediate sequence) may be partitioned into additional sub-sequences. In addition, in one example additional pages may be prepended to the sub-sequence that begins with the backtracked page. In this case, Sub-Sequence 2A is affected, while Sub- Sequence 1 and Sub-Sequence 2B are not affected. The result is shown in stage 240 of FIG. 2. In one example, the transition from stage 230 to stage 240 may comprise a maximum forward reference traversal of Sub-Sequence 2A. In one example, this may also be referred to as a longest task flow traversal.

[0033] In the present example, backtracking may be detected from the "Passenger Details" page to the "Flight Details" page in Sub-Sequence 2A. Thus, Sub-Sequence 2A may be split as "Booked Flights→ Flight Details→ Passenger Details" and "Flight Details→ Baggage Details," respectively. In addition, the latter sub-sequence may be prepended with pages of the former sub-sequence starting from a task starter page up to the common page/backtracked page. In this case, the missing task starter page of "Booked Flights" is added. The final result is, Sub- Sequence 2A(i), "Booked Flights→ Flight Details→ Passenger Details" and Sub- Sequence 2A(ii), "Booked Flights → Flight Details → Baggage Details." In addition, the last set of sub-sequences in stage 240 may be determined to be the task flows of the session sequence.

[0034] FIG. 3 represents example pseudo-code of an example method 300 formalizing the process 200. The input is the data set "D" containing a plurality of session sequences, where each session sequence may comprise application usage records for a separate user session, a set of navigation hubs "H," and a set of task starter pages "S." The output is the data set "DT" containing task flows and corresponding counts of occurrences of the task flows. The "If... Else" statements within the nested "For" loop are to partition a session sequence, or sub-sequence, based upon navigation hubs and task starter pages. A sub-routine, EXTRACT- FLOW, further partitions sub-sequences based upon a detection of backtracking, e.g., by performing a maximum forward reference traversal.

[0035] FIG. 4 represents example pseudo-code of an example method 400, referred to as EXTRACT-FLOW, to partition sub-sequences based upon a detection of backtracking, e.g., by performing a maximum forward reference traversal. In one example, the method 400, EXTRACT-FLOW, may be called as a sub-routine from within the method 300 of FIG. 3. The input is a sequence "C," which may be passed to the sub-routine/method 400 from method 300 of FIG. 3, and the output is data set "DT" containing task flows and corresponding counts of occurrences of the task flows.

[0036] FIG. 5 is a block diagram depicting one example of a network, or system 500. The system 500 may comprise various types of networks that are interconnected to provide communications between devices in accordance with the present disclosure. As illustrated in FIG. 5, the system 500 includes a communication network 590, which may comprise a single network, or an integrated network that may include a packet network such as an Internet Protocol (IP) network (e.g., an IP Multimedia Subsystem (IMS) network), an asynchronous transfer mode (ATM) network, a wireless network, a cellular network (e.g., 2G, 3G, and the like), a long term evolution (LTE) network, and so forth. It should be noted that an IP network is broadly defined as a network that uses Internet Protocol to exchange data packets. In one example, the communication network 590 may comprise a core network of a telecommunications service provider and one or more access networks, e.g., an Internet service provider (ISP) network. In one example, the communication network 590 may be operated by a single telecommunications service provider or by a combination of several different telecommunications service providers. In one example, network 590 may include one or more local networks such as a residential or small office/home office (SOHO) network, a network of a business, academic institution, and so forth, and may comprise any one or more of a IP network, an Ethernet network, a local area network (LAN), a wireless local area network (WLAN), a combination of any of such types of networks, and the like. In one example, the communication network 590 may represent the Internet in general.

[0037] In one example, user devices 510, 520, and 530 may communicate with servers 540 and 550 over network 590, e.g., using one or more packet/datagram network protocols over a physical infrastructure. In one example, any one or more of user devices 510, 520, and 530, and servers 540 and 550 may comprise a computing device 800 as illustrated in FIG. 8 and discussed below. In one example, server 540 may comprise a web server for serving pages related to an application, e.g., a mobile application. In one example, server 550 may comprise an analytics server for receiving application usage records from user devices 510, 520, and 530 and/or from server 540.

[0038] To illustrate, in one example, each of user devices 510, 520, and 530 may comprise at least a processor, and a memory or non-transitory computer- readable medium, i.e., a non-transitory computer-readable storage device. For example, user devices 510, 520, and 530 may include processors 514, 524, and 534, respectively, and memories 515, 525, and 535, respectively. Each of the memories 515, 525, and 535 may store processor/machine/computer-readable code, programs and/or instructions which, when executed by one of the processors 514, 524, and 534, cause the respective processor/user device to provide an application for a user. For instance, in one example, each of user devices 510, 520, and 530 may comprise a mobile device, such as a smartphone, a networked tablet device, a laptop computer, a personal digital assistant, or the like. Thus, in one example, the application may comprise a mobile application for booking a flight, making a hotel reservation, searching for restaurants, playing games, communicating with other users, consuming entertainment media, and so forth. Alternatively, or in addition, any one or more of user devices 510, 520, or 530 may comprise a desktop computer, a personal computer, a server providing a virtual machine and/or cloud based desktop for the user, and so forth. In other words, the present disclosure is not limited to mobile applications, but is broadly applicable to various types of applications for various user device types.

[0039] In one example, the application may be supported by server 540. For instance, server 540 may comprise a web server to respond to requests for content of an application, and to deliver content, such as various pages of the application and/or text, images, videos, interactive forms, and so forth to a requesting user device. Accordingly, server 540 may include a processor 544 and a memory 545 storing processor/machine/computer-readable code, programs and/or instructions which when executed by the processor 544, cause the processor 544 and/or server 540 to provide services to user devices in connection with the application.

[0040] In addition, any one or more of user devices 510, 520, and 530, and server 540 may collect usage records regarding user actions with respect to the application and how users navigate through pages of the application. The usage records may take the form described above, e.g., with the name of a page, an action taken on that page, and the time when the action was taken. However, in other examples, the usage records may take a different form. For example, the usage records may simply include a page identifier and indicate the order in which the page was visited in relation to other pages of the application during a user session. It should be noted that for some applications, a full view of user actions may be collected by either the user device or a server supporting the application. However, for other applications, various page transitions may not trigger a user device-server interaction. Thus, in such examples the collection of usage records may be exclusively or predominantly by the user device.

[0041] In one example, memory 555 of server 550 may store processor/machine/computer-readable code, programs and/or instructions which when executed by the processor 554, cause the processor 554 and/or server 550 to perform analytics on the usage records. For example, the processor/machine/computer-readable code may cause the processor 554 and/or server 550 to perform operations as described in connection with either or both of the example methods 600 and 700 of FIGs. 6 and 7, respectively, or in connection with any of the examples described herein. User devices 510, 520, and 530 and/or server 540 may collect usage records and forward the usage records to server 550, e.g., periodically, in response to a request from server 550, and so forth.

[0042] It should be noted that the system 500 has been simplified. For example, the network 590 may include other network elements (not shown) such as border elements, routers, switches, policy servers, security devices, gateways, a content distribution network (CDN) and the like. In addition, although server 540 and server 550 are illustrated as individual devices, it should be noted that multiple servers may be employed to provide the same functions. For instance, the architecture of system 500 may include backup servers, overflow servers, mirror servers, and the like to provide functions redundant or supplemental to servers 540 and 550. In still another example, server 550 may be omitted, and the analytics functions of server 550 may instead be performed on server 540. Various other modifications to system 500 of a same or similar nature are also possible in accordance with the present disclosure.

[0043] FIG. 6 illustrates a flowchart of an example method 600 to extract task flows. The method 600 may be performed, for example, by any one or more of the components of the system 500 illustrated in FIG. 5. For example, the method 600 may be performed by one of the servers 540 or 550. However, the method 600 is not limited to implementation with the system 500 illustrated in FIG. 5, but may be applied in connection with any number of systems and communication networks. Alternatively, or in addition, one or more blocks of the method 600 may be implemented by a computing device having a processor, a memory, and input/output devices as illustrated below in FIG. 8, specifically programmed to perform the blocks of the method. Although any one of the elements in system 500 of FIG. 5, or in a similar system, may be configured to perform various blocks of the method 600, the method will now be described in terms of an example where blocks of the method 600 are performed by a processor, such as processor 802 in FIG. 8.

[0044] The method 600 begins in block 605 and proceeds to block 610. In block 610, the processor may determine a navigation graph of an application. In one example, the processor may receive application usage records from user devices and/or a server supporting an application. In one example, the usage records may be separated into session sequences, each session sequence representing usage records for a user device collected between launching the application and closing the application on the user device. In one example, the usage records may include the name of a page, an action taken on that page, and the time when the action was taken. In one example, duplicate records relating to a same page that do not indicate a transition to another page may be removed from the usage records to create a session sequence.

[0045] In one example, block 610 may include examining pairs of usage records in a sequential order and, for each of a plurality of pairs of the usage records of the application: examining a first usage record and a second usage record of the pair, the first usage record associated with a first page of the application and the second usage record associated with a second page of the application, adding a first vertex to the navigation graph for the first page, when it is a first time that the first page is encountered, adding a second vertex to the navigation graph for the second page, when it is a first time that the second page is encountered, and adding an edge to the navigation graph connecting the first vertex to the second vertex, when it is a first time that a transition from the first page to the second page is encountered.

[0046] In block 620, the processor may calculate at least one navigation hub in the navigation graph. In one example, the at least one navigation hub is calculated based upon a betweenness centrality metric. In one example, the betweenness centrality metric may be calculated in accordance with Equation 1 above. For example, the betweenness centrality metric may be calculated by calculating a number of shortest paths between a pair of vertices in the navigation graph that pass through the at least one navigation hub divided by the number of shortest paths between the pair of vertices summed over all pairs of vertices in the navigation graph. In one example, the at least one navigation hub is identified when the betweenness centrality metric is greater than a threshold value. In another example, the at least one navigation hub may be calculated based upon a closeness centrality metric. For instance, the closeness centrality metric may be calculated in accordance with Equation 2 above. In still another example, the at least one navigation hub may be calculated based upon a combination of the betweenness centrality metric and the closeness centrality metric.

[0047] For example, each vertex in the navigation graph may be a candidate navigation hub. Thus, the betweenness centrality metric and the closeness centrality metric may be calculated for each vertex in the graph. If either of the metrics, or the combined metric exceeds a threshold, the vertex may be considered a navigation hub (or ruled out as a navigation hub, depending upon the nature of the threshold, e.g., a floor or ceiling, and/or depending upon a direction in which the threshold is exceeded). In one example, less then all of the vertices in the navigation graph may be evaluated as outlined above. For instance, vertices may initially be screened such that vertices with greater than a threshold number of connecting edges are eligible for further evaluation based upon the betweenness centrality metric and/or the closeness centrality metric, while vertices with less than the threshold number of connecting edges are ruled out as candidate navigation hubs.

[0048] In block 630, the processor may identify task starter pages in the navigation graph based upon links from the at least one navigation hub in the navigation graph. For example, task starter pages can be identified by following outgoing edges from the at least one navigation hub in the navigation graph. Thus, a vertex may be determined to be a task starter page when it has a direct edge connecting to the at least one navigation hub. In addition, in one example, to eliminate false positive matches, a minimum support threshold may be implemented where a potential task starter page/vertex may be removed from an initial set of task starter pages found by following edges from the at least one navigation hub (or not placed in the set) if a support metric for vertex is below the minimum support threshold. Conversely, the vertex may be placed in the set if the support metric exceeds the minimum support threshold.

[0049] In block 640, the processor may segment usage records of the application into sequences based upon the at least one navigation hub and the task starter pages. In one example, the usage records may comprise the same usage records as used to determine the navigation graph at block 610. In another example, different usage records may be used at block 640. For example, a first portion of usage records may comprise "test data" to create a navigation graph and to determine the at least one navigation hub and task starter pages, while a second portion of the usage records may be used as source data for extracting task flows, which may then be used for various analytics associated with the application. In one example, the usage records may be initially divided based upon user sessions into a plurality of session sequences, and then segmented into intermediate sequences based upon the at least one navigation hub and the task starter pages. In one example, the segmenting may be performed as described above in connection with the process 200 of FIG. 2 and/or the method 300 of FIG. 3.

[0050] In block 650, the processor may determine task finisher pages in the sequences. In one example, task finisher pages are identified in the sequences (or intermediate sequences) when a backtracking to a previously visited page is detected within the sequence. In one example, task finisher pages may be determined based upon a maximum forward reference traversal of the usage records within the respective sequences. In one example, task finisher pages may be determined as described above in connection with the process 200 of FIG. 2, the method 300 of FIG. 3, and/or the method 400 of FIG. 4.

[0051] In block 660, the processor may extract task flows from the sequences based upon the task finisher pages that are determined. For instance, one or more of the sequences (or intermediate sequences) may be further segmented into additional sequences where a task finisher page that indicates backtracking is detected. The resulting set of sequences may then comprise the task flows from within the usage records of one or more session sequences. In one example, for sequences/task flows that do not begin at a navigation hub or task starter page, the task flow may be prepended with a plurality of pages associated with the task execution path. For instance, as described above, when backtracking is detected within a sequence, the sequence may be further segmented. However, the subsequence that includes pages after the backtracking may comprise a partial task flow. As such, the sub-sequence after the backtracking may be prepending with pages of the sub-sequence prior to the backtracking, e.g., pages up to the first occurrence of the point of backtracking. The prepending may be performed as described above in connection with the process 200 of FIG. 2 and/or the method 400 of FIG. 4. Following block 660, the method 600 proceeds to block 695 where the method ends.

[0052] FIG. 7 illustrates a flowchart of an example method 700 to generate a plurality of task flows of a user session. The method 700 may be performed, for example, by any one or more of the components of the system 500 illustrated in FIG. 5. For example, the method 700 may be performed by one of the servers 540 or 550. However, the method 700 is not limited to implementation with the system 500 illustrated in FIG. 5, but may be applied in connection with any number of systems and communication networks. Alternatively, or in addition, one or more blocks of the method 700 may be implemented by a computing device having a processor, a memory, and input/output devices as illustrated below in FIG. 8, specifically programmed to perform the blocks of the method. Although any one of the elements in system 500 of FIG. 5, or in a similar system, may be configured to perform various blocks of the method 700, the method will now be described in terms of an example where blocks of the method 700 are performed by a processor, such as processor 802 in FIG. 8.

[0053] The method 700 begins in block 705 and proceeds to block 710. In block 710, the processor may receive usage records comprising a sequence of pages of an application utilized during a user session. For example, the processor may receive application usage records from user devices and/or a server supporting an application. In one example, the usage records may be separated into session sequences, each session sequence representing usage records for a user device collected between launching the application and closing the application on the user device. In other words, the session sequence may represent usage records of the user session.

[0054] In block 720, the processor may segment the sequence of pages based upon at least one navigation hub of the application to generate a first plurality of intermediate sequences. For instance, a navigation graph of the application and at least one navigation hub within the navigation graph may be determined in any manner as described above and/or provided to the processor to use in performing the method 700. In one example, the segmenting of block 720 may be performed as described above in connection with the process 200 of FIG. 2 and/or the method 300 of FIG. 3.

[0055] In block 730, the processor may segment at least one of the first plurality of intermediate sequences based upon at least one task starter page of the application into a second plurality of intermediate sequences. For instance, task starter pages may represent pages that are accessible via a direct link from at least one navigation hub. In one example, a navigation graph of the application and at least one task starter page may be determined in any manner as described above and/or provided to the processor to use in performing the method 700. In one example, the segmenting of block 730 may be performed as described above in connection with the process 200 of FIG. 2 and/or the method 300 of FIG. 3.

[0056] In block 740, the processor may segment at least one of the second plurality of intermediate sequences based upon a page backtracking in the at least one of the second plurality of intermediate sequences to generate a plurality of task flows of the user session. In one example, a task finisher page may be identified in the at least one of the second plurality sequences (or intermediate sequences) when a backtracking to a previously visited page is detected. In addition, the at least one of the second plurality of intermediate sequences may be further divided at the point of backtracking to generate at least two of the plurality of task flows. In one example, a partial task flow may result from the dividing at the point of backtracking. Thus, in one example, block 740 may further include prepending pages to the task flow from after the backtracking with pages from the task flow prior to the backtracking, e.g., pages up to the first occurrence of the point of backtracking. In one example, segmenting at least one of the second plurality of intermediate sequences based upon a page backtracking may comprise a maximum forward reference traversal of the at least one of the second plurality of intermediate sequences. In one example, the segmenting of block 740 may be performed as described above in connection with the process 200 of FIG. 2 and/or the method 400 of FIG. 4. Following block 740, the method 700 proceeds to block 795 where the method ends.

[0057] It should be noted that although not explicitly specified, one or more blocks, functions, or operations of the methods 600 and 700 described above may include storing, displaying, and/or outputting. In other words, any data, records, fields, and/or intermediate results discussed in the methods can be stored, displayed, and/or outputted to another device depending on the particular application. Furthermore, blocks, functions, or operations in FIGs. 6 and 7 that recite a determining operation, or involve a decision, do not necessarily imply that both branches of the determining operation are practiced. In other words, one of the branches of the determining operation can be deemed as optional. In addition, various blocks of the respective methods 600 and 700 may be considered optional in various examples. For instance, in one example, method 600 may omit block 610. For example, a pre-determined navigation graph may be provided for use in connection with the method 600.

[0058] In addition, it should be noted that the respective methods 600 and 700 may also be expanded to include additional operations and functions as described above in connection with various examples. For instance, the methods 600 and 700 may further include presenting the task flows that are extracted, or generated. For instance, the task flows may be presented via a user interface in the form of a chart, table, as overlay data on a trie structure depicting the navigation graph, and so forth. The presentation of the task flows may include counts of the task flows, or other measures derived from a plurality of task flows. For example, either of the method 600 or the method 700 may further include determining an average number of page transitions to complete a task associated with a task flow and presenting the average number of page transitions that is determined. For example, the average number of page transitions may be based upon at least a first task flow of a user session and at least a second task flow of the same user session or a different user session that is associated with the same task. For instance, referring to the example application of the navigation graph 100 of FIG. 1 , a first task flow may indicate that a first user reached a "Boarding Pass" page in three page transitions, while a second task flow may indicate that a second user reached the "Boarding Pass" page in two page transitions. For example, the second user may first visit a "Passenger Details" page and backtrack to a "Flight Details" page to reach the "Boarding Pass" page. Thus, this type of information may be aggregated over a number of task flows related to the task to determine an average number of page transitions to complete the task.

[0059] FIG. 8 depicts a high-level block diagram of a computing device suitable for use in performing the functions described herein. As depicted in FIG. 8, the computer 800 comprises a hardware processor element 802, e.g., a central processing unit (CPU), a microprocessor, or a multi-core processor, a memory 804, e.g., random access memory (RAM), a module 805 to extract task flows or to generate a plurality of task flows of a user session, and various input/output devices 806, e.g., storage devices, including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive, a receiver, a transmitter, a speaker, a display, a speech synthesizer, an output port, an input port and a user input device, such as a keyboard, a keypad, a mouse, a microphone, and the like. Although one processor element is shown, it should be noted that the general- purpose computer may employ a plurality of processor elements. Furthermore, although one computer is shown in the figure, if the method(s) as discussed above is implemented in a distributed or parallel manner for a particular illustrative example, i.e., the blocks of the above method(s) or the entire method(s) are implemented across multiple or parallel computers, then the computer of this figure is intended to represent each of those multiple computers.

[0060] It should be noted that the present disclosure can be implemented by machine readable instructions and/or in a combination of machine readable instructions and hardware, e.g., using application specific integrated circuits (ASIC), a programmable logic array (PLA), including a field-programmable gate array (FPGA), or a state machine deployed on a hardware device, a general purpose computer or any other hardware equivalents, e.g., computer readable instructions pertaining to the method(s) discussed above can be used to configure a hardware processor to perform the blocks, functions and/or operations of the above disclosed methods.

[0061] In one example, instructions and data for the present module or process 805 to extract task flows or to generate a plurality of task flows of a user session, e.g., machine readable instructions can be loaded into memory 804 and executed by hardware processor element 802 to implement the blocks, functions, or operations as discussed above in connection with the example methods 600 and 700. Furthermore, when a hardware processor executes instructions to perform "operations," this could include the hardware processor performing the operations directly and/or facilitating, directing, or cooperating with another hardware device or component, e.g., a co-processor and the like, to perform the operations.

[0062] For instance, in one example, the module 805 may include a plurality of computer-readable components, including a determine navigation graph component 81 1 , a calculate navigation hub component 812, an identify task starter page component 813, a segment usage records component 814, a determine task finisher page component 815, and an extract task flows component 816. When executed by the hardware processor element 802, the determine navigation graph component 81 1 may cause the hardware processor element 802 to determine a navigation graph of an application, the calculate navigation hub component 812 may cause the hardware processor element 802 to calculate at least one navigation hub in the navigation graph, the identify task starter page component 813 may cause the hardware processor element 802 to identify task starter pages in the navigation graph based upon links from the at least one navigation hub in the navigation graph, the segment usage records component 814 may cause the hardware processor element 802 to segment usage records of the application into sequences based upon the at least one navigation hub at the task starter pages, the determine task finisher page component 815 may cause the hardware processor element 802 to determine task finisher pages in the sequences, and the extract task flows component 816 may cause the hardware processor element 802 to extract task flows from the sequences based upon the task finisher pages, e.g., as per the example method 600. When executed by the hardware processor element 802, the additional functions component 820 may cause the hardware processor element 802 to perform additional functions, e.g., in connection with the example method 600 as described above, and/or to perform any one or more additional functions as described in the present disclosure. The foregoing is just one example configuration of module 805 in accordance with the present disclosure.

[0063] In another example, the module 805 may include a different plurality of computer-readable components, including a receive usage records component 816, a segment sequence of pages component 817, a segment first plurality of intermediate sequences component 818, and a generate task flows component 819. When executed by the hardware processor element 802, the receive usage records component 816 may cause the hardware processor element 802 to receive usage records comprising a sequence of pages of an application utilized during a user session, the segment sequence of pages component 817 may cause the hardware processor element 802 to segment the sequence of pages based upon at least one navigation hub of the application to generate a first plurality of intermediate sequences, the segment first plurality of intermediate sequences component 818 may cause the hardware processor element 802 to segment at least one of the first plurality of intermediate sequences based upon at least one task starter page of the application, and the generate task flows component 819 may cause the hardware processor element 802 to segment at least one of the second plurality of intermediate sequences based upon a page backtracking in the at least one of the second plurality of intermediate sequences to generate a plurality of task flows of the user session, e.g., as per the example method 700. When executed by the hardware processor element 802, the additional functions component 820 may cause the hardware processor element 802 to perform additional functions, e.g., in connection with the example method 700 as described above, and/or to perform any one or more additional functions as described in the present disclosure. Again, the foregoing is just one example configuration of module 805 in accordance with the present disclosure.

[0064] The processor executing the machine readable instructions relating to the above described method(s) can be perceived as a programmed processor or a specialized processor. As such, the present module 805 to extract task flows or to generate a plurality of task flows of a user session, including associated data structures, of the present disclosure can be stored on a tangible or physical (broadly non-transitory) computer-readable storage device or medium, e.g., volatile memory, non-volatile memory, ROM memory, RAM memory, magnetic or optical drive, device or diskette and the like. Furthermore, the computer-readable storage device may comprise any physical devices that provide the ability to store information such as data and/or instructions to be accessed by a processor or a computing device such as a computer or an application server.

[0065] It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, or variations therein may be subsequently made, which are also intended to be encompassed by the following claims.

Claims

What is claimed is:

1 . A method comprising:

determining, by a processor, a navigation graph of an application;

calculating, by the processor, at least one navigation hub in the navigation graph;

identifying, by the processor, task starter pages in the navigation graph based upon links from the at least one navigation hub in the navigation graph; and segmenting, by the processor, usage records of the application into sequences based upon the at least one navigation hub and the task starter pages; determining, by the processor, task finisher pages in the sequences; and extracting, by the processor, task flows from the sequences based upon the task finisher pages that are determined.

2. The method of claim 1 , wherein a vertex in the navigation graph is identified as one of the task starter pages when the vertex has a direct edge to the at least one navigation hub that is identified.

3. The method of claim 1 , wherein a vertex of the navigation graph is identified as one of the task starter pages when the vertex has a direct edge to the at least one navigation hub that is identified and when a support metric of the vertex exceeds a minimum support threshold.

4. The method of claim 1 , comprising:

presenting the task flows that are extracted.

5. The method of claim 1 , wherein the task finisher pages are determined based upon a maximum forward reference traversal of the usage records.

6. The method of claim 5, wherein for at least one of the task flows, the extracting comprises:

prepending a plurality of pages to the at least one task flow, when the at least one task flow does not begin at a task starter page.

7. The method of claim 1 , wherein the navigation graph of the application is determined by, for each of a plurality of pairs of the usage records of the application:

examining a first usage record and a second usage record of the pair, the first usage record associated with a first page of the application and the second usage record associated with a second page of the application;

adding a first vertex to the navigation graph for the first page, when the first page is encountered for a first time;

adding a second vertex to the navigation graph for the second page, when the second page is encountered for a first time; and

adding an edge to the navigation graph connecting the first vertex to the second vertex, when a transition from the first page to the second page is encountered for a first time.

8. A device comprising:

a processor; and

a non-transitory computer-readable medium storing instructions which, when executed by the processor, cause the processor to:

identify at least one navigation hub in a navigation graph of an application based upon a betweenness centrality metric of the at least one navigation hub;

identify task starter pages in the navigation graph based upon links from the at least one navigation hub in the navigation graph;

segment usage records of the application into sequences based upon the at least one navigation hub and the task starter pages;

determine task finisher pages in the sequences; and extract task flows from the sequences based upon the task finisher pages that are determined.

9. The device of claim 8, wherein the at least one navigation hub is identified based upon the betweenness centrality metric of the at least one navigation hub and a closeness centrality metric of the at least one navigation hub.

10. The device of claim 8, wherein the betweenness centrality metric is determined by:

calculating a number of shortest paths between a pair of vertices in the navigation graph that pass through the at least one navigation hub divided by the number of shortest paths between the pair of vertices summed over all pairs of vertices in the navigation graph, wherein the at least one navigation hub is identified when the betweenness centrality metric is greater than a threshold value.

1 1 . The device of claim 8, wherein the non-transitory computer-readable medium stores additional instructions which, when executed by the processor, cause the processor to:

present the task flows that are extracted.

12. A method comprising:

receiving, by a processor, usage records comprising a sequence of pages of an application utilized during a user session;

segmenting, by the processor, the sequence of pages based upon at least one navigation hub of the application to generate a first plurality of intermediate sequences;

segmenting, by the processor, at least one of the first plurality of intermediate sequences based upon at least one task starter page of the application, the at least one task starter page accessible via a direct link from the at least one navigation hub to generate a second plurality of intermediate sequences; and

segmenting, by the processor, at least one of the second plurality of intermediate sequences based upon a page backtracking in the at least one of the second plurality of intermediate sequences to generate a plurality of task flows of the user session.

13. The method of claim 12, wherein the segmenting the at least one of the second plurality of intermediate sequences based upon the page backtracking comprises a maximum forward reference traversal of the at least one of the second plurality of intermediate sequences.

14. The method of claim 12, comprising:

determining an average number of page transitions to complete a task associated with a task flow, the average number of page transitions based upon at least one of the plurality of task flows of the user session; and

presenting the average number of page transitions to complete the task that is determined.

15. The method of claim 12, comprising:

presenting the plurality of task flows that is determined.