WO2021144680A1 - Interface classification system - Google Patents
Interface classification system Download PDFInfo
- Publication number
- WO2021144680A1 WO2021144680A1 PCT/IB2021/050169 IB2021050169W WO2021144680A1 WO 2021144680 A1 WO2021144680 A1 WO 2021144680A1 IB 2021050169 W IB2021050169 W IB 2021050169W WO 2021144680 A1 WO2021144680 A1 WO 2021144680A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- interface
- object hierarchy
- type
- interfaces
- paths
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
Definitions
- FIG. 1 illustrates an example of a system for classifying interface pages in accordance with an embodiment
- FIG. 2 illustrates an example of one type of interface in accordance with an embodiment
- FIG. 3 illustrates an example of another type of interface in accordance with an embodiment
- FIG. 4 illustrates an example of still another type of interface in accordance with an embodiment
- FIG. 5 illustrates an example of object hierarchy paths in accordance with an embodiment
- FIG. 6 illustrates an example diagram of non-intersecting object hierarchy paths between types of interface pages in accordance with an embodiment
- FIG. 7 illustrates an example diagram of intersecting object hierarchy paths between interface pages of the same type among different interface providers in accordance with an embodiment
- FIG. 8 illustrates an example of categorizing an interface page by providing its feature vector to a machine learning algorithm in accordance with an embodiment
- FIG. 9 is a flowchart that illustrates an example of building a category dictionary in accordance with an embodiment
- FIG. 10 is a flowchart that illustrates an example of training a machine learning algorithm to categorize interfaces in accordance with an embodiment
- FIG. 11 is a flowchart that illustrates an example of determining an interface type in accordance with an embodiment.
- FIG. 12 illustrates a computing device that may be used in accordance with at least one embodiment.
- a system may obtain a pair of interfaces comprising a first interface of a first type and second interface of a second type from a first interface provider.
- the system may further, through one or more processes, obtain source code of the first and second interface.
- the system may determine, based on an object model derived from a first portion of the source code that corresponds to the first interface, a first set of object hierarchy paths of the first interface, which may be a set of paths that may represent objects of the first interface.
- the system may further obtain an additional interface of the first type from a second interface provider, determine a second subset of object hierarchy paths of the additional interface based on source code of the additional interface, and compare the second subset of object hierarchy paths with the first subset of object hierarchy paths.
- the system may generate a category dictionary that may comprise object hierarchy paths that are common to the first subset of object hierarchy paths and the second subset of object hierarchy paths.
- the system may generate a feature vector based on a third interface from a third interface provider and the category dictionary, in which the feature vector may correspond to object hierarchy paths of the third interface that may match object hierarchy paths of the category dictionary.
- the third interface may also be of the first type.
- the feature vector may be utilized to train one or more machine learning algorithms; such one or more machine learning algorithms thereby may be configured to, upon receipt of a different feature vector as input, output an indication of a type of interface of an interface from which the different feature vector was derived.
- the system may utilize the one or more machine learning algorithms to perform operations on other interfaces of the first type that may be specific to interfaces of the first type.
- the above mentioned interfaces may be interfaces of an interface provider, which may provide various services.
- the interface provider may be a library organization of many library organizations that utilize one or more interfaces that users may interact with to access the services of the library organization.
- a system of the present disclosure may analyze the interfaces of the library organization, as well as interfaces of other library organizations, service organizations, and/or variations thereof, to determine a category dictionary for various types of interfaces (e.g., home pages, item pages, settings pages, queue pages, loading pages, and/or variations thereof).
- the system may utilize the category dictionary and the various interfaces to train a machine learning algorithm, which may comprise one or more classifications and machine learning algorithms such as a recurrent neural network (RNN), convolutional neural network (CNN), a random forest classifier, and/or variations or combinations thereof, to determine a type of interface for a given feature vector.
- a machine learning algorithm may, upon input of a feature vector, output a classification that may indicate the type of interface the feature vector was generated from.
- a global object variable of a first interface corresponding to a first interface type and a global object variable of a second interface corresponding to a second interface type is obtained from a first interface provider.
- a first set of object hierarchy paths is determined based on the global object variable of the first interface.
- a second set of object hierarchy paths is determined based on the global object variable of the second interface.
- a first subset of the first set of object hierarchy paths that is disjoint from the second set of object hierarchy paths is determined. Also in the example, additional global object variable of an additional interface corresponding to the first interface type is obtained from a second interface provider. Additionally in the example, a second subset of object hierarchy paths is determined based on the additional global object variable. [0022] Still further in the second example, a category dictionary is generated based on an intersection between the first subset of object hierarchy paths and the second subset of object hierarchy paths. Also in the example, a first set of feature vectors corresponding to the first interface type is generated, based on the category dictionary and the first set of object hierarchy paths.
- a second set of feature vectors corresponding to the second interface type is generated based on the category dictionary and the second set of object hierarchy paths.
- a machine learning algorithm is trained based on the first set of feature vectors with the first interface type as a ground truth value and the second set of feature vectors with the second interface type as the ground truth value.
- Techniques described and suggested in the present disclosure improve the field of computing, especially the field of software development, by enabling software agents and other software or hardware tools to identify a particular type of interface to then determine how it can be interacted with. Additionally, techniques described and suggested in the present disclosure improve the speed and accuracy of systems that identify/determine interface types using machine learning algorithms trained using feature vectors as described in the present disclosure. Moreover, techniques described and suggested in the present disclosure are necessarily rooted in computer technology in order to overcome problems specifically arising with being able to automate human interaction with interfaces.
- FIG. 1 illustrates an example 100 of a system for classifying interface pages, according to various embodiments.
- the example 100 may include an interface type classifier 102, which may classify a set of interfaces 104 to generate a set of classified interfaces 106A-06C, which may be usable by a client device 108.
- the interface type classifier 102 may receive the set of interfaces 104, and perform one or more processes to classify the set of interfaces 104 to produce the set of classified interfaces 106A-06C, which may be usable in one or more processes by the client device 108.
- the client device 108 may be any entity operable to access various systems and/or services, such as the interface type classifier 102.
- the client device 108 may be a physical device, such as a physical server computer, a mobile communication device, a laptop computer, a tablet computer, a personal computer, a mainframe, etc., or a virtual computing instance, such as a virtual machine hosted on one or more computing servers.
- the client device 108 may be operable by one or more clients that may utilize the interface type classifier 102.
- the client device 108 may run various software applications that may interact with the interface type classifier 102, the set of interfaces 104, and the set of classified interfaces 106A-06C.
- the set of interfaces 104 may be a set of interfaces provided by an interface provider, service provider, and/or variations thereof. Examples of such services the set of interfaces 104 may be associated with include data processing, data storage, software applications, security, encryption, library services, utility services, television services, entertainment services and/or other such services.
- an interface provider may also be a service provider that may provide various interfaces to access one or more services.
- the set of interfaces 104 may be interfaces that allow an entity to access one or more services of a service provider that may provide the set of interfaces 104.
- the set of interfaces 104 may be one or more interfaces of various services that may be accessible through the Internet, and in some examples, the set of interfaces 104 may be identified with one or more uniform resource identifiers (URIs).
- the set of interfaces 104 may also be a set of web pages.
- the set of interfaces 104 may be of various types, such as home pages, item pages, collection pages, queue pages, search pages, profile pages, media player pages, news feed pages, blog pages, and so on. It should be noted that, in various embodiments, the set of interfaces 104 may be implemented as any interface, such as a graphical user interface or other type of interface provided to a user for interaction utilizing interface objects and/or elements.
- the interface type classifier 102 may receive the set of interfaces 104. In some examples, the interface type classifier 102 may obtain the set of interfaces 104 through one or more processes, such as interacting with a web browser or other entity. In some other examples, the interface type classifier 102 may receive the set of interfaces 104 from one or more other systems, such as the client device 108. The interface type classifier 102 may perform one or more processes on the set of interfaces 104 to generate the set of classified interfaces 106A-06C.
- the set of classified interfaces 106A-06C may comprise interfaces of the set of interfaces 104 that may be classified into three categories, a first type of the set of classified interfaces 106A, a second type of the set of classified interfaces 106B, and a third type of the set of classified interfaces 106C.
- the types of interfaces may correspond to various types of interface pages, such as a home page, collection page, item page, and/or variations thereof.
- the set of interfaces 104 may be classified such that all of the interfaces of the set of interfaces 104 are classified as a specific type of interface page (e.g., 106A, 106B, and 106C).
- the interface type classifier 102 may utilize any number of types of interfaces to classify a set of interfaces that may be obtained by the interface type classifier 102.
- the set of classified interfaces 106A-06C may be utilized by the client device 108, which may perform one or more processes that may interact with the interfaces using the determined classifications.
- FIG. 2 illustrates an example 200 of a type of interface page, according to various embodiments.
- FIG. 2 depicts an interface 206A, which may be of a first type, and tabs corresponding to an interface 206B, which may be of a second type and an interface 206C, which may be of a third type.
- a type of an interface page may refer to a desired functionality of the interface page, a classification of the interface page, a usage of the interface page, and/or variations thereof.
- the type of an interface page may refer to a use case of the interface page.
- a library organization may provide one or more interfaces accessible through an Internet website that entities may interact with to access various services of the library organization.
- the initial interface of the website that may be loaded when an entity first interacts with the website may be referred to as a home page, or classified as a home page as its interface page type.
- the interfaces 206A-06C may be interfaces of a service provider, such as a library organization.
- the interfaces 206A-06C may be interfaces with which entities may interact with to access services of the service provider.
- the service provider may provide the interfaces 206A-06C through a web browser, in which entities may access the interfaces 206A-06C through the web browser.
- the interfaces 206A-06C may be pages of a website, which may be accessed through a uniform resource locator (URL).
- URL uniform resource locator
- the service provider may provide the interfaces 206A-06C through one or more other interfaces through one or more communication networks, in which entities may perform one or more processes involving the one or more interfaces to interact with and/or obtain the interfaces 206A-06C.
- the interface 206A may be an interface that may be of a type referred to as a home page.
- the interface 206A may be an interface that may be classified as a home page.
- a home page may refer to an interface page that may be an initial interface page that may provide access to other interface pages.
- the source code may be represented as a hierarchical tree structure (e.g., an object model) comprised of components and their properties (collectively referred to as “elements” or “nodes”) descending from a base (“root”) object or node.
- An example of a base object is the JavaScript “window” object or other global object variable, which represents the open window of the interface under which all other objects of the window fall in the hierarchy.
- the base object may be any global variable used by an application or framework to store state information of the application or framework.
- Each object hierarchy path may be a text representation of a sequence of objects or attributes from the root node (e.g., window object) to a leaf node (e.g., an object or attribute that has no child object or attribute) in the hierarchy.
- the process of deconstructing and representing the object model as a set of object hierarchy paths may be referred to in the present disclosure as “flattening” the object model of the interface.
- FIG. 5 an illustrative portion of a set of object hierarchy paths flattened from an object model of an interface can be seen in FIG. 5.
- the interfaces 206A-06C may be provided by a library organization that may provide services.
- the library organization may provide the interfaces 206A-06C through a web browser, in which entities may utilize the web browser to interact with the interfaces 206A-06C.
- the interfaces 206A-06C may be interfaces with which entities may utilize to access the services of the library organization.
- the interfaces 206A-06C may be accessed by the entities through one or more URLs, which may identify the interfaces 206A-06C.
- An entity may desire to access the services of the library organization through the web browser.
- the entity may load the interface 206A by identifying the interface 206A through a URL.
- the interface 206A may be a home page that may indicate to the entity that the entity is accessing services of the library organization.
- the interface 206A may further provide one or more interface elements in which the entity may access the interfaces 206B-06C, as well as other interfaces and elements which may enable access to the services of the library organization.
- the interface 206B may be an interface that may be of a type referred to as a collection page.
- the interface 206B may be an interface that may be classified as a collection page. Further information regarding a collection page may be found in the description of FIG. 3.
- the interface 206C may be an interface that may be of a type referred to as an item page.
- the interface 206C may be an interface that may be classified as an item page. Further information regarding a collection page may be found in the description of FIG. 4.
- FIG. 3 illustrates an example 300 of a different type of interface page, according to various embodiments.
- FIG. 3 depicts an interface 306 A, which may be of a first type, an interface 306B, which may be of a second type, and an interface 306C, which may be of a third type.
- the interfaces 306A-06C may be the same as or different from the interfaces 206A-06C as described in connection with FIG. 2.
- a type of an interface page may refer to a desired functionality of the interface page, a classification of the interface page, a usage of the interface page, and/or variations thereof.
- the interfaces 306A-06C may be interfaces of a service provider, such as a library organization.
- the interfaces 306A-06C may be interfaces with which entities may interact with to access services of the service provider.
- the service provider may provide the interfaces 306A-06C through a web browser, in which entities may access the interfaces 306A-06C through the web browser.
- the interfaces 306A-06C may be pages of a website, which may be accessed through one or more URLs.
- the service provider may provide the interfaces 306A-06C through one or more other interfaces through one or more communication networks, in which entities may perform one or more processes involving the one or more interfaces to interact with and/or obtain the interfaces 306A-06C.
- the interface 306B may be an interface that may be of a type referred to as a collection page.
- the interface 306B may be an interface that may be classified as a collection page.
- a collection page may refer to an interface page that may present a view of a collection of one or more items, objects, or elements.
- a service provider may provide various services and/or items that may be utilized by clients of the service.
- the collection page may provide a consolidated view of the various services and/or items.
- a collection page may refer to an interface page that may provide a catalog of items associated with services of a service provider, in which an entity may select an item of the catalog of items to access one or more services of the service provider.
- the interface 306B may provide one or more elements that may allow an entity to select one or more items that may be displayed in the interface 306B.
- the interface 306B depicts images of items in the collection, textual elements describing attributes of the item, and interactive control objects for adding the item to a queue.
- Some of the elements may be interactive to cause an interface page of the same or other type to be displayed; for example, interacting with (e.g., real or simulated human interaction, such as clicking or tapping) with an image of one of the items may cause a device displaying the interface 306B to load an interface of an item page corresponding to the image interacting with (e.g., interface 406C of FIG. 4).
- the interface 306B may be generated as a result of execution of interface source code written in of one or more computer languages.
- the source code of the interface 306B may be expressed as an object model comprised of a hierarchy of components that can be flattened into a set of object hierarchy paths, as further described in conjunction with FIG. 5.
- the interfaces 306A-06C may be provided by a library organization that may provide various services.
- the library organization may provide the interfaces 306A-06C through a web browser, in which entities may utilize the web browser to interact with the interfaces 306A-06C.
- the interfaces 306A-06C may be interfaces with which entities may utilize to access the services of the library organization, such as borrowing a book, returning a book, and/or variations thereof.
- the interfaces 306A-06C may be accessed by the entities through one or more URLs, which may identify the interfaces 306A-06C.
- An entity may desire to access the services of the library organization through the web browser.
- the entity may load the interface 306B by identifying the interface 306B through a URL.
- the interface 306B may be an interface page that may display a collection of books that may be selected to be borrowed.
- the interface 306B may be presented in response to a search query, and may present a collection of books matching the search criteria identified in the search query.
- the entity may select one or more books to add to the selected books in a queue to be borrowed through the interface 306B.
- the interfaces 306A-06C may be provided by a cinema reservation service.
- the cinema reservation service may provide the interfaces 306A-06C through a web browser, in which entities may utilize the web browser to interact with the interfaces 306A-06C.
- the interfaces 306A-06C may be interfaces with which entities may utilize to access the services of the cinema reservation service, such as reserving a movie.
- the interface 306B may provide a consolidated view of potential movies that may be reserved to be watched.
- the interface 306B may comprise various interface elements, corresponding to different movies, by which an entity may select to reserve a specific movie.
- FIG. 4 illustrates an example 400 of another type of interface page, according to various embodiments.
- FIG. 4 depicts an interface 406A, which may be of a first type, an interface 406B, which may be of a second type, and an interface 406C, which may be of a third type.
- the interfaces 406A-06C may be the same or different from the interfaces 306A-06C as described in connection with FIG. 3.
- a type of an interface page may refer to a desired functionality of the interface page, a classification of the interface page, a usage of the interface page, and/or variations thereof.
- the service provider may provide the interfaces 406A-06C through one or more other interfaces through one or more communication networks, in which entities may perform one or more processes involving the one or more interfaces to interact with and/or obtain the interfaces 406A-06C.
- the interfaces 406A-06C may be provided by a library organization that may provide services.
- the library organization may provide the interfaces 406A-06C through a web browser, in which entities may utilize the web browser to interact with the interfaces 406A-06C.
- the interfaces 406A-06C may be interfaces with which entities may utilize to access the services of the library organization, such as borrowing a book, returning a book, and/or variations thereof.
- the interfaces 406A-06C may be accessed by the entities through one or more URLs, which may identify the interfaces 406A-06C.
- An entity may desire to access the services of the library organization through the web browser.
- the interface may be represented by an object model that may be structured in a hierarchical format, in which elements/objects of the interface may be identified according to various attributes, functions, namespaces, values, and/or variations thereof.
- An element/object of the interface may be identified by an object hierarchy path, which may be a text string representing the element/object of the interface relative to various attributes of the element/object and the interface.
- an interface may comprise various interface elements and/or objects (e.g., various selectable elements, text boxes, images, and/or variations thereof).
- a specific object of the interface may be identified by an object hierarchy path, which may represent various attributes that the object may have.
- a specific object may be a selectable text box that may be presented in the color red; the object hierarchy path of the specific object may denote that the specific object is a selectable object, is a selectable text box object, and is a selectable text box object that is red.
- the set of object hierarchy paths 510 may be a set of paths that represent all of the objects of a particular interface page. It may not be unusual for a single interface to yield 100,000 to 500,000 (or more) object hierarchy paths. Therefore, for ease of illustration, only a subset of all of the set of object hierarchy paths of an example interface is depicted in FIG. 5.
- the base object of the interface may be utilized to determine the set of object hierarchy paths 510, such as by traversing each branch of the hierarchical tree structure of the interface to determine a path for each element/object of the interface.
- an object hierarchy path of a specific object of an interface which may in some embodiments be referred to as a word, may be formatted as a plurality of text strings, in which each text string of the plurality of text strings may represent a sequence of attributes from a base node of an object model of the interface to an end node of the object model, in which the end node may represent the specific object of the interface.
- FIG. 6 illustrates an example 600 of a first stage of analyzing object hierarchy paths of various interface pages of a single interface provider, according to various embodiments. Specifically, FIG. 6 depicts disjoint sets of object hierarchy paths 610A-10D and intersecting object hierarchy paths 626.
- a set of interfaces may be obtained. The set of interfaces may be associated with each other, such that interacting with one interface of the set of interfaces may generate and/or affect a different interface of the set of interfaces.
- the set of interfaces depicted in the example 600 may be obtained from a single interface provider, but it is contemplated that the process of determining the intersecting and disjoint sets may be performed separately (e.g., in sequence or in parallel) for various interface providers.
- Object hierarchy paths may be determined for each interface of the set of interfaces.
- the set of interfaces may comprise various types of interfaces, such as home pages, item pages, collection pages, queue pages, and so on.
- Each interface of the set of interfaces may be processed such that object hierarchy paths may be determined for each interface, in which the object hierarchy paths may represent each object and/or element of the interface.
- a set of object hierarchy paths may be determined for each interface page type (e.g., a set of object hierarchy paths may be determined for all of the home pages of the set of interfaces, a different set of object hierarchy paths may be determined for all of the item pages of the set of interfaces, and so on).
- a set of object hierarchy paths may be determined for all of the home pages, collection pages, item pages, and queue pages of the set of interfaces.
- the Venn diagram circle labeled home pages 610A represents a first set of object hierarchy paths obtained by flattening the object model of one or more home pages of a particular interface provider.
- the Venn diagram circle labeled collection pages 610B represents a second set of hierarchy paths obtained by flattening the object model of one or more collection pages of the particular interface provider
- the Venn diagram circle labeled collection pages 610C represents a third set of hierarchy paths obtained by flattening the object model of one or more item pages of the particular interface provider
- the Venn diagram circle labeled queue pages 610D represents a fourth set of hierarchy paths obtained by flattening the object model of one or more collection pages of the particular interface provider.
- the shaded area of the intersecting object hierarchy paths 626 represents one or more object hierarchy paths that are present in at least two types of interface pages.
- the subsets of object hierarchy paths that are disjoint from the intersecting object hierarchy paths 626 are sets of non-intersecting object hierarchy paths unique to their respective interface type for the particular interface provider.
- the object hierarchy paths that do not intersect between categories may be ignored/discarded.
- object hierarchy paths that intersect between less than a threshold number e.g., two or less, three or less, etc.
- the sets of object hierarchy paths for each interface type may be compared.
- the sets of object hierarchy paths may be compared utilizing a Venn diagram structure as depicted in FIG. 6.
- various other data structures may be utilized to compare the sets of object hierarchy paths.
- the sets of object hierarchy paths may be compared to determine commonalities between the sets of object hierarchy paths.
- an object of a home page may be present in an item page, and the object may share the same object hierarchy path across the home page and the item page.
- the intersecting object hierarchy paths 626 may represent object hierarchy paths that are common to least two or more types of interfaces of the set of interfaces.
- the disjoint sets of object hierarchy paths 610A-10D may represent object hierarchy paths that are not shared between interface types.
- the disjoint set of object hierarchy paths 610A may represent object hierarchy paths that are only present in home pages, and not in other interface types of the set of interfaces.
- the disjoint set of object hierarchy paths 610B may represent object hierarchy paths that are only present in collection pages, and not in other interface types of the set of interfaces.
- the disjoint set of object hierarchy paths 610C may represent object hierarchy paths that are only present in item pages, and not in other interface types of the set of interfaces.
- the disjoint set of object hierarchy paths 610D may represent object hierarchy paths that are only present in queue pages, and not in other interface types of the set of interfaces.
- FIG. 7 illustrates an example 700 of comparing sets of object hierarchy paths between multiple interface providers, according to various embodiments. Specifically, FIG. 7 depicts a first set of disjoint object hierarchy paths 710 originating from an interface provider A 716A, a second set of disjoint object hierarchy paths 712 originating from an interface provider B 716B, and a third set of disjoint object hierarchy paths 714 originating from an interface provider C 716C. Although only three different interface providers are depicted for ease of illustration, it is contemplated that sets of object hierarchy paths of any number of interface providers (e.g., 10, 100, 1,000, etc.) may be compared.
- any number of interface providers e.g. 10, 100, 1,000, etc.
- the interface providers A-C 716A-16C may each provide different services that may be accessible through interfaces provided by the interface providers A-C 716A-16C.
- the interface provider A 716A may be a library organization service, and may provide interfaces through which entities may interact with to access the various services of the library organization service (e.g., borrowing a book, returning a book, and so on).
- the interface provider A 716A may be a cinema reservation service, and may provide interfaces through which entities may interact with to access the various services of the cinema reservation services (e.g., reserving a seat, reserving a film, and so on).
- the intersecting set of object hierarchy paths 718 may comprise object hierarchy paths that are common to at least two of the first set of disjoint object hierarchy paths 710, the second set of disjoint object hierarchy paths 712, or the third set of disjoint object hierarchy paths 714. This process may be repeated for each different interface type to determine intersecting sets of object hierarchy paths 718 for the different interface types.
- an intersection refers to an object hierarchy path that is found in at least two of the sets being compared.
- an intersecting set of object hierarchy paths may be a subset that includes object hierarchy paths that match another object hierarchy path in at least one other set of the sets being compared.
- the interface provider A 716A may be a library organization service, and may provide a first set of interfaces, in which the first set of interfaces may be categorized into interface types and processed into object hierarchy paths; the object hierarchy paths of the interface types may be compared with each other to determine the first set of disjoint object hierarchy paths 710, which may represent object hierarchy paths that are unique to interfaces of the library organization service that are of a particular interface type
- the interface provider C 716C may be a cinema reservation service, and may provide a third set of interfaces, in which the third set of interfaces may be categorized into interface types and processed into object hierarchy paths; the object hierarchy paths of the interface types may be compared with each other to determine the third set of disjoint object hierarchy paths 714, which may represent object hierarchy paths that are unique to interfaces of the cinema reservation service that likewise are of the same particular interface type (e.g., home pages.)
- first set of disjoint object hierarchy paths 710, the second set of disjoint object hierarchy paths 712, and the third set of disjoint object hierarchy paths 714 may be compared with each other to determine the intersecting set of object hierarchy paths 718, which may represent object hierarchy paths that are common to the home page interface type within interfaces provided by the interface providers
- an intersecting set of disjoint object hierarchy paths may be determined for various interface types (e.g., an intersecting set of disjoint object hierarchy paths may be determined for sets of object hierarchy paths for an item page, as well as for a collection page, and so on).
- the various intersecting sets of disjoint object hierarchy paths may be utilized to form a collection of intersecting sets of disjoint object hierarchy paths, which may also be referred to in the present disclosure as a dictionary.
- a dictionary may be constructed.
- a dictionary may refer to a collection of sets of object hierarchy paths that may be unique to particular interface types across various interface providers. Details about determining object hierarchy paths for the dictionary may be found in the present disclosure in conjunction with FIGS. 6, 7, and 9.
- a plurality of interface providers may be identified, in which each of the plurality of interface providers may provide a set of interfaces which may comprise interfaces of various types.
- the set of interfaces may be categorized into types.
- object hierarchy paths may be identified that are unique to each type of interface. The unique object hierarchy paths for a particular type of interface from a particular interface provider may then be compared to other unique object hierarchy paths for the particular type of interface from the other interface providers; the set of object hierarchy paths for the particular type of interface that most-frequently appears across the plurality of interface providers may be utilized to construct a category dictionary for the particular interface type.
- a category dictionary may be constructed for various interface types, such as a home page, a collection page, an item page, or a queue page.
- a category dictionary may comprise a list of object hierarchy paths that are common to particular interface types across various interface providers (e.g., a home page dictionary may comprise a list of object hierarchy paths that are common across various home pages provided by various interface providers).
- the category dictionary may be comprised of a concatenation of the top most-frequently found object hierarchy paths for each interface type.
- the category dictionaries may be concatenated to form a dictionary, and may be concatenated in any order.
- the feature vector 820 for a given interface may be constructed by looping through each object hierarchy path entry in the dictionary and determining whether the object hierarchy path entry matches an object hierarchy path in the object model of the given interface (e.g., “1” indicating a match and “0” indicating no match for a given object hierarchy path entry; although it is contemplated that in some implementations, this may be reversed).
- the feature vector 820 appears to have been constructed using a category dictionary with categories aligned in the order of home page, collection page, item page, and queue page; although any order may be utilized.
- the machine learning algorithm 822 may comprise one or more neural networks or other machine learning algorithm that may be configured to classify a given interface.
- the dictionary may be utilized to train the machine learning algorithm 822.
- an interface may be obtained from an interface provider.
- the interface may already be classified.
- the type of the interface may be provided to identify the interface.
- the interface may be analyzed to determine a data representation of the interface, which may be an object model representing a hierarchical tree structure of the interface objects and elements.
- the object model may be processed to determine a plurality of the object hierarchy paths of each object/item/element of the interface.
- the plurality of the object hierarchy paths may then be utilized to generate a feature vector, which may be a vector that is expressed in a base-2 (binary) numeric format as shown in FIG. 8.
- the feature vector may be generated by comparing the plurality of the object hierarchy paths to each entry of the dictionary, in which if an entry of the dictionary matches an object hierarchy path of the plurality of the object hierarchy paths, a “1” may be added to the feature vector, and if there is not a match, a “0” may be added to the feature vector. For example, if the first entry of the dictionary corresponds to a path “A.B.C,” and the object hierarchy path “A.B.C” is contained in the plurality of object hierarchy paths, the first entry of the feature vector may be “1.”
- the feature vector as well as a ground truth indication of the type of the interface that the feature vector was generated from may be utilized as an input to the machine learning algorithm 822 to train the machine learning algorithm 822.
- the machine learning algorithm 822 may calculate a loss function by comparing a predicted classification of a given feature vector to the ground truth indication that may indicate the type of interface the feature vector was generated from, and may be trained such that the loss is minimized.
- the process may be repeated for a plurality of interfaces of a plurality of types of interfaces from a plurality of interface providers such that the machine learning algorithm 822 may identify the type of an interface based on a feature vector generated from the interface. Further details on training the machine learning algorithm 822 can be found in the present disclosure in conjunction with FIG. 10.
- an interface may be obtained, and analyzed to determine object hierarchy paths of the interface.
- the object hierarchy paths may be compared to the dictionary to generate the feature vector 820.
- the object hierarchy paths may be compared to a category dictionary portion corresponding to the home page interface type. Entries of the category dictionary may be compared with the object hierarchy paths, such that if an entry of the category dictionary is found in the object hierarchy paths, the feature vector 820 may be appended with the value
- the feature vector 820 may be appended with the value “0.”
- the object hierarchy paths may be compared to a category dictionary portion corresponding to the collection page interface type.
- the object hierarchy paths may be compared to a category dictionary portion corresponding to the item page interface type.
- the object hierarchy paths may be compared to a category dictionary portion corresponding to the queue page interface type. Note that, in some embodiments, a single dictionary may include portions of object hierarchy paths corresponding to different categories (interface types), but it is also contemplated that, in some implementations, each category may have a separate dictionary of object hierarchy paths specific to that category.
- the feature vector 820 may be input to the machine learning algorithm 822.
- the machine learning algorithm 822 may comprise one or more neural networks or other machine learning algorithm that may be configured to identify the type of a given interface based on a feature vector generated based on the given interface.
- the machine learning algorithm 822 may comprise various machine learning structures/algorithms, such as a gradient boosted decision tree, a logistic regression algorithm, an artificial neural network, and/or variations thereof.
- the machine learning algorithm 822 may comprise one or more classification algorithms, and may be trained through the usage of various loss functions, in which input feature vectors and corresponding ground truth classifications and predicted classifications may be utilized to minimize loss from the various loss functions.
- the machine learning algorithm 822 may perform one or more processes to determine the determined category 824 for the feature vector 820.
- the determined category 824 may indicate the most likely type of interface the feature vector 820 was generated from and/or based upon.
- FIG. 9 is a flowchart illustrating an example of a process 900 for building a category dictionary, in accordance with various embodiments.
- Some or all of the process 900 may be performed under the control of one or more computer systems configured with executable instructions and/or other data, and may be implemented as executable instructions executing collectively on one or more processors.
- the executable instructions and/or other data may be stored on a non-transitory computer-readable storage medium (e.g., a computer program persistently stored on magnetic, optical, or flash media).
- a non-transitory computer-readable storage medium e.g., a computer program persistently stored on magnetic, optical, or flash media.
- some or all of process 900 may be performed by any suitable system, such as the computing device 1200 of
- the process 900 includes a series of operations wherein multiple interfaces from different interface providers are obtained, and, for each interface provider and interface, the interface category is determined, the object model of the interface is flattened, and overlapping object hierarchy paths are discarded. For each non-overlapping object hierarchy path, the number of occurrences of that non-overlapping object hierarchy path between different providers is counted, the most infrequent ones discarded, and a category dictionary is generated from the remaining object hierarchy paths.
- the system performing the process 900 may obtain a set of interface providers.
- the system may be provided with a predetermined set of interface providers.
- the system may perform one or more processes to identify the set of interface providers, such as by crawling the Internet using a Web crawler (“hot”) to identify candidate interface providers for the set of interface providers.
- the different interface providers may each have their own set of interfaces, or pages, that may allow other entities to interact with services of the different interface providers.
- the system performing the process 900 may label multiple interfaces from different interface providers.
- “labelling” in this context may refer to identifying and selecting exemplary interfaces to use for generating the category dictionaries and determining their interface type (which may be further used as a ground truth value when training a machine learning algorithm).
- the categories/types of the interfaces for the purposes of generating the category dictionary may initially be classified by operators of the system.
- Example categories/types of interfaces may include, but are not limited to, home pages, collection pages, item pages, queue pages, and/or variations thereof.
- the system may utilize one or more classification algorithms to label the multiple interfaces.
- the system performing the process 900 may process interfaces of each interface provider of the set of interface providers.
- the system may process a set of interfaces from each interface provider.
- the system may, in 906, process an interface of the set of interfaces for a particular interface provider.
- the system may determine the interface category, which may be referred to as the interface type, type, or classification, of the interface.
- the system may obtain the interface category of the interface.
- the system may retrieve a data object that may represent the interface, and determine the interface category of the interface based on analysis of the data object. In other examples, the system may utilize a previously determined category or label to determine the interface category of the interface.
- the interface may be represented by an interface model, which may be denoted as an object model.
- the interface model may provide a representation of the elements/objects of the interface, and may be formatted in a tree structure, in which the leaves of the tree may represent the elements/objects of the interface.
- the system performing the process 900 may discard non-overlapping object hierarchy paths between interface providers.
- the system may, for each category, compare object hierarchy paths determined for the category for an interface provider with object hierarchy paths determined for the category for other interface providers of the set of interface providers.
- the system may, for each category, discard any object hierarchy paths that are not present in at least two or more interface providers of the set of interface providers.
- the system may determine a set of overlapping object hierarchy paths for each category, in which a set of overlapping object hierarchy paths for a particular category may comprise object hierarchy paths that are present in interfaces of the particular category from at least two or more interface providers of the set of interface providers.
- the process 1000 includes a series of operations wherein for each of a plurality of interface categories, DOMs for interfaces corresponding to the category are flattened, feature vectors are generated for each interface, and the feature vectors with their corresponding category are used to train a machine learning algorithm to categorize interfaces.
- the system performing the process 1000 may train the machine learning algorithm using the set of feature vectors with the category/type of interfaces from which the feature vectors were derived being the ground truth value.
- a ground truth value may refer to a value that is an expected output of a machine learning algorithm. For example, for a given feature vector that has been generated from a home interface type interface, the ground truth value for the given feature vector may indicate that the type of interface the given feature vector was generated from is a home page.
- the machine learning algorithm may be trained by inputting a feature vector to the machine learning algorithm, and comparing a classification output by the machine learning algorithm to the ground truth value of the feature vector to calculate loss. The machine learning algorithm may then be optimized such that loss is minimized.
- the machine learning algorithm may be considered trained when, if given a feature vector, the output classification is the same as the ground truth value for the feature vector.
- the system performing the process 1000 may determine if the category being processed is the last category of the determined categories. If further categories need to be processed, the system performing the process 1000 may return to 1004 to repeat process 1004-12 to process interfaces corresponding to the next category. The system may continue to generate additional feature vectors for interfaces of each category using the category dictionary to train the machine learning algorithm. Otherwise, if interfaces for all categories have been processed, the machine learning algorithm may be trained and the process 1000 may end. Thereafter, the trained machine learning algorithm may be used in conjunction with the process 1100 of FIG. 11 to categorize a given interface.
- the process 1100 includes a series of operations wherein an interface is flattened, object hierarchy paths of the interface are obtained, a feature vector is generated using a category dictionary, and the feature vector is provided to a machine learning algorithm (such as the one trained using the process 1000 of FIG. 10) that outputs the most likely interface type/category to which the interface belongs.
- the system performing the process 1100 may compare the object hierarchy paths of the interface to each entry of the dictionary.
- the system may iterate through the list of object hierarchy paths, which may be denoted as object words, of the dictionary to generate a feature vector.
- the system may determine if any of the object hierarchy paths of the interface match an object hierarchy path of the dictionary. If so, the system may, in 1112, append the value of 1 (or otherwise indicate that a match was found, depending on implementation) to the feature vector. If none of the object hierarchy paths of the interface match the object hierarchy path of the dictionary, the system may, in 1114, append the value of 0 (or otherwise indicate that a match was not found, depending on implementation) to the feature vector.
- the system may perform an operation specific to the determined interface category.
- the system may interact with, or cause another device to interact with, the interface based on the determined interface category.
- the system may perform one or more processes, which may be based on the determined interface category, to cause one or more processes in connection with the interface page. That is, automated operations may be applicable with some types of interfaces and not others.
- a search page may be configured to perform searches, whereas a home page may not be; as a result, it may be futile for an automated software application to attempt to perform a search operation using such a search page.
- an item page see FIG.
- the techniques of the present disclosure may improve the field of software development by providing enhanced functionality to automated software applications.
- the automated software application may be a software application to aid visually impaired persons to navigate the Internet via a browser.
- the visually impaired person may give a verbal command to a device running the software application to perform an
- the software application may respond with an audio error message to this effect, whereas if the software application utilizes the techniques do determine that the interface type is a “search” interface, the software application may perform an operation specific to a search interface (e.g., locating the search box, inputting the search query, and simulating human interaction with a “Search” button object in the interface).
- a search interface e.g., locating the search box, inputting the search query, and simulating human interaction with a “Search” button object in the interface.
- the computing device 1200 may be implemented as a hardware device, a virtual computer system, or one or more programming modules executed on a computer system, and/or as another device configured with hardware and/or software to receive and respond to communications (e.g., web service application programming interface (API) requests) over a network.
- communications e.g., web service application programming interface (API) requests
- the bus subsystem 1204 may provide a mechanism for enabling the various components and subsystems of computing device 1200 to communicate with each other as intended. Although the bus subsystem 1204 is shown schematically as a single bus, alternative embodiments of the bus subsystem utilize multiple buses.
- the network interface subsystem 1216 may provide an interface to other computing devices and networks.
- the network interface subsystem 1216 may serve as an interface for receiving data from and transmitting data to other systems from the computing device 1200.
- the bus subsystem 1204 is utilized for communicating data such as details, search terms, and so on.
- the network in an embodiment, is a local area network, a wide-area network, a virtual private network, the Internet, an intranet, an extranet, a public switched telephone network, a cellular network, an infrared network, a wireless network, a satellite network, or any other such network and/or combination thereof, and components used for such a system may depend at least in part upon the type of network and/or system selected.
- a connection-oriented protocol is used to communicate between network endpoints such that the connection-oriented protocol (sometimes called a connection-based protocol) is capable of transmitting data in an ordered stream.
- a connection-oriented protocol can be reliable or unreliable.
- the TCP protocol is a reliable connection-oriented protocol.
- Asynchronous Transfer Mode (ATM) and Frame Relay are unreliable connection-oriented protocols.
- Connection-oriented protocols are in contrast to packet-oriented protocols such as UDP that transmit packets without a guaranteed ordering.
- communication via the network interface subsystem 1216 is enabled by wired and/or wireless connections and combinations thereof.
- the storage subsystem 1206 provides a computer-readable storage medium for storing the basic programming and data constructs that provide the functionality of at least one embodiment of the present disclosure.
- the software applications programs, source code modules, instructions
- the storage subsystem 1206 additionally provides a repository for storing data used in accordance with the present disclosure.
- the storage subsystem 1206 comprises a memory subsystem 1208 and a file/disk storage subsystem 1210.
- the memory subsystem 1208 includes a number of memories, such as a main random access memory (RAM) 1218 for storage of instructions and data during program execution and/or a read only memory (ROM) 1220, in which fixed instructions can be stored.
- the file/disk storage subsystem 1210 provides a non-transitory persistent (non-volatile) storage for program and data files and can include a hard disk drive, a floppy disk drive along with associated removable media, a Compact Disk Read Only Memory (CD-ROM) drive, an optical drive, removable media cartridges, or other like storage media.
- CD-ROM Compact Disk Read Only Memory
- the computing device 1200 includes at least one local clock 1224.
- the at least one local clock 1224 in some embodiments, is a counter that represents the number of ticks that have transpired from a particular starting date and, in some embodiments, is located integrally within the computing device 1200.
- the at least one local clock 1224 is used to synchronize data transfers in the processors for the computing device 1200 and the subsystems included therein at specific clock pulses and can be used to coordinate synchronous operations between the computing device 1200 and other systems in a data center.
- the local clock is a programmable interval timer.
- the computing device 1200 could be of any of a variety of types, including a portable computer device, tablet computer, a workstation, or any other device described below. Additionally, the computing device 1200 can include another device that, in some embodiments, can be connected to the computing device 1200 through one or more ports (e.g., USB, a headphone jack, Lightning connector, etc.). In embodiments, such a device includes a port that accepts a fiber-optic connector. Accordingly, in some embodiments, this device converts optical signals to electrical signals that are transmitted through the port connecting the device to the computing device 1200 for processing. Due to the ever-changing nature of computers and networks, the description of the computing device 1200 depicted in FIG. 12 is intended only as a specific example for purposes of illustrating the preferred embodiment of the device. Many other configurations having more or fewer components than the system depicted in FIG. 12 are possible.
- data may be stored in a data store (not depicted).
- a “data store” refers to any device or combination of devices capable of storing, accessing, and retrieving data, which may include any combination and number of data servers, databases, data storage devices, and data storage media, in any standard, distributed, virtual, or clustered system.
- a data store in an embodiment, communicates with block-level and/or object level interfaces.
- the computing device 1200 may include any appropriate hardware, software and firmware for integrating with a data store as needed to execute aspects of one or more software applications for the computing device 1200 to handle some or all of the data access and business logic for the one or more software applications.
- the data store includes several separate data tables, databases, data documents, dynamic data storage schemes, and/or other data storage mechanisms and media for storing data relating to a particular aspect of the present disclosure.
- the computing device 1200 includes a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across a network.
- the information resides in a storage-area network (SAN) familiar to those skilled in the art, and, similarly, any necessary files for performing the functions attributed to the computers, servers or other network devices are stored locally and/or remotely, as appropriate.
- SAN storage-area network
- the computing device 1200 may provide access to content including, but not limited to, text, graphics, audio, video, and/or other content that is provided to a user in the form of HTML, XML, JavaScript, CSS, JavaScript Object Notation (JSON), and/or another appropriate language.
- the computing device 1200 may provide the content in one or more forms including, but not limited to, forms that are perceptible to the user audibly, visually, and/or through other senses.
- the handling of requests and responses, as well as the delivery of content in an embodiment, is handled by the computing device 1200 using PHP: Hypertext Preprocessor (PHP), Python, Ruby, Perl, Java, HTML, XML, JSON, and/or another appropriate language in this example.
- PHP Hypertext Preprocessor
- Python Python
- Ruby, Perl Java
- HTML Hypertext XML
- JSON JavaScript Object Notation
- operations described as being performed by a single device are performed collectively by multiple devices that form a distributed and/or virtual system.
- the computing device 1200 typically will include an operating system that provides executable program instructions for the general administration and operation of the computing device 1200 and includes a computer-readable storage medium (e.g., a hard disk, random access memory (RAM), read only memory (ROM), etc.) storing instructions that if executed (e.g., as a result of being executed) by a processor of the computing device 1200 cause or otherwise allow the computing device 1200 to perform its intended functions (e.g., the functions are performed as a result of one or more processors of the computing device 1200 executing instructions stored on a computer-readable storage medium).
- a computer-readable storage medium e.g., a hard disk, random access memory (RAM), read only memory (ROM), etc.
- RAM random access memory
- ROM read only memory
- the computing device 1200 operates as a web server that runs one or more of a variety of server or mid-tier software applications, including Hypertext Transfer Protocol (HTTP) servers, FTP servers, Common Gateway Interface (CGI) servers, data servers, Java servers, Apache servers, and business application servers.
- HTTP Hypertext Transfer Protocol
- CGI Common Gateway Interface
- computing device 1200 is also capable of executing programs or scripts in response to requests from user devices, such as by executing one or more web applications that are implemented as one or more scripts or programs written in any programming language, such as Java®, C, C# or C++, or any scripting language, such as Ruby, PHP, Perl, Python, or TCL, as well as combinations thereof.
- the computing device 1200 is capable of storing, retrieving, and accessing structured or unstructured data.
- computing device 1200 additionally or alternatively implements a database, such as one of those commercially available from Oracle®, Microsoft®, Sybase®, and IBM® as well as open-source servers such as MySQL, Postgres, SQLite, MongoDB.
- the database includes table-based servers, document-based servers, unstructured servers, relational servers, non-relational servers, or combinations of these and/or other database servers.
- a computer-implemented method comprising: generating, from sets of object hierarchy paths corresponding to interfaces associated with a plurality of interface providers, a category dictionary that categorizes object hierarchy paths according to interface type; generating, based on the category dictionary and a global object variable of an interface of an interface provider, a feature vector that corresponds to the interface, the feature vector indicating matches between the object hierarchy paths of the category dictionary and elements of the global object variable of the interface; providing the feature vector as input to a machine learning algorithm that is trained to identify types of interfaces based on feature vectors; determining, based on a response obtained from the machine learning algorithm, that the interface corresponds to a particular interface type; and as a result of determining that the interface corresponds to the particular interface type, causing a client device to perform an operation to the interface specific to the particular interface type.
- generating the category dictionary further comprises: obtaining, from a first interface provider: a global object variable of a first interface corresponding to a first interface type; and a global object variable of a second interface corresponding to a second interface type; determining, based on the global object variable of the first interface, a first set of object hierarchy paths; determining, based on the global object variable of the second interface, a second set of object hierarchy paths; determining a first subset of the first set of object hierarchy paths that is disjoint from the second set of object hierarchy paths; obtaining, from a second interface provider, an additional global object variable of an additional interface corresponding to the first interface type; determining, based on the additional global object variable, a second subset of object hierarchy paths; and generating the category dictionary based on an intersection between the first subset of object hierarchy paths and the second subset of object hierarchy paths.
- determining the second subset further comprises: obtaining, from the second interface provider: a global object variable of a fourth interface corresponding to the first interface type; and a global object variable of a fifth interface corresponding to the second interface type; determining, based on the global object variable of the fourth interface, a third set of object hierarchy paths; determining, based on the global object variable of the second interface, a fourth set of object hierarchy paths; and determining the second subset of object hierarchy paths from a subset of the third set of object hierarchy paths that is disjoint from the fourth set of object hierarchy paths.
- a system comprising: one or more processors; and memory including executable instructions that, if executed by the one or more processors, cause the system to: determine a first object hierarchy path associated with a first interface of a first provider, the first interface being of a first type; determine a second object hierarchy path associated with a second interface of the first provider, the second interface being of a second type different from the first type; determine that the first object hierarchy path is a mismatch from the second object hierarchy path; obtain, from a second provider, a third object hierarchy path corresponding to another interface of the first type; generate a category dictionary based on a determination that the third object hierarchy path matches the first object hierarchy path; generate, based on the category dictionary, a vector that corresponds to a third interface; determine, based on the vector, whether the third interface is of the first type or of the second type; and as a result of the third interface being determined to be of the first type, cause a device to perform an operation against the third interface, the operation being applicable to the first type of interface but inapplic
- executable instructions further include instructions that further cause the system to, as a result of the third interface being determined to be of the second type, cause the device to perform a different operation applicable to the second type against the third interface.
- executable instructions further include instructions that further cause the system to: generate, based on the category dictionary and the first object hierarchy path, an additional feature vector; and train, based on the additional feature vector and the first type, a machine learning algorithm.
- executable instructions that cause the system to determine that the third interface is of the first type further include instructions that further cause the system to: provide the vector as input to the machine learning algorithm; and obtain, from the machine learning algorithm, an indication that the third interface is of the first type.
- the machine learning algorithm includes at least one of: a gradient boosted decision tree, a logistic regression, or an artificial neural network.
- a non-transitory computer-readable storage medium having stored thereon executable instructions that, if executed by one or more processors of a computer system, cause the computer system to at least: obtain source code of a pair of interfaces provided by a first interface provider, a first interface of the pair corresponding to a first interface type and a second interface of the pair corresponding to a second interface type; determine, based on the source code, a first set of object hierarchy paths that: descend from a first global object variable that corresponds to the first interface; and do not descend from a second global object variable that corresponds to the second interface; determine, based on different source code of a different pair of interfaces provided by a second interface provider, a second set of object hierarchy paths; determine a set of intersecting object hierarchy paths between the first set of object hierarchy paths and the second set of object hierarchy paths; generate, based on an interface of a third interface provider and the set of intersecting object hierarchy paths, a feature vector corresponding to the interface of the third interface provider; determine, based on the feature vector,
- executable instructions further include instructions that further cause the computer system to: generate, based on the first set of object hierarchy paths and the set of intersecting object hierarchy paths, a set of feature vectors; and train a machine learning algorithm based on: the set of feature vectors; and the first interface type as a ground truth value.
- the first set of object hierarchy paths comprises a plurality of text strings, whereby each text string of the plurality of text strings represents a sequence of attributes from a base node of an object model of the source code to an end node of the object model.
- a computer-implemented method comprising: obtaining, from a first interface provider: a global object variable of a first interface corresponding to a first interface type; and a global object variable of a second interface corresponding to a second interface type; determining, based on the global object variable of the first interface, a first set of object hierarchy paths; determining, based on the global object variable of the second interface, a second set of object hierarchy paths; determining a first subset of the first set of object hierarchy paths that is disjoint from the second set of object hierarchy paths; obtaining, from a second interface provider, an additional global object variable of an additional interface corresponding to the first interface type; determining, based on the additional global object variable, a second subset of object hierarchy paths; generating a category dictionary based on an intersection between the first subset of object hierarchy paths and the second subset of object hierarchy paths; generating, based on the category dictionary and the first set of object hierarchy paths, a first set of feature vectors corresponding to the first interface type; generating
- a system comprising: one or more processors; memory including executable instructions that, if executed by the one or more processors, cause the system to: determine a first object hierarchy path associated with a first interface of a first provider, the first interface being of a first type; determine a second object hierarchy path associated with a second interface of the first provider, the second interface being of a second type different from the first type; determine that the first object hierarchy path is a mismatch to the second object hierarchy path; obtain, from a second provider, a third object hierarchy path corresponding to an additional interface of the first type; generate a category dictionary based on a determination that the third object hierarchy path matches the first object hierarchy path; generate, based on the category dictionary and the first object hierarchy path, a first vector that corresponds to the first type; and generate, based on the category dictionary and the second object hierarchy path, a second vector that corresponds to the second type; and train a machine learning algorithm to categorize interfaces based on: the first vector with the first type as a ground truth value; and the
- executable instructions further include instructions that further cause the system to: generate, based on the category dictionary, an additional vector that corresponds to an additional interface; obtain, as a result of providing the additional vector as input to the machine learning algorithm, an indication that the additional interface corresponds to the first type; and cause, based on the indication, a device to perform an operation specific to the first type against the additional interface.
- the operation specific to the first type is a first operation
- the executable instructions further include instructions that further cause the system to: generate, based on the category dictionary, a second additional vector that corresponds to a second additional interface; obtain, as a result of providing the second additional vector as another input to the machine learning algorithm, a second indication that the second additional interface corresponds to the second type; and cause, based on the second indication, the device to perform a second operation different from the first operation.
- executable instructions that cause the system to generate the category dictionary further include instructions that further cause the system to: determine a set of disjoint object hierarchy paths that descend from a global object variable of the first interface and do not descend from a global object variable of the second interface; and generate the category dictionary to comprise a plurality of intersections between the set of disjoint object hierarchy paths and object hierarchy paths of the additional interface, the plurality of intersections including the first object hierarchy path.
- the category dictionary includes a set of object hierarchy paths, a first subset of the object hierarchy paths corresponding to the first type and a second subset of the object hierarchy paths corresponding to the second type.
- a non-transitory computer-readable storage medium having stored thereon executable instructions that, if executed by one or more processors of a computer system, cause the computer system to at least: determine a set of intersecting object hierarchy paths between a first set of object hierarchy paths of a first interface of a first interface provider and a second set of object hierarchy paths of a second interface provided by a second interface provider, the first interface and the second interface corresponding to a first interface type; generate a first set of feature vectors based on: the set of intersecting object hierarchy paths; and the first set of object hierarchy paths; generate a second set of feature vectors based on: the set of intersecting object hierarchy paths; and a third set of object hierarchy paths corresponding to a third interface of the first interface provider, the third interface corresponding to a second interface type; and train, using the first set of feature vectors and the second set of feature vectors, a machine learning algorithm to distinguish between feature vectors corresponding to the first interface type and feature vectors corresponding to a second interface type different from the first interface
- executable instructions further include instructions that further cause the computer system to: obtain source code of the first interface and of the third interface; determine, based on the source code, the first set of object hierarchy paths that: descend from a first global object variable that corresponds to the first interface; and do not descend from a second global object variable that corresponds to the second interface; and determine, based on different source code of a different interface provided by the second interface provider and the second portion of the source code, the second set of object hierarchy paths.
- executable instructions further include instructions that further cause the computer system to: generate, based on the dictionary, a vector that corresponds to an additional interface; provide the vector as input to the machine learning algorithm; and obtain, from the machine learning algorithm, an indication that the additional interface is of the first interface type.
- non-transitory computer-readable storage medium of clause 39 wherein the operation includes at least one of: storing a Uniform Resource Identifier corresponding to the additional interface, extracting, from the additional interface, a value from an interface object associated with interfaces of the first type, or simulating human interaction with the interface object associated with the interfaces of the first interface type.
- the conjunctive phrases “at least one of A, B, and C” and “at least one of A, B, and C” refer to any of the following sets: ⁇ A ⁇ , ⁇ B ⁇ , ⁇ C ⁇ , (A, B ⁇ , (A, C ⁇ , (B, C ⁇ , (A, B, C ⁇ .
- ⁇ A ⁇ , ⁇ B ⁇ , ⁇ C ⁇ A, B ⁇ , (A, C ⁇ , (B, C ⁇ , (A, B, C ⁇ .
- Processes described can be performed under the control of one or more computer systems configured with executable instructions and can be implemented as code (e.g., executable instructions, one or more computer programs or one or more software applications) executing collectively on one or more processors, by hardware or combinations thereof.
- the code can be stored on a computer-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions executable by one or more processors.
- the computer-readable storage medium is non-transitory.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021207357A AU2021207357A1 (en) | 2020-01-15 | 2021-01-11 | Interface classification system |
EP21700463.9A EP4091074A1 (en) | 2020-01-15 | 2021-01-11 | Interface classification system |
CA3162860A CA3162860A1 (en) | 2020-01-15 | 2021-01-11 | Interface classification system |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/744,017 | 2020-01-15 | ||
US16/744,017 US11409546B2 (en) | 2020-01-15 | 2020-01-15 | Interface classification system |
US16/744,021 US11386356B2 (en) | 2020-01-15 | 2020-01-15 | Method of training a learning system to classify interfaces |
US16/744,021 | 2020-01-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021144680A1 true WO2021144680A1 (en) | 2021-07-22 |
Family
ID=74184684
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2021/050169 WO2021144680A1 (en) | 2020-01-15 | 2021-01-11 | Interface classification system |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP4091074A1 (en) |
AU (1) | AU2021207357A1 (en) |
CA (1) | CA3162860A1 (en) |
WO (1) | WO2021144680A1 (en) |
-
2021
- 2021-01-11 AU AU2021207357A patent/AU2021207357A1/en active Pending
- 2021-01-11 WO PCT/IB2021/050169 patent/WO2021144680A1/en unknown
- 2021-01-11 EP EP21700463.9A patent/EP4091074A1/en active Pending
- 2021-01-11 CA CA3162860A patent/CA3162860A1/en active Pending
Non-Patent Citations (1)
Title |
---|
HOU Y T ET AL: "Malicious web content detection by machine learning", EXPERT SYSTEMS WITH APPLICATIONS, OXFORD, GB, vol. 37, no. 1, 2010, pages 55 - 60, XP026666851, ISSN: 0957-4174, [retrieved on 20090515], DOI: 10.1016/J.ESWA.2009.05.023 * |
Also Published As
Publication number | Publication date |
---|---|
AU2021207357A1 (en) | 2022-09-01 |
CA3162860A1 (en) | 2021-07-22 |
EP4091074A1 (en) | 2022-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11386356B2 (en) | Method of training a learning system to classify interfaces | |
US11550602B2 (en) | Real-time interface classification in an application | |
US9621624B2 (en) | Methods and apparatus for inserting content into conversations in on-line and digital environments | |
US20190384762A1 (en) | Computer-implemented method of querying a dataset | |
US20210141497A1 (en) | Dynamic location and extraction of a user interface element state in a user interface that is dependent on an event occurrence in a different user interface | |
US20210141652A1 (en) | Location and extraction of item elements in a user interface | |
US20180268053A1 (en) | Electronic document generation using data from disparate sources | |
US20210312124A1 (en) | Method and system for determining sentiment of natural language text content | |
US11593343B1 (en) | User interface structural clustering and analysis | |
US11726752B2 (en) | Unsupervised location and extraction of option elements in a user interface | |
WO2023073496A1 (en) | System for identification and autofilling of web elements in forms on web pages using machine learning | |
CN110275938B (en) | Knowledge extraction method and system based on unstructured document | |
US11409546B2 (en) | Interface classification system | |
US20220366264A1 (en) | Procedurally generating realistic interfaces using machine learning techniques | |
US20230137487A1 (en) | System for identification of web elements in forms on web pages | |
US11610047B1 (en) | Dynamic labeling of functionally equivalent neighboring nodes in an object model tree | |
US20230306071A1 (en) | Training web-element predictors using negative-example sampling | |
EP4174795A1 (en) | Multiple input machine learning framework for anomaly detection | |
WO2021144680A1 (en) | Interface classification system | |
US20210141498A1 (en) | Unsupervised location and extraction of quantity and unit value elements in a user interface | |
JP5581339B2 (en) | Retrieve and display information from unstructured electronic document collections | |
US20240037131A1 (en) | Subject-node-driven prediction of product attributes on web pages | |
US20230325598A1 (en) | Dynamically generating feature vectors for document object model elements | |
AU2022203715B2 (en) | Extracting explainable corpora embeddings | |
EP4058888A1 (en) | Dynamic location and extraction of a user interface element state in a user interface that is dependent on an event occurrence in a different user interface |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21700463 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3162860 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021700463 Country of ref document: EP Effective date: 20220816 |
|
ENP | Entry into the national phase |
Ref document number: 2021207357 Country of ref document: AU Date of ref document: 20210111 Kind code of ref document: A |