WO2022037818A1

WO2022037818A1 - Device and method for interaction with a graphical user interface and for testing an application

Info

Publication number: WO2022037818A1
Application number: PCT/EP2021/065301
Authority: WO
Inventors: Andreas Rau; Jenny Rau; Andreas Zeller
Original assignee: CISPA - Helmholtz-Zentrum für Informationssicherheit gGmbH
Priority date: 2020-08-17
Filing date: 2021-06-08
Publication date: 2022-02-24
Also published as: DE102020121565A1

Abstract

The invention relates to a device (1) and a method for interaction with a graphical user interface and for testing an application. A voice command is provided, from which a text element is identified. A lexical content with respect to an interactive element (22, 23, 24) of the user interface is determined. A semantic degree of similarity of the text element with the lexical content is determined and it is then interacted with the interactive element (22, 23, 24) as a function of the degree of similarity.

Description

Device and method for interacting with a graphical user interface and for testing an application

The invention relates to a method for interacting with a graphical user interface, the graphical user interface having an interaction element.

This is to be understood in such a way that the user interface has exactly one or more than one interaction element. As a rule, the user interface will have a large number of interaction elements. An interaction element is an interactive element with a graphical representation, i.e. an element that can be displayed and interacted with to trigger an action. The interaction element can therefore in particular be a so-called "widget". An interaction element can be clickable, for example. An action is triggered when the interaction element is clicked. There are interaction elements that require input, drop-down menus and many other types of interaction elements.

Provision can be made for the interaction with the user interface to control functionalities of an application that uses the user interface as an interface. In particular, provision can be made for a program flow of the application to be controlled by triggering an action by way of an interaction with the interaction element. In particular, an action can be triggered by an interaction with an interaction element, which transfers the state of the application and/or the user interface to a different state.

An interaction can also be referred to as an action taken. An application can be characterized in particular by the fact that it is a software application or a computer program that performs a function that is useful for the user.

Graphical user interfaces, often also referred to as graphical user interfaces, are ubiquitous. Interaction often takes place in that the user clicks on graphically displayed interactive elements with a computer mouse or his fingers and makes entries using a keyboard. This is cumbersome.

There is therefore a need for more ergonomic interaction options.

In recent years, voice controls have been developed that can improve operation. Language assistants usually go a different way here and do not interact with a graphical user interface, but trigger functionalities of the application in a different way.

Direct voice control of graphical user interfaces is only implemented very rudimentarily nowadays. It is known to remotely control graphical user interfaces by using voice commands. The voice commands only replace the manual operation by uttering commands such as "move the cursor to" or "click on the button labeled <name>", where <name> is specifically specified. Flexible voice control is therefore not possible since the user must speak the correct terms and navigate through the graphical user interface in the same way as with manual interaction. The invention also relates to a method for testing an application using a user interface as an interface.

An interaction is facilitated in such an environment since the operating system with which the application is executed generally provides a software interface for interacting with the graphical user interface. Interactions that are normally performed manually can therefore be computer-assisted via the software interface.

Application testing is of great practical importance. In addition to tests that are carried out manually, automated test procedures are becoming more and more important because they can be carried out with less effort and create a high level of comparability.

It is known to use crawlers for this purpose, which essentially trigger randomized interactions with the graphical user interface and successively explore a large number of states of the user interface. Such an approach is slow and can only explore a fraction of the state space.

Recently, alternative methods have been developed in which a complete test run for a test application is initially programmed. This test run is then applied to an alternative application. With such an approach, compared to a randomized crawler, a larger area of the state space of the application to be tested can be explored in the same amount of time, but the method is very complex, since a complete test run has to be programmed for each individual test case. Against this background, the invention is based on the object of creating a method for interacting with a graphical user interface and for testing an application that facilitates navigation through the graphical user interface. Furthermore, it should be achieved that the time required to develop a test protocol for an application is reduced and that different applications can be tested with the same test protocol.

Insofar as variants of the invention are described below, they can be combined with one another as desired, provided that a combination is not ruled out for technical reasons.

In order to solve the aforementioned problem, the invention proposes the features of claim 1. In particular, it is therefore proposed according to the invention in a method for interacting with a graphical user interface of the type described above to solve the stated task that a voice command is provided from which a text element is identified, that a lexical content is determined for the interaction element, that a semantic degree of similarity of the text element to the lexical content is determined and that the interaction element is interacted with depending on the degree of similarity. This is to be understood in such a way that, in addition to the one identified text element, other text elements can also be identified. In addition to the one voice command, other voice commands can also be given. Both will even be the case as a rule.

This enables navigation through the graphic

User interface greatly simplified. So will one

Allows voice control that does not depend on the user interface is bound to given terms. As a result of the determination of a semantic degree of similarity between text elements of the voice command and interaction elements, syntactically completely dissimilar but semantically similar terms can be used. This makes it possible, for example, to use the voice command "Sign me in" on a graphical user interface whose corresponding button is labeled "Login". This enables natural language control without having to use the terminology of the graphical user interface to be controlled.

This is useful not only for people who cannot recognize the terms due to poor eyesight, but also for sighted people, since the interaction elements can be outside the displayed area and therefore not visible to the user at all times.

The voice command can consist of a single word, but as a rule it will be formed by a sequence of text elements. A text element can consist of a single word or be formed by a sequence of words. The voice command is preferably formed from words in a natural language.

The voice command is preferably formed according to grammar rules. This is particularly preferably a formal grammar.

The grammar preferably divides text elements into different classes. Some possible classes are described in more detail below by way of example. The identified text element preferably belongs to a class whose text elements

designate interaction elements. Provision can be made for the provided voice command to be broken down into a sequence of text elements with the aid of a computer. In this case, the sequence can also consist of just one text element, even if the sequence regularly includes a plurality of text elements. The breakdown can take place, for example, by means of a text parser. This is preferably done according to the rules of the aforementioned grammar.

For example, it can be provided that the grammar is used to identify whether a text element consists of a single word or a sequence of several words. For this purpose, for example, signal words or punctuation marks can be provided which cause several words to be combined into one text element.

A list is preferably created which contains those text elements which belong to a class whose text elements designate interaction elements, in the order in which they occur. Any existing macros are resolved beforehand.

The voice command can preferably be formed according to the rules of the aforementioned grammar, otherwise freely. However, it is advantageous if additional information about an application that has a similar functionality to the application whose user interface is to be interacted with is used to set up the voice command. For this purpose, description documents of such an application or also results from testing such an application can be used. It is important here that it can be a different application with a user interface that differs from the user interface that is the subject of the method according to the invention. This is a great advantage of the invention, since once created the voice command on a Many number of different applications can be applied.

Provision can be made for the voice command to be provided by converting a spoken input into a sequence of text elements. In principle, any voice recording device and any voice recognition method that appear suitable to the person skilled in the art can be used for this purpose. Alternatively or additionally, it can be provided that the voice command is provided by providing a sequence of text elements. For example, this can be done by entering a key or by reading out the content of a text file. In this regard, too, the sequence of text elements can only have a single text element, although it preferably has more than one text element.

In principle, any method known from computer linguistics for determining a semantic similarity can be used to determine the degree of semantic similarity between the text element and the lexical content. For example, the cosine similarity of the two text elements being compared can be calculated, normalized to the length of the strings being compared. Cosine similarity returns a numerical value in an interval [−1, 1], where values close to 1 denote a high degree of semantic similarity.

The basis for determining the semantic degree of similarity can be a model space for a set of words, with each word in the set of words being assigned an element in the model space. A metric exists in the model space with which a distance between the elements can be determined, the distance describing a semantic distance between the elements. For example, the model space can be an n-dimensional vector space with n >= 2. The elements can be vectors in the vector space and the metric can be given by an angle between vectors of the vector space, for example.

To match the words elements of the model space or . To assign vectors, relationships between words can be determined starting from a text corpus. Words with a high level of semantic similarity are then assigned to elements of the model space that are at a small distance from one another. For example, a well-known model is the word2vec model, which is trained with a neural network.

To execute the method, the text elements to be compared with one another are then compared with the model space, so that elements from the model space are assigned to the text elements. The semantic distance between the text elements can then be determined by means of the metric.

If the semantic similarity of a first text element to a second text element is to be determined, one or both of which consist of several words in the model space, it can be provided that the semantic similarities are each calculated in pairs and arranged in a matrix. Provision can then be made for the matrix entries determined in this way to be assigned a scalar which describes the semantic similarity. For this purpose it can be provided that a maximum value is determined for each row or for each column and the values determined in this way are added to one another. For normalization, the sum can then be divided by the number of added values. If cosine similarity is used for each entry in the matrix, the scalar determined in this way is again between -1 and 1. The basis for determining a semantic degree of similarity can therefore be a model of a set of words, with the model depicting relationships between the words. The relationships can consist, for example, in a joint occurrence in a context, for example in a sentence or some other unit of meaning. A large number of methods in this regard are known to the person skilled in the art.

Such relationships between words are not taken into account in purely syntactic comparisons of text elements.

In order to be able to assign suitable lexical content to the interaction element, provision can be made for the lexical content to be determined for the interaction element by identifying lexical content arranged on the interaction element.

This approach fails when on the

Interaction element no lexical content is arranged.

However, a description characterizing the interaction element will generally be located in its vicinity. A graphically represented descriptive element is regularly spatially assigned to the interaction element.

It can therefore be advantageous if, in addition to or instead of identifying lexical content arranged on the interaction element, the lexical content of the interaction element is determined by determining distances from surrounding descriptive elements with lexical content and selecting the lexical content of the closest descriptive element . For this purpose it can be provided, for example, that Euclidean distances are determined by the interaction element to its neighboring descriptive elements, in which case the descriptive element with the smallest distance is selected. To determine the distance between two elements, for example, the Euclidean distance from corners of the elements, for example the top left corner of the elements, can be determined. It can be provided here that a description element is preferably selected which overlaps with the interaction element.

It is preferably first checked whether lexical content is arranged on the interaction element. If this check is negative, the immediate surroundings can then be examined for lexical content as described.

Provision can be made for only those elements with which an interaction is not possible to apply as descriptive elements. However, this is not mandatory since an adjacent interaction element can also carry a common description. It can therefore also be provided that all graphically represented adjacent elements with lexical content come into consideration as descriptive elements.

In a further advantageous embodiment of the method it can be provided that the lexical content of the interaction element or of the description element is determined by being taken directly from a text field of a data structure describing a current state of the user interface. Such a data structure is often available via a graphical user interface software interface. The data structure can have information about the interaction element or the description element in the form of data fields. Here, the text field can in particular a dem Interaction element or the description element assigned data field of the data structure.

If no such data field is available, the lexical content of the interaction element or the descriptive element can also be determined by applying a text recognition and/or image recognition method to the graphical representation of the interaction element or the descriptive element. All common methods can be used for this. OCR (optical character recognition) methods, for example, can be used for text recognition.

It can happen that the text field of the data structure or the text determined by the text recognition contains a character such as an icon that is not covered by the grammar. Provision can be made for a method to be used which assigns lexical content to such a character or also to a graphic element determined by image recognition. For this purpose, for example, a list can be stored that contains an assignment of lexical content to the characters and/or graphic elements. In such a case, the method would essentially consist of accessing the corresponding list element.

Provision can be made for the interaction element and/or the description element and/or information about these elements to be identified by means of a data structure describing a current state of the user interface.

A text determined for an interaction element or a description element can still contain unwanted, meaningless Unicode characters, line breaks or contain the like. It can therefore be provided that, in order to determine the lexical content, a recognized text is first cleaned by removing a selection of characters or replacing them with spaces. For example, it can be provided that all special characters are replaced by spaces and all stop words are removed.

Provision can be made for the recognized and preferably already cleaned character string to be broken down into a list of words which then form the determined lexical content.

Navigation on the user interface can be made considerably easier if it is provided that the voice command includes a designation for a macro, the macro in turn including a sequence of text elements. The sequence of text elements for the macro is preferably stored in a text file. This has the advantage that the language command can be designed much more simply and can contain more abstract instructions, while details can be stored in the macros. When processing the voice command, the macros are preferably resolved directly, in that the sequence of text elements formed by the macro replaces the designation of the macro in the voice command. The macro itself can have macros, which are then resolved recursively.

It can be provided that in the event that an interaction requires an input that is not contained in the spoken command with resolved macros, the method is stopped until an input is made. Provision can be made here for a graphically or acoustically perceptible indication to be given to the user of the method. It can be provided that the input via Voice control takes place or via the usual input paths required by the user interface.

Provision can be made for such an interaction element, if it is outside a current display of the graphical user interface, to only be interacted with after it has been displayed within the current display by scrolling the display. In addition, it can be provided that an interaction element is always displayed before an interaction with it.

In an advantageous embodiment of the method, provision can be made for the interaction element to be identified by selecting an interactive element of a current state of the graphical user interface. For the identification, for example, functionalities can be used that are known from conventional crawlers that can access a data structure that describes the graphical user interface.

In order to exclude existing but not visible interactive elements from an interaction, it can be provided that only such an interactive element is selected that has a finite extent. It can also be expedient to select only such an interactive element that is not completely or only partially covered by other displayed elements.

In order to avoid incorrect interactions, in a further advantageous embodiment of the method it can be provided that the interaction element is only interacted with if the ascertained degree of similarity exceeds a threshold value. For example, experiments have shown that when using cosine similarity, a threshold with the value 0 is particularly suitable. Also an overestimated one Threshold should be avoided in order to avoid that intrinsically good candidates are ultimately not selected.

If the threshold value is not reached or no similarity is found in any other way, and if such a threshold value is also not reached when determining further degrees of similarity between text elements of the voice command and lexical contents of interaction elements, or if a similarity is found in some other way, it can be provided that an interaction is selected at random. Interactions that have already taken place previously in the course of executing the method or the voice command are preferably excluded here. Such configurations of the method make it possible to ensure that even larger deviations in the voice command and functionality of the application to which the method is applied do not represent an obstacle.

In a further advantageous embodiment of the method, it can be provided that degrees of similarity of text elements of the voice command to interaction elements of the graphical user interface are determined lexical content and that the interaction element is interacted with first for which the determined degree of similarity is highest. What is decisive in this embodiment of the method is that more than one degree of similarity is determined before the interaction in question. For this purpose, degrees of similarity can be determined from a text element to a plurality of lexical content, from a plurality of text elements to a lexical content and from a plurality of text elements to a plurality of lexical content.

"Several" in this specification means at least two The determination of several degrees of similarity and the selection of a best pair can lead to a considerable flexibility in navigating on a user interface using voice commands.

This becomes particularly evident when it is provided that the order of the text elements of the voice command is not taken into account. Thus it can be provided that a number of text elements for which a degree of similarity is determined, in particular all text elements of the list which contains the text elements which belong to a class which describes the interaction elements, are treated equally. This also makes it possible to successfully execute those voice commands which specify interactions in a specific order, even if the application to which the method is applied requires a different order of interactions or some of the interactions contained in the voice command are not provided at all.

If the order is not to be completely ignored, but also not necessarily to be adhered to, this can be done by introducing a weighting that lowers the degree of similarity the more the later the text element appears in the voice command. Alternatively or additionally, it can be provided that the order of the text elements has priority in the case of the same degrees of similarity or small differences.

It can be provided that at least one action is assigned to the text element of the voice command and that the at least one action assigned to the text element is compared with an action assigned to the interaction element. The degree of similarity of the text element to the interaction element is preferably determined as a function of the result of the comparison. The is particularly preferred Degree of similarity only determined if the actions are compatible. If an interaction element only allows a click as an interaction and an input is linked to the text element, then the actions are not compatible and it would be superfluous to determine a degree of similarity. Provision can preferably also be made for the degree of similarity to be weighted even in the case of mutually compatible actions. An example in which a lowering can be useful is further below in the description of the in FIG. 1 illustrated method described.

For better control of the execution of a voice command, it can be provided that it is stored whether and/or how often the interaction element was interacted with during the execution of the method. Alternatively or additionally, it can be provided that a state model of the user interface is created and/or updated from interactions with the user interface and from the states of the user interface achieved as a result of the interactions. The information collected can be used to further influence the method in an advantageous manner, as the following exemplary embodiment also shows.

Provision can thus be made for the degree of similarity to be reduced if it is recognized that the interaction element has already been interacted with in the course of the execution of the method. For this purpose, a weighting factor is preferably provided, which is multiplied by the previously determined degree of similarity or subtracted from it. The degree of similarity is preferably further reduced the more frequently an interaction element has been interacted with. Such refinements of the method have the advantage, among other things, that the voice command can be processed more quickly and reliably, since in particular it is avoided that a goal-leading action is triggered several times or even recurring.

It can also be provided that in the case of states of the user interface that are reached for the first time in the course of the execution of the method, those text elements of the voice command that require or enable an input as an interaction are executed first. It can be provided that this only takes place if a threshold value for the ascertained degree of similarity is exceeded.

Alternatively, it can also be provided that the ascertained degree of similarity for such text elements is increased.

The state of the user interface and/or the application can be saved after the method or the voice command has been executed. This can be useful, for example, for evaluation or testing purposes.

To achieve the above object, the invention also proposes using a method for interacting with a graphical user interface for executing a function of an application that uses the user interface as an interface, the method according to the invention, in particular as described above or according to a protection claims directed to a method for interaction with a graphical user interface. The voice command is preferably provided by the user, for example by way of a voice input.

In order to solve the aforementioned problem, the invention also proposes the features of the independent claim directed to a method for testing an application. In particular, according to the invention, a user interface is used as an interface in a method for testing using application proposed to solve the stated task that at least one voice command is specified, that a method for interaction with a graphical user interface is performed for each of the at least one voice command, this method according to the invention, in particular as described above or according to a method for interaction with a graphical user interface directed protection claims, is formed, and that a state model of the user interface is generated and / or updated and stored in a retrievable manner, the state model comprising the interactions carried out and the states of the user interface achieved thereby.

Several or a large number of voice commands are preferably specified. In this case, the voice commands preferably test different functions of the application.

The state model can then be used to evaluate the test result. For example, it can be determined which functions the tested application has.

The method steps mentioned above are each preferably carried out with the aid of a computer. All method steps are preferably carried out with the aid of a computer. However, it is preferably provided that at least the voice input takes place by a user speaking a voice command. Alternatively, this can also be read out from a file using a computer.

In order to solve the aforementioned problem, the invention also proposes the features of the independent claim directed to a technical device. In particular, therefore, according to the invention in a technical device for Solution to the stated object proposed that the technical device is set up to perform a method for interacting with a graphical user interface and / or a method for testing a user interface, for example the aforementioned user interface, as an interface using application, the respective method according to the invention, in particular as described above or according to one of the claims directed to a corresponding method.

In order to achieve the aforementioned object, the invention also proposes an arrangement with a technical device and with a technical device on which an application using the user interface as an interface is stored ready for use and access. The technical device is designed according to the invention, in particular as described above or below. Furthermore, at least one interface is formed, which enables data to be transmitted from the technical device to the technical device. Preferably, the at least one interface also enables data to be transmitted from the technical device to the technical device.

The interfaces can in particular be hardware interfaces or software interfaces.

The technical device and/or the technical facility is preferably an electronic device such as a computer or a smartphone. They can be the same or separate devices.

On the technical device and/or on the technical facility, input means for inputting a

Trained voice command. The technical device preferably has at least one data memory and one data processing unit. A computer program can be stored in the data memory, which is executed to carry out the methods described above.

The technical device preferably has a display on which elements of the graphical user interface to be displayed can be displayed.

A preferably optical sensor, such as a video camera, can be designed with which what is shown on the display can be detected. The sensor can be connected to the technical device in order to transmit the recorded data to it.

The technical device and the technical facility can use common resources such as a shared data memory or a shared processor.

The invention will now be described in more detail using a few exemplary embodiments, but is not limited to these few exemplary embodiments. Further exemplary embodiments result from the combination of the features of individual or multiple claims with one another and/or with individual or multiple features of the exemplary embodiments.

It shows :

Fig. 1 shows a flow chart of an exemplary embodiment of a method according to the invention for interacting with a graphical user interface,

Fig. 2 is a flowchart of an embodiment Method for testing an application that has a user interface,

Fig. 3 shows a graphical user interface display that is in a specific state.

Fig. 4 to fig. 6 in each case an exemplary embodiment of an arrangement according to the invention.

In the following description of the invention, elements that have the same function are given the same reference numbers, even if the design or shape differs.

Fig. 1 shows an exemplary embodiment of a method 100 according to the invention.

In method step 101, the method is initialized by providing a voice command that is to be applied to an application with a graphical user interface. The application can be an online shop, for example. For example, the voice command can be "Sign me in".

A macro can be stored for the designation "Sign me in", which reads as follows, for example:

Sign me up for Mache

Click Sign In

And write John Doe in username

And write 123456 in password

The voice command can be provided by voice input. To do this, the user speaks the words “Sign me in Using a microphone and a voice recognition process, the voice input is converted into the voice command, i.e. into a sequence of words.

In step 102, the voice command is then broken down into its text elements using a grammar and a parser, with the text elements being divided into classes.

The text element "Sign me in" belongs to a first class of words. The first class describes a macro . The parser recognizes this from a list of labels, which also includes the text element "Sign me in" . To execute the voice command "Log me in", the macro "Log me in" is then executed. As a result, the name of the macro is replaced by its content.

The text elements "For" , "Do" and "And" belong to a second class . The second class describes words that control the voice command . The words "For" and "Do" indicate that the intervening text element is a label of a macro The word "And" means, for example, that in addition to the interaction before the "And" there is another interaction that is described after the "And".

The text element "click" and the word combination "write" and "in" belong to a third class. The third class relates to the type of interaction. For example, the word "click" can be interpreted in such a way that the interaction element "login" is to be clicked . This can be a short or a long click . The text element consisting of the word combination "Write" and "In" means that text element between the words "Write" and "In" as text in the input to be entered, which belongs to the interaction element after the word "write". The text elements "login", "username" and "password" belong to a fourth class. The fourth class describes words that designate interaction elements. As will be explained in more detail below, it is not necessary for the application to use exactly these terms fe is used It is sufficient that there is a semantic similarity .

A list is then created listing the text elements of the fourth class in the order of their occurrence. Associated with each element of the list are all the actions compatible with the type of interaction taken from the text elements of the third class. Actions that are possible for compatible interaction elements are assigned here. To access the previous example, the action "click" and "long-click" is associated with the word "click" if the graphical user interface allows such actions.

In step 103 the current state of the graphical user interface is analyzed. As a rule, the operating system on which the application is running provides an interface that can be used to analyze the current status of the graphical user interface. For example, Android or websites on desktop PCs offer the possibility of accessing a structure that hierarchically lists all graphical elements of the current state of the graphical user interface and provides some basic information about these elements, such as their type, location, horizontal extent and vertical plane . It is also possible to see whether the elements currently appear on the displayed display or whether they first have to be reached by scrolling the display.

From the structure can therefore be particularly interactive Elements and the interaction options with them, such as clicking, entering text or selecting from a drop-down menu, and deriving a possible designation directly linked to the interaction element and graphically displayed on it. A possibility is also provided to interact with the interaction elements by using a peripheral device or by a programmed instruction replacing it.

Non-interactive elements such as purely descriptive elements can also be derived for which properties such as location, horizontal extent and vertical plane can also be stored. The descriptive elements can in particular have a text-based or an image-based description (such as an icon).

In the exemplary embodiment described here, in step 103 the interaction elements and the description elements of the current state of the graphical user interface are specifically determined, together with all available information about these elements that is required for the further process.

In step 104 the respective designation of the current interaction elements is determined. This is easy if a designation is directly linked to the interaction element, since this can then be taken from the information available on the interaction element. This is the case, for example, in FIG. 3 in the case of the interaction element provided with the reference number 24, in which the designation "login" is represented graphically on the interaction element 24.

Frequently, however, a graphically displayed designation is not directly linked to the interaction element, but there is a non-interactive description element in the vicinity of the interaction element. This is true in Fig. 3, for example, on the interaction element 22, respectively. 23 to which the descriptive element 25 ("Username") or 26 ("Password") is assigned.

In order to assign the correct descriptive element to these interaction elements, provision can be made to determine the distance between the surrounding descriptive elements and the respective interaction element, with the distances between the respective upper left corners being determined, for example. Then, for example, that descriptive element that has the smallest distance to the interaction element can be assigned to the interaction element. If there are overlapping elements, it can be provided that an overlapping element with the smallest distance is selected, otherwise a non-overlapping element with a smallest distance is selected.

The lexical content of the descriptive elements is then extracted. This can be linked directly to the description element in the hierarchical structure. If this is not the case, for example because an image file is stored, the lexical content can be extracted, for example, using a text recognition method (e.g. using OCR software) or, in the case of icons, using a method that assigns lexical content to the icon assigns .

As already described above, the recognized text can still be cleaned up and broken down into words for the final determination of the lexical content.

As a result, in step 104 a lexical content is determined for the interaction elements that designates them. In step 105 a semantic degree of similarity between the words of the list created in step 102 and the lexical content determined in step 104 is then determined. The dashed lines indicate that data determined from steps 102 and 104 are processed in step 105 . Possible methods for the determination have already been mentioned above. For example, if the list has N words and there are M interaction elements with associated lexical content, then a total of up to N*M degrees of similarity are determined.

In order to reduce the number of calculations required, it can be provided that only such degrees of similarity are determined in which the actions are compatible with one another. This can be done by checking whether the actions assigned to a text element of the list contain an action that can be carried out with the interaction element. If such an action is not possible, there is no need for a similarity calculation of the semantic content, since an interaction with the interaction element is ruled out.

In step 106 the calculated degree of similarity is modified by means of a weighting. If the degree of similarity is expressed by a real number, the weighting can be multiplicative. Alternatively, subtraction with a number can also be provided.

Firstly, provision can be made for certain actions to reduce the calculated degree of similarity. For example, if a word in the list contains an optional data element, a click action can be assigned to it, whereby the degree of similarity is reduced. In this way, for example, drop-down menus can be handled adequately, which only reveal an editable input when they are clicked on. Secondly, provision can be made for the calculated degree of similarity to be reduced if it is established that the interaction element has already been interacted with during the execution of the method and the voice command. Provision can be made for the degree of similarity to be reduced further and further as the number of interactions increases. In this way it can be avoided that undesired chains of interaction are repeated and it can be achieved that new paths of interaction are reached. In order to make this possible, it can be provided that the states reached during the execution of the voice command and the interactions carried out are stored.

A suitable candidate for the next interaction is then selected in step 107 . For this purpose, that interaction element is selected for which—after possibly being lowered in step 106—the highest

Degree of similarity has been determined.

If the highest ascertained degree of similarity is lower than a threshold value, the exemplary embodiment described here provides for any interaction element to be selected, with only one being selected that has not yet been interacted with.

In step 108 the interaction element is then interacted with. If an association exists, applying an action associated with the corresponding text element from the list. If an entry is required, then what is to be entered has already been linked to the corresponding action in step 102 beforehand. If the same effective degrees of similarity are assigned to several actions, one of the two is selected. A termination criterion is then checked in step 109 . For example, it can be checked whether a maximum time has been reached and/or whether a desired target state has been reached. For example, the procedure can be terminated if the registration has been successfully completed in the example discussed here.

Provision can also be made for the selected text element to be deleted from the list after an interaction. In this case, the method can also be ended when the list no longer contains any further elements.

If the termination criterion is met, the method ends in step 110 . In particular when the method is used to test an application, it can be useful if the states reached during the execution of the voice command and the interactions carried out are saved.

If the termination criterion is not met, the method continues in step 103 .

Fig. 2 shows an exemplary embodiment of a method 200 for testing an application that has a graphical user interface.

To carry out the method, a large number of voice commands are first provided in step 201 . This can be done by speaking or by providing a text file containing the voice commands.

In step 202 it is then checked whether all voice commands have already been processed or whether a voice command has remained unprocessed. If a voice command is still unprocessed, in step 203 the method 100 shown in FIG. 1 is shown . The result of the method 100 is added to an initially empty state model that already exists after a first iteration, the state model including the interactions carried out for the respective voice command and the states of the graphical user interface achieved as a result.

Then the method in step 202 with the remaining

voice commands continued.

If no more voice commands need to be executed, the method

200 ends in step 204 in that the created state model is stored so that it can be called up for further analysis.

Fig. 3 shows a graphical user interface display 14 which is in a current state. In the state shown, a total of three interactive interaction elements 22, 23 and 24 and two non-interactive description elements 25 and 26 are displayed. In order to log in, it is first necessary to enter the user name and password in fields 22 and 23 before logging in can take place via button 24 . The lexical content "Login" is arranged on the interaction element 24. In the two input fields 22 and 23, the lexical content "Username" and "Password" assigned to them is located in the respective nearest description fields 25 and 26. Elsewhere is described in more detail that and how a registration takes place by means of the voice command "Sign me in" through interaction with the user interface. The voice command can be executed successfully despite the different order of the text elements provided in the voice command and despite considerable syntactic deviations from the text fields and the user is successfully logged in.

The methods 100, 200 described above can be used, for example, with the arrangements 12 or be performed with the technical devices 1, according to FIG. 4 to fig. 6 are formed and are described below. For this purpose, the method steps are carried out automatically by means of the technical device 1, which can be a server or a smartphone, for example.

The arrangement 12 includes the technical device 1 and the technical facility 2 .

In Fig. 4 and figs. 5, device 1 has resources that are separate from device 2. In Fig. 6 shows an example in which the technical device 1 and the technical device 2 share resources such as the processor 15 and the data memory 16 .

For example, the technical device 2 can be an electronic device 13 such as a computer or a mobile device, preferably a smartphone. The electronic device 13 can also form the technical device 1 at the same time, as shown in FIG. 6 . The technical device 1 can, however, as shown in FIG. 4 or fig . 5 can also be a separate device such as a server.

The technical facility 2 and/or the technical device 1 have a display 14 on which the current status of graphic elements of the graphic user interface can be displayed.

On the technical device 2 is a the

Using user interface as interface

Software application from stored ready for use and access . A computer program is stored on the technical device 1 ready for use and access.

The technical device 2 has a data input interface 6 via which data can be received.

The technical device 2 can receive data from the data output interface 3 of the technical device 1 via the data input interface 6 . The received data can include, for example, specific actions to be performed by interaction elements of the user interface.

The technical device 1 has a data input interface 9 via which data can be received. As in Fig. 4, the data input interface 9 can be connected to a data output interface 17 of the technical device 2 via a wireless and/or wired data line 10 such as an Internet connection.

In Fig. 4 and figs. 6, lexical content of elements of the user interface is determined via a data structure or other memory content stored in the form of data. In Fig. 5 it is provided that the graphic representation of the user interface currently displayed on the display 14 is recorded by means of a sensor 8 such as a video camera and the recordings are transmitted to the technical device 1 .

The technical device 1 and the technical facility 2 each have at least one data memory 16 and one processor 15 . The latter are shown in Fig. 4 and figs. 5 not shown explicitly for reasons of clarity. In Fig. 4 and 5 the interfaces 3, 6, 9 and 17 are as hardware interfaces and in Fig. 6 as

Software interfaces formed from . In summary, the invention relates to a device 1 and a method for interacting with a graphical user interface and for testing an application. A voice command is provided from which a text element is identified. A lexical content is determined for an interactive element 22, 23, 24 of the user interface. A semantic degree of similarity of the text element to the lexical content is determined and the interactive element 22, 23, 24 is then interacted with depending on the degree of similarity.

/ List of References

List of reference symbols Technical device Technical device Data output interface of 1 Data input interface of 2 Sensor Data input interface of 1 data line Test arrangement Electronic device Display of a graphical user interface Processor Data memory Data output interface of 2 interaction element further interaction element further interaction element descriptive element further descriptive element

/ Expectations

Claims

34

Claims Method for interacting with a graphical user interface, the graphical user interface having an interaction element, characterized in that a voice command is provided from which a text element is identified, that a lexical content for the interaction element is determined with computer assistance, that a semantic degree of similarity of the text element to the lexical content is determined in a computer-aided manner and that the interaction element is interacted with depending on the degree of similarity. Method according to the preceding claim, characterized in that the lexical content for the interaction element is determined by identifying lexical content arranged on the interaction element and/or by determining distances from surrounding description elements with lexical content and the lexical content of the closest description element is selected . Method according to one of the preceding claims, characterized in that the lexical content of the

interaction element or the descriptive element is determined by taking it directly from a text field of a data structure describing a current state of the user interface and/or by applying a text recognition and/or image recognition method to a graphical representation of the interaction element or the descriptive element, in particular with an initially identified graphic element and/or in the case of an initially identified character that is 35 is not included in the grammar, a method is used which assigns lexical content to the graphic element and/or the character. Method according to one of the preceding claims, characterized in that the voice command is provided by a spoken input is converted into a sequence of text elements and / or that the

Voice command is provided by a sequence of text elements, in particular by a keyboard input or by a text file, is provided. Method according to one of the preceding claims, characterized in that the voice command is broken down into a sequence of text elements with the aid of a computer, preferably by means of a parser. Method according to one of the preceding claims, characterized in that the voice command includes a designation for a macro, the macro for its part including a sequence of text elements preferably stored in a text file. Method according to one of the preceding claims, characterized in that the interaction element is identified by selecting an interactive element of a current state of the graphical user interface, in particular wherein the interactive element has a finite extent and/or is unobscured. Method according to one of the preceding claims, characterized in that the interaction element is only interacted with if the determined degree of similarity exceeds a threshold value. Method according to one of the preceding claims, characterized in that semantic degrees of similarity of one or more than one text element of the voice command to one or more than one lexical content, which is determined for one or more than one corresponding interaction element, are determined and that with demj enigen Interaction element is interacted first, for which the certain degree of similarity is highest. Method according to one of the preceding claims, characterized in that the text element of the voice command is assigned at least one action and that the at least one action assigned to the text element is compared with an action assigned to the interaction element, in particular wherein the determination of the degree of similarity of the text element to the interaction element in Dependence on the result of the comparison takes place. Method according to one of the preceding claims, characterized in that it is stored whether and/or how often the interaction element has been interacted with during the execution of the method and/or that from interactions with the user interface and from the states achieved as a result of the interactions user interface a state model of the

User interface is created and/or updated. Method according to one of the preceding claims, characterized in that the degree of similarity is preferably reduced by means of a weighting factor if it is recognized that the interaction element has already been interacted with during the execution of the method. Method according to one of the preceding claims, characterized in that in the case of states of the user interface which are reached for the first time in the course of the execution of the method, those text elements of the voice command which require or enable an input as an interaction are executed first. Use of the method according to any one of the preceding claims for executing a function of a die

Application using the user interface as an interface, in particular where the user uses the

voice command provides . Method for testing an application using a user interface as an interface, characterized in that at least one voice command is specified, that the method according to one of the preceding claims is carried out for each of the at least one voice command and that a state model of the

User interface is generated and / or updated and stored in a retrievable manner, the state model comprising the interactions carried out and the states of the user interface achieved as a result. Technical device (1), characterized in that the technical device (1) is set up, the method for interacting with a graphical user interface according to one of claims 1 to 13 and/or the method for testing an application using one or the user interface as an interface carry out according to claim 15, in particular wherein input means are designed for entering a voice command. 38 arrangement (12) with a technical device (1) according to the preceding claim and with a technical device (2) on which an application using the user interface as an interface is stored ready for execution and access, with at least one

Interface is formed, which enables a transmission of data from the technical device (1) to the technical device (2).

/ Summary