CN111857635A

CN111857635A - Interaction method, storage medium, operating system and device

Info

Publication number: CN111857635A
Application number: CN201910361806.6A
Authority: CN
Inventors: 杨扬; 袁志俊; 李晓鹏; 王雷; 王恺
Original assignee: Alibaba Group Holding Ltd
Current assignee: Banma Zhixing Network Hongkong Co Ltd
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2020-10-30

Abstract

The embodiment of the invention provides an interaction method, a storage medium, an operating system and equipment, wherein the method comprises the following steps: displaying an interface, wherein the interface comprises at least one view component; receiving voice information; acquiring a keyword used for describing a view component in the voice information, wherein the view component is associated with at least one attribute; a view component matching the keyword is determined from the at least one view component. From the perspective of user perception, a view component included in the interface presents visual features of multiple dimensions for the user to perceive and describe, and the visual features of multiple dimensions correspond to multiple attributes of the view component, for example, a certain view component presents a rectangular visual feature and a red visual feature, and the two visual features correspond to two attributes of the shape and the color of the view component. Therefore, in the scheme, by expanding multiple attribute dimensions, a user can operate the view component in a more flexible and more diversified representation mode through a voice interaction mode.

Description

Interaction method, storage medium, operating system and device

Technical Field

The present invention relates to the field of internet technologies, and in particular, to an interaction method, a storage medium, an operating system, and an apparatus.

Background

Various human-computer interaction modes have been widely applied to different human-computer interaction scenes, such as touch interaction on a view component displayed in an interface, voice interaction with an application program, and somatosensory interaction, gesture interaction and the like in scenes such as virtual reality.

In the prior art, various human-computer interaction modes are mutually independent, and one application program often singly supports a certain interaction mode. For example, an application supports a touch interaction mode, an interface is displayed on a screen, in response to a touch operation of a user on a view component in the interface, an operating system notifies the application that the view component is triggered, so that the application calls a corresponding callback function to respond, and the response result is, for example, embodied as displaying another interface on the screen.

Disclosure of Invention

The embodiment of the invention provides an interaction method, a storage medium, an operating system and equipment, which are used for expanding an interaction mode of a service object (such as an application program).

In a first aspect, an embodiment of the present invention provides an interaction method, where the method includes:

displaying an interface, wherein at least one view component is included in the interface;

Receiving voice information;

acquiring a keyword used for describing a view component in the voice information, wherein the view component is associated with at least one attribute;

determining a view component matching the keyword from the at least one view component.

In a second aspect, an embodiment of the present invention provides an interaction apparatus, where the apparatus includes:

the display module is used for displaying an interface, and the interface comprises at least one view component;

the receiving module is used for receiving voice information;

the acquisition module is used for acquiring keywords used for describing a view component in the voice information, and the view component is associated with at least one attribute;

a determining module for determining a view component matching the keyword from the at least one view component.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory stores executable codes, and when the executable codes are executed by the processor, the processor is caused to implement at least the interaction method described in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to implement at least the interaction method in the first aspect.

In a fifth aspect, an embodiment of the present invention provides an operating system, including:

the display control unit is used for controlling a display screen to display an interface, and the interface comprises at least one view component;

the input control unit is used for controlling the voice input equipment to receive voice information;

the interaction engine is used for acquiring keywords used for describing the view components in the voice information, and the keywords are used for describing the attributes of the view components; determining a view component matching the keyword from the at least one view component.

In a sixth aspect, an embodiment of the present invention provides an interaction method, where the method includes:

receiving voice information;

determining a view component described by the voice information from the at least one view component according to the recognition result of the interface;

the determined view component is responded to.

In the embodiment of the invention, a scheme for operating the view component in a voice interaction mode is provided, that is, for the view component in the interface, the view component can be operated not only in a traditional touch interaction mode but also in a voice interaction mode. Specifically, when a user wants to operate a view component in the interface, the user can speak voice information for operating the view component, the voice information includes a keyword for describing at least one attribute of the view component to be operated, the keyword included in the voice information is identified, and then a view component matching the identified keyword is determined from at least one view component included in the interface, the determined view component is regarded as the view component which the user wants to operate, and then execution of a response event which should be executed when the view component is triggered can be controlled.

From the perspective of user perception, a view component included in the interface presents visual features of multiple dimensions for the user to perceive and describe, and the visual features of multiple dimensions correspond to multiple attributes of the view component, for example, a certain view component presents a rectangular visual feature and a red visual feature, and the two visual features correspond to two attributes of the shape and the color of the view component. Therefore, in the scheme, by expanding multiple attribute dimensions, a user can operate the view component in a more flexible and more diversified representation mode through a voice interaction mode.

In a seventh aspect, an embodiment of the present invention provides an interface control method, where the method includes:

receiving voice information;

and outputting prompt information aiming at the view component described by the voice information.

In the interface control method, a user can operate view components in a voice interaction mode, when the user wants to operate a certain view component in an interface, voice information for operating the view component can be spoken, and when the view component described by the voice information is determined in at least one view component in the interface, prompt information can be output to prompt the user which view component is selected, so that the view component which the user wants to operate in the voice interaction mode can be accurately positioned.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of an interaction method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating the components and interactions of a view component in an interface according to an embodiment of the present invention;

FIG. 3 is a flow chart of another interaction method provided by the embodiments of the present invention;

FIG. 4 is a schematic diagram illustrating the composition and interaction of view components in another interface according to an embodiment of the present invention;

FIG. 5 is a flowchart of another interaction method provided by the embodiment of the present invention;

fig. 6 is a schematic diagram illustrating an operating principle of an operating system according to an embodiment of the present invention;

FIG. 7 is a flowchart of another interaction method provided by the embodiments of the present invention;

fig. 8 is a flowchart of an interface control method according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an interaction apparatus according to an embodiment of the present invention;

Fig. 10 is a schematic structural diagram of an electronic device corresponding to the interaction apparatus provided in the embodiment shown in fig. 9.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well. "plurality" generally includes at least two unless the context clearly dictates otherwise.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

Before describing the interaction method provided by the embodiment of the present invention in detail, a core idea of the interaction method is briefly described.

Taking an application as an example, when an application runs, an interface including at least one view component is displayed, and when a user does not conveniently operate the view component through a touch interaction manner, the operation on the view component can be realized through a voice interaction manner, in short, the view component to be operated is selected in a "what you see is what you say" manner to realize the operation on the view component.

What is called "what you see is what you say" is that for some interactive view components in the interface, such as buttons, check boxes, etc., the user can speak some instructions to manipulate using voice. Such as: the interface comprises two buttons arranged from left to right, namely A and B, and a user can speak the voice information of 'select left' or 'select first' by voice, and then the interaction engine hits the button A to trigger a response event of clicking the button A.

Because the view components are displayed in the interface and can show visual characteristics of multiple dimensions, such as colors, shapes, positions and other dimensions, and different view components are different in at least one visual characteristic, in the embodiment of the invention, the visual characteristics of multiple dimensions can be expanded, so that a user can speak the view component to be operated by the user through the visual characteristics of multiple dimensions in a voice mode, so that the characteristics that different users have different expression habits and the sensitivities to different visual characteristics are different are considered, and the flexibility of operating the view component through a voice interaction mode is improved.

In summary, when a user wants to manipulate a view component, one or more visual features of the view component to be manipulated can be spoken based on the perception of the difference in the one or more visual features between the view component and other view components in the interface, so that the interaction engine can hit the view component to be manipulated by the user from among the view components displayed in the interface based on the user's voice.

For example, with the solution provided by the embodiment of the present invention, the user may implement the operation on the view component by speaking the voice information such as "the button in the middle of the point", "the button for pointing to me in red", "the button for pressing the circle", "the button at the top of the point", "the first item in the selection list", "the first button in the area of the red dot", and the like.

In summary, from the perspective of user perception, the view components included in the interface present visual features of multiple dimensions for the user to perceive and describe, but from the perspective of the view components or from the perspective of the interface structure, the visual features of multiple dimensions correspond to multiple attributes of the view components, for example, a certain view component presents a visual feature of a rectangle or a red color, and the two visual features correspond to two attribute dimensions of a shape or a color of the view component. Therefore, it can also be said that, in the embodiment of the present invention, by extending multiple attribute dimensions, a user can operate a view component in a more flexible and more diverse manner through a voice interaction manner.

The following describes in detail the implementation of the interaction method provided herein with reference to some embodiments as follows.

Fig. 1 is a flowchart of an interaction method according to an embodiment of the present invention, where the interaction method may be executed by an operating system or a service object, where the service object may be an application, a service in a cloud, a page (page), or the like, and the service object displays an interface including at least one view component during a running process. As shown in fig. 1, the method comprises the steps of:

101. and displaying an interface, wherein at least one view component is included in the interface.

102. And receiving voice information.

103. And acquiring keywords used for describing a view component in the voice information, wherein the view component is associated with at least one attribute.

104. A view component matching the keyword is determined from at least one view component included in the interface.

In the embodiment of the present invention, a service object is taken as an example for description.

In the running process of the application program, an interface containing at least one view component is displayed, and for convenience of description and understanding, as shown in fig. 2, it is assumed that one interface comprises three buttons of downloading, collecting and commenting schematically shown in the figure and a search box schematically shown in the upper right corner.

When a user wants to operate one of the view components, the user can speak voice information indicating to operate the view component. For example, if a user wants to search for a video named XYZ, the speech information "search XYZ" can be said. For another example, if the user wants to download a video played in the current video playing window, the user may speak the voice message "download this video", or "click the first button", or "click the button to the left".

Because the application program runs in a certain electronic device, the operating system in the electronic device can control the audio acquisition device to pick up the voice information sent by the user, so that the audio acquisition device can send the voice information to the operating system after acquiring the voice information sent by the user, and the interaction engine in the operating system processes the voice information to finally determine the current operation intention of the user, namely which view component the user wants to operate. Or, an interaction engine may be integrated in the application program, so that the operating system may send the voice information to the application program, and the interaction engine in the application program processes the voice information.

In the process of processing the voice information, firstly, a keyword for describing the view component in the voice information needs to be identified, and specifically, at least one attribute is associated on the view component, so that the keyword can be used for describing at least one attribute of the view component. And further, determining the view component matched with the acquired keyword from the view components contained in the interface. Therefore, when the interaction method provided by this embodiment is executed by an operating system, the operating system may send a notification to the application program that the determined view component is triggered, so that the application program executes a response event that should be executed when the view component is triggered; when the interaction method provided by the embodiment is executed by the application program, the application program directly executes the response event that should be executed when the view component is triggered.

It should be noted that, view components often have multiple attribute dimensions, and in order to implement operations on a certain view component more accurately, a voice message uttered by a user generally includes at least one keyword for describing the view component, where the at least one keyword is used for describing at least one attribute of the view component. That is, the user achieves a hit on the view component to be operated on by visual features that describe one or more attributes of the view component. Accordingly, when at least one keyword for describing the view component is acquired from the voice information of the user, the view component matching at least part of the at least one keyword can be determined from the at least one view component contained in the interface. For example, when three keywords are obtained, a view component matching the three keywords or two or one of the three keywords may be determined from the view components included in the interface.

In the embodiment of the present invention, the attributes of the view component that can be provided for the user to use in voice interaction may include the following attributes: text attributes, number attributes, type attributes, color attributes, shape attributes, graphic attributes, location attributes, containment relationship attributes, and the like.

The text attribute is that text information is displayed in a view component such as a button or a text box, so that a user can hit the corresponding view component by only speaking all or part of the text information.

The number attribute refers to that a plurality of view components displayed in the interface are numbered according to a certain sequence, so that a user can hit the view component by only speaking the number corresponding to the view component.

The type attribute refers to the type of the view component, such as a button, a check box, a search box, a list, and the like. The user can speak directly, say "button", to hit a button in the interface.

And the color attribute refers to the corresponding rendering color of the view component in the interface. For example, the interface includes two buttons, red and blue, and the user may speak the "dot blue button" to name the blue button.

The shape attribute refers to a shape feature presented by the view component, and common shapes include a rectangle, a circle, a rounded square and the like. The user may speak a shape feature of a view component directly to hit the view component.

A graphic attribute, which may also be referred to as a pattern attribute or an image attribute, refers to a certain view component being represented by a certain image on or in close proximity to the certain view component. Such as graphics corresponding to a printer, brush, etc., representing view components corresponding to print and edit functions, respectively.

The location attribute, which may also be referred to as an orientation attribute, refers to the relative location of the view component in the interface. Descriptions such as "first left", "upper right", etc. are descriptions corresponding to the location attribute.

The inclusion relationship attribute, which may also be referred to as a parent-child relationship attribute, is used to describe parent-child relationships between different view components. A common description is "B in A", such as "select the first item in the list", where A represents the type attribute-list and B represents the number attribute-first item. That is, a and B tend to correspond to two different attributes.

In connection with the example of fig. 2, when the speech information is "search XYZ", a keyword corresponding to the genre attribute is recognized: and (6) searching. When the voice message is "download this video", the keywords corresponding to the text attributes are identified: and (6) downloading. When the voice information is "click first button", the keywords corresponding to the number and type attributes are recognized: first, a button. When the speech information is "button left of dot", the keywords corresponding to the location and type attributes are recognized: left, button.

In an alternative embodiment, in the process of processing the voice information, firstly, the voice information can be converted into a text; then according to the provided various attribute dimensions, performing semantic understanding on the text to identify keywords contained in the text and corresponding to the various attribute dimensions; and then according to the identified keywords, determining a view component matched with the acquired keywords in at least one view component contained in the interface.

Optionally, determining a view component matching the identified keyword from at least one view component included in the interface may be implemented as: and determining the view components matched with the identified keywords according to the at least one reference keyword labeled by the at least one view component.

In this way, in the initial design stage, the designer of the application program can label the reference keywords corresponding to different attributes for each view component in combination with the above-mentioned multiple attributes, and save the labeling result. For example, if N attributes are provided, any view component can be labeled with N reference keywords corresponding to the N attributes, where N is greater than or equal to 1.

The reference keywords are named in a form conforming to the expression habit of the user, and describe what a certain view component is under a certain attribute dimension in terms of visual features which can be directly perceived by the user.

For example, taking the color attribute as an example, assuming that a certain button is rendered red, the labeling of the button results in: the color attribute red is not labeled with the color value characteristics of red, such as RGB values, because the RBG values do not conform to the expression habits of the user.

For another example, assuming a button is circular, the labeling of the button results in: the shape attribute is circular and is not labeled with the radius of a circle.

For another example, for a graphic attribute, the labeling result may be a graphic name, such as a printer, a mail, a brush; for the position attribute, the interface may be divided into grid regions, and based on the division result, the labeling result may be orientation information such as left, right, upper left corner, lower right corner, middle left corner, and the like.

Based on the reference keyword labeling results respectively corresponding to each view component under different attributes, after the keywords included in the voice information sent by the user are identified, each reference keyword respectively corresponding to at least one view component included in the current interface can be inquired, then each inquired reference keyword is compared with the keywords identified from the voice information, and if each reference keyword corresponding to a certain view component includes the identified keywords, the view component is determined to be the view component to be operated by the user.

For any attribute, assuming that a keyword corresponding to any attribute identified from the voice information is denoted as a keyword 1, and a reference keyword corresponding to any attribute obtained through query is denoted as a keyword 2, if the keyword 1 is consistent with the keyword 2, the keyword 1 is considered to be matched with the keyword 2; alternatively, if the similarity between keyword 1 and keyword 2 is higher than a set threshold, keyword 1 is also considered to match keyword 2. Optionally, the similarity between the keyword 1 and the keyword 2 may be measured by calculating a cosine distance between vectors corresponding to the two keywords through the vec2word model.

In another alternative embodiment, in the process of processing the voice information, firstly, the voice information may be subjected to text conversion, so that the converted text includes a keyword for describing at least one attribute of the view component; then, a target text template matching the converted text may be determined from the provided text templates, so that it is determined that the view component corresponding to the target text template is a view component matching the keyword included in the voice information, that is, a view component corresponding to the voice information of the user.

In this embodiment, a number of text templates are provided, wherein each text template corresponds to a representation of the selected corresponding view component. That is, for any view component, a plurality of text templates that can uniquely hit the view component may be provided, so that if the voice information uttered by the user matches a certain text template corresponding to a certain view component, the view component is determined to be the view component to be operated by the user.

In the above description of the reference keywords, the text template may be understood as a set of reference keywords corresponding to N attributes. Therefore, after the voice information sent by the user is converted into the text, the text template matched with the text can be searched in the text templates, and the view component corresponding to the text template is the view component to be operated by the user. Specifically, the converted text and each text template may be successively input into a trained network model, the network model measures the similarity between the converted text and a text template, and if the similarity is higher than a set threshold, the two are considered to be matched. The network model may be any model that enables analysis of similarity between sentences.

In summary, after the view component to be operated by the user is determined in the above manner, the execution of the response event that should be executed when the view component is triggered may be triggered. Taking fig. 2 as an example, if the voice information of the user is "i want to comment", based on the above processing procedure, the "comment" button in the drawing is hit, an event that the comment button is clicked is triggered, and based on the triggering of the event, the response result of the application is that a comment area is displayed in the interface in fig. 2.

Therefore, through the scheme, the user can interact with the view component in a more flexible expression mode in a voice mode.

Fig. 3 is a flowchart of another interaction method provided by an embodiment of the present invention, where the interaction method may be executed by an operating system. As shown in fig. 3, the method may include the steps of:

301. and displaying an interface, wherein at least one view component is included in the interface.

302. And receiving voice information.

303. Acquiring at least one keyword used for describing the view component in the voice information, wherein the at least one keyword comprises a first keyword used for describing a first attribute of the view component, and the first attribute comprises any one of the following items: shape attributes, color attributes, graphic attributes.

In this embodiment, the shape attribute, the color attribute, and the graphic attribute are taken as examples, and how to determine the view component to be operated by the user based on keywords when the speech information includes the keywords for describing any one of the attributes is described.

For example, the voice message uttered by the user is "click on that red circle", and at this time, the keyword corresponding to the color attribute can be recognized by performing the voice recognition processing on the voice message: red, and the keywords corresponding to the shape attribute: and (4) a circular shape.

For another example, the voice message sent by the user is "click that blue", and at this time, the keyword corresponding to the color attribute can be recognized by performing voice recognition processing on the voice message: blue in color.

For another example, the voice message sent by the user is "click the brush pen", and at this time, the keyword corresponding to the graphic attribute can be recognized by performing voice recognition processing on the voice message: a painting brush.

When the voice information is recognized to contain the keywords describing any one of the three attributes, the view components matching the recognized keywords can be found in the view components contained in the interface in a manner illustrated by the following steps.

304. Determining a view component that matches at least a portion of the at least one keyword by identifying an image of at least one view component contained in the interface.

Optionally, the image of the at least one view component may be obtained by means of screen capture. Or, optionally, associating each view component with a corresponding image in advance, and obtaining the image of the at least one view component based on the images.

Step 304 may be specifically implemented as: determining a first attribute value corresponding to each of the at least one view component under the first attribute by identifying an image of the at least one view component; and determining the view component matched with the first keyword according to the provided classification labels corresponding to the first attribute, the characteristic information corresponding to the classification labels respectively, and the first attribute value corresponding to the at least one view component under the first attribute respectively.

When the first attribute includes a color attribute, the first attribute value corresponding to the color attribute is a color value characteristic, such as an RGB value, corresponding to the image of the view component. Because the image of the view component comprises a plurality of pixel points, the first attribute value corresponding to the view component under the color attribute can be obtained by averaging the color values of the pixel points. Of course, an interval formed by the minimum value and the maximum value of the color values of a plurality of pixel points can also be taken as the corresponding first attribute value of the view component under the color attribute.

When the shape attribute is included in the first attribute, the first attribute value corresponding to the shape attribute may be a shape feature included in the image of the view component, such as a radius, an aspect ratio, a size, an outline, or the like.

When the graphic attribute is included in the first attribute, the first attribute value corresponding to the graphic attribute may be a contour feature or the like included in the image of the view component.

In this embodiment, for the color attribute, the shape attribute, and the graphic attribute, the classification labels respectively corresponding to the three attributes and the feature information corresponding to each classification label may be preset in combination with the component colors, shapes, and graphics that are commonly found in various interfaces.

For example, classification labels such as red, orange, blue, green, and white may be set for the color attribute, and the characteristic information corresponding to each classification label may be a color value or a color value interval corresponding to the corresponding color.

For another example, for the shape attribute, classification labels such as a circle, an ellipse, a rectangle, a triangle, etc. may be set, and the feature information corresponding to each classification label may be a shape feature corresponding to the corresponding shape. For example, the characteristic information corresponding to the circular classification label is the radius size and the like; the feature information corresponding to the elliptical classification label is the length of the long axis, the length of the short axis and the like; the characteristic information corresponding to the rectangular classification label is length-width ratio, size and the like; the feature information corresponding to the triangle classification label is the length of three sides, etc.

For another example, classification labels corresponding to a plurality of different images may be set for the graphic attributes, and the feature information corresponding to each classification label may be a contour feature corresponding to the corresponding image, or even any image corresponding to the corresponding classification label. For example, a classified label is set as a printer, and the corresponding characteristic information may be an image of the printer.

In the case that the classification label and the feature information corresponding to each classification label are set, the view component matching the first keyword may be determined by combining the obtained first attribute values of the at least one view component under the first attribute. That is, a view component whose first attribute value matches the first keyword may be found from at least one view component based on the setting of the classification tag and the feature information.

In an alternative, the step of determining the view component matching the first keyword may be implemented as:

for any view component in the at least one view component, determining target feature information corresponding to the first attribute value of the any view component according to the feature information corresponding to each classification label;

And if the classification label corresponding to the target characteristic information is matched with the first keyword, determining that any view component is the view component matched with the first keyword.

Taking the first attribute as the color attribute as an example, it is assumed that the first keyword recognized at this time is red. For any view component X included in the interface, the corresponding color value of the view component X under the color attribute is represented as RGB _ X. The classification labels corresponding to the color attributes include classification labels of red, blue, white, orange and the like, and the characteristic information corresponding to each classification label is called color value characteristic information. Comparing the color value RGB _ X of the view component X with the color value characteristic information corresponding to each classification label to find out target color value characteristic information containing the color value RGB _ X, further comparing whether the classification label corresponding to the target color value characteristic information is matched with a first keyword, namely red, and if the classification label is matched with the first keyword, determining that the view component X is the view component corresponding to the first keyword.

Taking the first attribute as the graphic attribute, assume that the first keyword identified at this time is a brush. Regarding any view component X included in the interface, the first attribute value corresponding to the view component X under the graphic attribute is an image associated with the view component X, and the association may be embodied as that the image is presented on the view component X from the viewpoint of display effect, such as an icon, and may also be embodied as that the image is displayed in a position adjacent to the view component X. The classification label corresponding to the graphic attribute is assumed to include a plurality of object classification labels, and the feature information corresponding to each classification label is referred to as image feature information. The image of the view component X may be input into a classification model, so as to perform feature extraction on the image through the classification model, and compare the extracted features with image feature information corresponding to each classification label to determine target image feature information corresponding to the extracted features. And further, if the classification label corresponding to the target image characteristic information is matched with a first keyword, namely the brush pen, determining that the view component X is the view component corresponding to the first keyword.

The shape attribute is similar to the processing procedure of the color attribute and the graphic attribute, and is not described again.

It should be noted that in some practical scenarios, under a certain first attribute, for example, a color attribute, there may be a case where feature information corresponding to different classification tags partially overlaps, for example, two classification tags with similar colors: red and orange-red, the color value intervals corresponding to the red and orange-red may partially overlap. At this time, for any view component X, the corresponding first attribute value, i.e. color value, under the color attribute may fall into the color value interval of the two classification labels at the same time. At this time, the degree of matching between the color value of the view component X and the color value interval corresponding to each of the two classification labels may be calculated, and thus, the color value interval with the highest degree of matching is selected as the target color value interval corresponding to the color value of the view component X.

That is, the aforementioned step of "determining target feature information corresponding to the first attribute value of any view component" may be implemented as: determining the matching degree between the first attribute value of any view component and the characteristic information corresponding to each classification label; thus, the feature information with the highest matching degree is determined to be the target feature information corresponding to the first attribute value of any view component.

Besides the determination of the view component matching the first keyword is implemented in the above manner, optionally, the step of determining the view component matching the first keyword may also be implemented by:

determining a target classification label matched with the first keyword in each classification label corresponding to the first attribute;

for any view component in the at least one view component, if the first attribute value of the view component matches the feature information corresponding to the target classification tag, determining that the view component is a view component matching the first keyword.

In the previous implementation manner, target feature information matched with a first attribute value is found based on the first attribute value corresponding to any view component under the first attribute, and then the matching of a first keyword and a classification label corresponding to the target feature information is compared to determine whether any view component is a view component matched with the first keyword. In the implementation manner, the target classification tag matched with the first keyword is found, and then whether any view component is the view component matched with the first keyword is determined according to whether the first attribute value corresponding to any view component under the first attribute is matched with the feature information corresponding to the target classification tag. The target classification label matched with the first keyword means that the first keyword is the same as or similar in semantic meaning to the label name corresponding to the target classification label.

The determination process of the view component matched with the first keyword can be realized locally in an operating system or in a cloud.

The foregoing describes how, when a first keyword describing any one of a color attribute, a graphic attribute, and a shape attribute is included in voice information, a view component corresponding to the first keyword is found from at least one view component included in an interface with respect to the attributes. However, besides the first keywords corresponding to the above attributes, the voice message may also include keywords describing other attributes.

Optionally, if at least one keyword included in the voice information of the user includes a second keyword used for describing a second attribute, where the second attribute is any one of a type attribute, a location attribute, a text attribute, and a number attribute, at this time, determining a view component matched with the second keyword may be implemented as: and determining the view component matched with the second keyword from the second attribute value of at least one view component contained in the interface. That is, when the voice message includes a second keyword for describing the second attribute, the view component matching the second keyword may be determined by directly querying the attribute value of each view component in the current interface.

When an interface is initially designed, if a view component is deployed in the interface, a structure corresponding to the view component is generated, and attribute values corresponding to the view component in various attribute dimensions are described in the structure.

Taking the second attribute as the type attribute as an example, at this time, based on the consistency between the type value of at least one view component in the interface and the second keyword, a view component with a type value consistent with the second keyword is determined from at least one view component included in the current interface as the view component matched with the second keyword. For example, when the user says the voice message "click that button," the view component in the interface belonging to the button type may be hit. Of course, at this point, multiple view components may be hit based only on the type attribute. The conflict resolution mechanism when multiple view components are hit is described below.

For another example, when the second attribute is a position attribute, a position coordinate of the view component in the interface may be described in the structural body corresponding to the view component, where the position coordinate is a relative coordinate of the view component with respect to a certain reference position, such as a vertex position at an upper left corner of the interface. It will be appreciated that the second keyword included in the user's speech information may not be an exact location coordinate, and is often an ambiguous location description, such as the upper left corner, middle, etc. Therefore, after the position coordinates of at least one view component included in the interface are obtained, the relative position relationship of each view component can be known according to the position coordinates of each view component, and based on the relative position relationship, the view component which is most matched with the second keyword can be found from the at least one view component. For example, when the user's voice message is "click the left button", and if two rows of buttons are included in the interface, the left-most button in each of the two rows has the same abscissa, then the left-most button in each of the two rows is hit.

The following will exemplify the interface structure shown in fig. 4. The interface is a video playing interface, three buttons of downloading, commenting and collecting are arranged at the upper right part of the interface, and the buttons are rendered into red. The upper left of the interface includes two circular buttons representing the minimize and close functions, respectively, and the two buttons carry the graphics illustrated in the figures. The middle area of the interface is a video playing window, and a certain video is supposed to be played currently. Below the interface are two triangular buttons corresponding to the play previous and play next function, respectively, and a circular button corresponding to the start/pause play function, rendered in red, and filled with graphics indicating different play states, such as the playing state represented by the two vertical bars in the figure, and the pause state represented by the triangle pointing to the right.

Assuming that the current user utters the speech information "the red round button in the middle of the point", the following structure is obtained based on the speech recognition processing for this speech information:

keyword of location attribute: middle;

keywords of color attributes: red;

Keywords of shape attribute: a circle;

keywords of type attribute: a button.

Based on the analysis result of the structure, it is known that the attributes that need attention are a position attribute, a color attribute, a shape attribute, and a type attribute.

Thus, for example, the buttons included in the interface may be first filtered out by the type attribute, and the result of the filtering is that each view component illustrated in fig. 4 is a button. Then, for example, the button is filtered again by the shape attribute to filter out the circular buttons, and as can be seen from fig. 4, the filtered circular buttons are the button corresponding to the minimize and close function at the upper left and the button corresponding to the start/pause play function at the lower side, respectively. Then, the color attribute is further screened, and it can be known that only the button corresponding to the start/pause playing function satisfies the red color condition, so that the button corresponding to the start/pause playing function is finally determined to be the view component to be operated by the user, the event that the button is clicked is sent to the application program, the response result of the application program is as shown in fig. 4, the playing of the video is paused, and the graphics carried on the button corresponding to the start/pause playing function are updated to the graphics corresponding to the pause playing state.

Fig. 5 is a flowchart of another interaction method provided in an embodiment of the present invention, and as shown in fig. 5, the method may include the following steps:

501. and displaying an interface, wherein at least one view component is included in the interface.

502. And receiving voice information.

503. At least one keyword used for describing the view component in the voice information is obtained, wherein the at least one keyword comprises a third keyword and a fourth keyword which are used for describing the containing relation attribute of the view component, and the containing relation attribute is used for describing the containing relation between the third attribute and the fourth attribute.

Since different view components of the interface may have an attribute of containing relationship or parent-child relationship, the user may also perform an operation on a certain view component by voice based on the attribute.

The operation of a certain view component can be realized based on the inclusion relationship among the view components, and the theoretical basis is that in the structural body of the view component, besides the attributes such as color, shape, type, position, graph, text, number and the like of the view component itself can be included as described above, the structural body of the view component can also include the description information of the parent component thereof, wherein the description information of the parent component can include information such as the identification (i.e. I D) of the parent component and the relative position relationship of the view component itself in the parent component.

For the containment relationship between view components, the custom expression of the user is B in A. In the above step, it is assumed that the inclusion relationship attribute is used to describe the inclusion relationship between the third attribute and the fourth attribute, the keyword used to describe the third attribute is the third keyword, and the keyword used to describe the fourth attribute is the fourth keyword.

For example, the user speaks the voice message "the first in the list of points", which is an expression containing a relationship, wherein the third attribute is a type attribute, and the corresponding third keyword is: a list; the fourth attribute is a number attribute, and the corresponding fourth keyword is: the first one.

Actually, the third keyword and the fourth keyword included in the speech information for describing the inclusion relationship attribute may be recognized based on the expression habit of B in a.

After identifying the third keyword and the fourth keyword for describing the relationship-containing attribute, determining a view component matching the third keyword and the fourth keyword from at least one view component contained in the interface, which may specifically be implemented by the following steps:

504. And selecting a first candidate view component with the third attribute value matched with the third key word from at least one view component contained in the interface.

Still taking the voice message "the first in the list of the point" as an example, if the third attribute is a type attribute, the type attribute of each view component included in the interface is obtained, so as to obtain the type of each view component, and the view component corresponding to the third keyword, i.e., the type of the list, is screened out from the list as the first candidate view component.

505. And selecting a second candidate view component taking the first candidate view component as a parent component from the remaining view components, and if the fourth attribute value of the second candidate view component is matched with the fourth keyword, determining that the second candidate view component is the view component matched with the third keyword and the fourth keyword.

After the first candidate view component is obtained, which view components in the remaining view components have parent components can be determined according to the description information of the parent components in the structural body of the remaining view components, the parent components are in a list type, and the determined view components are called as second candidate view components.

And then, acquiring a fourth attribute value corresponding to the second candidate view component under the fourth attribute, comparing the fourth attribute value with the fourth keyword, and if the fourth attribute value is matched with the fourth keyword, determining that the second candidate view component is the view component which the user wants to operate.

For example, if a list includes three text boxes, the three text boxes are all the second candidate view components, and since the fourth keyword is the first one, the corresponding fourth attribute is a number attribute, and the first one of the three text boxes is selected according to the numbers of the three text boxes.

It is mentioned above that after obtaining at least one keyword in the voice information for describing at least one attribute of the view component and then determining a view component matching at least a part of the at least one keyword from the at least one view component included in the interface, there may be a plurality of view components matching at least a part of the at least one keyword, that is, the voice information of the user may hit the plurality of view components, and at this time, a target view component needs to be selected from the plurality of view components.

In an alternative embodiment, the selecting of the target view component from the plurality of view components may be implemented as: determining a keyword corresponding to the high-priority attribute from at least one keyword according to the priority of the at least one attribute; determining a plurality of attribute values corresponding to the plurality of view components under the high-priority attribute; determining the matching degree between the keywords corresponding to the high-priority attributes and the attribute values respectively; therefore, the view component corresponding to the attribute value with the highest matching degree is selected as the target view component.

For example, it is assumed that the voice information sent by the user includes three attributes of color, type, and location, the priority is sequentially decreased, and the keyword included in the voice information and corresponding to the color attribute is red. It is assumed that the view components matching the red color are respectively a view component a, a view component B, and a view component C, and it is assumed that the attribute value, i.e., the color value, corresponding to the color attribute of the view component a is RGB _ a, the attribute value, i.e., the color value, corresponding to the color attribute of the view component B is RGB _ B, and the attribute value, i.e., the color value, corresponding to the color attribute of the view component C is RGB _ C. Assuming that the feature information corresponding to the classification label of red is color value intervals s 1-s 2, the matching degree between the color values corresponding to the three view components under the color attribute and the keyword of red can be obtained according to the coincidence degree between the color values corresponding to the three view components and the color value intervals, and the higher the coincidence degree, the higher the matching degree. And assuming that the matching degree of the view component B is the highest, the view component B is the finally determined target view component to be operated by the user.

In another alternative embodiment, the selecting the target view component from the plurality of view components may further be implemented as: and selecting the view component with the highest display level from the plurality of view components as the target view component according to the display level relation of the plurality of view components. Each view component's structure may also include its corresponding display hierarchy.

In another alternative embodiment, the selecting the target view component from the plurality of view components may further be implemented as: and selecting the view component with the largest display area size from the plurality of view components as the target view component according to the display area sizes corresponding to the plurality of view components. Wherein, the display area size is the area occupied by the view component in the interface.

In another alternative embodiment, the selecting the target view component from the plurality of view components may further be implemented as: outputting prompt information in a voice mode and/or a non-voice mode, wherein the prompt information is used for prompting that a target view component is selected from a plurality of view components; responding to the selection operation aiming at the prompt information.

For example, assuming that the plurality of view components are view component a, view component B and view component C, respectively, optionally, the three view components may be numbered, the numbers are correspondingly displayed at the corresponding view components, and the consultation voice "ask for a few numbered components" is output. Suppose the corresponding numbers of view component a, view component B and view component C are 1,2 and 3 in sequence. If a voice spoken by the user, such as "first" is received, the target view component is determined to be view component A.

Optionally, one of the three view components may be highlighted one by one, and the user is asked whether the current view component is the currently selected one by voice or text, and if the user answers yes, the currently highlighted view component is the target view component; if the user answers no, then a jump is made to highlight the next one, asking for the next one.

Optionally, the user may also be queried based on the difference of the view component a, the view component B, and the view component C in a certain attribute dimension, for example, if the location attributes of the three view components are different, the user may be queried whether to select a view component at a certain location, and a target view component is determined based on the answer of the user.

The interaction method provided by the foregoing embodiments may be executed by an operating system in an electronic device, such as an onboard device, a smart television, and the like, and the operating principle of the interaction method executed by the operating system is briefly described below with reference to fig. 6. As shown in fig. 6, the interaction engine, the input control unit, and the display control unit may be logically included in the operating system.

The display control unit is used for controlling a display screen to display an interface, and the interface comprises at least one view component.

The input control unit is used for controlling the voice input equipment to receive voice information. For example, when the operating system is started, the input control unit controls the voice input device to be started so as to collect voice information of the user. The voice input device may be a common microphone, sensor, etc.

The interactive engine is used for acquiring keywords used for describing the view components in the voice information, and the keywords are used for describing the attributes of the view components; a view component matching the keyword is determined from the at least one view component. In practice, the interaction engine may be further configured to send a notification that the determined view component is triggered.

In order to implement the above functions, the interaction engine is subdivided, and optionally, the interaction engine may include A Speech Recognition (ASR) module, a Natural Language Understanding (NLU) module, and a Speech/view matching processing module.

Among other things, ASR enables the conversion of speech information into text information. The NLU enables semantic understanding of the text information to identify at least one keyword in the speech information that describes the at least one attribute. And the voice/view matching processing module is used for realizing the mapping of at least one keyword contained in the voice information to the view component, namely finding the view component matched with at least part of the at least one keyword, and sending a notice that the determined view component is triggered to the application program.

Fig. 7 is a flowchart of another interaction method provided in an embodiment of the present invention, and as shown in fig. 7, the method may include the following steps:

701. and displaying an interface, wherein at least one view component is included in the interface.

702. And receiving voice information.

703. And determining a view component described by the voice information from at least one view component according to the recognition result of the interface.

704. The determined view component is responded to.

In this embodiment, the recognition result of the interface may be understood as a recognition result of which view components are included in the interface.

Keywords describing the view components may be included in the voice information, and thus, the view component matching the keywords may be determined as the view component described by the voice information.

The view component may have at least one attribute associated therewith, and thus, the above-mentioned keyword may be a keyword for describing the at least one attribute of the view component.

Based on the above, determining the view component described by the voice information from the at least one view component according to the recognition result of the interface may be implemented as follows: acquiring a keyword used for describing at least one attribute of the view component in the voice information; and determining the view components matched with the keywords according to the identification result of the at least one attribute respectively associated with the at least one view component.

Specifically, when keywords describing the attribute a and the attribute B are included in the voice information, the attribute a and the attribute B of at least one view component included in the interface may be identified, from which a view component matching the keywords describing the attribute a and the attribute B is determined.

The process of determining, in detail, a view component matching the keyword included in the voice message from at least one view component included in the interface may be implemented with reference to the foregoing embodiment, and is not described herein again.

When determining the view component described by the voice information, responding to the determined view component, the following steps can be implemented: executing the response event corresponding to the view component; alternatively, a notification is sent that the view component is triggered.

When the interaction method provided by this embodiment is executed by an operating system, the operating system may send a notification that the view component is triggered to a service object (such as an application program, a process, and a web page) corresponding to the view component, so that the operating system executes a response event corresponding to the view component. When the interaction method provided by the embodiment is executed by the service object corresponding to the view component, the response event corresponding to the view component can be directly executed.

Fig. 8 is a flowchart of an interface control method according to an embodiment of the present invention, and as shown in fig. 8, the method may include the following steps:

801. and displaying an interface, wherein at least one view component is included in the interface.

802. And receiving voice information.

803. And outputting prompt information aiming at the view component described by the voice information.

The process of determining the view component described in the voice information in this embodiment may refer to the description in the foregoing other embodiments, which is not described herein again.

Optionally, the output mode of the prompt message may be implemented as: and changing the corresponding display effect of the view component. For example, the view component is highlighted, or a highlighted border is added to the view component. Thus, the user is visually prompted as to which view component the voice information he or she outputs may hit.

Optionally, the output mode of the prompt message may also be implemented as: and outputting the voice prompting the view component to be selected. That is, the user may be prompted by voice which view component was selected.

Of course, the above-mentioned visual prompting mode and voice prompting mode can be used in combination, that is, the view component in the highlighted state is output while being highlighted, and the voice of the view component selected in the highlighted state is output.

Regardless of the manner in which the prompt is output, the prompt serves to let the user know which view component his voice message hits. Thus, the user can output a voice indicating whether the determination result is correct based on the prompt information. For example, if the determined view component is the view component that the user wants to operate, the user may speak a positive voice such as "no error"; if the determined view component is not the view component that the user wants to operate, the user may say "not this" or the like negative speech. Based thereon, responding to the determined view component if a positive voice spoken by the user is received; otherwise, if negative voice is received, the view component described by the voice information can be searched again, or the view component to be operated by the user is finally determined through multiple rounds of man-machine conversation.

In practical application, a situation that a user outputs a negative voice according to prompt information is often in a case that a view component described by determined voice information is not unique, and at this time, processing may be performed by referring to the scheme in the foregoing embodiment.

The interaction means of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these interactive devices may each be constructed using commercially available hardware components configured through the steps taught by the present solution.

Fig. 9 is a schematic structural diagram of an interaction apparatus according to an embodiment of the present invention, and as shown in fig. 9, the apparatus includes: the device comprises a display module 11, a receiving module 12, an obtaining module 13 and a determining module 14.

And the display module 11 is used for displaying an interface, and the interface comprises at least one view component.

And a receiving module 12, configured to receive the voice information.

An obtaining module 13, configured to obtain a keyword used to describe a view component in the voice information, where the view component is associated with at least one attribute.

A determining module 14, configured to determine, from the at least one view component, a view component matching the keyword.

In an optional embodiment, the obtaining module 13 may be configured to: and acquiring at least one keyword which is used for describing the view component in the voice information, wherein the at least one keyword is used for describing at least one attribute of the view component. Accordingly, the determination module 14 may be configured to: and determining the view component which is matched with at least part of the at least one keyword from the at least one view component according to the at least one attribute which is respectively associated with the at least one view component.

Wherein, optionally, the at least one attribute includes any one of the following attributes:

Shape attribute, color attribute, graphic attribute, type attribute, position attribute, character attribute, number attribute and inclusion relation attribute.

In an alternative embodiment, the determining module 14 may be configured to: and determining the view component matched with the keyword according to the at least one reference keyword marked on each view component.

In an optional embodiment, the obtaining module 13 may be configured to: and performing text conversion on the voice information, wherein the converted text comprises the keywords. At this point, the determination module 14 may be configured to:

determining a target text template matched with the converted text from the provided text templates, wherein each text template corresponds to an expression mode of the selected corresponding view component; and determining that the view component corresponding to the target text template is the view component matched with the keyword.

In an optional embodiment, the at least one keyword includes a first keyword for describing a first attribute, where the first attribute is any one of the shape attribute, the color attribute, and the graphic attribute, and at this time, the determining module 14 may be configured to: determining a view component that matches at least a portion of the at least one keyword by identifying an image of the at least one view component.

Wherein, the determining module 14 may specifically be configured to: determining a first attribute value corresponding to each of the at least one view component under the first attribute by identifying an image of the at least one view component; and determining the view component matched with the first keyword according to the provided classification labels corresponding to the first attribute, the characteristic information corresponding to the classification labels respectively, and the first attribute value corresponding to the at least one view component under the first attribute respectively.

Optionally, in the process of determining the view component matching the first keyword, the determining module 14 may be specifically configured to: for any view component in the at least one view component, determining target feature information corresponding to a first attribute value of the any view component according to the feature information corresponding to each classification label; and if the classification label corresponding to the target characteristic information is matched with the first keyword, determining that any view component is the view component matched with the first keyword.

In the process of determining the target feature information corresponding to the attribute value of any view component, the determining module 14 may be specifically configured to: determining the matching degree between the first attribute value of any view component and the characteristic information corresponding to each classification label; and determining that the feature information with the highest matching degree is the target feature information corresponding to the first attribute value of any view component.

Optionally, in the process of determining the view component matching the first keyword, the determining module 14 may be specifically configured to: determining a target classification label matched with the first keyword in all the classification labels; for any view component in the at least one view component, if the first attribute value of the any view component matches the feature information corresponding to the target classification tag, determining that the any view component is a view component matching the first keyword.

In another optional embodiment, the at least one keyword includes a second keyword for describing the second attribute, where the second attribute is any one of the type attribute, the location attribute, the text attribute, and the number attribute, and at this time, the determining module 14 may be configured to: and determining the view component matched with the second keyword according to the second attribute value of the at least one view component.

In another optional embodiment, the at least one keyword includes a third keyword and a fourth keyword for describing the inclusion relation attribute, and the inclusion relation attribute is used for describing an inclusion relation between the third attribute and the fourth attribute. At this time, the determining module 14 may be configured to: selecting a first candidate view component with a third attribute value matched with the third keyword from the at least one view component; selecting a second candidate view component taking the first candidate view component as a parent component from the remaining view components; and if the fourth attribute value of the second candidate view component is matched with the fourth keyword, determining that the second candidate view component is a view component matched with the third keyword and the fourth keyword.

Optionally, the apparatus further comprises: and the conflict resolution module is used for selecting a target view component from the plurality of view components if the view components matched with at least part of the at least one keyword are a plurality of view components.

Optionally, the conflict resolution module may be specifically configured to: determining a keyword corresponding to the high-priority attribute from the at least one keyword according to the priority of the at least one attribute; determining a plurality of attribute values corresponding to the plurality of view components under the high priority attribute; determining the matching degree between the keywords corresponding to the high-priority attributes and the attribute values respectively; and selecting the view component corresponding to the attribute value with the highest matching degree as the target view component.

Optionally, the conflict resolution module may be specifically configured to: and selecting the view component with the highest display level from the plurality of view components as a target view component according to the display level relation of the plurality of view components.

Optionally, the conflict resolution module may be specifically configured to: and selecting the view component with the largest display area size from the plurality of view components as the target view component according to the display area sizes corresponding to the plurality of view components.

Optionally, the conflict resolution module may be specifically configured to: outputting prompt information in a voice mode and/or a non-voice mode, wherein the prompt information is used for prompting that a target view component is selected from the plurality of view components; responding to the selection operation aiming at the prompt information.

The apparatus shown in fig. 9 can perform the methods provided in the foregoing embodiments, and details of the portions of this embodiment that are not described in detail can refer to the related descriptions of the foregoing embodiments, which are not described herein again.

In one possible design, the structure of the interaction apparatus shown in fig. 9 may be implemented as an electronic device, as shown in fig. 10, which may include: a processor 21 and a memory 22. Wherein the memory 22 has stored thereon executable code, which when executed by the processor 21, causes the processor 21 to at least perform the interaction method as provided in the embodiments of fig. 1 to 7 described above.

In practice, the electronic device may also include a communication interface 23 and a display screen 24 for communicating with other devices.

In addition, the embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of an electronic device, causes the processor to perform at least the interaction method in the embodiments illustrated in fig. 1 to 7.

The above-described apparatus embodiments are merely illustrative, wherein the units described as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An interaction method, comprising:

receiving voice information;

2. The method according to claim 1, wherein the obtaining keywords in the voice message for describing the view component comprises:

acquiring at least one keyword used for describing a view component in the voice information, wherein the at least one keyword is used for describing at least one attribute of the view component;

the determining, from the at least one view component, a view component matching the keyword includes:

and determining the view component which is matched with at least part of the at least one keyword from the at least one view component according to the at least one attribute which is respectively associated with the at least one view component.

3. The method of claim 2, wherein the at least one attribute comprises any one of the following attributes:

4. The method according to claim 3, wherein the at least one keyword includes a first keyword for describing a first attribute, the first attribute being any one of the shape attribute, the color attribute and the graphic attribute;

the determining, from the at least one view component, a view component that matches at least a portion of the at least one keyword comprises:

determining a view component that matches at least a portion of the at least one keyword by identifying an image of the at least one view component.

5. The method of claim 4, wherein determining a view component that matches at least a portion of the at least one keyword by identifying an image of the at least one view component comprises:

determining a first attribute value corresponding to each of the at least one view component under the first attribute by identifying an image of the at least one view component;

and determining the view component matched with the first keyword according to the provided classification labels corresponding to the first attribute, the characteristic information corresponding to the classification labels respectively, and the first attribute value corresponding to the at least one view component under the first attribute respectively.

6. The method of claim 5, wherein the step of determining the view component matching the first keyword comprises:

for any view component in the at least one view component, determining target feature information corresponding to a first attribute value of the any view component according to the feature information corresponding to each classification label;

7. The method of claim 6, wherein the determining target feature information corresponding to the attribute value of any view component comprises:

determining the matching degree between the first attribute value of any view component and the characteristic information corresponding to each classification label;

and determining that the feature information with the highest matching degree is the target feature information corresponding to the first attribute value of any view component.

8. The method of claim 5, wherein the step of determining the view component matching the first keyword comprises:

determining a target classification label matched with the first keyword in all the classification labels;

For any view component in the at least one view component, if the first attribute value of the any view component matches the feature information corresponding to the target classification tag, determining that the any view component is a view component matching the first keyword.

9. The method according to claim 3, wherein the at least one keyword includes a second keyword for describing the second attribute, and the second attribute is any one of the type attribute, the location attribute, the text attribute, and the number attribute;

determining a view component matching the second keyword, comprising:

and determining the view component matched with the second keyword according to the second attribute value of the at least one view component.

10. The method according to claim 3, wherein the at least one keyword includes a third keyword and a fourth keyword for describing the inclusion relation attribute, and the inclusion relation attribute is used for describing an inclusion relation between the third attribute and the fourth attribute;

determining a view component that matches the third keyword and the fourth keyword, comprising:

Selecting a first candidate view component with a third attribute value matched with the third keyword from the at least one view component;

selecting a second candidate view component taking the first candidate view component as a parent component from the remaining view components;

and if the fourth attribute value of the second candidate view component is matched with the fourth keyword, determining that the second candidate view component is a view component matched with the third keyword and the fourth keyword.

11. The method according to any one of claims 1 to 3, wherein the determining, from the at least one view component, a view component that matches the keyword comprises:

and determining the view component matched with the keyword according to the at least one reference keyword marked on each view component.

12. The method according to any one of claims 1 to 3, wherein the obtaining of the keywords in the voice message for describing the view component comprises:

performing text conversion on the voice information, wherein the converted text comprises the keywords;

the determining, from at least one view component, a view component matching the keyword includes:

Determining a target text template matched with the converted text from the provided text templates, wherein each text template corresponds to an expression mode of the selected corresponding view component;

and determining that the view component corresponding to the target text template is the view component matched with the keyword.

13. A method according to claim 2 or 3, characterized in that the method further comprises:

and if the view components matched with at least part of the at least one keyword are multiple, selecting a target view component from the multiple view components.

14. The method of claim 13, wherein selecting the target view component from the plurality of view components comprises:

determining a keyword corresponding to the high-priority attribute from the at least one keyword according to the priority of the at least one attribute;

determining a plurality of attribute values corresponding to the plurality of view components under the high priority attribute;

determining the matching degree between the keywords corresponding to the high-priority attributes and the attribute values respectively;

and selecting the view component corresponding to the attribute value with the highest matching degree as the target view component.

15. The method of claim 13, wherein selecting the target view component from the plurality of view components comprises:

and selecting the view component with the highest display level from the plurality of view components as a target view component according to the display level relation of the plurality of view components.

16. The method of claim 13, wherein selecting the target view component from the plurality of view components comprises:

and selecting the view component with the largest display area size from the plurality of view components as the target view component according to the display area sizes corresponding to the plurality of view components.

17. The method of claim 13, wherein selecting the target view component from the plurality of view components comprises:

outputting prompt information in a voice mode and/or a non-voice mode, wherein the prompt information is used for prompting that a target view component is selected from the plurality of view components;

responding to the selection operation aiming at the prompt information.

18. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the interaction method of any one of claims 1 to 17.

19. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the interaction method of any one of claims 1 to 17.

20. An operating system, comprising:

21. An interaction method, comprising:

receiving voice information;

the determined view component is responded to.

22. The method according to claim 21, wherein the determining a view component of the voice information description from the at least one view component according to the recognition result of the interface comprises:

acquiring a keyword used for describing at least one attribute of the view component in the voice information;

and determining the view component matched with the keyword from the at least one view component according to the identification result of the at least one attribute respectively associated with the at least one view component.

23. The method of claim 21, wherein said responding to the determined view component comprises:

executing a response event corresponding to the view component; alternatively, the first and second electrodes may be,

sending a notification that the view component is triggered.

24. An interface control method, comprising:

receiving voice information;

25. The method of claim 24, wherein outputting the prompt message comprises:

and changing the corresponding display effect of the view component.

26. The method of claim 25, wherein changing the corresponding display effect of the view component comprises:

and highlighting the view component, or adding a highlighted border to the view component.

27. The method of claim 24, wherein outputting the prompt message comprises:

and outputting voice prompting that the view component is selected.