CN117746859A

CN117746859A - Man-machine interaction method, system, electronic equipment and storage medium

Info

Publication number: CN117746859A
Application number: CN202311769187.7A
Authority: CN
Inventors: 刘杰钦; 胡海; 张然
Original assignee: Weilai Automobile Technology Anhui Co Ltd
Current assignee: Weilai Automobile Technology Anhui Co Ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-03-22

Abstract

The invention discloses a man-machine interaction method, a man-machine interaction system, electronic equipment and a storage medium, wherein the man-machine interaction method comprises the following steps: in the process of running a man-machine interaction application in electronic equipment, at least one first control on a current interface is obtained, identification information corresponding to the at least one first control and generalization information corresponding to the identification information are obtained, the identification information is generated based on a first identification or a second identification, the first identification is determined according to attribute information of the first control, and the second identification is determined according to view layout information of the first control; acquiring a voice instruction of a user; matching the voice command from the generalization information to obtain target generalization information, and determining a target control based on identification information corresponding to the target generalization information; and responding to the voice instruction, and executing control on the target control. The control matching method and the control matching device can improve accuracy and interaction experience of control matching, and do not need to be matched by a third party.

Description

Man-machine interaction method, system, electronic equipment and storage medium

Technical Field

The present invention relates to the field of man-machine interaction technologies, and in particular, to a man-machine interaction method, system, electronic device, and storage medium.

Background

Controlling electronic devices via voice commands has become a popular means for users to communicate intent and control electronic devices. And visual i.e. VUI (VUI) is a user interface technology that interacts with electronic devices by voice, and provides a more convenient, intelligent and natural way of interaction for users by visual i.e. VUI.

Currently, in a visual and i.e. a speaking implementation process, a control to be controlled is generally determined from a user interface through an unobstructed mode provided by an operating system in the electronic device, and the control to be controlled is controlled according to a voice input of a user. Because the barrier-free mode determines the control to be controlled according to the attribute information of the control, under the condition that the attribute information of some controls is missing, the unique control cannot be locked, namely the voice input of the user cannot be responded correctly, and the effective control of the electronic equipment cannot be realized.

In addition, although the attribute information of the control can be supplemented by adding the related description to the application end corresponding to the control under the condition that the attribute information of the control is missing, the adaptation is required by a third party of the application end, and a great deal of manpower and energy are required.

Disclosure of Invention

The invention provides a man-machine interaction method, a man-machine interaction system, electronic equipment and a storage medium, and aims to effectively solve the technical problems.

According to a first aspect of the present invention, there is provided a human-computer interaction method, the method being applied to an electronic device, the method comprising:

in the process of running a man-machine interaction application in the electronic equipment, at least one first control on a current interface is obtained, identification information corresponding to the at least one first control and generalization information corresponding to the identification information are obtained, the identification information is generated based on a first identification or a second identification, the first identification is determined according to attribute information of the first control, and the second identification is determined according to view layout information of the first control;

acquiring a voice instruction of a user;

matching the voice command from the generalization information to obtain target generalization information, and determining a target control based on identification information corresponding to the target generalization information;

and responding to the voice instruction, and executing control on the target control.

Further, the second identifier is determined according to the resource ID of the first control and root element information, child element information and position index information of the first control in the view layout information.

Further, the first identifier includes one or more attribute information of title information, description information or resource ID of the control.

Further, the identification information is generated by:

acquiring operation interfaces of the application program in different states and a plurality of first controls on the operation interfaces;

when the first identifications of the plurality of first controls are duplicated or missing, the second identifications are used as identification information;

taking the first identifications as identification information under the condition that the first identifications of the plurality of first controls are not duplicated and missing;

and storing the identification information and the first control correspondingly.

Further, the first identifier having repetition means that all the attribute information is repeated;

the step of using the first identifications as identification information in the case that the first identifications of the plurality of first controls are not duplicated and missing includes:

and sequentially judging whether the attribute information is repeated according to the priority of the attribute information, and taking the current attribute information as identification information under the condition that the current attribute information is not repeated.

Further, after acquiring the running interface of the application program in different states and the plurality of first controls on the running interface, the method further includes:

And configuring the generalization information for the first control, and correspondingly storing the configured generalization information and the identification information of the first control.

Further, the step of configuring generalization information for the first control includes:

and configuring generalization information for the first control according to the type of the first control, wherein the type of the first control comprises a button type, a text type, a switch type, a tab type, a progress bar type and a scrolling type.

Further, after the step of configuring generalization information for the first control according to the type of the first control, the method further includes:

configuring feedback information and corresponding interaction information for the first control according to the type of the first control;

after the step of performing control of the target control in response to the voice instruction, the method further comprises:

and acquiring feedback information and corresponding interaction information of the target control, and responding to the voice instruction by the corresponding interaction information.

Further, when the type of the target control is a scroll type, the step of executing control of the target control includes:

and determining a first sliding distance of the target control according to the product of the display height of the target control and a preset sliding percentage, and controlling the target control to slide according to the first sliding distance.

Further, when the type of the target control is a progress bar type, the step of executing the control of the target control includes:

and calculating to obtain a second sliding distance according to the maximum progress and the minimum progress of the target control and a preset sliding percentage, and controlling the target control to slide according to the second sliding distance.

Further, the step of obtaining at least one first control on the current interface includes:

scanning a current interface through a focus window, and acquiring full control information of the current interface;

and after the target control executes the operation, scanning the current interface through the focus window to obtain the increment control information corresponding to the target control.

Further, the step of executing control of the target control in response to the voice instruction includes:

when the target control monitors a click event, responding to the voice instruction, and clicking the target control in a click event mode;

and touching the target control in a touch event mode when the target control does not monitor the click event.

Further, the method further comprises:

in the process of running a man-machine interaction application in first electronic equipment or second electronic equipment, the man-machine interaction application acquires at least one first control, corresponding identification information and generalization information on a current interface in the first electronic equipment, and acquires at least one first control, corresponding identification information and generalization information on the current interface in the second electronic equipment, wherein the first electronic equipment is in communication connection with the second electronic equipment;

The step of matching the generalization information according to the voice command to obtain target generalization information and determining a target control based on the identification information corresponding to the target generalization information comprises the following steps:

the man-machine interaction application determines target generalization information from the generalization information of the first electronic device and the generalization information of the second electronic device according to the voice command, and determines target electronic devices and target controls based on identification information corresponding to the target generalization information;

the step of responding to the voice instruction and executing the control of the target control comprises the following steps:

and responding to the voice instruction, and executing control on the target electronic device.

Further, the method further comprises:

acquiring at least one second control on a current interface, and identity information and generalization information corresponding to the second control through a preset software development kit;

and matching the generalization information corresponding to the first control with the generalization information corresponding to the second control according to the voice command to obtain target generalization information, and determining a target control based on the target generalization information, the identity information and the identification information.

Further, the step of scanning the current interface through the focus window includes:

and determining a focus window from a plurality of application programs of the electronic equipment, and acquiring control information of a first control on the focus window.

In a second aspect, the present invention further provides a human-computer interaction system, where the electronic device includes the human-computer interaction system, and the system includes:

the control information acquisition module is used for acquiring at least one first control on a current interface and acquiring identification information corresponding to the at least one first control and generalization information corresponding to the identification information in the process of running a man-machine interaction application in the electronic equipment, wherein the identification information is generated based on a first identification or a second identification, the first identification is determined according to attribute information of the first control, and the second identification is determined according to view layout information of the first control;

the voice instruction acquisition module is used for acquiring a voice instruction of a user;

the control matching module is used for matching the generalization information from the generalization information according to the voice command to obtain target generalization information, and determining a target control based on identification information corresponding to the target generalization information;

and the control execution module is used for responding to the voice instruction and executing the control on the target control.

In a third aspect, the present invention also provides an electronic device, including:

one or more processors;

one or more memories;

a module in which a plurality of application programs are installed;

the memory stores one or more application programs that, when executed by the processor, cause the electronic device to perform the steps of the human-machine interaction method as described above.

In a fourth aspect, the present invention also provides a storage medium having stored therein a plurality of instructions adapted to be loaded by a processor for performing the steps of the human interaction method as described above.

Through one or more of the above embodiments of the present invention, at least the following technical effects can be achieved: and matching the voice command from the generalization information to obtain target generalization information, and determining a target control based on the identification information corresponding to the target generalization information, wherein each first control has unique identification information, even if the attribute information of the first control is missing or repeated, the target control can be obtained by utilizing the identification information matching, and the accuracy and interaction experience of the control matching are improved. In addition, because the identification information is generated based on the original attribute information or view layout information of the first control, no third party is required to perform adaptation, and therefore the adaptation process of the third party can be saved while all the original controls can be correctly matched.

Drawings

The technical solution and other advantageous effects of the present invention will be made apparent by the following detailed description of the specific embodiments of the present invention with reference to the accompanying drawings.

FIG. 1 is a flowchart of a human-computer interaction method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of generating identification information according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an interface of a generalized information configuration according to an embodiment of the present invention;

fig. 4 is a schematic diagram of generalization information provided by an embodiment of the present invention;

FIG. 5 is a second flowchart of a man-machine interaction method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a current interface provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of cross-device interaction provided by an embodiment of the present invention;

fig. 8 is a schematic diagram of control information acquisition according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a visual and convincing service implementation provided by an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

In the description of the present invention, it should be noted that, unless explicitly specified and defined otherwise, the term "and/or" herein is merely an association relationship describing associated objects, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" herein generally indicates that the associated object is an "or" relationship unless otherwise specified.

The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature.

In the prior art, the matching of the barrier-free mode provided by the operating system to the native control is specifically to match the generic information of the control according to the voice instruction, and then determine the target control (the generic information-attribute information-the native control is stored correspondingly) according to the attribute information (such as Title, description, etc. used for identification) corresponding to the generic information obtained by the matching. However, in many cases, when the attribute information of the native control is in a missing state (such as an icon type native control), the native control corresponding to the generalization information cannot be determined, and a response cannot be achieved.

Although the description can be added to the application end to supplement the attribute information of the native control, the application needs to be adapted by a third party, the number of the controls involved in one application program is often large, and much time and effort are required for the third party to adapt the native control of each missing or repeated attribute information.

In the man-machine interaction method provided by the invention, each original control (namely the first control) is provided with the identification information, the identification information is generated based on the original attribute information or view layout information of the original control, and the adaptation of a third party is not needed, so that the adaptation process of the third party can be saved while all the original controls can be correctly matched.

The invention provides a man-machine interaction method, a man-machine interaction system, electronic equipment and a storage medium.

Fig. 1 is one of flowcharts of a man-machine interaction method according to an embodiment of the present invention, as shown in fig. 1, and the method includes the following steps:

s101, acquiring at least one first control on a current interface and acquiring identification information corresponding to the at least one first control and generalization information corresponding to the identification information in the process of running a man-machine interaction application in the electronic equipment.

The first control is a native control (such as Button, text box edit text, text view, image view, etc.) under an operating system of the electronic device, where the operating system in this embodiment is an Android operating system (hereinafter, an Android operating system is also taken as an example), and in other embodiments, the first control may also be a hong-and-Monte system or an IOS system, which is not limited in this invention.

The identification information is generated based on a first identification or a second identification, the first identification is determined according to attribute information of the first control, and the second identification is determined according to view layout information of the first control. Each first control has unique identification information.

The generalization information refers to an operation text related to a first control corresponding to a voice instruction input by a user, and illustratively, for the first control such as a Button, the generalization information can be a point, a click, a selection and the like.

More specifically, the first identifier may be any one of Title (Title), description (Description), resource ID (ResourceID), text (Text), visibility (Visibility), enabled state (Enabled), background (Background), etc. for describing attribute information identifying the first control, and one first control may correspond to a plurality of first identifiers.

The second identifier is determined according to the resource ID of the first control, and root element information, sub-element information and position index information of the first control in the view layout information.

The View layout information is used to describe the hierarchical relationship and location of the various views (views), and generally includes root elements, sub-elements, attributes, layout manager, resource references (applying the values of the resources to certain attributes in the layout by resource ID), location index (index), etc.

Among them, common root elements include LinearLayout, relativeLayout, constraintLayou, frameLayout and the like.

A root element may contain multiple sub-elements, common sub-elements including TextView, imageView, button, etc., which represent different control types. These sub-elements may nest within each other, forming a hierarchy of layouts.

The position index is the position index of the control in the layout hierarchy.

As shown in fig. 2, the resource ID of the first control and the root element information, the sub-element information and the position index information thereof in the view layout information are subjected to string splicing, and the spliced string is converted into an MD5 format to obtain a final second identifier. Taking view layout information in fig. 2 as an example, in fig. 2, root element information is FrameLayout, in fig. 2, a root element framelayout@id is an outermost layer view of the FrameLayout, and sub-elements of the root element framelayout@id include framelayout@id: group and imageview@id: image, wherein imageview@id: image is an identifier of a first control of an image view, and the identifier of the first control of the image view is obtained by splicing the root element information framelayout@id: root and the sub-element imageview@id: image (namely, a second identifier staticind).

The group refers to a direct sub-view in the frame layout, two sub-elements are included below the direct sub-view, the two sub-elements are first controls, namely buttons, and when the types of the first controls in the sub-elements of the same level are the same and the number of the first controls is more than 2, in order to distinguish the uniqueness of the first control to realize the first control identification, the position index needs to be introduced on the basis of the root element and the sub-elements. Specifically, the second label of Button is FrameLayout $ group $ Button 1, where the number 1 is the Button's position index.

S102, acquiring a voice instruction of a user.

In this step, the voice input by the user is recorded and received, and the voice is converted into text form to form a voice command. The voice command corresponds to the content on the current interface, and it is visible and can be said.

And S103, matching the generalization information from the generalization information according to the voice command to obtain target generalization information, and determining a target control based on identification information corresponding to the target generalization information.

In this step, the intent classification is performed on the voice instruction in text form, and the intent or purpose of the user is identified, which may be implemented based on a machine learning algorithm. After the intention is acquired, the first control is matched with the generalization information of all the first controls in the current interface, so that target generalization information is determined, and further, according to the first control-identification information-generalization information stored correspondingly before, which control in the current interface needs to be controlled is determined through the identification information. The generalization information comprises operation behaviors related to the control, common action expressions for the control when the control is mentioned in the voice instruction, and the like.

Schematically, if the voice command is to open music, the application program which is intended to click music is correspondingly matched to obtain the generalization information of the click music, and the identification information is determined according to the matched generalization information, so that the control of the music application program is determined.

And S104, responding to the voice instruction, and executing control on the target control.

In this step, a related operation is performed on the target control according to the intention of the voice command, and if the voice command is a play song, a play button is obtained by matching in the current interface, and control is achieved by clicking the play button.

It should be noted that, the man-machine interaction method provided by the invention is applied to a scene including a single electronic device, and also can be applied to a scene including at least two electronic devices, wherein the electronic devices can be electronic devices such as a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality/virtual reality device and the like.

According to the man-machine interaction method provided by the embodiment of the invention, the target generalization information is obtained by matching from the generalization information according to the voice command, and the target control is determined based on the identification information corresponding to the target generalization information, wherein each first control has unique identification information, even if the attribute information of the first control is missing or repeated, the target control can be obtained by utilizing the identification information for matching, and the accuracy and interaction experience of control matching are improved. In addition, because the identification information is generated based on the original attribute information or view layout information of the first control, no third party is required to perform adaptation, and therefore the adaptation process of the third party can be saved while all the original controls can be correctly matched.

Before the man-machine interaction application on the electronic device runs, the first control-identification information-generalization information is configured and stored in advance, and in some embodiments of the present invention, the man-machine interaction method further includes a specific configuration process:

s201, operation interfaces of application programs on the electronic equipment in different states and a plurality of first controls on each operation interface are obtained.

The different application programs are provided with a plurality of running interfaces, each running interface is provided with a plurality of first controls, and the application programs refer to applications which can be controlled by a man-machine interaction application (namely a voice assistant), such as music, navigation, a video player and the like. All the first controls of each application program are acquired, and the first controls are independently stored according to the application program, namely, an information storage file corresponding to one first control of one application program.

S202, generating a second identifier according to the resource ID of the first control, the root element information, the sub-element information and the position index information of the first control in the view layout information, and acquiring all the first identifiers of the first control. The first control has a second mark, and the second mark is not duplicated or missing.

In this embodiment, the first identifier of the first control includes a Title (Title), a Description (Description), and a resource ID (ResourceID).

S203, if the first mark is repeated or missing, the second mark is used as mark information; and if the first mark is not repeated and is not missing, taking the first mark as the mark information, and storing the determined mark information corresponding to the first control.

The first identifier repetition refers to the situation that the title, description and resource ID of a certain first control under the current application program are repeated with the titles, descriptions or resource IDs of all the remaining first controls, and in this case, the first identifier cannot guarantee the uniqueness of the first control, so that the first controls cannot be accurately matched.

The first identity absence means that a certain first control has no title, description and resource ID, in which case the first control is in a state that cannot be matched.

Therefore, in both cases, the uniqueness of the identification information of the first control is ensured by using the second identification as the identification information.

When the first identifier is not repeated and is not missing, whether the attribute information is repeated or not needs to be sequentially judged according to the priority of the attribute information, the current attribute information is used as the identifier information when the current attribute information is not repeated, and the next attribute information is repeatedly judged according to the priority when the current attribute information is repeated.

Schematically, the titles of the first controls are compared first, if the titles do not have repetition, the titles are used as the identification information of the first controls, and if the titles have repetition, whether the resource IDs are repeated is further judged. If the resource ID does not have repetition, the resource ID is used as identification information, and if the resource ID has repetition, whether the description is repeated is further judged. And if the description is not repeated, the description is taken as the identification information, and if the description is repeated, the second identification is taken as the identification information. In the embodiment of the invention, the attribute information can be repeatedly judged according to the priority of other attribute information, so that the identification information of the first control is determined.

It should be noted that, the process of repeatedly determining the first identifier of the first control is to repeatedly determine in all the first controls of the current application program, and not determine that the first controls of all the application programs in the operating system are mixed together. The first mark repeated judgment taking the application program as a unit can improve the efficiency of generating the mark information, and the uniqueness of the mark information can still be ensured.

S204, after the unique identification information of the first control is obtained, the generalization information configuration of the first control is carried out, the generalization information and the identification information are correspondingly stored, so that storage information corresponding to the first control, the identification information and the generalization information is formed, and the storage information is used for completing matching of the target control in the human-computer interaction process.

Specifically, as shown in fig. 3, the generalized information configuration process includes an operation interface area and a configuration information area.

The configuration information area comprises page information, control information and generalization information. The page information includes information such as an application program corresponding to the running interface, a version corresponding to the application program, and a name of the running interface. The control information is information of the control to be configured selected in the operation interface, and comprises identification information, ID (identity) of the control, bounding box information (describing the position and size of the control) of the control and the like. The generalization information is content matched after the control to be configured is determined according to the page information and the control information, and specifically comprises a generalization name, a generalization action, interaction information (i.e. text of a reply user) and the like.

Illustratively, if the determined control to be configured is an account (the identification information of the control is a title), the corresponding generalized name may be "my account", the generalized action may be "switch", "open", "login", etc., and the interaction information may be "good", "logged in", etc.

In addition, in order to facilitate the generalization information configuration personnel to be able to determine which controls need to be configured on the operation interface, the controls needing to be configured may be highlighted on the generalization information configuration interface according to the bounding box information of the controls (for example, the bounding boxes of the controls are displayed in a highlighting manner). Style changes (e.g., changing from an A-color bounding box to a B-color bounding box) may also occur when a control to be configured is selected to determine which control in the running interface is currently configured. By the method, the efficiency of generalization information configuration can be improved.

The storage information corresponding to the first control-identification information-generalization information of each application program is obtained through the steps of S201-S204, in the process of formal human-computer interaction, the determined current first control is scanned in a focus window to screen the corresponding identification information and the generalization information from the storage information, and in an illustrative way, if the application program C has n pieces of storage information corresponding to the first control-identification information-generalization information, the identification information and the generalization information corresponding to m (m < n) first controls on the current interface are screened from n pieces of storage information.

And then, matching the selected identification information and the generalization information according to the voice command to obtain the target control.

In addition, in the above configuration process of the generalization information, the generalization information is configured for the first control according to the type of the first control (the control type information is displayed in the control information), where the type of the first control includes a button type, a text type, a switch type, a tab type, a progress bar type, and a scroll type, that is, includes types of all native controls. Each control type corresponds to a common basic operation behavior, a generalized name and the like, so that the configuration efficiency of generalized information can be improved.

Further, feedback information and corresponding interaction information are configured for the first control according to the type of the first control.

After the target control is controlled in response to the voice command, an operating system acquires feedback information of the target control, determines related interaction information according to the feedback information, and broadcasts the interaction information through a language to complete response. Schematically, as shown in fig. 4, if the target control is on-off, after the target control performs the operation, judging whether the switching is completed or not according to the blocked state of the control, and if the switching is completed, replying to the interaction information such as switching, completing the switching, etc.; if the switching is not completed, the interactive information such as switching failure is replied, so that the user can clearly determine the result after the voice command, and the interactive experience is improved.

If the configuration is not performed according to the control type, unified interaction information such as success, completion and the like is generally replied, and a user cannot determine whether ideal control is completed or not.

In some embodiments of the present invention, when the type of the target control is a scroll type, the step of performing control of the target control includes:

That is, the display height of the control is calculated according to the bounding box information of the control, and the set sliding percentage is multiplied, so that the first sliding distance is obtained, and then sliding is performed according to the first sliding distance through smoothscriolBy. The slip percentage may be 20%, 50%, 80%. 100%, etc., or may be a finer value of 1%, 2%. 100%, etc., the slip percentage related to the user's intention is determined from a plurality of slip percentages according to a voice command, and the slip distance is calculated based on the determined slip percentage.

And when the type of the target control is a progress bar type, the step of executing the control of the target control comprises the following steps:

Schematically, if the progress bar type target control is Seekbar, calculating to obtain the progress (i.e. the second sliding distance) to be adjusted through the maximum progress, the minimum progress and the designated sliding percentage, and then adjusting the progress of Seekbar by using setprogressive international.

Compared with the existing sliding mode that the sliding percentage is only 100%, the sliding mode is more flexible, more interaction choices are provided for users, and interaction experience is improved.

The invention also provides an embodiment, as shown in fig. 5, of a human-computer interaction method, which specifically comprises the following steps:

s301, starting the electronic equipment, namely starting an operating system.

S302, the voice assistant (i.e. the man-machine interaction application) and other applications in the operating system are all started and all bind with the visible and persuable service.

S303, the voice assistant acquires the voice instruction of the user for the first time and opens the visible and speaking service.

S304, notifying each application of visible and can be started through the visible and can be started, and each application carries out window registration callback on the visible and can be started through the application, and meanwhile information about whether the application is a focus window or not is generated to the visible and can be started through the application.

S305, the visual and i.e. the service determines the target application corresponding to the focus window and collects the interface information of the target application, wherein the interface information comprises all control information, namely the full control information, on the current interface of the electronic equipment. At the same time, the visual and just-say service also sends full-volume control information to the voice assistant.

And S306, the voice assistant performs matching of the first control according to the full control information and the voice instruction to obtain a target control, generates an execution instruction for the target control, sends the execution instruction to the visible and ready-to-speak service, and forwards the visible and ready-to-speak service to a corresponding target application.

S307, after the target control on the target application finishes execution according to the execution instruction, the target application scans the view of the current interface through the focus window, and control information after the target control is executed is obtained as incremental control information. The target application sends the incremental control information to the visible and can say service, the visible and can say service sends the incremental control information to the voice assistant, and the voice assistant processes the received incremental control information to obtain voice feedback information and provides the voice feedback information for the user.

S308, after the voice assistant acquires the new voice instruction, S304-S307 are repeated.

S309, after the voice assistant acquires the voice instruction indicating the end of the dialogue, the voice assistant closes the visible and can speak service, the visible and can speak service informs each application that the visible and can speak service is closed, each application calls back to the visible and can speak service logout window, and the state information is initialized.

According to the man-machine interaction method provided by the embodiment, the visible and persuasive service is started when the electronic equipment is started, and the subsequent voice assistant and other applications can be bound with the visible and persuasive service when started. When a user wakes up the voice assistant, a visible and visible function switch is turned on, the visible and visible service actively informs the focus window to scan the view to acquire control information, the control information is transmitted to the voice assistant in a full-scale mode, then when the state of a target control in the focus window changes (switch, page sliding, button clicking and the like), the focus window automatically collects information of the target control, the information is transmitted to the visible and visible service in an incremental mode, the visible and visible service transmits the data to the voice assistant in an incremental mode after the data is processed, and the conversation between the voice assistant and the user is finished until the visible and visible function is closed. The method has the advantages that the response speed is improved and the communication overhead is reduced in an incremental mode, the visible and visible functions are guaranteed to be operated only when users interact, and the visible and visible services are enabled on demand.

In some embodiments of the present invention, when the target control is controlled according to the voice command, the target control is differentiated according to whether the target control monitors the click event, specifically, when the target control monitors the click event, the target control is clicked in a click event mode in response to the voice command; and touching the target control in a touch event mode when the target control does not monitor the click event.

Illustratively, taking the current interface shown in fig. 6 as an example, there are 6 first controls in the current interface, namely, button 1, button 2, picture 1, picture 2, picture 3, and text 1. Click monitoring is set on the picture 1, the picture 2 and the picture 3, and when the picture 1, the picture 2 or the picture 3 is the target control, the simulated click on the target control can be directly realized through performClick () in the Android View class. The button 1, the button 2 and the text 1 are not provided with click monitoring, and then the touch is simulated through the MotionEvent, for example, for the text 1, the view area where the text 1 is located (i.e. the media card in fig. 6) can be simulated through the MotionEvent, so that the simulated click on the view area is completed, the click event monitoring is set for the picture 1 in the view area, and the text 1 is the text description of the picture 1. The view area further comprises first controls of other monitoring click events, and the first controls of other monitoring click events have corresponding relations with the target controls, for example, the target controls are text descriptions, picture indications and the like of the first controls of other monitoring click events.

The hit rate of the target control can be improved through the simulated clicking, and the execution accuracy of the target control is improved.

In other embodiments of the present invention, the human-computer interaction method further supports cross-device, and a schematic human-computer interaction diagram of the cross-device is shown in fig. 7, where the specific cross-device human-computer interaction method includes: the method comprises the steps of firstly establishing direct communication connection between a first electronic device and a second electronic device.

Furthermore, in the process of running a man-machine interaction application (i.e. a voice assistant) in the first electronic device or the second electronic device, the man-machine interaction application obtains at least one first control, corresponding identification information and generalization information on a current interface in the first electronic device, and obtains at least one first control, corresponding identification information and generalization information on the current interface in the second electronic device.

The man-machine interaction application can persuade the service to acquire all the first controls, corresponding identification information and generalization information in the current interface on the first electronic device through the visible in the first electronic device, and can persuade the service to acquire all the first controls, corresponding identification information and generalization information in the current interface on the second electronic device through the visible in the second electronic device.

Correspondingly, the step of matching the target generalization information from the generalization information according to the voice command and determining the target control based on the identification information corresponding to the target generalization information comprises the following steps:

And the man-machine interaction application determines target generalization information from the generalization information of the first electronic device and the generalization information of the second electronic device according to the voice command, and determines target electronic devices and target controls based on identification information corresponding to the target generalization information.

It should be noted that the first electronic device and the second electronic device may be the same type of electronic device, or may be different types of electronic devices, for example, the first electronic device is an electronic device in an autopilot (specifically may be an electronic device that interacts with a vehicle system in the autopilot), and the second electronic device may be a smart phone; alternatively, the first electronic device is a wearable device, the second electronic device is a smart phone, and so on in different combinations.

In other embodiments of the present invention, the human-computer interaction method is implemented based on the structure diagram shown in fig. 9, and by using the structure shown in fig. 9, it is possible to determine a focus window from a plurality of application programs of the electronic device, and acquire control information of a first control on the focus window.

Specifically, the man-machine interaction application (i.e. the voice assistant) acquires a voice instruction input by the user, and notifies the visible service function to be started.

It can be seen that after the service is started, a window or view of the user input event scanned by the focal window of LayerService in the android system is acquired. It can be said that the service determines a focus window from each window in the multiple application programs, and after the focus window is obtained, the corresponding vuiddowcallback (i.e. window callback, one vuiddowcallback for each window of each application program) can be communicated with the visible and persuable service.

Furthermore, control information (including each View element) of each first control on the focal window is obtained, specifically, the View elements in the current window may be collected by using a vuidwidowstrol, where the vuidwidowstrol is a module newly injected in a viewrootpmpl (responsible for managing drawing of a View, processing of an event, and communication with a window system), and each window has a vuidwidowstrol.

After the man-machine interaction application acquires the control information from the visible and i.e. the service, the identification information and the generalization information under the current interface are screened from a storage table of the first control-identification information-generalization information, and then the target control is obtained by matching the identification information and the generalization information according to the voice instruction.

After the target control is determined, the man-machine interaction application sends an operation instruction of the target space to the visible and convincing service, the visible and convincing service executes control operation on the target control through the VuiWindowController, and after the execution is completed, feedback information and text information are sent to the man-machine interaction application, so that the response to the user is completed.

In addition, after each application program is started, communication is established with the visible and can be said service through the VuiConectonMANAGER, meanwhile, the VuiConectonMANAGER is used for monitoring the starting and closing of the visible and can be said service, each VuiWindowCallback is registered in the visible and can be said service when the application program is started, and each VuiWindowCallback is logged off when the application program is closed.

It should be noted that, fig. 9 is a man-machine interaction implemented based on each class in the Android system, and in other embodiments, man-machine interaction may be implemented based on other operating systems, which is not limited in this invention.

In addition, the visible and persuasive service can be in communication connection with other electronic equipment, so that cross-equipment man-machine interaction is realized.

According to the man-machine interaction method provided by the embodiment, under an Android system, the VuiWindowController module is injected into the ViewRootImpl to ensure that the visual and persuasive service can dynamically acquire the view control information during operation, meanwhile, the field information of the auxiliary service is analyzed, so that the Android-compatible barrier-free mode (auxiliary service) is compatible, the information of each control is packaged into the ViewProperty (class used for representing view attributes), some inoperable ViewProperty is filtered, and then the ViewProperty is uniformly transmitted to the visual and persuasive service.

In other embodiments of the present invention, the current interface further includes a second control (i.e., a cross-platform or non-native control in the current operating system), and as shown in fig. 8, the matching of the target control is implemented for the second control by:

and acquiring at least one second control on the current interface, and identity information and generalization information corresponding to the second control through a preset software development tool (namely SDK) package.

The method comprises the steps of obtaining current interface information by utilizing a visible and namely provided SDK, and obtaining a second control on the current interface, identity information corresponding to the second control and generalization information. And the SDK sends the second control, the identity information corresponding to the second control and the generalization information to the visible and i.e. the service and the man-machine interaction application for processing so as to obtain the target control by matching. The second control-identity information-generalization information is stored correspondingly here.

It should be noted that, the identity information of the second control is different from the identity information of the first control, the identity information of the first control is directly determined according to the attribute information and the view layout information of the first control, and the first control can be generated without depending on a third party. The identity information of the second control needs to be filled in and uploaded by the application end, so that an adaptation process is added, the adaptation process cannot be directly generated, and the adaptation process is not generated according to the attribute information and the view layout information of the second control.

That is, it can be seen that the service determines the target generalization information according to the voice instruction after obtaining the identification information and the generalization information of the first control and the identity information and the generalization information of the second control, and further determines the target control according to the corresponding relation between the generalization information-the identification information-the first control or the generalization information-the identity information-the second control.

Based on any one of the above embodiments, another embodiment of the present invention further provides a human-computer interaction system, including a human-computer interaction system in an electronic device, where the human-computer interaction system includes:

the control information acquisition module is used for acquiring at least one first control on a current interface and acquiring identification information corresponding to the at least one first control and generalization information corresponding to the identification information in the process of running the man-machine interaction application in the electronic equipment.

The identification information is generated based on a first identification or a second identification, the first identification is determined according to attribute information of the first control, and the second identification is determined according to view layout information of the first control.

And the voice instruction acquisition module is used for acquiring the voice instruction of the user.

And the control matching module is used for matching the generalization information from the generalization information according to the voice command to obtain target generalization information, and determining a target control based on the identification information corresponding to the target generalization information.

The man-machine interaction system corresponds to the man-machine interaction method, and is not described in detail herein.

Fig. 10 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, where the electronic device may include: processor 1010 (Processor), communication interface 1020 (Communications Interface), memory 1030 (Memory) and communication bus 1040, wherein Processor 1010, communication interface 1020, memory 1030 communicate with each other via communication bus 1040. Processor 1010 may invoke logic instructions in memory 1030 to perform the human-machine interaction method.

Further, the logic instructions in the memory 1030 described above may be implemented in the form of software functional units and stored in a computer readable storage medium when sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the human-computer interaction method provided by the methods described above.

In yet another aspect, the present application further provides a storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above-provided human-machine interaction methods.

The electronic device, the computer program product, and the storage medium provided in the embodiments of the present application, where the computer program stored on the storage medium enables a processor to implement all the method steps implemented in the embodiments of the method and achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those of the embodiments of the method are omitted herein.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A human-machine interaction method, wherein the method is applied to an electronic device, and the method comprises:

acquiring a voice instruction of a user;

2. The human-machine interaction method of claim 1, wherein the second identification is determined according to a resource ID of the first control and root element information, child element information and position index information of the first control in the view layout information.

3. The human-machine interaction method of claim 1, wherein the first identifier comprises one or more attribute information of title information, description information, or resource ID of a control.

4. The human-computer interaction method according to claim 2, wherein the identification information is generated by:

5. The human-computer interaction method of claim 4, wherein the first mark presence repetition means that all the attribute information is repeated;

6. The human-machine interaction method of claim 4, wherein after acquiring the running interface of the application program in different states and the plurality of first controls on the running interface, the method further comprises:

7. The method of human-machine interaction of claim 6, wherein the step of configuring generalization information for the first control comprises:

8. The human-machine interaction method of claim 7, wherein after the step of configuring generalization information for the first control according to a type of the first control, the method further comprises:

9. The human-computer interaction method of claim 7, wherein the step of performing control of the target control when the type of the target control is a scroll type comprises:

10. The human-computer interaction method of claim 7, wherein the step of performing control of the target control when the type of the target control is a progress bar type comprises:

11. The human-computer interaction method of claim 1, wherein the step of obtaining at least one first control on the current interface comprises:

12. The human-machine interaction method of claim 1, wherein the step of performing control of the target control in response to the voice command comprises:

13. The human-machine interaction method of claim 1, wherein the method further comprises:

14. The human-machine interaction method of claim 1, wherein the method further comprises:

15. The human-machine interaction method of claim 11, wherein the step of scanning the current interface through the focus window comprises:

16. A human-machine interaction system, wherein an electronic device comprises the human-machine interaction system, the system comprising:

17. An electronic device, comprising:

one or more processors;

one or more memories;

a module in which a plurality of application programs are installed;

The memory stores one or more application programs that, when executed by the processor, cause the electronic device to perform the steps of the human-machine interaction method of any of claims 1-15.

18. A storage medium having stored therein a plurality of instructions adapted to be loaded by a processor to perform the steps of the human interaction method of any of claims 1 to 15.