WO2022052776A1

WO2022052776A1 - Human-computer interaction method, and electronic device and system

Info

Publication number: WO2022052776A1
Application number: PCT/CN2021/113542
Authority: WO
Inventors: 祝振凱; 张乐乐
Original assignee: 华为技术有限公司
Priority date: 2020-09-10
Filing date: 2021-08-19
Publication date: 2022-03-17
Also published as: CN114255745A

Abstract

A human-computer interaction method (900), and an electronic device (100) and a system, wherein the method (900) can be applied to electronic devices (100), such as a smart screen (101), or applied to a system comprising a mobile phone (102) and a vehicle-mounted device (103). A text control, a picture control, buttons (20, 30), an icon control, etc. that are displayed on interfaces (403, 404, 501-504, 601-603, 701-707, 801, 802) and are visible and can be subjected to a click operation by a user are acquired, such that operations such as clicking on any control on the interfaces (403, 404, 501-504, 601-603, 701-707, 801, 802) are executed according to a user speech instruction. In addition, during the process of matching a speech instruction with a control, combined with content information on the interfaces (403, 404, 501-504, 601-603, 701-707, 801, 802), an application scenario in which a current user speech instruction may occur is accurately analyzed, and therefore, a control in the application scenario in which the current user speech instruction may occur is matched according to a recognized speech instruction, so as to more accurately acquire a user's intention, thereby improving the speech recognition accuracy in a speech interaction scenario.

Description

A method, electronic device and system for human-computer interaction

This application claims the priority of the Chinese patent application with the application number 202010950650.8 and the application name "A method, electronic device and system for human-computer interaction", which was submitted to the State Intellectual Property Office on September 10, 2020. Reference is incorporated in this application.

technical field

The present application relates to the field of electronic technology, and in particular, to a method, electronic device and system for human-computer interaction.

Background technique

With the development of technology, more and more electronic devices support voice interaction, and voice interaction has gradually become a way for users to communicate intentions and control electronic devices. Controlling the electronic device through a voice command can liberate the user's hands and facilitate the user to control the electronic device.

In the implementation process of a voice interaction, different applications usually rely on independently developed voice assistants to interact with users. For example, for navigation applications, Baidu Maps can rely on its own smallness to interact with users by voice, while AutoNavi Maps can interact with users through self-developed Xiaode. In this implementation process, each application relies on an independently developed voice assistant, so that users have different voice interaction experiences for different applications.

In addition, for other applications of electronic devices, there is currently no system-level voice interaction method, the voice assistant is not integrated in the application, and the user cannot control the operations in the application through voice commands. For example, audio applications such as music, or media applications such as videos do not have the ability to interact with the user by voice, and the user cannot control the execution of such applications through voice commands.

To sum up, at present, the voice assistant of an electronic device is separated from the application, and it is impossible for different applications to access the same voice assistant.

SUMMARY OF THE INVENTION

The embodiments of the present application will provide a human-computer interaction method, electronic device, and system, which can realize system-level voice interaction, for all applications displayed on the interface, all visible buttons, pictures, icons, text, controls, etc. , users can click and other operations through voice commands to achieve precise human-computer interaction, generalize the recognition of voice commands, and improve the accuracy of user intent recognition.

In a first aspect, a human-computer interaction method is provided, the method is applied to an electronic device, and the method includes: acquiring current interface content information during the running process of the human-computer interaction application in the electronic device; according to the interface Content information, determine one or more controls on the interface, the one or more controls include one or more of buttons, icons, pictures, and text; obtain the user's voice command; according to the voice command, from the one or more Among the plurality of controls, a target control is matched; and, according to the voice instruction, a user's intention is determined, and an operation on the target control is performed in response to the user's intention.

Optionally, the interface content may include a user-visible portion of the currently displayed interface. Exemplarily, the user-visible part may include pictures, text, menus, options, icons, buttons, etc. displayed on the interface, which are collectively referred to as "controls" and the like in this embodiment of the present application.

It should be understood that, in this embodiment of the present application, when a user's voice instruction is matched to a target control on the interface, an operation may be performed on the target control. Optionally, the operation may include input operations such as clicking, clicking, double-clicking, sliding, and right-clicking.

It should also be understood that, in the embodiment of the present application, after obtaining the user's voice command, after parsing the voice command, the voice command is matched with the target control on the interface, that is, the user's intention is recognized, and the click operation on the target control is further performed.

Through the above implementation process, the method obtains the controls displayed on the interface that are visible and that can be clicked by the user, and then the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands.

Specifically, in the process of analyzing the user's voice command, the acquired interface content information of the current interface is used as the parameter of the ASR analysis, that is, according to the interface content information of the current interface, it is accurately analyzed that the current user's voice command may occur. After recognizing the user's voice command, the text of the recognized voice command is matched with the controls in the current possible application scenario, so as to obtain the user's intention more accurately and improve the voice interaction scenario. Accuracy of speech recognition.

In particular, in a driving scenario, in a noisy environment such as an in-vehicle device, when the voice command input by the user is accompanied by noise, the embodiment of the present application can analyze the user's voice command in combination with the current application scenario where the user's voice command may occur. The accuracy of speech recognition is improved, thereby reducing user manual operations, thereby avoiding user distraction, and improving user safety in driving scenarios.

In addition, for voice interaction scenarios, different applications do not need to develop voice assistants separately. In other words, users can control multiple different applications through the same voice interaction method, and voice assistants and applications are no longer separated, which enriches the application ecology.

In summary, based on the user's voice command recognition and analysis technology, the text controls, pictures, text and icons included in the interface are identified, and the user's voice commands are matched to the controls of the screen content to achieve precise human-computer interaction. The recognition of voice commands is generalized, and the accuracy of user intent recognition and ASR recognition is improved; in addition, the delay of voice interaction is reduced, so that the processing delay of visible and speaking intent is within 200ms. Recognizing voice commands, the delay is 200ms, which greatly improves the detection efficiency of voice commands and improves the user experience.

For consumers, all applications and content visible on the screen can be controlled by the user's voice commands, which can reduce user distraction and improve driving safety in vehicle and travel scenarios. For developers, different applications do not need to be specially adapted for voice interaction, enriching the application ecology of travel scenarios, supporting the projection of mobile phones to vehicle terminals, realizing application ecological relocation, and improving the ecological value of HiCar.

In combination with the first aspect and the above implementations, in some implementations of the first aspect, matching the target control from the one or more controls according to the voice command includes: determining the voice command and The degree of matching of each of the one or more controls; the control with the greatest degree of matching is determined as the target control.

In a possible implementation manner, in the embodiment of the present application, the smart voice service module may correspond to the smart voice application installed on the mobile phone side, that is, the smart voice application of the mobile phone performs the voice command recognition service of the embodiment of the present application. Process.

In another possible implementation, the service corresponding to the smart voice service module can be provided by the server. In this scenario, the mobile phone can send the user's voice command to the server with the help of the server's voice analysis capability, and the server can analyze the voice command. After that, the recognition result of the voice command of the mobile phone is returned, which will not be repeated here.

In combination with the first aspect and the above implementations, in some implementations of the first aspect, according to the voice command, determining the matching degree of the voice command and each of the one or more controls includes: extracting the voice command One or more keywords included in the instruction; determine the matching degree of each keyword in the one or more keywords and the description information of each control in the one or more controls; the control with the largest matching degree Determined as the target control.

In combination with the first aspect and the above implementations, in some implementations of the first aspect, when the one or more controls include an icon control, the method further includes: acquiring an outline of the icon control, and determining a description according to the outline The outline keyword of the icon control; determining the matching degree of one or more keywords included in the voice instruction and the outline keyword of the icon control; determining the icon control with the largest matching degree as the target control.

In a possible implementation manner, there is an ASR model in the ASR module. In this embodiment of the present application, the acquired information of the currently displayed interface content can be transferred to the ASR module in synchronization with step 908, that is, the information of the interface content is entered in the ASR model. As a parameter, the user's voice command is recognized according to the updated ASR model.

Exemplarily, the user's voice command may include homophones, etc., for example, the user inputs "variety show", which is affected by the pronunciation of different users, and may be analyzed by the ASR module. The possible recognition results of "Zhongyi", "Traditional Chinese Medicine", "Loyalty", "Variety", etc. Such homophones and words with similar pinyin may cause the mobile phone to fail to accurately obtain the user's operation intention through the user's voice command. In conjunction with the embodiments of the present application, according to the current interface content information of the in-vehicle device 103, for example, the current interface displays a lot of audio information, star photos, video information, etc., when the ASR module analyzes, it will select from “Zhongyi”, “Traditional Chinese Medicine”, From the possible identification results such as "loyalty" and "variety show", select "variety show" that is more relevant to the currently displayed audio information, star photos, video information, etc., and then determine that the voice command issued by the current user is "variety show".

Through the above-mentioned updated algorithm implementation process, the voice command of the existing ASR module is the information of the currently displayed interface content introduced into the process, so that the use of the user's current voice command can be accurately analyzed according to the information of the currently displayed interface content. scene, and then accurately locate the application scene targeted by the current user's voice command, and improve the accuracy of recognizing the voice command.

It should be understood that the above process can be understood as a process of determining the user's intention according to the voice command after acquiring the user's voice command, for example, determining which control on the current interface is to be clicked by the voice command currently input by the user.

In a possible implementation manner, according to the user's voice command, the matching degree of the voice command and each of the one or more controls is determined, and the control with the largest matching degree is determined as the user to perform the click. The target control for the operation.

Optionally, in the process of determining the degree of matching between the voice command and each of the one or more controls, one or more keywords contained in the user's voice command may be extracted; the one or more keywords may be determined. The degree of matching between each keyword in the and the description information of each of the one or more controls; and determining the control with the greatest degree of matching as the target control.

Optionally, the keywords may include words, words, pinyin of part or all of the Chinese characters of the voice command, etc., which are not limited in this embodiment of the present application.

Optionally, the description information of each control may include outline information, text information, color information, position information, icon information, etc. of the control, which is not limited in this embodiment of the present application.

Exemplarily, in the music playing interface, when the user inputs the voice command as "I like", during the matching process between the voice command and the controls included in the interface, if the description words of the favorite button on the music playing interface are "like", "favorite" , the outline of the favorite button is the shape of "peach heart", then the shape of the peach heart can be matched as "I like", this method can generalize the user's voice command, and more intelligently combine the user's command with the controls on the interface match.

In combination with the first aspect and the above-mentioned implementations, in some implementations of the first aspect, the method further includes: when the voice command is detected, adding a digital corner label to some controls in the one or more controls; When it is detected that the voice instruction includes the first number, the control marked with the first number is determined as the target control.

In combination with the first aspect and the above-mentioned implementations, in some implementations of the first aspect, adding a digital corner label to some of the one or more controls includes: according to a preset order, in the one or more controls. The number label is added to some of the controls, and the preset order includes the order from left to right and/or from top to bottom.

In combination with the first aspect and the above-mentioned implementations, in some implementations of the first aspect, the part of the controls that can add a digital corner mark includes one or more of the following: all of the one or more controls are picture-type controls ; or the one or more controls have a grid-type arrangement order; or the one or more controls have a list-type arrangement order; or the one or more controls have a display size greater than or equal to the preset value 's controls.

In combination with the first aspect and the above implementations, in some implementations of the first aspect, the interface corresponding to the interface content information is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device The interface of the running application.

With reference to the first aspect and the foregoing implementations, in some implementations of the first aspect, the method further includes: starting the human-computer interaction application on the electronic device.

In combination with the first aspect and the above implementations, in some implementations of the first aspect, starting the human-computer interaction application on the electronic device includes: obtaining a user's preset input, and starting the human-computer interaction application on the electronic device For an interactive application, the preset input includes at least one of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.

In a second aspect, an electronic device is provided, comprising: one or more processors; one or more memories; a module installed with a plurality of application programs; the memory stores one or more programs, when the one or more When a program is executed by the processor, the electronic device is made to perform the following steps: in the process of running the human-computer interaction application, obtain the current interface content information; according to the interface content information, determine one or more controls on the interface, The one or more controls include one or more of buttons, icons, pictures, and text; obtain a user's voice command; according to the voice command, match the target control from the one or more controls; and, according to the The voice command determines the user's intention, and in response to the user's intention, performs an operation on the target control.

In conjunction with the second aspect, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: according to the voice instruction, determine the voice instruction and the one The matching degree of each control in or multiple controls; the control with the largest matching degree is determined as the target control.

In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: extracting one or more components included in the voice instruction Multiple keywords; determine the matching degree of each keyword in the one or more keywords and the description information of each control in the one or more controls; determine the control with the largest matching degree as the target control .

In combination with the second aspect and the above implementations, in some implementations of the second aspect, the one or more controls include icon controls, and when the one or more programs are executed by the processor, the electronic device is made to execute Following steps: obtain the outline of the icon control, determine the outline keyword describing the icon control according to the outline; determine the matching degree of one or more keywords included in the voice command and the outline keyword of the icon control; The icon control with the highest matching degree is determined as the target control.

In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: when the voice instruction is detected, in A digital corner mark is added to some of the one or more controls; when it is detected that the voice instruction includes a first number, the control marked with the first number is determined as the target control.

In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: in a preset order, in the The numerical superscript is added to some of the one or more controls, and the preset order includes the order from left to right and/or from top to bottom.

In combination with the second aspect and the above-mentioned implementations, in some implementations of the second aspect, the part of the controls that can add digital corner labels includes one or more of the following: all of the one or more controls are picture-type controls ; or the one or more controls have a grid-type arrangement order; or the one or more controls have a list-type arrangement order; or the one or more controls have a display size greater than or equal to the preset value 's controls.

In combination with the second aspect and the above implementations, in some implementations of the second aspect, the interface corresponding to the interface content information is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device. The interface of the running application.

In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: start the human-computer interaction application.

In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: acquiring a user's preset input, starting In the human-computer interaction application, the preset input includes at least one mode of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.

In a third aspect, the present application provides a system, the system includes a connected electronic device and a display device, the electronic device can perform any one of the possible human-computer interaction methods in the first aspect above, and the display device is used for displaying The application interface of the electronic device.

In a fourth aspect, the present application provides an apparatus, the apparatus is included in an electronic device, and the apparatus has a function of implementing the behavior of the electronic device in the above-mentioned aspect and possible implementations of the above-mentioned aspect. The functions can be implemented by hardware, or by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the above functions. For example, a display module or unit, a detection module or unit, a processing module or unit, and the like.

In a fifth aspect, the present application provides an electronic device, comprising: a touch display screen, wherein the touch display screen includes a touch-sensitive surface and a display; a positioning chip; one or more cameras; one or more processors; a plurality of memory; a plurality of application programs; and one or more computer programs. Wherein, one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. The instructions, when executed by one or more processors, cause an electronic device to perform any of the possible human-computer interaction methods described above.

In a sixth aspect, the present application provides an electronic device including one or more processors and one or more memories. The one or more memories are coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform A method for human-computer interaction in any possible implementation of any of the above aspects.

In a seventh aspect, the present application provides a computer storage medium, including computer instructions, when the computer instructions are executed on an electronic device, the electronic device can perform any of the possible human-computer interaction methods in any of the foregoing aspects.

In an eighth aspect, the present application provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to perform any of the possible human-computer interaction methods in any of the foregoing aspects.

Description of drawings

FIG. 1 is a schematic diagram of an application scenario of an example of a method for human-computer interaction provided by an embodiment of the present application.

FIG. 2 is a schematic structural diagram of an example of an electronic device provided by an embodiment of the present application.

FIG. 3 is a software structural block diagram of an implementation process of an example of a method for human-computer interaction according to an embodiment of the present application.

FIG. 4 is a schematic interface diagram of an example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.

FIG. 5 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application.

FIG. 6 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.

FIG. 7 is a schematic interface diagram of another example of a voice interaction process implemented on a vehicle-mounted device provided by an embodiment of the present application.

FIG. 8 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.

FIG. 9 is a schematic flowchart of a voice interaction method provided by an embodiment of the present application.

detailed description

The following describes the human-computer interaction method provided by the embodiments of the present application in detail with reference to the accompanying drawings and application scenarios.

Wherein, in the description of the embodiments of the present application, unless otherwise stated, “/” means or means, for example, A/B can mean A or B; “and/or” in this document is only a description of the associated object The association relationship of , indicates that there can be three kinds of relationships, for example, A and/or B, can indicate that A exists alone, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" refers to two or more than two.

Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature.

The embodiments of the present application will provide a human-computer interaction method. The following describes in detail how to implement system-level voice interaction through the human-computer interaction method with reference to the accompanying drawings and different embodiments. First, before introducing the human-computer interaction method provided by the embodiments of the present application, several possible application scenarios are listed first.

In a possible scenario, the human-computer interaction method provided by the embodiments of the present application may be applied to a scenario including a separate electronic device. Exemplarily, as shown in (a) of FIG. 1 , the smart screen 101 is used as the electronic device, and the human-computer interaction method is applied to a scenario where a user uses the smart screen 101 . Specifically, the smart screen 101 can acquire the user's voice command through the microphone, recognize the voice command, perform corresponding operations according to the user's voice command, display a corresponding interface, and the like.

In another possible scenario, the human-computer interaction method provided by the embodiments of the present application may also be applied to a scenario including two electronic devices, and the two electronic devices in the scenario may include a mobile phone, a tablet computer, and a wearable device. , vehicle equipment and other different types of electronic equipment.

Exemplarily, as shown in (b) of FIG. 1 , taking the scenario including the mobile phone 102 and the in-vehicle device 103 as an example, the in-vehicle device 103 can be used as a display device, connected to the mobile phone 102 to display and run the mobile phone 102 . application. The mobile phone 102 can acquire the user's voice command, recognize the voice command, and perform the corresponding operation in the background according to the user's voice command, and then display the screen after the corresponding operation is performed on the in-vehicle device 103 . Alternatively, the in-vehicle device 103 can also obtain the user's voice command, and transmit the voice command to the mobile phone 102, the mobile phone recognizes the voice command, and performs the corresponding operation in the background according to the user's voice command, and then executes the corresponding operation. The interface projection screen is displayed on the in-vehicle device 103 .

In yet another possible scenario, the human-computer interaction method provided in the embodiments of the present application may also be applied to a scenario including at least one electronic device and a server. Exemplarily, as shown in (c) of FIG. 1 , in the scenario including the mobile phone 102, the in-vehicle device 103 and the server 104, the mobile phone 102 or the in-vehicle device 103 can obtain the user's voice command, and then convert the user's voice command to the user's voice command. Upload the data to the server 104, analyze the user's voice command more quickly and accurately with the help of the voice analysis capability of the server 104, and then transmit the analyzed voice command result back to the mobile phone 102, and perform corresponding operations on the mobile phone.

It should be understood that the embodiments of the present application do not limit the application scenarios of voice interaction.

In combination with the scenarios introduced above, the method for human-computer interaction provided by the embodiments of the present application can be applied to mobile phones, smart screens, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (virtual reality, VR) devices, notebook computers, ultra-mobile personal computers (ultra-mobile personal computers, UMPCs), netbooks, personal digital assistants (personal digital assistants, PDAs) and other electronic devices, the embodiments of the present application do not make any specific types of electronic devices. limit.

In this embodiment of the present application, the smart screen 101 , the mobile phone 102 , and the in-vehicle device 103 listed in FIG. 1 are collectively referred to as “electronic device 100 ”, and possible structures of the electronic device 100 are described below.

Exemplarily, FIG. 2 is a schematic structural diagram of an example of an electronic device 100 provided by an embodiment of the present application. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

The controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .

The I2S interface can be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 . In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.

The PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160 . For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.

The MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 . MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc. In some embodiments, the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 . The processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .

The GPIO interface can be configured by software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones. The interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.

The charging management module 140 is used to receive charging input from the charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from the wired charger through the USB interface 130 . In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .

The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 . The power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 . The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110 . In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.

Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .

The modem processor may include a modulator and a demodulator. Wherein, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and passed to the application processor. The application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 . In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR). The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .

In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).

The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light). emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on. In some embodiments, the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.

The electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193 .

Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.

A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.

The NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process the input information, and can continuously learn by itself. Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.

Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 . The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.

The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .

Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.

The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.

The earphone jack 170D is used to connect wired earphones. The earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.

The pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals. The gyro sensor 180B can be used to determine the motion attitude of the electronic device 100. The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation. The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D. The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). Distance sensor 180F for measuring distance. The electronic device 100 can measure the distance through infrared or laser. Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes. The ambient light sensor 180L is used to sense ambient light brightness. The fingerprint sensor 180H is used to collect fingerprints. The temperature sensor 180J is used to detect the temperature. The bone conduction sensor 180M can acquire vibration signals. Touch sensor 180K, also called "touch panel". The touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to touch operations may be provided through display screen 194 .

The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .

Motor 191 can generate vibrating cues. The motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects. The motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.

The SIM card interface 195 is used to connect a SIM card.

It should be understood that the human-computer interaction method in this embodiment of the present application may be applied to any possible electronic device having all or part of the structure shown in FIG. 2 .

In other words, in the possible scenario shown in FIG. 1 , electronic devices such as the smart screen 101 , the mobile phone 102 , and the in-vehicle device 103 may all have the structure shown in FIG. 2 , or have a structure with more or fewer components than that shown in FIG. The embodiments of the present application do not limit the types of electronic devices included in the application scenario.

For ease of understanding, the following embodiment will take the scenario shown in (b) of FIG. 1 as an example, and in a scenario including at least a mobile phone 102 and a vehicle-mounted device 103 , the human-computer interaction provided by the embodiments of the present application will be described in detail. method.

First, for the scenario shown in (b) of FIG. 1 , a software structural block diagram of the implementation process of the method for human-computer interaction provided by the embodiment of the present application is introduced.

In some embodiments, when the electronic device 100 shown in FIG. 2 is a mobile phone, it may have a Harmony OS system,

system,

The system or any other possible operating system, or may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture, etc. In the embodiments of the present application, the mobile phone has a layered architecture.

Taking the system as an example, the software structure of the mobile phone 102 is exemplarily described.

FIG. 3 is a software structural block diagram of an implementation process of an example of a method for human-computer interaction according to an embodiment of the present application. Wherein, after the mobile phone 102 and the in-vehicle device 103 are connected, the in-vehicle device 103 can be used as a screen projection device (or “display device”) of the mobile phone 102 , and the application of the mobile phone 102 can be displayed on the display screen of the in-vehicle device 103 .

specifically,

The system has a layered architecture, the software can be divided into several layers, each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the

The system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer.

1. Application layer

The application layer can include a series of application packages. As shown in Figure 3, the application package can include applications such as Visible to Speak 11, Smart Voice 13, Music, Navigation, and HiCar 15. The following mainly introduces the functional modules respectively corresponding to the visible to speak 11 and the intelligent voice 13 in the embodiments of the present application.

(1) Visible can say 11

In this embodiment of the present application, "visible" may refer to the part that the user can see during the human-computer interaction between the user and the electronic device. Exemplarily, the user-visible portion may include display content on the screen of the electronic device, such as the desktop, windows, menus, icons, buttons, and controls of the electronic device.

It should be understood that the visible portion may also include multimedia content such as text, pictures, and videos displayed on the screen of the electronic device, which is not limited in this embodiment of the present application.

It should also be understood that the display content on the screen of the electronic device can be an interface displayed by an application running in the foreground of the electronic device, or a virtual display interface running an application in the background of the electronic device. on other electronic devices.

In the embodiment of the present application, "speakable" means that the user can interact with the visible part through a voice command, thereby completing the interactive task. Exemplarily, for user-visible parts such as desktops, windows, menus, icons, buttons, and controls of an electronic device, the user can control them through voice commands, and then perform input operations such as clicking, double-clicking, and sliding on the visible parts.

To realize the above functions, it can be seen that 11 may include an interface information acquisition module 111, an intent processing module 112, an interface module 113, a predefined action execution module 114, and the like. The interface information acquisition module 111 may acquire interface content information of applications running in the foreground or background of the mobile phone. The intent processing module 112 may receive the user's voice instruction returned by the smart voice 13, and determine the user's intent according to the user's voice instruction. The interface module 113 is used to realize data and information exchange between various applications. The predefined action execution module 114 is configured to execute corresponding operations according to voice commands, user intentions, and the like.

(2) Smart Voice 13

In this embodiment of the present application, the smart voice 13 may correspond to a smart voice application installed on the side of the mobile phone 102 , that is, a service process of voice recognition provided by the smart voice application of the mobile phone 102 .

Or, the service process of the voice recognition provided by the smart voice 13 can be provided by the server, and this scenario can correspond to (c) in FIG. After the server 104 analyzes the voice command, the server 104 returns the recognition result of the voice command to the mobile phone 102, which will not be repeated here.

Optionally, the intelligent speech 13 may include a semantic understanding (natural language understanding, NLU) module 131, a speech recognition (automatic speech recognition, ASR) module 132, a speech synthesis (text to speech, TTS) module 133 and a session management (dialog management) module 131 , DM) module 134 and so on.

Among them, the ASR module 132 can convert the original voice signal input by the user into text information; the NUL module 131 can convert the recognized text information into semantics that can be understood by electronic devices such as mobile phones and in-vehicle devices; the DM module 134 can be based on dialogue The state determines the action that the system should take, etc.; the TTS module 133 can convert the natural language text into speech and output it to the user.

In addition, it should be understood that the smart speech 13 may also use a natural language generation (NLG) module, etc., which is not limited in this embodiment of the present application.

(3) HiCar application 15

The application of the mobile phone can be projected to the vehicle device through the HiCar application 15. During the projection process, the application actually runs on the side of the mobile phone, and the operation can include the foreground operation or the background operation of the mobile phone. B.

It should be understood that the in-vehicle device can have an independent display system. After the application of the mobile phone is projected to the in-vehicle device through the HiCar application 15, there can be an independent display desktop and application quick entry on the in-vehicle device, while providing the ability to obtain voice commands.

2. Application Framework Layer

The application framework layer includes a variety of service programs or some predefined functions, which can provide an application programming interface (API) and a programming framework for applications in the application layer. As shown in FIG. 3 , the application framework layer may include a content sensor (content sensor) 21, a multi-screen framework service module 23, a view system 25, and the like.

Among them, the content provider 21 can be used to store and obtain data, and make these data accessible by application programs. For example, the data acquired by the content provider 21 may include interface display data of the electronic device, video, image, audio, user browsing history, bookmarks and other data. Exemplarily, in the implementation of the present application, the content controller 21 may acquire the interface content displayed in the foreground or background of the mobile phone.

The multi-screen frame service module 23 may include a window manager, etc., for managing the window display of the electronic device. Exemplarily, the window manager may acquire the size of the display screen of the mobile phone 102 or the size of the window to be displayed, and acquire the content of the window to be displayed, and the like. In addition, the multi-screen framework service module 23 can also manage the screen projection display process of the electronic device, for example, obtain the interface content of one or more applications running in the background of the electronic device, and transmit the interface content to other electronic devices for realizing other electronic devices. The interface content of the electronic device is displayed on the electronic device, which is not repeated here.

The view system 25 includes visual controls, such as controls for displaying text, controls for displaying pictures, and the like. View system 25 can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

3. System library

The Android runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.

The core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.

The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.

A system library can include multiple functional modules. For example: surface manager (surface manager), media library (media library), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc. Among them, the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications. The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc. The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing. A 2D graphics engine is a drawing engine for 2D drawing. The image processing library can provide analysis of various image data and provide a variety of image processing algorithms, such as image cutting, image fusion, image blurring, image sharpening and other processing, which will not be repeated here.

4. Kernel layer

The kernel layer is the layer between hardware and software. The kernel layer at least includes display drivers, audio drivers, sensor drivers, etc. Various drivers can call the hardware structures such as the microphone, speaker or sensor of the mobile phone, such as calling the microphone of the mobile phone to obtain the voice commands used, and calling the speaker of the mobile phone for voice output, etc. , and will not be repeated here.

The possible software structure of the mobile phone is described above. The in-vehicle device 103 as a display device may have the same or different software structure than that of the mobile phone 102 . As shown in FIG. 3 , in this embodiment of the present application, the vehicle-mounted device at least includes a display module 31 , a microphone/speaker 32 , and the like. The display module 31 may be used to display the interface content currently running on the in-vehicle device 103 , or display the application interface projected by the mobile phone 102 .

It should be understood that the in-vehicle device 103 may have an independent display system. In the embodiment of the present application, after the application of the mobile phone 102 is projected to the in-vehicle device 103 through the HiCar application 15, there may be an independent display desktop and application on the in-vehicle device 103 Fast entry. In other words, music, navigation, video and other applications of the mobile phone 102 can be rearranged and displayed on the in-vehicle device 103 according to the display system of the in-vehicle device 103 after the application of the mobile phone is projected to the in-vehicle device 103 through the HiCar application 15. This embodiment of the present application This is not limited.

The microphone/speaker 32 is the hardware structure of the in-vehicle device, and can realize the same functions as the microphone/speaker of the mobile phone. In this embodiment of the present application, the input of the user's voice instruction may be through the microphone of the mobile phone 102 itself, or may be a remote virtual microphone. The remote virtual microphone can be understood as a kind of acquisition capability of voice commands provided by the microphone of the in-vehicle device 103 by means of the microphone of the in-vehicle device 103, and the acquired voice commands are transmitted to the mobile phone, and the mobile phone 102 can recognize the voice commands. etc., will not be repeated here.

In the embodiment of the present application, the HiCar application 15 can rely on the multi-screen framework capability of the mobile phone to project the interfaces of multiple applications of the mobile phone to the interface of the in-vehicle device. The multiple applications themselves actually run on the side of the mobile phone, and the interface is displayed on the on the screen of the in-vehicle device. The screen content is extracted through the content sensor of the mobile phone system, and the application interface content of the interface of the screen-casting vehicle device is obtained. Smart Voice can analyze user semantics more quickly and accurately through the terminal-side (relying on the powerful calculation example of the mobile phone itself) and cloud-side analysis capabilities, and the combination of the terminal and the cloud can send the recognized results to Visible to match the interface content and identify the user. the purpose. Finally, by simulating clicks, the interface is operated to realize control operations such as control clicks, sliding up and down, left and right, and return.

In addition, in the embodiment of the present application, during the process that the mobile phone 102 can project multiple applications to the in-vehicle device 103 through the HiCar application 15, the in-vehicle device 103 and the mobile phone 102 are in a state of established connection.

In a possible implementation manner, in this embodiment of the present application, there may be various manners for establishing a connection between the mobile phone 102 and the in-vehicle device 103 . For example, the connection between the mobile phone 102 and the in-vehicle device 103 may include various connection modes such as wired connection or wireless connection. Exemplarily, the wired connection between the mobile phone 102 and the in-vehicle device 103 may be through a USB data cable; the wireless connection between the mobile phone 102 and the in-vehicle device 103 may be established by means of a Wi-Fi connection, or by means of a mobile phone 102 and the in-vehicle device 103 support the function of near field communication (NFC), and perform proximity connection through the "touch" function, or connect through the mobile phone 102 and the in-vehicle device 103 through Bluetooth scanning code, etc.

Alternatively, with the development of communication technology, the communication bandwidth and rate are gradually increased, and data may be transmitted between the mobile phone 102 and the in-vehicle device 103 without establishing a near field communication connection. Exemplarily, with the popularization of high-speed communication methods such as the fifth generation (5th generation, 5G) mobile communication system in the future, the mobile phone 102 and the in-vehicle device 103 may be able to project the screen of the mobile phone to the in-vehicle device through 5G communication. For example, by installing different or the same applications on the mobile phone 102 and the in-vehicle device 103, data is transmitted by means of the 5G communication network. In this implementation manner, the mobile phone may not provide functions such as discovery and establishing a connection with the in-vehicle device.

Alternatively, the mobile phone 102 and the in-vehicle device 103 can be connected and communicated based on the relevant settings under the account by logging into the same account. Exemplarily, both the mobile phone 102 and the in-vehicle device 103 can register a Huawei account. Under this account, the application of the mobile phone 102 can be displayed on the display screen of the in-vehicle device 103, and the application can actually run on the side of the mobile phone 102, which is not repeated here. Repeat.

It should be understood that the embodiments of the present application do not limit the manner of establishing a connection between the mobile phone 102 and the in-vehicle device 103 .

It should also be understood that, in addition to the multiple layers and multiple functional modules included in the mobile phone 102 and the in-vehicle device 103 shown in FIG. 3 , the mobile phone 102 and the vehicle-mounted device 103 may also have different division methods or include more functional modules. This is not limited.

In the following, the mobile phone 102 and the in-vehicle device 103 having the software structure shown in FIG. 3 are taken as examples for detailed description in conjunction with the accompanying drawings and application scenarios.

FIG. 4 is a schematic diagram of a graphical user interface (graphical user interface, GUI) for implementing a voice interaction process on an in-vehicle device provided by an embodiment of the present application.

Exemplarily, (a) in FIG. 4 shows that the screen display system of the in-vehicle device 103 displays the currently output interface, and the content of the interface can be derived from the application on the side of the mobile phone 102 that is actually running, obtained by the HiCar application. Provided to the in-vehicle device 103 .

It should be understood that the interface content on the display screen of the in-vehicle device 103 can be arranged and filled based on its own display system, and the same content can have a different display style, icon size, arrangement order, etc. The content is arranged and filled on the display screen of the in-vehicle device 103 according to the requirements of the display system of the in-vehicle device 103 .

As shown in (a) of FIG. 4 , the screen display area of the in-vehicle device 103 may include a status display area at the top position, and a navigation menu area 401 and a content area 402 shown by dashed boxes. Among them, the status display area displays the current time and date, Bluetooth icon, WIFI icon, etc.; the navigation menu area 401 may include icons such as homepage, navigation, phone and music, each icon corresponds to at least one application actually running on the mobile phone 102, the user Any icon can be clicked to enter the corresponding interface of the application; the content area 402 displays the content provided to the in-vehicle device 103 by different applications.

For example, Huawei Music is installed on the mobile phone 102 , and the Huawei Music runs in the background, and the Huawei Music sends the content such as the playlist or song list displayed during the running process to the in-vehicle device 103 . The screen display system of the in-vehicle device 103 fills the content area provided by the Huawei Music in the content area of the display screen, as shown in (a) in FIG. Recommendations, playlists, ranking lists, radio stations, searches, etc., the display process will not be repeated in subsequent embodiments.

It should be understood that the interface of the in-vehicle device 103 may also display other more menus or contents of application programs, which are not limited in this embodiment of the present application.

Exemplarily, (a) in FIG. 4 shows the interface after the user clicks on the music application in the navigation menu area 401, and the icon of the music application in the navigation menu area 401 is highlighted in gray. In the content area 402, the song name or song list provided by Huawei Music is displayed. Assuming that the current song 1 is playing, the play button 20 is displayed on the icon of song 1, and the song 2, song 3, song 4 and song 5 are paused In the playback state, a pause button 30 is displayed.

In addition, on the interface of the in-vehicle device 103, a voice ball icon 10 may also be included, as shown in (a) of FIG. The in-vehicle device 103 can display the interface 403 as shown in (b) of FIG. 4 . The wake-up window 403 - 1 shown by the dotted box may be displayed on the interface 403 , and the wake-up window 403 - 1 includes the voice recognition icon 40 .

It should be noted here that the wake-up window 403 - 1 may not be embodied in the form of a window, but only includes the voice recognition icon 40 , or includes the voice recognition icon 40 and the voice command recommended to the user, and is displayed in a floating manner on the display screen of the in-vehicle device 103 superior. The embodiments of the present application are for convenience of description. The area including the voice recognition icon 40 is referred to as a "wake-up window", which should not limit the solution of the embodiment of the present application, and will not be described in detail later.

Optionally, the voice recognition icon 40 may be displayed dynamically, which is used to indicate that the in-vehicle device 103 is in a state of monitoring and acquiring the user's voice instruction. In addition, the wake-up window 403-1 may also include some voice commands recommended to the user, such as voice commands such as "stop playing" and "continue playing". It should be understood that the recommended voice command may also accept a user's click operation, and execute the purpose corresponding to the response command, which will not be repeated in this embodiment of the present application.

As shown in (b) in FIG. 4 , if the user inputs a voice command of “stop playing song 1”, after the in-vehicle device 103 obtains the user’s command, the voice command can be sent to the mobile phone 102, and the mobile phone 102 recognizes the user’s A voice command, in response to the voice command, a click operation on the play button 20 of the song 1 is performed in the background, and the play button 20 on the song 1 changes to a pause button 30 . The mobile phone 102 can transfer the display interface of the click operation on the play button 20 of the song 1 back to the in-vehicle device 103, and then the in-vehicle device 103 can display the interface 404 shown in (c) in FIG. Pause button 30.

The above implementation process can be understood as a user instruction can perform a click operation on any control on the display screen of the in-vehicle device 103 , and further display the interface after the click operation is performed on the display screen of the in-vehicle device 103 .

In a possible implementation manner, the user can also turn on the voice monitoring function by pressing the car control voice button of the car, for example, the user presses the car control voice button on the steering wheel to turn on the function of the in-vehicle device 103 to monitor and obtain the user's voice command, and display The wake-up window 403-1 is shown by the dashed box.

It should be understood that, in this embodiment of the present application, the operation of the user clicking the voice ball icon 10 can trigger the in-vehicle device 103 to enable the voice monitoring function; or, the voice interaction function between the in-vehicle device 103 and the user can be enabled, that is, the in-vehicle device 103 is always monitoring the voice The state of the instruction; or, the user can further turn on the in-vehicle device 103 through other shortcut operations to be in the state of monitoring the voice instruction all the time, which is not limited in this embodiment of the present application.

Optionally, when the in-vehicle device 103 is always in the state of monitoring the voice command, the wake-up window can always be displayed on the display screen of the in-vehicle device 103; , which is suspended and displayed on the display screen of the in-vehicle device 103 again; when no voice command of the user is detected within a preset time (for example, 2 minutes), the in-vehicle device 103 can automatically exit the monitoring function, which is not limited in this embodiment of the present application.

It should also be understood that, in the embodiments of the present application, the buttons, buttons, switches, menus, options, pictures, lists, texts, etc. that are visible on the interface and that can be clicked by the user are collectively referred to as “controls”. The example will not be repeated.

Optionally, the voice instruction recommended to the user displayed in the wake-up window 403-1 may be instruction content associated with a control on the currently displayed interface 403 that can be clicked by the user.

In a possible implementation process, the content sensor of the mobile phone 102 can obtain the current state of each control and provide the user with a recommendation instruction according to the current state.

Exemplarily, as shown in (b) of FIG. 4 , the interface 403 includes the play button 20, and the recommended instruction in the wake-up window 403-1 may include “stop playing”, that is, “stop playing” can be understood as playing. The state achieved after the button 20 is clicked by the user. Similarly, song 2 on the interface 403 also includes the pause button 30, then the recommended instruction in the wake-up window 403-1 may include "play song 2", that is, "play song" can be understood as the user clicking the pause button of song 2 The state achieved after 30. Or, as shown in (c) of FIG. 4 , the pause button 30 is displayed on all songs 1 to 5. In a state where music playback is paused, the mobile phone 102 obtains that the current interface does not include the “play button 20”. If the user wakes up After the voice ball is activated, the "start playing" instruction may be displayed in the wake-up window 403-1, but the "stop playing" instruction will not be displayed, which will not be described in detail later.

Alternatively, the voice command recommended to the user displayed in the wake-up window 403-1 may also be a fixed recommended command of a certain application. For example, in the interface of the music application shown in (a) of FIG. 4 , the voice instruction recommended to the user displayed in the wake-up window 403-1 can be fixed as “stop playing”, “start playing”, etc. This embodiment of the present application This is not limited.

In a possible way, the controls on the interface that can be clicked by the user can be divided into the following categories:

1. Text control

Text controls contain textual information that can be recognized. Exemplarily, "daily recommendation", "song list", "top chart", "radio station", "song X" and "song list 1" as shown in (a) of FIG. 4 .

In a possible implementation manner, the text information included in the text control may be directly identified by a content sensor (content sensor) of the application framework layer of the mobile phone 102 . It should be understood that the music application actually runs in the mobile phone 102 in the background, and the mobile phone 102 can acquire the text information of the text control projected and displayed on the display screen of the in-vehicle device 103 in the background.

Optionally, in the embodiment of the present application, the voice command recommended to the user displayed in the wake-up window 403-1 may be related to the text control obtained above, such as “play song 2”, etc., which will not be repeated here. .

2, web controls

Common web controls can include text input boxes (TextBox), drop-down boxes (DropList), date/time controls (Date/TimePicker), and so on. Exemplarily, in Fig. 4 (a), the "search" control can be divided into web control classes.

In a possible implementation manner, the web control on the interface can be recognized by the content sensor of the mobile phone 102. In the embodiment of the present application, the voice command recommended to the user displayed in the wake-up window 403-1 can be obtained from the above. related web controls, such as "search for songs" and other recommended commands.

3. Picture control

Picture controls are displayed as pictures on the interface, and each picture corresponds to a different descriptor. Exemplarily, as shown in (a) in FIG. 4 , the artist picture or album picture above the song 1, and the picture of “nostalgic classics” displayed above the song list 1 for identifying the list, etc.

In a possible implementation manner, the content sensor of the mobile phone 102 can generalize the meaning of the picture by obtaining the description word of each picture, and provide the user with a recommendation instruction. In the embodiment of the present application, the mobile phone 102 obtains a picture with the description word "Zhang XX's song 1" and the voice instruction recommended to the user displayed in the wake-up window 403-1 may display a recommended instruction such as "play Zhang XX's song 1" .

4. List control

As shown in (a) of FIG. 4 , the song list 1 may include a plurality of songs, and the “song list 1” can be divided into list controls. When the user clicks on the song list 1, the entered next-level interface may be presented to the user with multiple song lists included in the song list 1, and the music in the song list 1 is not started to be played.

5. Switch control

A switch control can be understood as a control with a switch function on the interface. Exemplarily, in (a) of FIG. 4 , the play button 20 and the pause button 30 can be divided into switch controls.

Five possible types of controls have been introduced above. The mobile phone 102 can obtain the controls on the current display interface 403, and further according to the obtained control types, descriptors and other information, determine the recommended instructions displayed in the wake-up window 403-1 to the user .

It should be understood that, for different applications and different interfaces, the embodiments of the present application may include more controls than the five types of controls listed above, and the embodiments of the present application will not exemplify them one by one. In addition, some controls may be divided into multiple control types at the same time, which is not limited in this embodiment of the present application.

For the control types listed above, Table 1 lists the controls on several common audio application pages. As shown in Table 1 below, for audio applications such as NetEase Cloud Music, Kugou Music, Huawei Music, Himalaya, Baby Bus Story, Xiaobanlong Children's Song, which are commonly used by users, different pages may include different controls, and each application The number and types of controls included in the first-level page and the second-level page are different.

Exemplarily, taking the NetEase Cloud Music application as an example, the first-level page can be understood as the main interface of NetEase Cloud Music entered after the user clicks the NetEase Cloud Music application icon, including "daily recommendation", "My favorite music", "Local Music", "Private FM" and other page content, the second-level interface is the next-level page that the user clicks on any menu or control on the main interface of the NetEase Cloud Music to enter, such as the playlist page, play page, etc. In the embodiment of the present application, the page content on each page can be acquired by the mobile phone, and the text information included in each control can be recognized, which will not be repeated here.

Table 1

Exemplarily, (a) in FIG. 5 shows that the screen display system of the in-vehicle device 103 displays an interface 501 currently output. In the content area of the interface 501, song 1, song 2, song 3, song 4, and song 5 are all in the state of being paused, and the pause button 30 is displayed.

As shown in (a) of FIG. 5 , on the display interface of the in-vehicle device 103, the user clicks the voice ball icon 10 to enable the voice monitoring function of the in-vehicle device 103, and the display screen of the in-vehicle device 103 displays as shown in FIG. 5 (( b) The wake-up window 502-1 shown by the dashed box in the figure. The wake-up window 502-1 may include the voice recognition icon 40 and recommended voice commands. For example, as shown in (b) of FIG. 5 , the recommended voice commands may be "start playing" and "next page", and so on.

If the user inputs the voice command of "play song list 1", after the in-vehicle device 103 obtains the user's command, in response to the voice command input by the user, an interface 503 as shown in (c) in FIG. 5 can be displayed, which interface 503 is the interface after the click operation on the song list 1 is performed. Exemplarily, as shown in (c) of FIG. 5 , the interface 503 may include the following controls: return to the previous level, song list 1—classic nostalgia, play all, and song 6 and many other songs included in the song list 1 in the song name.

In the embodiment shown in FIG. 5 , it is assumed that the user's operation of clicking the voice ball icon 10 enables the in-vehicle device 103 to be in a state of monitoring the user's instruction all the time. Then, after the user clicks the voice ball icon 10 as shown in (a) of FIG. 5 , the wake-up window is always suspended and displayed on the display screen of the in-vehicle device 103 . The interface 503 shown in (c) of FIG. 5 includes a wake-up window 503-1. Optionally, the instructions recommended to the user in the wake-up window 503-1 may be changed according to the controls included in the current interface 503, for example, voice instructions such as “play all” and “next page” are displayed.

The user inputs a voice command of "play all", after the in-vehicle device 103 obtains the user's command, in response to the voice command input by the user, an interface 504 can be displayed, which is the interface after the click operation on the "play all" control is performed. . Exemplarily, as shown in (d) in FIG. 5 , on the interface 504, the “play all” control is displayed as the playing state, and starts playing from the first song (song 6) arranged in the song list 1 , a sound icon 50 is displayed at the location of the first song, which is used to identify the source of the sound as song 6, that is, song 6 is the song currently being played, which is not limited in this embodiment of the present application.

Combining the different implementation processes of the above-mentioned Fig. 4 and Fig. 5, the method for human-computer interaction provided by the embodiment of the present application, by obtaining the controls displayed on the interface that are visible and can be clicked by the user, the user can input a voice command to execute the control on the interface. Click and other operations of any control. All apps and all visible content on the display can be controlled by the user with voice commands. In particular, in the driving scene, the manual operation of the user is reduced, thereby avoiding user distraction and improving the safety of the user in the driving scene.

Exemplarily, (a) in FIG. 6 shows that the screen display system of the in-vehicle device 103 displays an interface 601 currently output. In the content area of the interface 601, song 1, song 2, song 3, song 4, and song 5 are all in the state of being paused, and the pause button 30 is displayed.

As shown in (a) of FIG. 6 , on the display interface 601 of the in-vehicle device 103, the user clicks the voice ball icon 10 to enable the voice monitoring function of the in-vehicle device 103, and the display screen of the in-vehicle device 103 displays as shown in FIG. 6 . (b) The wake-up window 602-1 is shown in the figure. The wake-up window 602-1 may include the voice recognition icon 40, recommended voice commands such as "start playing" and "next page".

In a possible implementation, when the user enables the voice monitoring function of the vehicle-mounted device 103, the mobile phone 102 can obtain the content of the interface, determine whether the interface includes icons (or pictures), and add icons to the icons in a certain order. Digital corner markers.

Optionally, the user's operation of enabling the voice monitoring function of the in-vehicle device 103 may trigger the addition of a digital corner mark to the icon in the interface, or, before the user enables the voice monitoring function of the in-vehicle device 103, other preset operations may be used. Trigger to add a digital corner label on the icon in the interface, which is not limited in this embodiment of the present application.

Optionally, the icons with added digital corner marks mentioned in the embodiments of the present application may also include application icons of different applications. For example, the interface 601 as shown in (a) of FIG. 6 navigates the home icon, navigation icon, phone icon, music icon, etc. in the menu area.

Alternatively, the icon for adding a digital superscript mentioned in the embodiment of the present application may also include a picture displayed on the interface 601 . For example, in the content area of the interface 601 as shown in (a) of FIG. 6 , the singer picture of song 1, the picture of song list 1, etc., mark the pictures included on the interface with digital superscripts.

Exemplarily, when a song or a song list in the content area of the interface 601 is displayed in a foreign language, the user may not be able to accurately issue a voice command including the song name, by using the pictures of different songs or pictures of the song list. With the addition of digital corner markers, users can perform operations through voice commands containing digital corner markers, which is convenient and quick, and improves user experience.

Alternatively, the icon for adding a digital corner label mentioned in the embodiment of the present application may also include controls such as buttons displayed on the interface 601, which are not limited in the embodiment of the present application.

Exemplarily, taking an application icon as an example, when adding a digital subscript to the application icon, the display size of the numeric subscript can be adapted to the size of the application icon displayed on the interface of the in-vehicle device 103 . For example, if the application icon on the interface is small, adding a digital corner mark may cause the digital corner mark to be too small, and the user cannot obtain the digital corner mark accurately. Therefore, if the application icon is small, such as the display of the application icon on the in-vehicle device When the pixels occupied on the screen are less than or equal to the preset pixels, the application icon may not be marked, but only the application icons larger than the preset pixels are marked, which is not limited in this embodiment of the present application.

Exemplarily, when the mobile phone 102 acquires the interface 601 as shown in (a) of FIG. 6 , it is determined that the interface 601 includes icons of different songs and icons of different song lists. The mobile phone 102 can add a digital corner mark 60 as shown in (b) of FIG. 6 to each icon according to the arrangement order of the icons on the interface from left to right and from top to bottom, for example, add a digital corner to song 1. Mark 1, add a digital corner mark 2 to song 2, and so on, add a digital corner mark 60 to all the icons on the music interface.

Specifically, the above-mentioned process can obtain the content on the interface by the content sensor of the mobile phone 102, and obtain the interface content from the content sensor by the HiCar application 15 installed on the mobile phone 102, and the HiCar application 15 can judge whether the current interface has an icon according to the interface content. . When an icon is included in the interface, a digital corner mark 60 is added to the icon in a certain order, which is not limited in this embodiment of the present application.

When the picture of each song or the picture of the song list on the interface of the in-vehicle device 103 is marked with a digital superscript, the user can input a voice command including the digital superscript, and use the voice command to execute the picture of the digital superscript. Click Action. Exemplarily, as shown in (b) of FIG. 6 , after the pictures included on the interface of the in-vehicle device 103 are marked with a numerical corner mark, the user can input “1” or “play 1”, etc. containing the corresponding number. Voice command, in response to the voice command input by the user, the mobile phone 102 can perform a click operation on the song 1 marked as 1 in the background, and display the interface 603 as shown in (c) in FIG. 6 , the pause button 30 of the song 1 The transition is made to the play button 20, and the in-vehicle device 103 starts to play the song 1.

Specifically, in combination with the software architecture and functional modules of FIG. 2 , in the implementation process, the user speaks a corresponding number, such as number 1, through a voice command, and the user's voice command is recognized by the intelligent voice 13 of the application layer, and the user's voice command is The voice command translates into the text "1". At the same time, the content sensor of the application framework layer extracts the content of the current interface, analyzes the content of the control from the visible, and obtains the text information of the control. For example, match the recognized control information with the text "1" returned by Smart Voice. After the matching is successful, the click operation is performed on the icon of song 1, and the click event on the icon of song 1 is transmitted to the business logic of the music application itself, so as to realize the jump of the corresponding business logic. At the same time, the HiCar application 15 ends this round of voice recognition, exits the voice recognition function of Smart Voice, the voice ball icon 10 returns to the static state as shown in (c) in FIG. 6 , wakes up the window 602-1 and the recommended voice command etc. to disappear.

In another possible implementation, in the process of adding a digital corner mark, a digital corner mark may be added to some controls in one or more controls on the current interface according to certain principles, which can increase all the digital corner marks. Said part of the controls may include all the controls identified as picture type in one or more controls of the current interface; or identified as controls with grid-type arrangement order in one or more controls of the current interface; or identified as the current interface One or more of the controls in the list have a list-type arrangement order; or one or more controls in the current interface are identified and the display size is greater than or equal to the preset value.

In yet another possible implementation manner, during the process of adding a digital subscript, the numeric subscript may be added to some controls in the one or more controls according to a preset sequence. Optionally, the preset order includes a left-to-right and/or a top-to-bottom order.

In yet another possible implementation, in the process of acquiring the controls included in the interface, if the controls include icon controls, the outline of the icon controls can be acquired, and the outline keywords describing the icon controls can be determined according to the outlines; The matching degree between one or more keywords included in the voice instruction and the outline keyword of the icon control is determined, and the icon control with the largest matching degree is determined as the target control.

In a possible implementation manner, in the process of matching the recognized control information with the voice command text of the smart voice recognition, strong matching is given priority, that is, the control information and the voice command text of the smart voice recognition need to be in one-to-one correspondence. If the strong match is unsuccessful, a weak match is performed, that is, it is judged whether the control information contains the voice command text of the intelligent voice recognition. As long as it contains part of the voice command text of the intelligent voice recognition, it is judged that the matching is successful, and the control corresponding to the control information is determined. Perform a click action.

Through the above method, in this embodiment of the present application, a digital corner mark is added to the clickable controls such as pictures and application icons displayed on the interface, and the user can issue a voice command including numbers, and the digital corner mark is executed through the voice command. Control click operation, etc. When the user sees the digital corner mark on the interface, he sends out a voice command including a number, and converts the voice command including a number through voice recognition, so as to determine the picture, application icon and other controls corresponding to the number that can be clicked, and execute the click operation. . In this process, the user does not need to memorize a variety of complex voice commands, and only realizes the voice interaction process through digital voice commands, which is simpler and more convenient, reduces the difficulty of voice interaction, and improves user experience.

4 to 6 , the process of user voice interaction for audio applications such as music has been described above. In addition, the embodiments of the present application can also be applied to navigation applications. The following describes the navigation applications in conjunction with FIG. 7 . The process of implementing voice interaction in an application.

It should be understood that for the in-vehicle device 103, the navigation menu area 401 of the screen display system displays navigation menus such as home page, navigation, phone and music, and switching between different navigation menus can also be controlled by the user's voice commands.

Exemplarily, the process of jumping from the music interface shown in (c) of FIG. 6 to the navigation interface from the screen display interface of the in-vehicle device 103 can also be implemented by voice commands. Specifically, as shown in (a) of FIG. 7 , the screen display system of the in-vehicle device 103 displays the currently output interface 701 . In the content area of the interface 701, song 1 is displayed in a playing state, song 2, song 3, song 4 and song 5 are all in a paused state, and a pause button 30 is displayed.

As shown in (a) of FIG. 7 , on the display interface 701 of the in-vehicle device 103 , the user clicks the voice ball icon 10 to enable the voice monitoring function of the in-vehicle device 103 , and the display screen of the in-vehicle device 103 displays as shown in FIG. 7 (b) The wake-up window 702-1 is shown in the figure. The wake-up window 702-1 may include the voice recognition icon 40, recommended voice commands such as "start search" and "next page".

It should be understood that the voice commands recommended in the wake-up window 702-1 may be different from the wake-up window 403-1 shown in (b) of FIG. 4 and the wake-up window 502- shown in (b) of FIG. 5. 1 and (c) the recommended voice commands displayed in the wake-up window 503-1, etc. shown in Figures 1 and (c), the recommended voice commands displayed in the wake-up window can follow the display content on the current interface to make corresponding changes, and display the same as the one on the current interface. Voice commands related to the displayed content, or voice commands not related to the displayed content on the current interface may also be displayed, which is not limited in this embodiment of the present application.

Exemplarily, as shown in (b) of FIG. 7 , when the user inputs a voice instruction of “enter the dialogue mode, open the navigation”, after the in-vehicle device 103 obtains the user’s instruction, the voice instruction can be sent to the mobile phone 102. , the mobile phone 102 recognizes the user's voice command, and in response to the voice command, enables the voice interaction function between the in-vehicle device 103 and the user, that is, the in-vehicle device 103 is always in the state of monitoring the voice command, and the user does not need to activate the in-vehicle device 103 multiple times to monitor Get the user's voice command.

In addition, in response to the voice instruction, as shown in (c) of FIG. 7 , the display interface of the display screen of the in-vehicle device 103 can jump from the music menu to the interface 703 of the navigation menu. On the interface 703 of the navigation menu, the user can be provided with various types of search options including "food", "gas station", "shopping mall", etc. shown in the right area. The interface content of the interface 703 of the navigation menu is not repeated here.

In a possible implementation manner, after the in-vehicle device 103 obtains and executes the user's instruction once, the wake-up window may disappear briefly, and the user's voice instruction is monitored in the background. When the voice command issued by the user is detected again, it can be suspended and displayed on the display screen again.

Exemplarily, as shown in (d) of FIG. 7 , the user starts to issue a voice command, and the wake-up window 704-1 appears. Optionally, the recommended instruction displayed in the wake-up window 704-1 may be adapted to the current interface content, or the recommended instruction may be associated with historical data with the highest search frequency when the user uses the navigation application. For example, the wake-up window 704-1 may include the voice recognition icon 40, and recommended voice commands such as “navigate to the company” and “navigate to the mall”, which are not limited in this embodiment of the present application.

When the user inputs the voice command of "search for food", after the in-vehicle device 103 obtains the user's command, it can send the voice command to the mobile phone 102, and the mobile phone 102 recognizes the user's voice command, and in response to the voice command, simulates clicking as shown in Figure 7 The "food" option on the interface 704 shown in (d) of FIG. 7 is displayed for the user, and the search result interface 705 shown in (e) of FIG. 7 is displayed for the user. Exemplarily, multiple searched restaurants are displayed on the interface 705, and the restaurants can be sorted according to the distance from the user's current location, and the per capita unit price and distance of the restaurant are displayed for the user, which is not limited in this embodiment of the present application. .

Optionally, as shown in (e) of FIG. 7 , the recommended instruction displayed in the wake-up window 705-1 displayed on the interface 705 can be re-adapted to the current interface content. For example, the wake-up window 705-1 can include voice recognition. Icon 40, "start search", "next page" and other recommended voice commands, which are not limited in this embodiment of the present application.

When the user inputs the voice command of "next page", after the in-vehicle device 103 obtains the user's command, in response to the voice command, a search result interface 706 as shown in (f) of FIG. 7 is displayed for the user. It should be understood that the interface 706 is an interface displayed after performing a swipe on the interface 705 as indicated by the black arrow. When the user selects the target restaurant as "5.XX light food restaurant", he can continue to input the voice command of "navigate to 5". After the in-vehicle device 103 obtains the user's command, in response to the voice command, the user can display the display as shown in the figure below. The navigation route interface 707 shown in (g) in Figure 7, the interface 707 includes the route and distance to the 5.XX light food restaurant, etc., which is not limited in this embodiment of the present application.

In a possible implementation manner, as shown in (f) of FIG. 7 , the user's voice command includes "5". After the in-vehicle device 103 obtains the user's command, the voice command can be sent to the mobile phone 102. 102 may perform interface matching according to the instruction, that is, intercept the keyword in the instruction, and match it with the keyword or description information contained in the controls on all interfaces on the current interface. Exemplarily, for example, the keywords of the user instruction are "navigation" and "5", and the mobile phone detects that the keywords of the option "5.XX light food restaurant" on the interface are "5", "light food restaurant", etc., the user instruction The matching degree with this option is the highest, so click the "5.XX light food restaurant" option on the interface 706 to be executed, and the interface 707 as shown in (g) in FIG. 7 is displayed.

The above method obtains the text controls, picture controls, buttons, and icon controls that are visible on the interface and can be clicked by the user, and then matches the target controls on the interface according to the obtained user voice commands, and executes the matching on the interface. The target control's click and other operations.

Combined with the control types listed above, Table 2 shows several common controls on pages of navigation applications. As shown in Table 2 below, for navigation applications such as Baidu Map and AutoNavi Map that are commonly used by users, different pages may include different controls, as well as the number and types of controls included in the primary and secondary pages of each application. all different.

Exemplarily, taking the Baidu map application as an example, the first-level page can be understood as the main interface of Baidu map entered by the user after clicking the Baidu map application icon, including "zoom in", "zoom out", "positioning", "road conditions", "Search", "More", "Exit" and other controls, the second-level interface is the next-level page that the user clicks on any menu or control on the main interface of Baidu Maps to enter, such as the route preference setting page, etc. In the embodiment of the present application, the page content and controls on each page can be acquired by the mobile phone, and the text information included in each control can be recognized, which will not be repeated here.

Table 2

The above describes the voice interaction process provided by the embodiments of the present application in combination with audio applications and navigation applications. It should be understood that the embodiments of the present application can identify page contents, controls, etc. of different pages of different applications listed above, and the Can be generic command control.

It should be understood that the general instruction controls may include controls on the interface, such as return, turn left/turn right, turn up/down, page up/page down, and the like.

Exemplarily, after the user enables the voice recognition function, when the intelligent voice recognizes the above-mentioned general instruction text, the text is sent to Visible and Talkable. For the return instruction, through the inject key event method of the system, the click event of the return key (key event) is sent to the application to which the current interface belongs, and the application to which the current interface belongs will receive the corresponding return event by monitoring the return key event. Process the return business.

For controls such as flip left or right, flip up or down, previous page or next page, the corresponding sliding list control is identified through the interface control returned by the content sensor. When analyzing the direction of sliding, call the sliding method of the control itself, such as the scrollBy sliding method of RecyclerView, to implement up and down sliding. To implement left and right sliding, it is based on whether the control itself supports the feature of left and right sliding. When the control supports left and right sliding, the distance moved in the horizontal direction is passed in the scrollBy sliding method called, and the positive and negative values are used to judge left or right sliding. . When the control supports sliding up and down, the distance moved in the vertical direction is passed in, and the positive and negative values are used to determine whether to slide up or down. The implementation process of this general command control will not be repeated here. FIG. 8 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application. Switching between different navigation menus can also be controlled by a user's voice command.

Exemplarily, FIG. 8 shows the process that the screen display interface of the in-vehicle device 103 jumps from the navigation route interface shown in (g) in FIG. 7 to the phone menu, and this process can also be implemented by the user's voice command. . Specifically, as shown in (a) of FIG. 8 , the screen display system of the in-vehicle device 103 displays the currently output navigation route interface 801 . The wake-up window on the interface 801 displays the voice recognition icon 40, recommended voice commands such as "exit navigation" and "search".

When the user inputs the voice command of "opening the phone book", after the in-vehicle device 103 obtains the user's command, in response to the voice command, a phone application interface 802 as shown in (b) in FIG. 8 is displayed for the user. 802 may include submenus such as call records, contacts, and dialing, and the interface 802 currently displays content such as the user's call records, which will not be repeated here.

Through the above-mentioned embodiments, by obtaining the controls displayed on the interface that are visible and that can be clicked by the user, the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands. In particular, in the driving scene, the manual operation of the user is reduced, thereby avoiding user distraction and improving the safety of the user in the driving scene.

The display process of the voice interaction method provided by the embodiment of the present application on the interface is described with reference to the above-mentioned embodiments and the related drawings. Taking the scenarios of the mobile phone 102 and the in-vehicle device 103 shown in the above-mentioned drawings as an example, the following description will be introduced in conjunction with FIG. 9 . The specific implementation process of the voice interaction method.

FIG. 9 is a schematic flowchart of a method for voice interaction provided by an embodiment of the present application. As shown in FIG. 9 , the method 900 may include the following steps:

901, the user opens the first application.

Optionally, the first application may be an application actually running on the side of the mobile phone 102 , for example, an application running in the foreground or an application running in the background of the mobile phone 102 .

It should be understood that the execution of this step 901 can be performed by the user on the side of the in-vehicle device 103, and transmitted back to the mobile phone 102 by the in-vehicle device 103 to start the first application in the background of the mobile phone 102, or the user can perform it on the side of the mobile phone 102 to directly cast the screen It is displayed on the display screen of the in-vehicle device 103, which is not limited in this embodiment of the present application.

902, the first application performs interface refresh.

903, trigger interface identification. Optionally, performing interface refresh by the first application may trigger the mobile phone 102 to perform interface identification through an algorithm service.

904 , the mobile phone 102 performs interface hot word recognition to obtain information of the interface content. It should be understood that the time delay of the interface hot word recognition process in this 904 is less than 500 milliseconds.

Optionally, the interface content may include user-visible portions of the currently displayed interface. Exemplarily, the user-visible part may include pictures, text, menus, options, icons, buttons, etc. displayed on the interface, which are collectively referred to as "controls" and the like in this embodiment of the present application.

905. The user activates the voice recognition function.

It should be understood that the user can activate the voice recognition function on the in-vehicle device 103 . In this embodiment of the present application, starting the voice recognition function may be starting the vehicle-mounted device 103 to start monitoring the user's voice command; The command is transmitted back to the mobile phone 102, and the mobile phone 102 analyzes the voice command, etc., which is not limited in this embodiment of the present application.

Optionally, the user can activate the voice recognition function through a physical button of the vehicle-mounted device or through voice.

In a possible implementation manner, the display interface of the in-vehicle device 103 may also include a voice ball icon, as shown in (a) in FIG. Function. Optionally, in response to the user's click operation, the in-vehicle device 103 may display a wake-up window 403-1 as shown in (b) of FIG. 4 , which will not be repeated here.

In another possible implementation manner, the user can also turn on the voice monitoring function by pressing the car control voice button of the car, for example, the user presses the car control voice button 50 on the steering wheel as shown in (b) in FIG. 1 to turn on the voice monitoring function. The in-vehicle device 103 has a function of monitoring and acquiring a user's voice command, which is not limited in this embodiment of the present application.

906. Trigger the HiCar application of the mobile phone to request to obtain the information of the interface content.

907. Return the information of the interface content to the HiCar application of the mobile phone.

908, the HiCar application of the mobile phone transmits the acquired information of the interface content to the smart voice service module.

In a possible implementation manner, in this embodiment of the present application, the smart voice service module may correspond to a smart voice application installed on the mobile phone 102, that is, the smart voice application of the mobile phone 102 executes the service process provided in FIG. 9 .

In another possible implementation manner, the service corresponding to the smart voice service module may be provided by the server, and this scenario may correspond to (c) in FIG. 1. With the help of the voice analysis capability of the server 104, the mobile phone 102 will The user's voice command is sent to the server 104, and after the server 104 analyzes the voice command, it returns the recognition result of the voice command to the mobile phone 102, which will not be repeated here.

909. The user inputs a voice command.

910, the mobile phone sends the voice command to the smart voice service module.

Optionally, the process of steps 909 and 910 may be that the user inputs a voice command on the side of the in-vehicle device 103, and after the microphone of the in-vehicle device 103 obtains the user's voice command, the voice command is sent to the HiCar application of the mobile phone, and then via the HiCar of the mobile phone. The application is passed to the smart voice service module, and the smart voice service module analyzes the user's voice commands.

911 , the smart voice service module transmits the acquired information of the user's voice command and interface content to the ASR module.

912, the ASR module enhances and recognizes the user's voice instruction according to the information of the interface content.

913, the ASR module returns the analyzed voice command text to the HiCar application of the mobile phone.

914, the HiCar application of the mobile phone sends a voice command text to the algorithm service module.

915, the mobile phone uses a certain algorithm service to match the text of the voice command with the information of the current interface content to determine the matching result.

916, return the matching result to the HiCar application of the mobile phone.

917. Return the simulated click instruction to the first application of the mobile phone, and the first application executes the simulated click operation.

918, and the interface after performing the click operation is fed back to the display screen of the in-vehicle device 103 and provided to the user.

919. Determine an operation result after the first application performs the operation.

920. Return the operation result to the HiCar application of the mobile phone.

At the same time, the smart voice service module can perform steps 914-1 to 919-1 shown by the dotted box in FIG. 9:

914-1, the NLU module of the smart voice service can also obtain the text of the voice command.

915-1, the NLU module of the smart voice service performs intention recognition according to the voice command text, and determines the user's intention corresponding to the voice command text.

916-1, return user intent.

917-1, send user intent to DM module.

918-1, the DM module may, according to the returned user intent, perform the intent processing and determine the user intent of the user's voice command.

919-1 and 921, the smart voice service module returns the user intent to the HiCar application of the mobile phone. It should be understood that steps 914-1 to 919-1 shown in the dotted box may be optional steps, and this process can be understood as accurately analyzing the user's intention with the help of a powerful speech recognition function such as a server, and responding on the mobile phone side Based on the user's voice command and the returned user intent, the HiCar application of the mobile phone determines whether to execute the operation corresponding to the user's intent, which improves the accuracy of voice command recognition.

After the user's intention, that is, the target control to be clicked by the user, is determined through the user's voice command, the following process can be continued:

922, the HiCar application of the mobile phone determines whether to execute the operation corresponding to the user's intention.

923. When the HiCar application of the mobile phone determines that the operation corresponding to the user's intention is not to be performed, a notification message that the user's instruction is not to be performed is sent to the smart voice service module.

924. The smart voice service module ends the current conversation according to the notification message that the user instruction is not executed.

925 , notify the DM module to end the current dialogue mode, that is, not to continuously acquire the user's voice command and the like.

It can be understood that, in order to realize the above-mentioned functions, the electronic device includes corresponding hardware and/or software modules for executing each function. The present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.

In this embodiment, the electronic device can be divided into functional modules according to the above method examples. For example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware. It should be noted that, the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division manners in actual implementation.

In the case where each functional module is divided according to each function, the electronic device involved in the above embodiment may include: a display unit, a detection unit, and a processing unit.

The display unit, the detection unit, and the processing unit cooperate with each other, and may be used to support the electronic device to perform the technical process described in the above embodiments.

It should be noted that, all relevant contents of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.

The electronic device provided in this embodiment is used to execute the above-mentioned method for human-computer interaction, and thus can achieve the same effect as the above-mentioned implementation method.

Where an integrated unit is employed, the electronic device may include a processing module, a memory module and a communication module. The processing module may be used to control and manage the actions of the electronic device, for example, may be used to support the electronic device to perform the steps performed by the display unit, the detection unit and the processing unit. The storage module may be used to support the electronic device to execute stored program codes and data, and the like. The communication module can be used to support the communication between the electronic device and other devices.

The processing module may be a processor or a controller. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, and the like. The storage module may be a memory. The communication module may specifically be a device that interacts with other electronic devices, such as a radio frequency circuit, a Bluetooth chip, and a Wi-Fi chip.

In one embodiment, when the processing module is a processor and the storage module is a memory, the electronic device involved in this embodiment may be a device having the structure shown in FIG. 2 .

This embodiment also provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the electronic device, the electronic device executes the above-mentioned related method steps to realize the above-mentioned embodiments. methods of human-computer interaction.

This embodiment also provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned relevant steps, so as to realize the method for human-computer interaction in the above-mentioned embodiment.

In addition, the embodiments of the present application also provide an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, and when the apparatus is running, The processor can execute the computer-executed instructions stored in the memory, so that the chip executes the method for human-computer interaction in the foregoing method embodiments.

Wherein, the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.

From the description of the above embodiments, those skilled in the art can understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated by different The function module is completed, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or May be integrated into another device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

Units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

The above content is only a specific embodiment of the present application, but the protection scope of the present application is not limited to this. Covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A method for human-computer interaction, wherein the method is applied to an electronic device, and the method comprises:

In the process of running the human-computer interaction application in the electronic device, obtain current interface content information;

According to the interface content information, determine one or more controls on the interface, where the one or more controls include one or more of buttons, icons, pictures, and text;

Get the user's voice command;

According to the voice command, from the one or more controls, matching a target control;

And, according to the voice instruction, a user intent is determined, and an operation on the target control is performed in response to the user intent.
The method according to claim 1, wherein the matching target control from the one or more controls according to the voice instruction comprises:

According to the voice command, determining the degree of matching between the voice command and each of the one or more controls;

The control with the highest matching degree is determined as the target control.
The method according to claim 2, wherein the determining, according to the voice command, a degree of matching between the voice command and each of the one or more controls comprises:

extracting one or more keywords contained in the voice instruction;

determining the degree of matching between each of the one or more keywords and the description information of each of the one or more controls;

The control with the highest matching degree is determined as the target control.
The method according to any one of claims 1 to 3, wherein when the one or more controls includes an icon control, the method further comprises:

Obtain the outline of the icon control, and determine the outline keyword describing the icon control according to the outline;

determining the degree of matching between one or more keywords included in the voice instruction and the outline keywords of the icon control;

The icon control with the highest matching degree is determined as the target control.
The method according to any one of claims 1 to 3, wherein the method further comprises:

When the voice command is detected, a digital corner mark is added to some controls in the one or more controls;

When it is detected that the voice instruction includes a first number, the control marked by the first number is determined as the target control.
The method according to claim 5, wherein the adding a digital corner mark to some of the one or more controls comprises:

The numerical subscripts are added to some of the one or more controls according to a preset sequence, where the preset sequence includes a left-to-right and/or a top-to-bottom sequence.
The method according to claim 5 or 6, wherein the part of the controls capable of adding a digital subscript includes one or more of the following:

All of the one or more controls are picture-type controls; or

A control in the one or more controls having a grid-type arrangement order; or

A control with a list-type arrangement order among the one or more controls; or

Controls whose size is greater than or equal to a preset value are displayed in the one or more controls.
The method according to any one of claims 1 to 7, wherein the interface corresponding to the interface content information is the interface of an application running in the foreground of the electronic device, and/or the interface of the electronic device The interface of the app running in the background.
The method according to any one of claims 1 to 8, wherein the method further comprises:

The human-computer interaction application on the electronic device is started.
The method according to claim 9, wherein the starting the human-computer interaction application on the electronic device comprises:

Obtain the user's preset input, start the human-computer interaction application on the electronic device, and the preset input includes an operation to trigger a button, a preset human-computer interaction command of voice input, or a preset fingerprint input. at least one way.
An electronic device, comprising:

one or more processors;

one or more memories;

Modules with multiple applications installed;

The memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the following steps:

During the running process of the human-computer interaction application, obtain the current interface content information;

According to the interface content information, determine one or more controls on the interface, where the one or more controls include one or more of buttons, icons, pictures, and text;

Get the user's voice command;

According to the voice command, from the one or more controls, matching a target control;

And, according to the voice instruction, a user intent is determined, and an operation on the target control is performed in response to the user intent.
The electronic device according to claim 11, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:

According to the voice command, determining the degree of matching between the voice command and each of the one or more controls;

The control with the highest matching degree is determined as the target control.
The electronic device according to claim 12, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:

extracting one or more keywords contained in the voice instruction;

determining the degree of matching between each of the one or more keywords and the description information of each of the one or more controls;

The control with the highest matching degree is determined as the target control.
The electronic device according to any one of claims 11 to 13, wherein the one or more controls include icon controls, and when the one or more programs are executed by the processor, all The described electronic device performs the following steps:

Obtain the outline of the icon control, and determine the outline keyword describing the icon control according to the outline;

determining the degree of matching between one or more keywords included in the voice instruction and the outline keywords of the icon control;

The icon control with the highest matching degree is determined as the target control.
The electronic device according to any one of claims 11 to 13, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:

When the voice command is detected, a digital corner mark is added to some controls in the one or more controls;

When it is detected that the voice instruction includes a first number, the control marked by the first number is determined as the target control.
The electronic device according to claim 15, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:

The numerical subscripts are added to some of the one or more controls according to a preset sequence, where the preset sequence includes a left-to-right and/or a top-to-bottom sequence.
The electronic device according to claim 15 or 16, wherein the part of the controls capable of adding a digital subscript includes one or more of the following:

All of the one or more controls are picture-type controls; or

A control in the one or more controls having a grid-type arrangement order; or

A control with a list-type arrangement order among the one or more controls; or

Controls whose size is greater than or equal to a preset value are displayed in the one or more controls.
The electronic device according to any one of claims 11 to 17, wherein the interface corresponding to the interface content information is an interface of an application running in the foreground of the electronic device, and/or an interface in the electronic device The interface of the application running in the background on the device.
The electronic device according to any one of claims 11 to 18, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:

Start the human-computer interaction application.
The electronic device according to claim 19, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:

The user's preset input is acquired, and the human-computer interaction application is started, and the preset input includes at least one mode of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions that, when the computer instructions are executed on an electronic device, cause the electronic device to perform any one of claims 1 to 10. one of the methods described.