WO2022052776A1 - Human-computer interaction method, and electronic device and system - Google Patents

Human-computer interaction method, and electronic device and system Download PDF

Info

Publication number
WO2022052776A1
WO2022052776A1 PCT/CN2021/113542 CN2021113542W WO2022052776A1 WO 2022052776 A1 WO2022052776 A1 WO 2022052776A1 CN 2021113542 W CN2021113542 W CN 2021113542W WO 2022052776 A1 WO2022052776 A1 WO 2022052776A1
Authority
WO
WIPO (PCT)
Prior art keywords
controls
interface
electronic device
user
voice
Prior art date
Application number
PCT/CN2021/113542
Other languages
French (fr)
Chinese (zh)
Inventor
祝振凱
张乐乐
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022052776A1 publication Critical patent/WO2022052776A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04817Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present application relates to the field of electronic technology, and in particular, to a method, electronic device and system for human-computer interaction.
  • the voice assistant is not integrated in the application, and the user cannot control the operations in the application through voice commands.
  • audio applications such as music, or media applications such as videos do not have the ability to interact with the user by voice, and the user cannot control the execution of such applications through voice commands.
  • the voice assistant of an electronic device is separated from the application, and it is impossible for different applications to access the same voice assistant.
  • the embodiments of the present application will provide a human-computer interaction method, electronic device, and system, which can realize system-level voice interaction, for all applications displayed on the interface, all visible buttons, pictures, icons, text, controls, etc. , users can click and other operations through voice commands to achieve precise human-computer interaction, generalize the recognition of voice commands, and improve the accuracy of user intent recognition.
  • a human-computer interaction method is provided, the method is applied to an electronic device, and the method includes: acquiring current interface content information during the running process of the human-computer interaction application in the electronic device; according to the interface Content information, determine one or more controls on the interface, the one or more controls include one or more of buttons, icons, pictures, and text; obtain the user's voice command; according to the voice command, from the one or more Among the plurality of controls, a target control is matched; and, according to the voice instruction, a user's intention is determined, and an operation on the target control is performed in response to the user's intention.
  • the interface content may include a user-visible portion of the currently displayed interface.
  • the user-visible part may include pictures, text, menus, options, icons, buttons, etc. displayed on the interface, which are collectively referred to as "controls" and the like in this embodiment of the present application.
  • an operation may be performed on the target control.
  • the operation may include input operations such as clicking, clicking, double-clicking, sliding, and right-clicking.
  • the voice command is matched with the target control on the interface, that is, the user's intention is recognized, and the click operation on the target control is further performed.
  • the method obtains the controls displayed on the interface that are visible and that can be clicked by the user, and then the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands.
  • the acquired interface content information of the current interface is used as the parameter of the ASR analysis, that is, according to the interface content information of the current interface, it is accurately analyzed that the current user's voice command may occur.
  • the text of the recognized voice command is matched with the controls in the current possible application scenario, so as to obtain the user's intention more accurately and improve the voice interaction scenario. Accuracy of speech recognition.
  • the embodiment of the present application can analyze the user's voice command in combination with the current application scenario where the user's voice command may occur.
  • the accuracy of speech recognition is improved, thereby reducing user manual operations, thereby avoiding user distraction, and improving user safety in driving scenarios.
  • the text controls, pictures, text and icons included in the interface are identified, and the user's voice commands are matched to the controls of the screen content to achieve precise human-computer interaction.
  • the recognition of voice commands is generalized, and the accuracy of user intent recognition and ASR recognition is improved; in addition, the delay of voice interaction is reduced, so that the processing delay of visible and speaking intent is within 200ms. Recognizing voice commands, the delay is 200ms, which greatly improves the detection efficiency of voice commands and improves the user experience.
  • matching the target control from the one or more controls according to the voice command includes: determining the voice command and The degree of matching of each of the one or more controls; the control with the greatest degree of matching is determined as the target control.
  • the smart voice service module may correspond to the smart voice application installed on the mobile phone side, that is, the smart voice application of the mobile phone performs the voice command recognition service of the embodiment of the present application. Process.
  • the service corresponding to the smart voice service module can be provided by the server.
  • the mobile phone can send the user's voice command to the server with the help of the server's voice analysis capability, and the server can analyze the voice command. After that, the recognition result of the voice command of the mobile phone is returned, which will not be repeated here.
  • determining the matching degree of the voice command and each of the one or more controls includes: extracting the voice command One or more keywords included in the instruction; determine the matching degree of each keyword in the one or more keywords and the description information of each control in the one or more controls; the control with the largest matching degree Determined as the target control.
  • the method further includes: acquiring an outline of the icon control, and determining a description according to the outline The outline keyword of the icon control; determining the matching degree of one or more keywords included in the voice instruction and the outline keyword of the icon control; determining the icon control with the largest matching degree as the target control.
  • the acquired information of the currently displayed interface content can be transferred to the ASR module in synchronization with step 908, that is, the information of the interface content is entered in the ASR model.
  • the user's voice command is recognized according to the updated ASR model.
  • the user's voice command may include homophones, etc., for example, the user inputs "variety show", which is affected by the pronunciation of different users, and may be analyzed by the ASR module.
  • Such homophones and words with similar pinyin may cause the mobile phone to fail to accurately obtain the user's operation intention through the user's voice command.
  • the current interface displays a lot of audio information, star photos, video information, etc.
  • the ASR module analyzes, it will select from “Zhongyi”, “Traditional Chinese Medicine”, From the possible identification results such as “loyalty” and “variety show”, select "variety show” that is more relevant to the currently displayed audio information, star photos, video information, etc., and then determine that the voice command issued by the current user is "variety show”.
  • the voice command of the existing ASR module is the information of the currently displayed interface content introduced into the process, so that the use of the user's current voice command can be accurately analyzed according to the information of the currently displayed interface content. scene, and then accurately locate the application scene targeted by the current user's voice command, and improve the accuracy of recognizing the voice command.
  • the above process can be understood as a process of determining the user's intention according to the voice command after acquiring the user's voice command, for example, determining which control on the current interface is to be clicked by the voice command currently input by the user.
  • the matching degree of the voice command and each of the one or more controls is determined, and the control with the largest matching degree is determined as the user to perform the click.
  • the target control for the operation is determined.
  • one or more keywords contained in the user's voice command may be extracted; the one or more keywords may be determined.
  • the keywords may include words, words, pinyin of part or all of the Chinese characters of the voice command, etc., which are not limited in this embodiment of the present application.
  • the description information of each control may include outline information, text information, color information, position information, icon information, etc. of the control, which is not limited in this embodiment of the present application.
  • the music playing interface when the user inputs the voice command as "I like", during the matching process between the voice command and the controls included in the interface, if the description words of the favorite button on the music playing interface are "like", “favorite” , the outline of the favorite button is the shape of "peach heart”, then the shape of the peach heart can be matched as "I like", this method can generalize the user's voice command, and more intelligently combine the user's command with the controls on the interface match.
  • the method further includes: when the voice command is detected, adding a digital corner label to some controls in the one or more controls; When it is detected that the voice instruction includes the first number, the control marked with the first number is determined as the target control.
  • adding a digital corner label to some of the one or more controls includes: according to a preset order, in the one or more controls.
  • the number label is added to some of the controls, and the preset order includes the order from left to right and/or from top to bottom.
  • the part of the controls that can add a digital corner mark includes one or more of the following: all of the one or more controls are picture-type controls ; or the one or more controls have a grid-type arrangement order; or the one or more controls have a list-type arrangement order; or the one or more controls have a display size greater than or equal to the preset value 's controls.
  • the interface corresponding to the interface content information is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device The interface of the running application.
  • the method further includes: starting the human-computer interaction application on the electronic device.
  • starting the human-computer interaction application on the electronic device includes: obtaining a user's preset input, and starting the human-computer interaction application on the electronic device
  • the preset input includes at least one of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.
  • an electronic device comprising: one or more processors; one or more memories; a module installed with a plurality of application programs; the memory stores one or more programs, when the one or more
  • the electronic device is made to perform the following steps: in the process of running the human-computer interaction application, obtain the current interface content information; according to the interface content information, determine one or more controls on the interface, The one or more controls include one or more of buttons, icons, pictures, and text; obtain a user's voice command; according to the voice command, match the target control from the one or more controls; and, according to the The voice command determines the user's intention, and in response to the user's intention, performs an operation on the target control.
  • the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: according to the voice instruction, determine the voice instruction and the one The matching degree of each control in or multiple controls; the control with the largest matching degree is determined as the target control.
  • the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: extracting one or more components included in the voice instruction Multiple keywords; determine the matching degree of each keyword in the one or more keywords and the description information of each control in the one or more controls; determine the control with the largest matching degree as the target control .
  • the one or more controls include icon controls, and when the one or more programs are executed by the processor, the electronic device is made to execute Following steps: obtain the outline of the icon control, determine the outline keyword describing the icon control according to the outline; determine the matching degree of one or more keywords included in the voice command and the outline keyword of the icon control; The icon control with the highest matching degree is determined as the target control.
  • the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: when the voice instruction is detected, in A digital corner mark is added to some of the one or more controls; when it is detected that the voice instruction includes a first number, the control marked with the first number is determined as the target control.
  • the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: in a preset order, in the The numerical superscript is added to some of the one or more controls, and the preset order includes the order from left to right and/or from top to bottom.
  • the part of the controls that can add digital corner labels includes one or more of the following: all of the one or more controls are picture-type controls ; or the one or more controls have a grid-type arrangement order; or the one or more controls have a list-type arrangement order; or the one or more controls have a display size greater than or equal to the preset value 's controls.
  • the interface corresponding to the interface content information is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device.
  • the interface of the running application is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device.
  • the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: start the human-computer interaction application.
  • the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: acquiring a user's preset input, starting In the human-computer interaction application, the preset input includes at least one mode of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.
  • the present application provides a system, the system includes a connected electronic device and a display device, the electronic device can perform any one of the possible human-computer interaction methods in the first aspect above, and the display device is used for displaying The application interface of the electronic device.
  • the present application provides an apparatus, the apparatus is included in an electronic device, and the apparatus has a function of implementing the behavior of the electronic device in the above-mentioned aspect and possible implementations of the above-mentioned aspect.
  • the functions can be implemented by hardware, or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules or units corresponding to the above functions. For example, a display module or unit, a detection module or unit, a processing module or unit, and the like.
  • the present application provides an electronic device, comprising: a touch display screen, wherein the touch display screen includes a touch-sensitive surface and a display; a positioning chip; one or more cameras; one or more processors; a plurality of memory; a plurality of application programs; and one or more computer programs.
  • one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. The instructions, when executed by one or more processors, cause an electronic device to perform any of the possible human-computer interaction methods described above.
  • the present application provides an electronic device including one or more processors and one or more memories.
  • the one or more memories are coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform A method for human-computer interaction in any possible implementation of any of the above aspects.
  • the present application provides a computer storage medium, including computer instructions, when the computer instructions are executed on an electronic device, the electronic device can perform any of the possible human-computer interaction methods in any of the foregoing aspects.
  • the present application provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to perform any of the possible human-computer interaction methods in any of the foregoing aspects.
  • FIG. 1 is a schematic diagram of an application scenario of an example of a method for human-computer interaction provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of an example of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a software structural block diagram of an implementation process of an example of a method for human-computer interaction according to an embodiment of the present application.
  • FIG. 4 is a schematic interface diagram of an example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
  • FIG. 5 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application.
  • FIG. 6 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
  • FIG. 7 is a schematic interface diagram of another example of a voice interaction process implemented on a vehicle-mounted device provided by an embodiment of the present application.
  • FIG. 8 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a voice interaction method provided by an embodiment of the present application.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
  • a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • the embodiments of the present application will provide a human-computer interaction method.
  • the following describes in detail how to implement system-level voice interaction through the human-computer interaction method with reference to the accompanying drawings and different embodiments.
  • FIG. 1 is a schematic diagram of an application scenario of an example of a method for human-computer interaction provided by an embodiment of the present application.
  • the human-computer interaction method provided by the embodiments of the present application may be applied to a scenario including a separate electronic device.
  • the smart screen 101 is used as the electronic device, and the human-computer interaction method is applied to a scenario where a user uses the smart screen 101 .
  • the smart screen 101 can acquire the user's voice command through the microphone, recognize the voice command, perform corresponding operations according to the user's voice command, display a corresponding interface, and the like.
  • the human-computer interaction method provided by the embodiments of the present application may also be applied to a scenario including two electronic devices, and the two electronic devices in the scenario may include a mobile phone, a tablet computer, and a wearable device. , vehicle equipment and other different types of electronic equipment.
  • the in-vehicle device 103 can be used as a display device, connected to the mobile phone 102 to display and run the mobile phone 102 .
  • the mobile phone 102 can acquire the user's voice command, recognize the voice command, and perform the corresponding operation in the background according to the user's voice command, and then display the screen after the corresponding operation is performed on the in-vehicle device 103 .
  • the in-vehicle device 103 can also obtain the user's voice command, and transmit the voice command to the mobile phone 102, the mobile phone recognizes the voice command, and performs the corresponding operation in the background according to the user's voice command, and then executes the corresponding operation.
  • the interface projection screen is displayed on the in-vehicle device 103 .
  • the human-computer interaction method provided in the embodiments of the present application may also be applied to a scenario including at least one electronic device and a server.
  • a scenario including at least one electronic device and a server Exemplarily, as shown in (c) of FIG. 1 , in the scenario including the mobile phone 102, the in-vehicle device 103 and the server 104, the mobile phone 102 or the in-vehicle device 103 can obtain the user's voice command, and then convert the user's voice command to the user's voice command. Upload the data to the server 104, analyze the user's voice command more quickly and accurately with the help of the voice analysis capability of the server 104, and then transmit the analyzed voice command result back to the mobile phone 102, and perform corresponding operations on the mobile phone.
  • the method for human-computer interaction can be applied to mobile phones, smart screens, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (virtual reality, VR) devices, notebook computers, ultra-mobile personal computers (ultra-mobile personal computers, UMPCs), netbooks, personal digital assistants (personal digital assistants, PDAs) and other electronic devices, the embodiments of the present application do not make any specific types of electronic devices. limit.
  • AR augmented reality
  • VR virtual reality
  • UMPCs ultra-mobile personal computers
  • netbooks personal digital assistants
  • PDAs personal digital assistants
  • the smart screen 101 , the mobile phone 102 , and the in-vehicle device 103 listed in FIG. 1 are collectively referred to as “electronic device 100 ”, and possible structures of the electronic device 100 are described below.
  • FIG. 2 is a schematic structural diagram of an example of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may contain multiple sets of I2C buses.
  • the processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • the processor 110 may contain multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is typically used to connect the processor 110 with the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
  • the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 .
  • the processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
  • the GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • global positioning system global positioning system, GPS
  • global navigation satellite system global navigation satellite system, GLONASS
  • Beidou navigation satellite system beidou navigation satellite system, BDS
  • quasi-zenith satellite system quadsi -zenith satellite system, QZSS
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 Moving Picture Experts Group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the gyro sensor 180B can be used to determine the motion attitude of the electronic device 100.
  • the air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor. The electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
  • Distance sensor 180F for measuring distance. The electronic device 100 can measure the distance through infrared or laser.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the temperature sensor 180J is used to detect the temperature.
  • the bone conduction sensor 180M can acquire vibration signals.
  • Touch sensor 180K also called "touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • electronic devices such as the smart screen 101 , the mobile phone 102 , and the in-vehicle device 103 may all have the structure shown in FIG. 2 , or have a structure with more or fewer components than that shown in FIG.
  • the embodiments of the present application do not limit the types of electronic devices included in the application scenario.
  • the electronic device 100 shown in FIG. 2 when the electronic device 100 shown in FIG. 2 is a mobile phone, it may have a Harmony OS system, system, The system or any other possible operating system, or may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture, etc.
  • the mobile phone has a layered architecture. Taking the system as an example, the software structure of the mobile phone 102 is exemplarily described.
  • FIG. 3 is a software structural block diagram of an implementation process of an example of a method for human-computer interaction according to an embodiment of the present application.
  • the in-vehicle device 103 can be used as a screen projection device (or “display device”) of the mobile phone 102 , and the application of the mobile phone 102 can be displayed on the display screen of the in-vehicle device 103 .
  • the system has a layered architecture, the software can be divided into several layers, each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the The system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer.
  • the application layer can include a series of application packages. As shown in Figure 3, the application package can include applications such as Visible to Speak 11, Smart Voice 13, Music, Navigation, and HiCar 15. The following mainly introduces the functional modules respectively corresponding to the visible to speak 11 and the intelligent voice 13 in the embodiments of the present application.
  • “visible” may refer to the part that the user can see during the human-computer interaction between the user and the electronic device.
  • the user-visible portion may include display content on the screen of the electronic device, such as the desktop, windows, menus, icons, buttons, and controls of the electronic device.
  • the visible portion may also include multimedia content such as text, pictures, and videos displayed on the screen of the electronic device, which is not limited in this embodiment of the present application.
  • the display content on the screen of the electronic device can be an interface displayed by an application running in the foreground of the electronic device, or a virtual display interface running an application in the background of the electronic device. on other electronic devices.
  • “speakable” means that the user can interact with the visible part through a voice command, thereby completing the interactive task.
  • user-visible parts such as desktops, windows, menus, icons, buttons, and controls of an electronic device
  • the user can control them through voice commands, and then perform input operations such as clicking, double-clicking, and sliding on the visible parts.
  • 11 may include an interface information acquisition module 111, an intent processing module 112, an interface module 113, a predefined action execution module 114, and the like.
  • the interface information acquisition module 111 may acquire interface content information of applications running in the foreground or background of the mobile phone.
  • the intent processing module 112 may receive the user's voice instruction returned by the smart voice 13, and determine the user's intent according to the user's voice instruction.
  • the interface module 113 is used to realize data and information exchange between various applications.
  • the predefined action execution module 114 is configured to execute corresponding operations according to voice commands, user intentions, and the like.
  • the smart voice 13 may correspond to a smart voice application installed on the side of the mobile phone 102 , that is, a service process of voice recognition provided by the smart voice application of the mobile phone 102 .
  • the service process of the voice recognition provided by the smart voice 13 can be provided by the server, and this scenario can correspond to (c) in FIG.
  • the server 104 analyzes the voice command, the server 104 returns the recognition result of the voice command to the mobile phone 102, which will not be repeated here.
  • the intelligent speech 13 may include a semantic understanding (natural language understanding, NLU) module 131, a speech recognition (automatic speech recognition, ASR) module 132, a speech synthesis (text to speech, TTS) module 133 and a session management (dialog management) module 131 , DM) module 134 and so on.
  • NLU natural language understanding
  • ASR automatic speech recognition
  • TTS text to speech
  • TTS session management
  • DM session management
  • the ASR module 132 can convert the original voice signal input by the user into text information; the NUL module 131 can convert the recognized text information into semantics that can be understood by electronic devices such as mobile phones and in-vehicle devices; the DM module 134 can be based on dialogue The state determines the action that the system should take, etc.; the TTS module 133 can convert the natural language text into speech and output it to the user.
  • the smart speech 13 may also use a natural language generation (NLG) module, etc., which is not limited in this embodiment of the present application.
  • NLG natural language generation
  • the application of the mobile phone can be projected to the vehicle device through the HiCar application 15. During the projection process, the application actually runs on the side of the mobile phone, and the operation can include the foreground operation or the background operation of the mobile phone.
  • the in-vehicle device can have an independent display system. After the application of the mobile phone is projected to the in-vehicle device through the HiCar application 15, there can be an independent display desktop and application quick entry on the in-vehicle device, while providing the ability to obtain voice commands.
  • the application framework layer includes a variety of service programs or some predefined functions, which can provide an application programming interface (API) and a programming framework for applications in the application layer.
  • the application framework layer may include a content sensor (content sensor) 21, a multi-screen framework service module 23, a view system 25, and the like.
  • the content provider 21 can be used to store and obtain data, and make these data accessible by application programs.
  • the data acquired by the content provider 21 may include interface display data of the electronic device, video, image, audio, user browsing history, bookmarks and other data.
  • the content controller 21 may acquire the interface content displayed in the foreground or background of the mobile phone.
  • the multi-screen frame service module 23 may include a window manager, etc., for managing the window display of the electronic device.
  • the window manager may acquire the size of the display screen of the mobile phone 102 or the size of the window to be displayed, and acquire the content of the window to be displayed, and the like.
  • the multi-screen framework service module 23 can also manage the screen projection display process of the electronic device, for example, obtain the interface content of one or more applications running in the background of the electronic device, and transmit the interface content to other electronic devices for realizing other electronic devices.
  • the interface content of the electronic device is displayed on the electronic device, which is not repeated here.
  • the view system 25 includes visual controls, such as controls for displaying text, controls for displaying pictures, and the like. View system 25 can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the Android runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (media library), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • a 2D graphics engine is a drawing engine for 2D drawing.
  • the image processing library can provide analysis of various image data and provide a variety of image processing algorithms, such as image cutting, image fusion, image blurring, image sharpening and other processing, which will not be repeated here.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer at least includes display drivers, audio drivers, sensor drivers, etc.
  • Various drivers can call the hardware structures such as the microphone, speaker or sensor of the mobile phone, such as calling the microphone of the mobile phone to obtain the voice commands used, and calling the speaker of the mobile phone for voice output, etc. , and will not be repeated here.
  • the in-vehicle device 103 as a display device may have the same or different software structure than that of the mobile phone 102 .
  • the vehicle-mounted device at least includes a display module 31 , a microphone/speaker 32 , and the like.
  • the display module 31 may be used to display the interface content currently running on the in-vehicle device 103 , or display the application interface projected by the mobile phone 102 .
  • the in-vehicle device 103 may have an independent display system.
  • the application of the mobile phone 102 after the application of the mobile phone 102 is projected to the in-vehicle device 103 through the HiCar application 15, there may be an independent display desktop and application on the in-vehicle device 103 Fast entry.
  • music, navigation, video and other applications of the mobile phone 102 can be rearranged and displayed on the in-vehicle device 103 according to the display system of the in-vehicle device 103 after the application of the mobile phone is projected to the in-vehicle device 103 through the HiCar application 15.
  • This embodiment of the present application This is not limited.
  • the microphone/speaker 32 is the hardware structure of the in-vehicle device, and can realize the same functions as the microphone/speaker of the mobile phone.
  • the input of the user's voice instruction may be through the microphone of the mobile phone 102 itself, or may be a remote virtual microphone.
  • the remote virtual microphone can be understood as a kind of acquisition capability of voice commands provided by the microphone of the in-vehicle device 103 by means of the microphone of the in-vehicle device 103, and the acquired voice commands are transmitted to the mobile phone, and the mobile phone 102 can recognize the voice commands. etc., will not be repeated here.
  • the HiCar application 15 can rely on the multi-screen framework capability of the mobile phone to project the interfaces of multiple applications of the mobile phone to the interface of the in-vehicle device.
  • the multiple applications themselves actually run on the side of the mobile phone, and the interface is displayed on the on the screen of the in-vehicle device.
  • the screen content is extracted through the content sensor of the mobile phone system, and the application interface content of the interface of the screen-casting vehicle device is obtained.
  • Smart Voice can analyze user semantics more quickly and accurately through the terminal-side (relying on the powerful calculation example of the mobile phone itself) and cloud-side analysis capabilities, and the combination of the terminal and the cloud can send the recognized results to Visible to match the interface content and identify the user. the purpose.
  • the interface is operated to realize control operations such as control clicks, sliding up and down, left and right, and return.
  • the in-vehicle device 103 and the mobile phone 102 are in a state of established connection.
  • connection between the mobile phone 102 and the in-vehicle device 103 may include various connection modes such as wired connection or wireless connection.
  • the wired connection between the mobile phone 102 and the in-vehicle device 103 may be through a USB data cable; the wireless connection between the mobile phone 102 and the in-vehicle device 103 may be established by means of a Wi-Fi connection, or by means of a mobile phone 102 and the in-vehicle device 103 support the function of near field communication (NFC), and perform proximity connection through the "touch" function, or connect through the mobile phone 102 and the in-vehicle device 103 through Bluetooth scanning code, etc.
  • NFC near field communication
  • the communication bandwidth and rate are gradually increased, and data may be transmitted between the mobile phone 102 and the in-vehicle device 103 without establishing a near field communication connection.
  • the mobile phone 102 and the in-vehicle device 103 may be able to project the screen of the mobile phone to the in-vehicle device through 5G communication.
  • the mobile phone may not provide functions such as discovery and establishing a connection with the in-vehicle device.
  • the mobile phone 102 and the in-vehicle device 103 can be connected and communicated based on the relevant settings under the account by logging into the same account.
  • both the mobile phone 102 and the in-vehicle device 103 can register a Huawei account.
  • the application of the mobile phone 102 can be displayed on the display screen of the in-vehicle device 103, and the application can actually run on the side of the mobile phone 102, which is not repeated here. Repeat.
  • the mobile phone 102 and the vehicle-mounted device 103 may also have different division methods or include more functional modules. This is not limited.
  • the mobile phone 102 and the in-vehicle device 103 having the software structure shown in FIG. 3 are taken as examples for detailed description in conjunction with the accompanying drawings and application scenarios.
  • FIG. 4 is a schematic diagram of a graphical user interface (graphical user interface, GUI) for implementing a voice interaction process on an in-vehicle device provided by an embodiment of the present application.
  • GUI graphical user interface
  • FIG. 4 shows that the screen display system of the in-vehicle device 103 displays the currently output interface, and the content of the interface can be derived from the application on the side of the mobile phone 102 that is actually running, obtained by the HiCar application. Provided to the in-vehicle device 103 .
  • the interface content on the display screen of the in-vehicle device 103 can be arranged and filled based on its own display system, and the same content can have a different display style, icon size, arrangement order, etc.
  • the content is arranged and filled on the display screen of the in-vehicle device 103 according to the requirements of the display system of the in-vehicle device 103 .
  • the screen display area of the in-vehicle device 103 may include a status display area at the top position, and a navigation menu area 401 and a content area 402 shown by dashed boxes.
  • the status display area displays the current time and date, Bluetooth icon, WIFI icon, etc.
  • the navigation menu area 401 may include icons such as homepage, navigation, phone and music, each icon corresponds to at least one application actually running on the mobile phone 102, the user Any icon can be clicked to enter the corresponding interface of the application; the content area 402 displays the content provided to the in-vehicle device 103 by different applications.
  • the Huawei Music is installed on the mobile phone 102 , and the Huawei Music runs in the background, and the Huawei Music sends the content such as the playlist or song list displayed during the running process to the in-vehicle device 103 .
  • the screen display system of the in-vehicle device 103 fills the content area provided by the Huawei Music in the content area of the display screen, as shown in (a) in FIG. Recommendations, playlists, ranking lists, radio stations, searches, etc., the display process will not be repeated in subsequent embodiments.
  • the interface of the in-vehicle device 103 may also display other more menus or contents of application programs, which are not limited in this embodiment of the present application.
  • FIG. 4 shows the interface after the user clicks on the music application in the navigation menu area 401, and the icon of the music application in the navigation menu area 401 is highlighted in gray.
  • the song name or song list provided by Huawei Music is displayed in the content area 402
  • the play button 20 is displayed on the icon of song 1, and the song 2, song 3, song 4 and song 5 are paused
  • a pause button 30 is displayed in the playback state.
  • a voice ball icon 10 may also be included, as shown in (a) of FIG.
  • the in-vehicle device 103 can display the interface 403 as shown in (b) of FIG. 4 .
  • the wake-up window 403 - 1 shown by the dotted box may be displayed on the interface 403 , and the wake-up window 403 - 1 includes the voice recognition icon 40 .
  • the wake-up window 403 - 1 may not be embodied in the form of a window, but only includes the voice recognition icon 40 , or includes the voice recognition icon 40 and the voice command recommended to the user, and is displayed in a floating manner on the display screen of the in-vehicle device 103 superior.
  • the embodiments of the present application are for convenience of description.
  • the area including the voice recognition icon 40 is referred to as a "wake-up window", which should not limit the solution of the embodiment of the present application, and will not be described in detail later.
  • the voice recognition icon 40 may be displayed dynamically, which is used to indicate that the in-vehicle device 103 is in a state of monitoring and acquiring the user's voice instruction.
  • the wake-up window 403-1 may also include some voice commands recommended to the user, such as voice commands such as "stop playing” and "continue playing". It should be understood that the recommended voice command may also accept a user's click operation, and execute the purpose corresponding to the response command, which will not be repeated in this embodiment of the present application.
  • the voice command can be sent to the mobile phone 102, and the mobile phone 102 recognizes the user’s
  • a voice command in response to the voice command, a click operation on the play button 20 of the song 1 is performed in the background, and the play button 20 on the song 1 changes to a pause button 30 .
  • the mobile phone 102 can transfer the display interface of the click operation on the play button 20 of the song 1 back to the in-vehicle device 103, and then the in-vehicle device 103 can display the interface 404 shown in (c) in FIG. Pause button 30.
  • the above implementation process can be understood as a user instruction can perform a click operation on any control on the display screen of the in-vehicle device 103 , and further display the interface after the click operation is performed on the display screen of the in-vehicle device 103 .
  • the user can also turn on the voice monitoring function by pressing the car control voice button of the car, for example, the user presses the car control voice button on the steering wheel to turn on the function of the in-vehicle device 103 to monitor and obtain the user's voice command, and display
  • the wake-up window 403-1 is shown by the dashed box.
  • the operation of the user clicking the voice ball icon 10 can trigger the in-vehicle device 103 to enable the voice monitoring function; or, the voice interaction function between the in-vehicle device 103 and the user can be enabled, that is, the in-vehicle device 103 is always monitoring the voice The state of the instruction; or, the user can further turn on the in-vehicle device 103 through other shortcut operations to be in the state of monitoring the voice instruction all the time, which is not limited in this embodiment of the present application.
  • the wake-up window can always be displayed on the display screen of the in-vehicle device 103; , which is suspended and displayed on the display screen of the in-vehicle device 103 again; when no voice command of the user is detected within a preset time (for example, 2 minutes), the in-vehicle device 103 can automatically exit the monitoring function, which is not limited in this embodiment of the present application.
  • buttons, buttons, switches, menus, options, pictures, lists, texts, etc. that are visible on the interface and that can be clicked by the user are collectively referred to as “controls”. The example will not be repeated.
  • the voice instruction recommended to the user displayed in the wake-up window 403-1 may be instruction content associated with a control on the currently displayed interface 403 that can be clicked by the user.
  • the content sensor of the mobile phone 102 can obtain the current state of each control and provide the user with a recommendation instruction according to the current state.
  • the interface 403 includes the play button 20, and the recommended instruction in the wake-up window 403-1 may include “stop playing”, that is, “stop playing” can be understood as playing.
  • the state achieved after the button 20 is clicked by the user.
  • song 2 on the interface 403 also includes the pause button 30, then the recommended instruction in the wake-up window 403-1 may include "play song 2", that is, "play song” can be understood as the user clicking the pause button of song 2 The state achieved after 30.
  • the pause button 30 is displayed on all songs 1 to 5.
  • the mobile phone 102 obtains that the current interface does not include the “play button 20”. If the user wakes up After the voice ball is activated, the "start playing" instruction may be displayed in the wake-up window 403-1, but the "stop playing” instruction will not be displayed, which will not be described in detail later.
  • the voice command recommended to the user displayed in the wake-up window 403-1 may also be a fixed recommended command of a certain application.
  • the voice instruction recommended to the user displayed in the wake-up window 403-1 can be fixed as “stop playing”, “start playing”, etc. This embodiment of the present application This is not limited.
  • controls on the interface that can be clicked by the user can be divided into the following categories:
  • Text controls contain textual information that can be recognized. Exemplarily, “daily recommendation”, “song list”, “top chart”, “radio station”, “song X” and “song list 1" as shown in (a) of FIG. 4 .
  • the text information included in the text control may be directly identified by a content sensor (content sensor) of the application framework layer of the mobile phone 102 . It should be understood that the music application actually runs in the mobile phone 102 in the background, and the mobile phone 102 can acquire the text information of the text control projected and displayed on the display screen of the in-vehicle device 103 in the background.
  • a content sensor content sensor
  • the voice command recommended to the user displayed in the wake-up window 403-1 may be related to the text control obtained above, such as “play song 2”, etc., which will not be repeated here. .
  • Common web controls can include text input boxes (TextBox), drop-down boxes (DropList), date/time controls (Date/TimePicker), and so on.
  • search control can be divided into web control classes.
  • the web control on the interface can be recognized by the content sensor of the mobile phone 102.
  • the voice command recommended to the user displayed in the wake-up window 403-1 can be obtained from the above. related web controls, such as "search for songs" and other recommended commands.
  • Picture controls are displayed as pictures on the interface, and each picture corresponds to a different descriptor. Exemplarily, as shown in (a) in FIG. 4 , the artist picture or album picture above the song 1, and the picture of “nostalgic classics” displayed above the song list 1 for identifying the list, etc.
  • the content sensor of the mobile phone 102 can generalize the meaning of the picture by obtaining the description word of each picture, and provide the user with a recommendation instruction.
  • the mobile phone 102 obtains a picture with the description word "Zhang XX's song 1" and the voice instruction recommended to the user displayed in the wake-up window 403-1 may display a recommended instruction such as "play Zhang XX's song 1" .
  • the song list 1 may include a plurality of songs, and the “song list 1” can be divided into list controls.
  • the entered next-level interface may be presented to the user with multiple song lists included in the song list 1, and the music in the song list 1 is not started to be played.
  • a switch control can be understood as a control with a switch function on the interface.
  • the play button 20 and the pause button 30 can be divided into switch controls.
  • the mobile phone 102 can obtain the controls on the current display interface 403, and further according to the obtained control types, descriptors and other information, determine the recommended instructions displayed in the wake-up window 403-1 to the user .
  • the embodiments of the present application may include more controls than the five types of controls listed above, and the embodiments of the present application will not exemplify them one by one.
  • some controls may be divided into multiple control types at the same time, which is not limited in this embodiment of the present application.
  • Table 1 lists the controls on several common audio application pages. As shown in Table 1 below, for audio applications such as NetEase Cloud Music, Kugou Music, Huawei Music, Himalaya, Baby Bus Story, Xiaobanlong Children's Song, which are commonly used by users, different pages may include different controls, and each application The number and types of controls included in the first-level page and the second-level page are different.
  • the first-level page can be understood as the main interface of NetEase Cloud Music entered after the user clicks the NetEase Cloud Music application icon, including "daily recommendation", “My favorite music”, “Local Music”, “Private FM” and other page content
  • the second-level interface is the next-level page that the user clicks on any menu or control on the main interface of the NetEase Cloud Music to enter, such as the playlist page, play page, etc.
  • the page content on each page can be acquired by the mobile phone, and the text information included in each control can be recognized, which will not be repeated here.
  • FIG. 5 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application.
  • FIG. 5 shows that the screen display system of the in-vehicle device 103 displays an interface 501 currently output.
  • song 1, song 2, song 3, song 4, and song 5 are all in the state of being paused, and the pause button 30 is displayed.
  • the user clicks the voice ball icon 10 to enable the voice monitoring function of the in-vehicle device 103, and the display screen of the in-vehicle device 103 displays as shown in FIG. 5 (( b)
  • the wake-up window 502-1 may include the voice recognition icon 40 and recommended voice commands.
  • the recommended voice commands may be "start playing" and "next page", and so on.
  • an interface 503 as shown in (c) in FIG. 5 can be displayed, which interface 503 is the interface after the click operation on the song list 1 is performed.
  • the interface 503 may include the following controls: return to the previous level, song list 1—classic nostalgia, play all, and song 6 and many other songs included in the song list 1 in the song name.
  • the wake-up window is always suspended and displayed on the display screen of the in-vehicle device 103 .
  • the interface 503 shown in (c) of FIG. 5 includes a wake-up window 503-1.
  • the instructions recommended to the user in the wake-up window 503-1 may be changed according to the controls included in the current interface 503, for example, voice instructions such as “play all” and “next page” are displayed.
  • an interface 504 can be displayed, which is the interface after the click operation on the "play all” control is performed.
  • the “play all” control is displayed as the playing state, and starts playing from the first song (song 6) arranged in the song list 1 , a sound icon 50 is displayed at the location of the first song, which is used to identify the source of the sound as song 6, that is, song 6 is the song currently being played, which is not limited in this embodiment of the present application.
  • the method for human-computer interaction provided by the embodiment of the present application, by obtaining the controls displayed on the interface that are visible and can be clicked by the user, the user can input a voice command to execute the control on the interface. Click and other operations of any control. All apps and all visible content on the display can be controlled by the user with voice commands. In particular, in the driving scene, the manual operation of the user is reduced, thereby avoiding user distraction and improving the safety of the user in the driving scene.
  • FIG. 6 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
  • FIG. 6 shows that the screen display system of the in-vehicle device 103 displays an interface 601 currently output.
  • song 1, song 2, song 3, song 4, and song 5 are all in the state of being paused, and the pause button 30 is displayed.
  • the wake-up window 602-1 is shown in the figure.
  • the wake-up window 602-1 may include the voice recognition icon 40, recommended voice commands such as "start playing" and "next page”.
  • the mobile phone 102 can obtain the content of the interface, determine whether the interface includes icons (or pictures), and add icons to the icons in a certain order. Digital corner markers.
  • the user's operation of enabling the voice monitoring function of the in-vehicle device 103 may trigger the addition of a digital corner mark to the icon in the interface, or, before the user enables the voice monitoring function of the in-vehicle device 103, other preset operations may be used. Trigger to add a digital corner label on the icon in the interface, which is not limited in this embodiment of the present application.
  • the icons with added digital corner marks mentioned in the embodiments of the present application may also include application icons of different applications.
  • the interface 601 as shown in (a) of FIG. 6 navigates the home icon, navigation icon, phone icon, music icon, etc. in the menu area.
  • the icon for adding a digital superscript mentioned in the embodiment of the present application may also include a picture displayed on the interface 601 .
  • the singer picture of song 1, the picture of song list 1, etc. mark the pictures included on the interface with digital superscripts.
  • a song or a song list in the content area of the interface 601 is displayed in a foreign language
  • the user may not be able to accurately issue a voice command including the song name, by using the pictures of different songs or pictures of the song list.
  • digital corner markers users can perform operations through voice commands containing digital corner markers, which is convenient and quick, and improves user experience.
  • the icon for adding a digital corner label mentioned in the embodiment of the present application may also include controls such as buttons displayed on the interface 601, which are not limited in the embodiment of the present application.
  • the display size of the numeric subscript can be adapted to the size of the application icon displayed on the interface of the in-vehicle device 103 .
  • the application icon on the interface is small, adding a digital corner mark may cause the digital corner mark to be too small, and the user cannot obtain the digital corner mark accurately. Therefore, if the application icon is small, such as the display of the application icon on the in-vehicle device When the pixels occupied on the screen are less than or equal to the preset pixels, the application icon may not be marked, but only the application icons larger than the preset pixels are marked, which is not limited in this embodiment of the present application.
  • the mobile phone 102 acquires the interface 601 as shown in (a) of FIG. 6 , it is determined that the interface 601 includes icons of different songs and icons of different song lists.
  • the mobile phone 102 can add a digital corner mark 60 as shown in (b) of FIG. 6 to each icon according to the arrangement order of the icons on the interface from left to right and from top to bottom, for example, add a digital corner to song 1.
  • Mark 1 add a digital corner mark 2 to song 2, and so on, add a digital corner mark 60 to all the icons on the music interface.
  • the above-mentioned process can obtain the content on the interface by the content sensor of the mobile phone 102, and obtain the interface content from the content sensor by the HiCar application 15 installed on the mobile phone 102, and the HiCar application 15 can judge whether the current interface has an icon according to the interface content. .
  • a digital corner mark 60 is added to the icon in a certain order, which is not limited in this embodiment of the present application.
  • the user can input a voice command including the digital superscript, and use the voice command to execute the picture of the digital superscript.
  • Click Action Exemplarily, as shown in (b) of FIG. 6 , after the pictures included on the interface of the in-vehicle device 103 are marked with a numerical corner mark, the user can input “1” or “play 1”, etc. containing the corresponding number.
  • Voice command in response to the voice command input by the user, the mobile phone 102 can perform a click operation on the song 1 marked as 1 in the background, and display the interface 603 as shown in (c) in FIG. 6 , the pause button 30 of the song 1 The transition is made to the play button 20, and the in-vehicle device 103 starts to play the song 1.
  • the user speaks a corresponding number, such as number 1, through a voice command, and the user's voice command is recognized by the intelligent voice 13 of the application layer, and the user's voice command is The voice command translates into the text "1".
  • the content sensor of the application framework layer extracts the content of the current interface, analyzes the content of the control from the visible, and obtains the text information of the control. For example, match the recognized control information with the text "1" returned by Smart Voice.
  • the click operation is performed on the icon of song 1, and the click event on the icon of song 1 is transmitted to the business logic of the music application itself, so as to realize the jump of the corresponding business logic.
  • the HiCar application 15 ends this round of voice recognition, exits the voice recognition function of Smart Voice, the voice ball icon 10 returns to the static state as shown in (c) in FIG. 6 , wakes up the window 602-1 and the recommended voice command etc. to disappear.
  • a digital corner mark in the process of adding a digital corner mark, may be added to some controls in one or more controls on the current interface according to certain principles, which can increase all the digital corner marks.
  • Said part of the controls may include all the controls identified as picture type in one or more controls of the current interface; or identified as controls with grid-type arrangement order in one or more controls of the current interface; or identified as the current interface
  • One or more of the controls in the list have a list-type arrangement order; or one or more controls in the current interface are identified and the display size is greater than or equal to the preset value.
  • the numeric subscript may be added to some controls in the one or more controls according to a preset sequence.
  • the preset order includes a left-to-right and/or a top-to-bottom order.
  • the outline of the icon controls can be acquired, and the outline keywords describing the icon controls can be determined according to the outlines; The matching degree between one or more keywords included in the voice instruction and the outline keyword of the icon control is determined, and the icon control with the largest matching degree is determined as the target control.
  • the music playing interface when the user inputs the voice command as "I like", during the matching process between the voice command and the controls included in the interface, if the description words of the favorite button on the music playing interface are "like", “favorite” , the outline of the favorite button is the shape of "peach heart”, then the shape of the peach heart can be matched as "I like", this method can generalize the user's voice command, and more intelligently combine the user's command with the controls on the interface match.
  • strong matching is given priority, that is, the control information and the voice command text of the smart voice recognition need to be in one-to-one correspondence. If the strong match is unsuccessful, a weak match is performed, that is, it is judged whether the control information contains the voice command text of the intelligent voice recognition. As long as it contains part of the voice command text of the intelligent voice recognition, it is judged that the matching is successful, and the control corresponding to the control information is determined. Perform a click action.
  • a digital corner mark is added to the clickable controls such as pictures and application icons displayed on the interface, and the user can issue a voice command including numbers, and the digital corner mark is executed through the voice command.
  • Control click operation etc.
  • the user sees the digital corner mark on the interface, he sends out a voice command including a number, and converts the voice command including a number through voice recognition, so as to determine the picture, application icon and other controls corresponding to the number that can be clicked, and execute the click operation.
  • the user does not need to memorize a variety of complex voice commands, and only realizes the voice interaction process through digital voice commands, which is simpler and more convenient, reduces the difficulty of voice interaction, and improves user experience.
  • FIG. 7 is a schematic interface diagram of another example of a voice interaction process implemented on a vehicle-mounted device provided by an embodiment of the present application.
  • the navigation menu area 401 of the screen display system displays navigation menus such as home page, navigation, phone and music, and switching between different navigation menus can also be controlled by the user's voice commands.
  • the process of jumping from the music interface shown in (c) of FIG. 6 to the navigation interface from the screen display interface of the in-vehicle device 103 can also be implemented by voice commands.
  • the screen display system of the in-vehicle device 103 displays the currently output interface 701 .
  • song 1 is displayed in a playing state
  • song 2, song 3, song 4 and song 5 are all in a paused state
  • a pause button 30 is displayed.
  • the wake-up window 702-1 is shown in the figure.
  • the wake-up window 702-1 may include the voice recognition icon 40, recommended voice commands such as "start search" and "next page”.
  • the voice commands recommended in the wake-up window 702-1 may be different from the wake-up window 403-1 shown in (b) of FIG. 4 and the wake-up window 502- shown in (b) of FIG. 5. 1 and (c) the recommended voice commands displayed in the wake-up window 503-1, etc. shown in Figures 1 and (c), the recommended voice commands displayed in the wake-up window can follow the display content on the current interface to make corresponding changes, and display the same as the one on the current interface. Voice commands related to the displayed content, or voice commands not related to the displayed content on the current interface may also be displayed, which is not limited in this embodiment of the present application.
  • the voice instruction can be sent to the mobile phone 102.
  • the mobile phone 102 recognizes the user's voice command, and in response to the voice command, enables the voice interaction function between the in-vehicle device 103 and the user, that is, the in-vehicle device 103 is always in the state of monitoring the voice command, and the user does not need to activate the in-vehicle device 103 multiple times to monitor Get the user's voice command.
  • the display interface of the display screen of the in-vehicle device 103 can jump from the music menu to the interface 703 of the navigation menu.
  • the user can be provided with various types of search options including "food”, “gas station”, “shopping mall”, etc. shown in the right area.
  • the interface content of the interface 703 of the navigation menu is not repeated here.
  • the wake-up window may disappear briefly, and the user's voice instruction is monitored in the background.
  • the voice command issued by the user is detected again, it can be suspended and displayed on the display screen again.
  • the user starts to issue a voice command, and the wake-up window 704-1 appears.
  • the recommended instruction displayed in the wake-up window 704-1 may be adapted to the current interface content, or the recommended instruction may be associated with historical data with the highest search frequency when the user uses the navigation application.
  • the wake-up window 704-1 may include the voice recognition icon 40, and recommended voice commands such as “navigate to the company” and “navigate to the mall”, which are not limited in this embodiment of the present application.
  • the in-vehicle device 103 When the user inputs the voice command of "search for food", after the in-vehicle device 103 obtains the user's command, it can send the voice command to the mobile phone 102, and the mobile phone 102 recognizes the user's voice command, and in response to the voice command, simulates clicking as shown in Figure 7
  • the "food” option on the interface 704 shown in (d) of FIG. 7 is displayed for the user, and the search result interface 705 shown in (e) of FIG. 7 is displayed for the user.
  • multiple searched restaurants are displayed on the interface 705, and the restaurants can be sorted according to the distance from the user's current location, and the per capita unit price and distance of the restaurant are displayed for the user, which is not limited in this embodiment of the present application. .
  • the recommended instruction displayed in the wake-up window 705-1 displayed on the interface 705 can be re-adapted to the current interface content.
  • the wake-up window 705-1 can include voice recognition. Icon 40, "start search", "next page” and other recommended voice commands, which are not limited in this embodiment of the present application.
  • a search result interface 706 as shown in (f) of FIG. 7 is displayed for the user. It should be understood that the interface 706 is an interface displayed after performing a swipe on the interface 705 as indicated by the black arrow.
  • the user selects the target restaurant as "5.XX light food restaurant”
  • he can continue to input the voice command of "navigate to 5".
  • the user can display the display as shown in the figure below.
  • the navigation route interface 707 shown in (g) in Figure 7 the interface 707 includes the route and distance to the 5.XX light food restaurant, etc., which is not limited in this embodiment of the present application.
  • the user's voice command includes "5".
  • the voice command can be sent to the mobile phone 102.
  • 102 may perform interface matching according to the instruction, that is, intercept the keyword in the instruction, and match it with the keyword or description information contained in the controls on all interfaces on the current interface.
  • the keywords of the user instruction are “navigation” and "5"
  • the mobile phone detects that the keywords of the option “5.XX light food restaurant” on the interface are “5", "light food restaurant”, etc.
  • the user instruction The matching degree with this option is the highest, so click the "5.XX light food restaurant” option on the interface 706 to be executed, and the interface 707 as shown in (g) in FIG. 7 is displayed.
  • the above method obtains the text controls, picture controls, buttons, and icon controls that are visible on the interface and can be clicked by the user, and then matches the target controls on the interface according to the obtained user voice commands, and executes the matching on the interface.
  • Table 2 shows several common controls on pages of navigation applications. As shown in Table 2 below, for navigation applications such as Baidu Map and AutoNavi Map that are commonly used by users, different pages may include different controls, as well as the number and types of controls included in the primary and secondary pages of each application. all different.
  • the first-level page can be understood as the main interface of Baidu map entered by the user after clicking the Baidu map application icon, including "zoom in”, “zoom out”, “positioning”, “road conditions”, “Search”, “More”, “Exit” and other controls
  • the second-level interface is the next-level page that the user clicks on any menu or control on the main interface of Baidu Maps to enter, such as the route preference setting page, etc.
  • the page content and controls on each page can be acquired by the mobile phone, and the text information included in each control can be recognized, which will not be repeated here.
  • the general instruction controls may include controls on the interface, such as return, turn left/turn right, turn up/down, page up/page down, and the like.
  • the text is sent to Visible and Talkable.
  • the click event of the return key (key event) is sent to the application to which the current interface belongs, and the application to which the current interface belongs will receive the corresponding return event by monitoring the return key event. Process the return business.
  • the corresponding sliding list control is identified through the interface control returned by the content sensor.
  • the sliding method of the control itself such as the scrollBy sliding method of RecyclerView, to implement up and down sliding.
  • left and right sliding it is based on whether the control itself supports the feature of left and right sliding.
  • the control supports left and right sliding the distance moved in the horizontal direction is passed in the scrollBy sliding method called, and the positive and negative values are used to judge left or right sliding.
  • the control supports sliding up and down the distance moved in the vertical direction is passed in, and the positive and negative values are used to determine whether to slide up or down.
  • FIG. 8 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application. Switching between different navigation menus can also be controlled by a user's voice command.
  • FIG. 8 shows the process that the screen display interface of the in-vehicle device 103 jumps from the navigation route interface shown in (g) in FIG. 7 to the phone menu, and this process can also be implemented by the user's voice command.
  • the screen display system of the in-vehicle device 103 displays the currently output navigation route interface 801 .
  • the wake-up window on the interface 801 displays the voice recognition icon 40, recommended voice commands such as "exit navigation" and "search”.
  • a phone application interface 802 as shown in (b) in FIG. 8 is displayed for the user.
  • 802 may include submenus such as call records, contacts, and dialing, and the interface 802 currently displays content such as the user's call records, which will not be repeated here.
  • the user can input voice commands to perform operations such as clicking on any control on the interface.
  • All apps and all visible content on the display can be controlled by the user with voice commands.
  • the manual operation of the user is reduced, thereby avoiding user distraction and improving the safety of the user in the driving scene.
  • FIG. 9 is a schematic flowchart of a method for voice interaction provided by an embodiment of the present application. As shown in FIG. 9 , the method 900 may include the following steps:
  • the user opens the first application.
  • the first application may be an application actually running on the side of the mobile phone 102 , for example, an application running in the foreground or an application running in the background of the mobile phone 102 .
  • this step 901 can be performed by the user on the side of the in-vehicle device 103, and transmitted back to the mobile phone 102 by the in-vehicle device 103 to start the first application in the background of the mobile phone 102, or the user can perform it on the side of the mobile phone 102 to directly cast the screen It is displayed on the display screen of the in-vehicle device 103, which is not limited in this embodiment of the present application.
  • the first application performs interface refresh.
  • performing interface refresh by the first application may trigger the mobile phone 102 to perform interface identification through an algorithm service.
  • the mobile phone 102 performs interface hot word recognition to obtain information of the interface content. It should be understood that the time delay of the interface hot word recognition process in this 904 is less than 500 milliseconds.
  • the interface content may include user-visible portions of the currently displayed interface.
  • the user-visible part may include pictures, text, menus, options, icons, buttons, etc. displayed on the interface, which are collectively referred to as "controls" and the like in this embodiment of the present application.
  • an operation may be performed on the target control.
  • the operation may include input operations such as clicking, clicking, double-clicking, sliding, and right-clicking.
  • the voice command is matched with the target control on the interface, that is, the user's intention is recognized, and the click operation on the target control is further performed.
  • the user activates the voice recognition function.
  • starting the voice recognition function may be starting the vehicle-mounted device 103 to start monitoring the user's voice command;
  • the command is transmitted back to the mobile phone 102, and the mobile phone 102 analyzes the voice command, etc., which is not limited in this embodiment of the present application.
  • the user can activate the voice recognition function through a physical button of the vehicle-mounted device or through voice.
  • the display interface of the in-vehicle device 103 may also include a voice ball icon, as shown in (a) in FIG. Function.
  • the in-vehicle device 103 may display a wake-up window 403-1 as shown in (b) of FIG. 4 , which will not be repeated here.
  • the user can also turn on the voice monitoring function by pressing the car control voice button of the car, for example, the user presses the car control voice button 50 on the steering wheel as shown in (b) in FIG. 1 to turn on the voice monitoring function.
  • the in-vehicle device 103 has a function of monitoring and acquiring a user's voice command, which is not limited in this embodiment of the present application.
  • the HiCar application of the mobile phone transmits the acquired information of the interface content to the smart voice service module.
  • the smart voice service module may correspond to a smart voice application installed on the mobile phone 102, that is, the smart voice application of the mobile phone 102 executes the service process provided in FIG. 9 .
  • the service corresponding to the smart voice service module may be provided by the server, and this scenario may correspond to (c) in FIG. 1.
  • the mobile phone 102 With the help of the voice analysis capability of the server 104, the mobile phone 102 will The user's voice command is sent to the server 104, and after the server 104 analyzes the voice command, it returns the recognition result of the voice command to the mobile phone 102, which will not be repeated here.
  • the user inputs a voice command.
  • the mobile phone sends the voice command to the smart voice service module.
  • the process of steps 909 and 910 may be that the user inputs a voice command on the side of the in-vehicle device 103, and after the microphone of the in-vehicle device 103 obtains the user's voice command, the voice command is sent to the HiCar application of the mobile phone, and then via the HiCar of the mobile phone.
  • the application is passed to the smart voice service module, and the smart voice service module analyzes the user's voice commands.
  • the smart voice service module transmits the acquired information of the user's voice command and interface content to the ASR module.
  • the ASR module enhances and recognizes the user's voice instruction according to the information of the interface content.
  • the acquired information of the currently displayed interface content can be transferred to the ASR module in synchronization with step 908, that is, the information of the interface content is entered in the ASR model.
  • the user's voice command is recognized according to the updated ASR model.
  • the user's voice command may include homophones, etc., for example, the user inputs "variety show", which is affected by the pronunciation of different users, and may be analyzed by the ASR module.
  • Such homophones and words with similar pinyin may cause the mobile phone to fail to accurately obtain the user's operation intention through the user's voice command.
  • the current interface displays a lot of audio information, star photos, video information, etc.
  • the ASR module analyzes, it will select from “Zhongyi”, “Traditional Chinese Medicine”, From the possible identification results such as “loyalty” and “variety show”, select "variety show” that is more relevant to the currently displayed audio information, star photos, video information, etc., and then determine that the voice command issued by the current user is "variety show”.
  • the voice command of the existing ASR module is the information of the currently displayed interface content introduced into the process, so that the use of the user's current voice command can be accurately analyzed according to the information of the currently displayed interface content. scene, and then accurately locate the application scene targeted by the current user's voice command, and improve the accuracy of recognizing the voice command.
  • the ASR module returns the analyzed voice command text to the HiCar application of the mobile phone.
  • the HiCar application of the mobile phone sends a voice command text to the algorithm service module.
  • the mobile phone uses a certain algorithm service to match the text of the voice command with the information of the current interface content to determine the matching result.
  • the smart voice service module can perform steps 914-1 to 919-1 shown by the dotted box in FIG. 9:
  • the NLU module of the smart voice service can also obtain the text of the voice command.
  • the NLU module of the smart voice service performs intention recognition according to the voice command text, and determines the user's intention corresponding to the voice command text.
  • the DM module may, according to the returned user intent, perform the intent processing and determine the user intent of the user's voice command.
  • the smart voice service module returns the user intent to the HiCar application of the mobile phone.
  • steps 914-1 to 919-1 shown in the dotted box may be optional steps, and this process can be understood as accurately analyzing the user's intention with the help of a powerful speech recognition function such as a server, and responding on the mobile phone side
  • the HiCar application of the mobile phone determines whether to execute the operation corresponding to the user's intent, which improves the accuracy of voice command recognition.
  • the above process can be understood as a process of determining the user's intention according to the voice command after acquiring the user's voice command, for example, determining which control on the current interface is to be clicked by the voice command currently input by the user.
  • the matching degree of the voice command and each of the one or more controls is determined, and the control with the largest matching degree is determined as the user to perform the click.
  • the target control for the operation is determined.
  • one or more keywords contained in the user's voice command may be extracted; the one or more keywords may be determined.
  • the keywords may include words, words, pinyin of part or all of the Chinese characters of the voice command, etc., which are not limited in this embodiment of the present application.
  • the description information of each control may include outline information, text information, color information, position information, icon information, etc. of the control, which is not limited in this embodiment of the present application.
  • the HiCar application of the mobile phone determines whether to execute the operation corresponding to the user's intention.
  • the smart voice service module ends the current conversation according to the notification message that the user instruction is not executed.
  • the method obtains the controls displayed on the interface that are visible and that can be clicked by the user, and then the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands.
  • the acquired interface content information of the current interface is used as the parameter of the ASR analysis, that is, according to the interface content information of the current interface, it is accurately analyzed that the current user's voice command may occur.
  • the text of the recognized voice command is matched with the controls in the current possible application scenario, so as to obtain the user's intention more accurately and improve the voice interaction scenario. Accuracy of speech recognition.
  • the embodiment of the present application can analyze the user's voice command in combination with the current application scenario where the user's voice command may occur.
  • the accuracy of speech recognition is improved, thereby reducing user manual operations, thereby avoiding user distraction, and improving user safety in driving scenarios.
  • the electronic device includes corresponding hardware and/or software modules for executing each function.
  • the present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
  • the electronic device can be divided into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware. It should be noted that, the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • the electronic device involved in the above embodiment may include: a display unit, a detection unit, and a processing unit.
  • the display unit, the detection unit, and the processing unit cooperate with each other, and may be used to support the electronic device to perform the technical process described in the above embodiments.
  • the electronic device provided in this embodiment is used to execute the above-mentioned method for human-computer interaction, and thus can achieve the same effect as the above-mentioned implementation method.
  • the electronic device may include a processing module, a memory module and a communication module.
  • the processing module may be used to control and manage the actions of the electronic device, for example, may be used to support the electronic device to perform the steps performed by the display unit, the detection unit and the processing unit.
  • the storage module may be used to support the electronic device to execute stored program codes and data, and the like.
  • the communication module can be used to support the communication between the electronic device and other devices.
  • the processing module may be a processor or a controller. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, and the like.
  • the storage module may be a memory.
  • the communication module may specifically be a device that interacts with other electronic devices, such as a radio frequency circuit, a Bluetooth chip, and a Wi-Fi chip.
  • the electronic device involved in this embodiment may be a device having the structure shown in FIG. 2 .
  • This embodiment also provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the electronic device, the electronic device executes the above-mentioned related method steps to realize the above-mentioned embodiments. methods of human-computer interaction.
  • This embodiment also provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned relevant steps, so as to realize the method for human-computer interaction in the above-mentioned embodiment.
  • the embodiments of the present application also provide an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, and when the apparatus is running, The processor can execute the computer-executed instructions stored in the memory, so that the chip executes the method for human-computer interaction in the foregoing method embodiments.
  • the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.
  • the disclosed apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or May be integrated into another device, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • Units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium.
  • a readable storage medium including several instructions to make a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Abstract

A human-computer interaction method (900), and an electronic device (100) and a system, wherein the method (900) can be applied to electronic devices (100), such as a smart screen (101), or applied to a system comprising a mobile phone (102) and a vehicle-mounted device (103). A text control, a picture control, buttons (20, 30), an icon control, etc. that are displayed on interfaces (403, 404, 501-504, 601-603, 701-707, 801, 802) and are visible and can be subjected to a click operation by a user are acquired, such that operations such as clicking on any control on the interfaces (403, 404, 501-504, 601-603, 701-707, 801, 802) are executed according to a user speech instruction. In addition, during the process of matching a speech instruction with a control, combined with content information on the interfaces (403, 404, 501-504, 601-603, 701-707, 801, 802), an application scenario in which a current user speech instruction may occur is accurately analyzed, and therefore, a control in the application scenario in which the current user speech instruction may occur is matched according to a recognized speech instruction, so as to more accurately acquire a user's intention, thereby improving the speech recognition accuracy in a speech interaction scenario.

Description

一种人机交互的方法、电子设备及系统A method, electronic device and system for human-computer interaction
本申请要求于2020年09月10日提交国家知识产权局、申请号为202010950650.8、申请名称为“一种人机交互的方法、电子设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202010950650.8 and the application name "A method, electronic device and system for human-computer interaction", which was submitted to the State Intellectual Property Office on September 10, 2020. Reference is incorporated in this application.
技术领域technical field
本申请涉及电子技术领域,尤其涉及一种人机交互的方法、电子设备及系统。The present application relates to the field of electronic technology, and in particular, to a method, electronic device and system for human-computer interaction.
背景技术Background technique
随着技术的发展,越来越多的电子设备支持语音交互,语音交互逐渐成为用户传达意图以及控制电子设备的一种方式。通过语音指令控制电子设备可以解放用户的双手,方便用户操控电子设备。With the development of technology, more and more electronic devices support voice interaction, and voice interaction has gradually become a way for users to communicate intentions and control electronic devices. Controlling the electronic device through a voice command can liberate the user's hands and facilitate the user to control the electronic device.
一种语音交互的实现过程中,不同的应用通常依赖于独立开发的语音助手和用户进行交互。例如,对于导航类应用,百度地图可以依靠自身的小度和用户进行语音交互,高德地图则可以通过自研的小德和用户进行语音交互。该实现过程中,各个应用依赖独立开发的语音助手,使得用户对不同应用的语音交互体验有差异。In the implementation process of a voice interaction, different applications usually rely on independently developed voice assistants to interact with users. For example, for navigation applications, Baidu Maps can rely on its own smallness to interact with users by voice, while AutoNavi Maps can interact with users through self-developed Xiaode. In this implementation process, each application relies on an independently developed voice assistant, so that users have different voice interaction experiences for different applications.
此外,对于电子设备的其他应用,目前没有系统级的语音交互方式,语音助手并没有集成在应用中,用户无法通过语音指令控制应用内的操作。例如,对于音乐等有声类应用,或者视频等媒体类应用,不具有和用户进行语音交互的能力,用户无法通过语音指令控制该类应用的执行等。In addition, for other applications of electronic devices, there is currently no system-level voice interaction method, the voice assistant is not integrated in the application, and the user cannot control the operations in the application through voice commands. For example, audio applications such as music, or media applications such as videos do not have the ability to interact with the user by voice, and the user cannot control the execution of such applications through voice commands.
综上所述,目前电子设备的语音助手与应用分离,无法实现不同的应用接入同一个语音助手。To sum up, at present, the voice assistant of an electronic device is separated from the application, and it is impossible for different applications to access the same voice assistant.
发明内容SUMMARY OF THE INVENTION
本申请实施例将提供一种人机交互的方法、电子设备及系统,该方法可以实现系统级的语音交互,对于界面上显示的所有应用、所有可见的按钮、图片、图标、文字、控件等,用户都可以通过语音指令执行点击等操作,实现精准的人机交互,泛化了语音指令的识别,提高了用户意图识别的准确率。The embodiments of the present application will provide a human-computer interaction method, electronic device, and system, which can realize system-level voice interaction, for all applications displayed on the interface, all visible buttons, pictures, icons, text, controls, etc. , users can click and other operations through voice commands to achieve precise human-computer interaction, generalize the recognition of voice commands, and improve the accuracy of user intent recognition.
第一方面,提供了一种人机交互的方法,该方法应用于电子设备,该方法包括:在该电子设备中的人机交互应用运行的过程中,获取当前的界面内容信息;根据该界面内容信息,确定界面上的一个或多个控件,该一个或多个控件包括按钮、图标、图片、文字中的一种或多种;获取用户的语音指令;根据该语音指令,从该一个或多个控件中,匹配目标控件;以及,根据该语音指令,确定用户意图,响应于该用户意图,执行对该目标控件的操作。In a first aspect, a human-computer interaction method is provided, the method is applied to an electronic device, and the method includes: acquiring current interface content information during the running process of the human-computer interaction application in the electronic device; according to the interface Content information, determine one or more controls on the interface, the one or more controls include one or more of buttons, icons, pictures, and text; obtain the user's voice command; according to the voice command, from the one or more Among the plurality of controls, a target control is matched; and, according to the voice instruction, a user's intention is determined, and an operation on the target control is performed in response to the user's intention.
可选地,该界面内容可以包括当前显示界面上的用户可见的部分。示例性的,用户可见部分可以包括界面上显示的图片、文字、菜单、选项、图标、按钮等,本申请实施例中,统一称为“控件”等。Optionally, the interface content may include a user-visible portion of the currently displayed interface. Exemplarily, the user-visible part may include pictures, text, menus, options, icons, buttons, etc. displayed on the interface, which are collectively referred to as "controls" and the like in this embodiment of the present application.
应理解,本申请实施例中,当通过用户的语音指令匹配到界面上的目标控件时,可以对该目标控件执行操作。可选地,该操作可以包括单击、点击、双击、滑动、右击等输入 操作。It should be understood that, in this embodiment of the present application, when a user's voice instruction is matched to a target control on the interface, an operation may be performed on the target control. Optionally, the operation may include input operations such as clicking, clicking, double-clicking, sliding, and right-clicking.
还应理解,在本申请实施例中,通过获取用户的语音指令,解析语音指令之后,将语音指令匹配界面上的目标控件,即识别出用户的意图,进一步执行对目标控件的点击操作。It should also be understood that, in the embodiment of the present application, after obtaining the user's voice command, after parsing the voice command, the voice command is matched with the target control on the interface, that is, the user's intention is recognized, and the click operation on the target control is further performed.
通过上述实现过程,该方法通过获取界面上显示的可见且可以被用户执行点击操作的控件等,在获取用户可以输入语音指令执行对界面上的任意控件的点击等操作。对于显示屏上显示的所有应用、所有可见的内容,用户都可以通过语音指令进行控制。Through the above implementation process, the method obtains the controls displayed on the interface that are visible and that can be clicked by the user, and then the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands.
具体地,在对用户的语音指令进行分析的过程中,将获取的当前界面的界面内容信息作为ASR分析的参数,即根据当前界面的界面内容信息,精确地分析出当前的用户语音指令可能发生的应用场景,进而对用户的语音指令进行识别之后,将识别的语音指令文本和当前可能发生的应用场景中的控件等进行匹配,以更准确地获取用户的意图,提高了语音交互场景下的语音识别的准确率。Specifically, in the process of analyzing the user's voice command, the acquired interface content information of the current interface is used as the parameter of the ASR analysis, that is, according to the interface content information of the current interface, it is accurately analyzed that the current user's voice command may occur. After recognizing the user's voice command, the text of the recognized voice command is matched with the controls in the current possible application scenario, so as to obtain the user's intention more accurately and improve the voice interaction scenario. Accuracy of speech recognition.
特别地,在驾驶场景中,对于车载设备等嘈杂的环境中,当用户输入的语音指令伴随噪音时,本申请实施例可以结合当前的用户语音指令可能发生的应用场景,分析用户的语音指令,提高语音识别的准确率,从而减少了用户的手动操作,从而避免用户分心,提高了用户驾驶场景中的安全性。In particular, in a driving scenario, in a noisy environment such as an in-vehicle device, when the voice command input by the user is accompanied by noise, the embodiment of the present application can analyze the user's voice command in combination with the current application scenario where the user's voice command may occur. The accuracy of speech recognition is improved, thereby reducing user manual operations, thereby avoiding user distraction, and improving user safety in driving scenarios.
此外,对于语音交互的场景,不同的应用不需要单独开发语音助手,换言之,用户可以通过同样的语音交互方式控制多个不同的应用,语音助手和应用不再分离,丰富了应用生态。In addition, for voice interaction scenarios, different applications do not need to develop voice assistants separately. In other words, users can control multiple different applications through the same voice interaction method, and voice assistants and applications are no longer separated, which enriches the application ecology.
综上所述,基于用户的语音指令识别和分析技术,识别界面中包括的文本控件、图片、文字以及图标等,并通过用户语音指令匹配屏幕内容的控件,实现精准的人机交互,该方法泛化了语音指令的识别,提高了用户意图识别的准确率以及ASR识别准确率;此外,降低了语音交互的时延,使得可见即可说意图处理时延200ms以内,如果通过服务器等云端ASR识别语音指令,时延在200ms,大大提高了语音指令的检测效率,提高了用户体验。In summary, based on the user's voice command recognition and analysis technology, the text controls, pictures, text and icons included in the interface are identified, and the user's voice commands are matched to the controls of the screen content to achieve precise human-computer interaction. The recognition of voice commands is generalized, and the accuracy of user intent recognition and ASR recognition is improved; in addition, the delay of voice interaction is reduced, so that the processing delay of visible and speaking intent is within 200ms. Recognizing voice commands, the delay is 200ms, which greatly improves the detection efficiency of voice commands and improves the user experience.
对于消费者而言,所有应用、屏幕可见的内容均可以被用户的语音指令操控,在车机、出行场景可以减少用户分心,提升驾驶安全。对于开发者而言,不同的应用无需专门为语音交互进行接口适配,丰富出行场景的应用生态,支撑手机向车机终端的投射,实现应用生态搬迁,提高了HiCar的生态价值。For consumers, all applications and content visible on the screen can be controlled by the user's voice commands, which can reduce user distraction and improve driving safety in vehicle and travel scenarios. For developers, different applications do not need to be specially adapted for voice interaction, enriching the application ecology of travel scenarios, supporting the projection of mobile phones to vehicle terminals, realizing application ecological relocation, and improving the ecological value of HiCar.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,根据该语音指令,从该一个或多个控件中,匹配目标控件,包括:根据该语音指令,确定该语音指令和该一个或多个控件中的每一个控件的匹配程度;将该匹配程度最大的控件确定为该目标控件。In combination with the first aspect and the above implementations, in some implementations of the first aspect, matching the target control from the one or more controls according to the voice command includes: determining the voice command and The degree of matching of each of the one or more controls; the control with the greatest degree of matching is determined as the target control.
一种可能的实现方式中,在本申请实施例中,智慧语音服务模块可以对应于安装在手机侧的智慧语音应用程序,即由手机的智慧语音应用执行本申请实施例的语音指令识别的服务过程。In a possible implementation manner, in the embodiment of the present application, the smart voice service module may correspond to the smart voice application installed on the mobile phone side, that is, the smart voice application of the mobile phone performs the voice command recognition service of the embodiment of the present application. Process.
另一种可能的实现方式中,智慧语音服务模块对应的服务可以由服务器所提供,该场景可以借助于服务器的语音分析能力,由手机将用户的语音指令发送到服务器,服务器进行语音指令的分析之后,返回给手机语音指令的识别结果,此处不再赘述。In another possible implementation, the service corresponding to the smart voice service module can be provided by the server. In this scenario, the mobile phone can send the user's voice command to the server with the help of the server's voice analysis capability, and the server can analyze the voice command. After that, the recognition result of the voice command of the mobile phone is returned, which will not be repeated here.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,根据该语音指令,确定该语音指令和该一个或多个控件中的每一个控件的匹配程度,包括:提取该语音指令包 含的一个或多个关键词;确定该一个或多个关键词中的每一个关键词和该一个或多个控件中的每一个控件的描述信息的匹配程度;将该匹配程度最大的控件确定为该目标控件。In combination with the first aspect and the above implementations, in some implementations of the first aspect, according to the voice command, determining the matching degree of the voice command and each of the one or more controls includes: extracting the voice command One or more keywords included in the instruction; determine the matching degree of each keyword in the one or more keywords and the description information of each control in the one or more controls; the control with the largest matching degree Determined as the target control.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,当该一个或多个控件中包括图标控件时,该方法还包括:获取该图标控件的轮廓,根据该轮廓确定描述该图标控件的轮廓关键词;确定该语音指令中的包括的一个或多个关键词和该图标控件的轮廓关键词的匹配程度;将该匹配程度最大的图标控件确定为该目标控件。In combination with the first aspect and the above implementations, in some implementations of the first aspect, when the one or more controls include an icon control, the method further includes: acquiring an outline of the icon control, and determining a description according to the outline The outline keyword of the icon control; determining the matching degree of one or more keywords included in the voice instruction and the outline keyword of the icon control; determining the icon control with the largest matching degree as the target control.
一种可能的实现方式中,ASR模块中有ASR模型,本申请实施例中,可以同步步骤908将获取的当前显示的界面内容的信息传递给ASR模块,即在ASR模型中输入界面内容的信息作为参数,再根据更新后的ASR模型对用户语音指令进行识别。In a possible implementation manner, there is an ASR model in the ASR module. In this embodiment of the present application, the acquired information of the currently displayed interface content can be transferred to the ASR module in synchronization with step 908, that is, the information of the interface content is entered in the ASR model. As a parameter, the user's voice command is recognized according to the updated ASR model.
示例性的,用户语音指令可能包括同音字等,例如用户输入“综艺”,受不同用户的发音等影响,可能经过ASR模块的ASR分析,根据“zong yi”、“zhong yi”等拼音会产生“中意”、“中医”、“忠义”、“综艺”等可能的识别结果,这样的同音字、相近拼音的词,可能导致手机无法准确通过用户的语音指令获取用户的操作意图。结合本申请实施例,根据当前的车载设备103的界面内容信息,例如当前界面显示了很多音频信息、明星照片、视频信息等,ASR模块进行分析时,就会从“中意”、“中医”、“忠义”、“综艺”等可能的识别结果中,选择与当前显示的音频信息、明星照片、视频信息等更相关的“综艺”,进而确定当前用户发出的语音指令为“综艺”。Exemplarily, the user's voice command may include homophones, etc., for example, the user inputs "variety show", which is affected by the pronunciation of different users, and may be analyzed by the ASR module. The possible recognition results of "Zhongyi", "Traditional Chinese Medicine", "Loyalty", "Variety", etc. Such homophones and words with similar pinyin may cause the mobile phone to fail to accurately obtain the user's operation intention through the user's voice command. In conjunction with the embodiments of the present application, according to the current interface content information of the in-vehicle device 103, for example, the current interface displays a lot of audio information, star photos, video information, etc., when the ASR module analyzes, it will select from “Zhongyi”, “Traditional Chinese Medicine”, From the possible identification results such as "loyalty" and "variety show", select "variety show" that is more relevant to the currently displayed audio information, star photos, video information, etc., and then determine that the voice command issued by the current user is "variety show".
通过上述更新的算法实现过程,在现有的ASR模块的语音指令是被过程中引入当前显示的界面内容的信息,从而可以根据当前显示的界面内容的信息准确地分析出用户当前语音指令的使用场景,进而准确地定位当前用户语音指令针对的应用场景,提高了识别语音指令的准确率。Through the above-mentioned updated algorithm implementation process, the voice command of the existing ASR module is the information of the currently displayed interface content introduced into the process, so that the use of the user's current voice command can be accurately analyzed according to the information of the currently displayed interface content. scene, and then accurately locate the application scene targeted by the current user's voice command, and improve the accuracy of recognizing the voice command.
应理解,以上过程可以理解为获取用户的语音指令之后,根据所述语音指令,确定用户意图的过程,例如确定用户当前输入的语音指令是要点击当前界面上的哪一个控件。It should be understood that the above process can be understood as a process of determining the user's intention according to the voice command after acquiring the user's voice command, for example, determining which control on the current interface is to be clicked by the voice command currently input by the user.
一种可能的实现方式中,根据用户的语音指令,确定所述语音指令和所述一个或多个控件中的每一个控件的匹配程度,将所述匹配程度最大的控件确定为用户要执行点击操作的目标控件。In a possible implementation manner, according to the user's voice command, the matching degree of the voice command and each of the one or more controls is determined, and the control with the largest matching degree is determined as the user to perform the click. The target control for the operation.
可选地,确定语音指令和所述一个或多个控件中的每一个控件的匹配程度的过程中,可以提取用户语音指令包含的一个或多个关键词;确定所述一个或多个关键词中的每一个关键词和所述一个或多个控件中的每一个控件的描述信息的匹配程度;将所述匹配程度最大的控件确定为所述目标控件。Optionally, in the process of determining the degree of matching between the voice command and each of the one or more controls, one or more keywords contained in the user's voice command may be extracted; the one or more keywords may be determined. The degree of matching between each keyword in the and the description information of each of the one or more controls; and determining the control with the greatest degree of matching as the target control.
可选地,关键词可以包括语音指令的字、词、部分或全部汉字的拼音等,本申请实施例对此不作限定。Optionally, the keywords may include words, words, pinyin of part or all of the Chinese characters of the voice command, etc., which are not limited in this embodiment of the present application.
可选地,该每一个控件的描述信息可以包括该控件的轮廓信息、文本信息、颜色信息、位置信息、图标信息等,本申请实施例对此不作限定。Optionally, the description information of each control may include outline information, text information, color information, position information, icon information, etc. of the control, which is not limited in this embodiment of the present application.
示例性的,在音乐播放界面,当用户输入语音指令为“我喜欢”,语音指令和界面包括的控件匹配过程中,假如音乐播放界面上的收藏按钮的描述词为“喜欢”、“收藏”,该收藏按钮的轮廓为“桃心”的形状,那么该桃心的形状可以被匹配为“我喜欢”,该方式可以泛化用户语音指令,更智能的将用户指令和界面上的控件进行匹配。Exemplarily, in the music playing interface, when the user inputs the voice command as "I like", during the matching process between the voice command and the controls included in the interface, if the description words of the favorite button on the music playing interface are "like", "favorite" , the outline of the favorite button is the shape of "peach heart", then the shape of the peach heart can be matched as "I like", this method can generalize the user's voice command, and more intelligently combine the user's command with the controls on the interface match.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,该方法还包括:检测 到该语音指令时,在该一个或多个控件中的部分控件上增加数字角标;当检测到该语音指令中包括第一数字时,将该第一数字标记的控件确定为该目标控件。In combination with the first aspect and the above-mentioned implementations, in some implementations of the first aspect, the method further includes: when the voice command is detected, adding a digital corner label to some controls in the one or more controls; When it is detected that the voice instruction includes the first number, the control marked with the first number is determined as the target control.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,在该一个或多个控件中的部分控件上增加数字角标,包括:按照预设的顺序,在该一个或多个控件中的部分控件上增加该数字角标,该预设的顺序包括从左到右,和/或从上到下的顺序。In combination with the first aspect and the above-mentioned implementations, in some implementations of the first aspect, adding a digital corner label to some of the one or more controls includes: according to a preset order, in the one or more controls. The number label is added to some of the controls, and the preset order includes the order from left to right and/or from top to bottom.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,能够增加数字角标的该部分控件包括以下一种或多种:该一个或多个控件中的所有为图片类型的控件;或者该一个或多个控件中具有网格型排列顺序的控件;或者该一个或多个控件中具有列表型排列顺序的控件;或者该一个或多个控件中显示尺寸大于或等于预设值的控件。In combination with the first aspect and the above-mentioned implementations, in some implementations of the first aspect, the part of the controls that can add a digital corner mark includes one or more of the following: all of the one or more controls are picture-type controls ; or the one or more controls have a grid-type arrangement order; or the one or more controls have a list-type arrangement order; or the one or more controls have a display size greater than or equal to the preset value 's controls.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,该界面内容信息对应的界面是在该电子设备的前台运行的应用的界面,和/或是在该电子设备的后台运行的应用的界面。In combination with the first aspect and the above implementations, in some implementations of the first aspect, the interface corresponding to the interface content information is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device The interface of the running application.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,该方法还包括:启动该电子设备上的该人机交互应用。With reference to the first aspect and the foregoing implementations, in some implementations of the first aspect, the method further includes: starting the human-computer interaction application on the electronic device.
结合第一方面和上述实现方式,在第一方面的某些实现方式中,启动该电子设备上的该人机交互应用,包括:获取用户的预设输入,启动该电子设备上的该人机交互应用,该预设输入包括触发一个按钮的操作、语音输入的预设人机交互指令或预设的指纹输入中的至少一种方式。In combination with the first aspect and the above implementations, in some implementations of the first aspect, starting the human-computer interaction application on the electronic device includes: obtaining a user's preset input, and starting the human-computer interaction application on the electronic device For an interactive application, the preset input includes at least one of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.
第二方面,提供了一种电子设备,包括:一个或多个处理器;一个或多个存储器;安装有多个应用程序的模块;该存储器存储有一个或多个程序,当该一个或者多个程序被该处理器执行时,使得该电子设备执行以下步骤:在人机交互应用运行的过程中,获取当前的界面内容信息;根据该界面内容信息,确定界面上的一个或多个控件,该一个或多个控件包括按钮、图标、图片、文字中的一种或多种;获取用户的语音指令;根据该语音指令,从该一个或多个控件中,匹配目标控件;以及,根据该语音指令,确定用户意图,响应于该用户意图,执行对该目标控件的操作。In a second aspect, an electronic device is provided, comprising: one or more processors; one or more memories; a module installed with a plurality of application programs; the memory stores one or more programs, when the one or more When a program is executed by the processor, the electronic device is made to perform the following steps: in the process of running the human-computer interaction application, obtain the current interface content information; according to the interface content information, determine one or more controls on the interface, The one or more controls include one or more of buttons, icons, pictures, and text; obtain a user's voice command; according to the voice command, match the target control from the one or more controls; and, according to the The voice command determines the user's intention, and in response to the user's intention, performs an operation on the target control.
结合第二方面,在第二方面的某些实现方式中,当该一个或者多个程序被该处理器执行时,使得该电子设备执行以下步骤:根据该语音指令,确定该语音指令和该一个或多个控件中的每一个控件的匹配程度;将该匹配程度最大的控件确定为该目标控件。In conjunction with the second aspect, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: according to the voice instruction, determine the voice instruction and the one The matching degree of each control in or multiple controls; the control with the largest matching degree is determined as the target control.
结合第二方面和上述实现方式,在第二方面的某些实现方式中,当该一个或者多个程序被该处理器执行时,使得该电子设备执行以下步骤:提取该语音指令包含的一个或多个关键词;确定该一个或多个关键词中的每一个关键词和该一个或多个控件中的每一个控件的描述信息的匹配程度;将该匹配程度最大的控件确定为该目标控件。In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: extracting one or more components included in the voice instruction Multiple keywords; determine the matching degree of each keyword in the one or more keywords and the description information of each control in the one or more controls; determine the control with the largest matching degree as the target control .
结合第二方面和上述实现方式,在第二方面的某些实现方式中,该一个或多个控件中包括图标控件,当该一个或者多个程序被该处理器执行时,使得该电子设备执行以下步骤:获取该图标控件的轮廓,根据该轮廓确定描述该图标控件的轮廓关键词;确定该语音指令中的包括的一个或多个关键词和该图标控件的轮廓关键词的匹配程度;将该匹配程度最大的图标控件确定为该目标控件。In combination with the second aspect and the above implementations, in some implementations of the second aspect, the one or more controls include icon controls, and when the one or more programs are executed by the processor, the electronic device is made to execute Following steps: obtain the outline of the icon control, determine the outline keyword describing the icon control according to the outline; determine the matching degree of one or more keywords included in the voice command and the outline keyword of the icon control; The icon control with the highest matching degree is determined as the target control.
结合第二方面和上述实现方式,在第二方面的某些实现方式中,当该一个或者多个程序被该处理器执行时,使得该电子设备执行以下步骤:检测到该语音指令时,在该一个或 多个控件中的部分控件上增加数字角标;当检测到该语音指令中包括第一数字时,将该第一数字标记的控件确定为该目标控件。In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: when the voice instruction is detected, in A digital corner mark is added to some of the one or more controls; when it is detected that the voice instruction includes a first number, the control marked with the first number is determined as the target control.
结合第二方面和上述实现方式,在第二方面的某些实现方式中,当该一个或者多个程序被该处理器执行时,使得该电子设备执行以下步骤:按照预设的顺序,在该一个或多个控件中的部分控件上增加该数字角标,该预设的顺序包括从左到右,和/或从上到下的顺序。In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: in a preset order, in the The numerical superscript is added to some of the one or more controls, and the preset order includes the order from left to right and/or from top to bottom.
结合第二方面和上述实现方式,在第二方面的某些实现方式中,能够增加数字角标的该部分控件包括以下一种或多种:该一个或多个控件中的所有为图片类型的控件;或者该一个或多个控件中具有网格型排列顺序的控件;或者该一个或多个控件中具有列表型排列顺序的控件;或者该一个或多个控件中显示尺寸大于或等于预设值的控件。In combination with the second aspect and the above-mentioned implementations, in some implementations of the second aspect, the part of the controls that can add digital corner labels includes one or more of the following: all of the one or more controls are picture-type controls ; or the one or more controls have a grid-type arrangement order; or the one or more controls have a list-type arrangement order; or the one or more controls have a display size greater than or equal to the preset value 's controls.
结合第二方面和上述实现方式,在第二方面的某些实现方式中,该界面内容信息对应的界面是在该电子设备的前台运行的应用的界面,和/或是在该电子设备的后台运行的应用的界面。In combination with the second aspect and the above implementations, in some implementations of the second aspect, the interface corresponding to the interface content information is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device. The interface of the running application.
结合第二方面和上述实现方式,在第二方面的某些实现方式中,当该一个或者多个程序被该处理器执行时,使得该电子设备执行以下步骤:启动该人机交互应用。In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: start the human-computer interaction application.
结合第二方面和上述实现方式,在第二方面的某些实现方式中,当该一个或者多个程序被该处理器执行时,使得该电子设备执行以下步骤:获取用户的预设输入,启动该人机交互应用,该预设输入包括触发一个按钮的操作、语音输入的预设人机交互指令或预设的指纹输入中的至少一种方式。In combination with the second aspect and the above implementations, in some implementations of the second aspect, when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: acquiring a user's preset input, starting In the human-computer interaction application, the preset input includes at least one mode of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.
第三方面,本申请提供了一种系统,该系统包括连接的电子设备和显示设备,该电子设备可以执行上述第一方面中任一项可能的人机交互的方法,该显示设备用于显示该电子设备的应用界面。In a third aspect, the present application provides a system, the system includes a connected electronic device and a display device, the electronic device can perform any one of the possible human-computer interaction methods in the first aspect above, and the display device is used for displaying The application interface of the electronic device.
第四方面,本申请提供了一种装置,该装置包含在电子设备中,该装置具有实现上述方面及上述方面的可能实现方式中电子设备行为的功能。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块或单元。例如,显示模块或单元、检测模块或单元、处理模块或单元等。In a fourth aspect, the present application provides an apparatus, the apparatus is included in an electronic device, and the apparatus has a function of implementing the behavior of the electronic device in the above-mentioned aspect and possible implementations of the above-mentioned aspect. The functions can be implemented by hardware, or by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the above functions. For example, a display module or unit, a detection module or unit, a processing module or unit, and the like.
第五方面,本申请提供了一种电子设备,包括:触摸显示屏,其中,触摸显示屏包括触敏表面和显示器;定位芯片;一个或多个摄像头;一个或多个处理器;一个或多个存储器;多个应用程序;以及一个或多个计算机程序。其中,一个或多个计算机程序被存储在存储器中,一个或多个计算机程序包括指令。当指令被一个或多个处理器执行时,使得电子设备执行上述任一方面任一项可能的人机交互的方法。In a fifth aspect, the present application provides an electronic device, comprising: a touch display screen, wherein the touch display screen includes a touch-sensitive surface and a display; a positioning chip; one or more cameras; one or more processors; a plurality of memory; a plurality of application programs; and one or more computer programs. Wherein, one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. The instructions, when executed by one or more processors, cause an electronic device to perform any of the possible human-computer interaction methods described above.
第六方面,本申请提供了一种电子设备,包括一个或多个处理器和一个或多个存储器。该一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行上述任一方面任一项可能的实现中的人机交互的方法。In a sixth aspect, the present application provides an electronic device including one or more processors and one or more memories. The one or more memories are coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform A method for human-computer interaction in any possible implementation of any of the above aspects.
第七方面,本申请提供了一种计算机存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述任一方面任一项可能的人机交互的方法。In a seventh aspect, the present application provides a computer storage medium, including computer instructions, when the computer instructions are executed on an electronic device, the electronic device can perform any of the possible human-computer interaction methods in any of the foregoing aspects.
第八方面,本申请提供了一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行上述任一方面任一项可能的人机交互的方法。In an eighth aspect, the present application provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to perform any of the possible human-computer interaction methods in any of the foregoing aspects.
附图说明Description of drawings
图1是本申请实施例提供的一例人机交互的方法的应用场景示意图。FIG. 1 is a schematic diagram of an application scenario of an example of a method for human-computer interaction provided by an embodiment of the present application.
图2是本申请实施例提供过的一例电子设备的结构示意图。FIG. 2 is a schematic structural diagram of an example of an electronic device provided by an embodiment of the present application.
图3是本申请实施例的一例人机交互的方法实现过程的软件结构框图。FIG. 3 is a software structural block diagram of an implementation process of an example of a method for human-computer interaction according to an embodiment of the present application.
图4是本申请实施例提供的一例车载设备上实现语音交互过程的界面示意图。FIG. 4 is a schematic interface diagram of an example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
图5是本申请实施例提供的又一例车载设备上实现语音交互过程的界面示意图。FIG. 5 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application.
图6是本申请实施例提供的又一例车载设备上实现语音交互过程的界面示意图。FIG. 6 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
图7是本申请实施例提供的又一例车载设备上实现语音交互过程的界面示意图。FIG. 7 is a schematic interface diagram of another example of a voice interaction process implemented on a vehicle-mounted device provided by an embodiment of the present application.
图8是本申请实施例提供的又一例车载设备上实现语音交互过程的界面示意图。FIG. 8 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
图9是本申请实施例提供的语音交互的方法的示意性流程图。FIG. 9 is a schematic flowchart of a voice interaction method provided by an embodiment of the present application.
具体实施方式detailed description
下面结合附图和应用场景,对本申请实施例提供的人机交互的方法进行具体阐述。The following describes the human-computer interaction method provided by the embodiments of the present application in detail with reference to the accompanying drawings and application scenarios.
其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。Wherein, in the description of the embodiments of the present application, unless otherwise stated, “/” means or means, for example, A/B can mean A or B; “and/or” in this document is only a description of the associated object The association relationship of , indicates that there can be three kinds of relationships, for example, A and/or B, can indicate that A exists alone, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" refers to two or more than two.
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature.
本申请实施例将提供一种人机交互的方法,下面结合附图以及不同的实施例,详细介绍如何通过该人机交互的方法实现系统级的语音交互。首先,在介绍本申请实施例将提供的人机交互的方法之前,先列举几种可能的应用场景。The embodiments of the present application will provide a human-computer interaction method. The following describes in detail how to implement system-level voice interaction through the human-computer interaction method with reference to the accompanying drawings and different embodiments. First, before introducing the human-computer interaction method provided by the embodiments of the present application, several possible application scenarios are listed first.
图1是本申请实施例提供的一例人机交互的方法的应用场景示意图。FIG. 1 is a schematic diagram of an application scenario of an example of a method for human-computer interaction provided by an embodiment of the present application.
一种可能的场景中,本申请实施例提供的人机交互的方法可以应用于包括单独的电子设备的场景中。示例性的,如图1中的(a)图所示,以智慧屏101作为该电子设备,该人机交互的方法应用于用户使用智慧屏101的场景中。具体地,智慧屏101可以通过麦克风获取用户的语音指令,并识别该语音指令,根据用户的语音指令执行相应的操作、显示相应的界面等。In a possible scenario, the human-computer interaction method provided by the embodiments of the present application may be applied to a scenario including a separate electronic device. Exemplarily, as shown in (a) of FIG. 1 , the smart screen 101 is used as the electronic device, and the human-computer interaction method is applied to a scenario where a user uses the smart screen 101 . Specifically, the smart screen 101 can acquire the user's voice command through the microphone, recognize the voice command, perform corresponding operations according to the user's voice command, display a corresponding interface, and the like.
又一种可能的场景中,本申请实施例提供的人机交互的方法还可以应用于包括两个电子设备的场景中,该场景中的两个电子设备可以包括手机、平板电脑、可穿戴设备、车载设备等不同类型的电子设备。In another possible scenario, the human-computer interaction method provided by the embodiments of the present application may also be applied to a scenario including two electronic devices, and the two electronic devices in the scenario may include a mobile phone, a tablet computer, and a wearable device. , vehicle equipment and other different types of electronic equipment.
示例性的,如图1中的(b)图所示,以该场景包括手机102和车载设备103为例,车载设备103可以作为一种显示设备,和手机102连接,显示并运行手机102的应用。手机102可以获取用户的语音指令,并识别该语音指令,且根据用户的语音指令在后台执行相应的操作,再将执行相应的操作之后的界面投屏显示在车载设备103上。或者,车载设备103也可以获取用户的语音指令,并将语音指令传递给手机102,由手机识别该语音指令,且根据用户的语音指令在后台执行相应的操作,再将执行相应的操作之后的界面投屏显示在车载设备103上。Exemplarily, as shown in (b) of FIG. 1 , taking the scenario including the mobile phone 102 and the in-vehicle device 103 as an example, the in-vehicle device 103 can be used as a display device, connected to the mobile phone 102 to display and run the mobile phone 102 . application. The mobile phone 102 can acquire the user's voice command, recognize the voice command, and perform the corresponding operation in the background according to the user's voice command, and then display the screen after the corresponding operation is performed on the in-vehicle device 103 . Alternatively, the in-vehicle device 103 can also obtain the user's voice command, and transmit the voice command to the mobile phone 102, the mobile phone recognizes the voice command, and performs the corresponding operation in the background according to the user's voice command, and then executes the corresponding operation. The interface projection screen is displayed on the in-vehicle device 103 .
再一种可能的场景中,本申请实施例提供的人机交互的方法还可以应用于包括至少一 个电子设备和服务器的场景中。示例性的,如图1中的(c)图所示,在包括手机102、车载设备103和服务器104的场景中,手机102或者车载设备103可以获取用户的语音指令之后,将用户的语音指令上传到服务器104,借助服务器104的语音分析能力,更加快速准确的分析用户的语音指令,再将分析的语音指令结果传递回手机102,在手机执行相应的操作。In yet another possible scenario, the human-computer interaction method provided in the embodiments of the present application may also be applied to a scenario including at least one electronic device and a server. Exemplarily, as shown in (c) of FIG. 1 , in the scenario including the mobile phone 102, the in-vehicle device 103 and the server 104, the mobile phone 102 or the in-vehicle device 103 can obtain the user's voice command, and then convert the user's voice command to the user's voice command. Upload the data to the server 104, analyze the user's voice command more quickly and accurately with the help of the voice analysis capability of the server 104, and then transmit the analyzed voice command result back to the mobile phone 102, and perform corresponding operations on the mobile phone.
应理解,本申请实施例对语音交互的应用场景不作限定。It should be understood that the embodiments of the present application do not limit the application scenarios of voice interaction.
结合以上介绍的场景,本申请实施例提供的人机交互的方法可以应用于手机、智慧屏、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等电子设备上,本申请实施例对电子设备的具体类型不作任何限制。In combination with the scenarios introduced above, the method for human-computer interaction provided by the embodiments of the present application can be applied to mobile phones, smart screens, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (virtual reality, VR) devices, notebook computers, ultra-mobile personal computers (ultra-mobile personal computers, UMPCs), netbooks, personal digital assistants (personal digital assistants, PDAs) and other electronic devices, the embodiments of the present application do not make any specific types of electronic devices. limit.
在本申请实施例中,将以图1中列举的智慧屏101、手机102、车载设备103统一称为“电子设备100”,下面介绍电子设备100可能具有的结构。In this embodiment of the present application, the smart screen 101 , the mobile phone 102 , and the in-vehicle device 103 listed in FIG. 1 are collectively referred to as “electronic device 100 ”, and possible structures of the electronic device 100 are described below.
示例性的,图2是本申请实施例提供过的一例电子设备100的结构示意图。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。Exemplarily, FIG. 2 is a schematic structural diagram of an example of an electronic device 100 provided by an embodiment of the present application. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。The I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。The I2S interface can be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 . In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。The PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。The UART interface is a universal serial data bus used for asynchronous communication. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160 . For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。The MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 . MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc. In some embodiments, the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 . The processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。The GPIO interface can be configured by software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。The USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones. The interface can also be used to connect other electronic devices, such as AR devices.
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并 不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。The charging management module 140 is used to receive charging input from the charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from the wired charger through the USB interface 130 . In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 . The power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 . The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110 . In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。 Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。The modem processor may include a modulator and a demodulator. Wherein, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and passed to the application processor. The application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 . In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理 后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR). The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light). emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on. In some embodiments, the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193 .
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。The NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process the input information, and can continuously learn by itself. Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 . The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。 Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone jack 170D is used to connect wired earphones. The earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。陀螺仪传感器 180B可以用于确定电子设备100的运动姿态。气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。环境光传感器180L用于感知环境光亮度。指纹传感器180H用于采集指纹。温度传感器180J用于检测温度。骨传导传感器180M可以获取振动信号。触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。The pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals. The gyro sensor 180B can be used to determine the motion attitude of the electronic device 100. The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation. The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D. The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). Distance sensor 180F for measuring distance. The electronic device 100 can measure the distance through infrared or laser. Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes. The ambient light sensor 180L is used to sense ambient light brightness. The fingerprint sensor 180H is used to collect fingerprints. The temperature sensor 180J is used to detect the temperature. The bone conduction sensor 180M can acquire vibration signals. Touch sensor 180K, also called "touch panel". The touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to touch operations may be provided through display screen 194 .
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。Motor 191 can generate vibrating cues. The motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects. The motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
SIM卡接口195用于连接SIM卡。The SIM card interface 195 is used to connect a SIM card.
应理解,本申请实施例的人机交互的方法可以应用于具有图2所示的全部或者部分结构的任意一种可能的电子设备上。It should be understood that the human-computer interaction method in this embodiment of the present application may be applied to any possible electronic device having all or part of the structure shown in FIG. 2 .
换言之,图1示出的可能的场景中,智慧屏101、手机102、车载设备103等电子设备都可以具有图2所示的结构,或者具有比图2具有更多或者更少部件的结构,本申请实施例对应用场景中包括的电子设备的类型不作限定。In other words, in the possible scenario shown in FIG. 1 , electronic devices such as the smart screen 101 , the mobile phone 102 , and the in-vehicle device 103 may all have the structure shown in FIG. 2 , or have a structure with more or fewer components than that shown in FIG. The embodiments of the present application do not limit the types of electronic devices included in the application scenario.
为了便于理解,下面的实施例将以图1中的(b)图所示的场景为例,在至少包括手机102、车载设备103的场景中,具体介绍本申请实施例提供的人机交互的方法。For ease of understanding, the following embodiment will take the scenario shown in (b) of FIG. 1 as an example, and in a scenario including at least a mobile phone 102 and a vehicle-mounted device 103 , the human-computer interaction provided by the embodiments of the present application will be described in detail. method.
首先,针对图1中的(b)图所示的场景,介绍本申请实施例提供的人机交互的方法实现过程的软件结构框图。First, for the scenario shown in (b) of FIG. 1 , a software structural block diagram of the implementation process of the method for human-computer interaction provided by the embodiment of the present application is introduced.
在一些实施例中,图2所示的电子设备100为手机时,可以具有鸿蒙系统(Harmony OS)系统、
Figure PCTCN2021113542-appb-000001
系统、
Figure PCTCN2021113542-appb-000002
系统或者其它任意一种可能的操作系统,或者可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构等。本申请实施例以手机具有分层架构的
Figure PCTCN2021113542-appb-000003
系统为例,示例性说明手机102的软件结构。
In some embodiments, when the electronic device 100 shown in FIG. 2 is a mobile phone, it may have a Harmony OS system,
Figure PCTCN2021113542-appb-000001
system,
Figure PCTCN2021113542-appb-000002
The system or any other possible operating system, or may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture, etc. In the embodiments of the present application, the mobile phone has a layered architecture.
Figure PCTCN2021113542-appb-000003
Taking the system as an example, the software structure of the mobile phone 102 is exemplarily described.
图3是本申请实施例的一例人机交互的方法实现过程的软件结构框图。其中,手机102和车载设备103建立连接之后,车载设备103可以作为手机102的投屏设备(或者称“显示设备”),在车载设备103的显示屏上可以显示手机102的应用。FIG. 3 is a software structural block diagram of an implementation process of an example of a method for human-computer interaction according to an embodiment of the present application. Wherein, after the mobile phone 102 and the in-vehicle device 103 are connected, the in-vehicle device 103 can be used as a screen projection device (or “display device”) of the mobile phone 102 , and the application of the mobile phone 102 can be displayed on the display screen of the in-vehicle device 103 .
具体地,
Figure PCTCN2021113542-appb-000004
系统具有分层架构,可以将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将
Figure PCTCN2021113542-appb-000005
系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
specifically,
Figure PCTCN2021113542-appb-000004
The system has a layered architecture, the software can be divided into several layers, each layer has a clear role and division of labor. Layers communicate with each other through software interfaces. In some embodiments, the
Figure PCTCN2021113542-appb-000005
The system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer.
1、应用程序层1. Application layer
应用程序层可以包括一系列应用程序包。如图3所示,应用程序包可以包括可见即可说11、智慧语音13、音乐、导航和HiCar 15等应用程序。下面重点介绍本申请实施例中的可见即可说11和智慧语音13分别对应的功能模块。The application layer can include a series of application packages. As shown in Figure 3, the application package can include applications such as Visible to Speak 11, Smart Voice 13, Music, Navigation, and HiCar 15. The following mainly introduces the functional modules respectively corresponding to the visible to speak 11 and the intelligent voice 13 in the embodiments of the present application.
(1)可见即可说11(1) Visible can say 11
在本申请实施例中,“可见”可以指在用户与电子设备的人机交互过程中,用户可以看见的部分。示例性的,用户可见部分可以包括电子设备的屏幕上的显示内容,例如电子设备的桌面、窗口、菜单、图标、按钮和控件等。In this embodiment of the present application, "visible" may refer to the part that the user can see during the human-computer interaction between the user and the electronic device. Exemplarily, the user-visible portion may include display content on the screen of the electronic device, such as the desktop, windows, menus, icons, buttons, and controls of the electronic device.
应理解,该可见部分还可以包括电子设备的屏幕上显示的文字、图片、视频等多媒体内容,本申请实施例对此不作限定。It should be understood that the visible portion may also include multimedia content such as text, pictures, and videos displayed on the screen of the electronic device, which is not limited in this embodiment of the present application.
还应理解,该电子设备的屏幕上的显示内容可以是电子设备前台运行某应用显示的界面,也可以是该电子设备后台运行某应用的虚拟显示界面,该虚拟显示界面可以经过投屏显示在其他电子设备上。It should also be understood that the display content on the screen of the electronic device can be an interface displayed by an application running in the foreground of the electronic device, or a virtual display interface running an application in the background of the electronic device. on other electronic devices.
在本申请实施例中,“可说”指用户可以通过语音指令与可见部分进行交互互动,进而完成交互任务。示例性的,对于电子设备的桌面、窗口、菜单、图标、按钮和控件等用户可见部分,用户可以通过语音指令实现对其的操控,进而执行对可见部分的点击、双击、滑动等输入操作。In the embodiment of the present application, "speakable" means that the user can interact with the visible part through a voice command, thereby completing the interactive task. Exemplarily, for user-visible parts such as desktops, windows, menus, icons, buttons, and controls of an electronic device, the user can control them through voice commands, and then perform input operations such as clicking, double-clicking, and sliding on the visible parts.
要实现上述功能,可见即可说11可以包括界面信息获取模块111、意图处理模块112、接口模块113和预定义动作执行模块114等。其中,界面信息获取模块111可以获取手机前台或者后台运行的应用的界面内容信息。意图处理模块112可以接收智慧语音13返回的用户语音指令,并根据用户语音指令确定用户意图。接口模块113用于实现在各个不同的应用之间的数据和信息交互。预定义动作执行模块114用于根据语音指令、用户意图等执行相应的操作。To realize the above functions, it can be seen that 11 may include an interface information acquisition module 111, an intent processing module 112, an interface module 113, a predefined action execution module 114, and the like. The interface information acquisition module 111 may acquire interface content information of applications running in the foreground or background of the mobile phone. The intent processing module 112 may receive the user's voice instruction returned by the smart voice 13, and determine the user's intent according to the user's voice instruction. The interface module 113 is used to realize data and information exchange between various applications. The predefined action execution module 114 is configured to execute corresponding operations according to voice commands, user intentions, and the like.
(2)智慧语音13(2) Smart Voice 13
在本申请实施例中,智慧语音13可以对应于安装在手机102侧的智慧语音应用程序,即由手机102的智慧语音应用提供语音识别的服务过程。In this embodiment of the present application, the smart voice 13 may correspond to a smart voice application installed on the side of the mobile phone 102 , that is, a service process of voice recognition provided by the smart voice application of the mobile phone 102 .
或者,智慧语音13提供语音识别的服务过程可以由服务器所提供,该场景可以对应图1中的(c)图所示,借助于服务器104的语音分析能力,由手机102将用户的语音指令发送到服务器104,服务器104进行语音指令的分析之后,返回给手机102语音指令的识别结果,此处不再赘述。Or, the service process of the voice recognition provided by the smart voice 13 can be provided by the server, and this scenario can correspond to (c) in FIG. After the server 104 analyzes the voice command, the server 104 returns the recognition result of the voice command to the mobile phone 102, which will not be repeated here.
可选地,智慧语音13可以包括语义理解(natural language understanding,NLU)模块131、语音识别(automatic speech recognition,ASR)模块132、语音合成(text to speech,TTS)模块133和会话管理(dialog management,DM)模块134等。Optionally, the intelligent speech 13 may include a semantic understanding (natural language understanding, NLU) module 131, a speech recognition (automatic speech recognition, ASR) module 132, a speech synthesis (text to speech, TTS) module 133 and a session management (dialog management) module 131 , DM) module 134 and so on.
其中,ASR模块132可以将用户输入的原始语音信号转换成文本信息;NUL模块131用于可以将识别出的文本信息转换为手机、车载设备等电子设备可以理解的语义;DM模 块134可以基于对话的状态判断系统应该采取的动作等;TTS模块133可以将自然语言文本变成语音输出给用户。Among them, the ASR module 132 can convert the original voice signal input by the user into text information; the NUL module 131 can convert the recognized text information into semantics that can be understood by electronic devices such as mobile phones and in-vehicle devices; the DM module 134 can be based on dialogue The state determines the action that the system should take, etc.; the TTS module 133 can convert the natural language text into speech and output it to the user.
此外,应理解,智慧语音13还可以自然语言生成(natural language generation,NLG)模块等,本申请实施例对此不作限定。In addition, it should be understood that the smart speech 13 may also use a natural language generation (NLG) module, etc., which is not limited in this embodiment of the present application.
(3)HiCar应用15(3) HiCar application 15
通过HiCar应用15可以将手机的应用投屏到车载设备,在该投屏过程中,应用实际运行在手机侧,该运行可以包括手机的前台运行或者后台运行,即该投屏过程可以借助手机的运算能力。The application of the mobile phone can be projected to the vehicle device through the HiCar application 15. During the projection process, the application actually runs on the side of the mobile phone, and the operation can include the foreground operation or the background operation of the mobile phone. B.
应理解,车载设备可以有独立的显示系统,通过HiCar应用15将手机的应用投屏到车载设备之后,在车载设备上可以有独立的显示桌面和应用快速入口,同时提供语音指令的获取能力。It should be understood that the in-vehicle device can have an independent display system. After the application of the mobile phone is projected to the in-vehicle device through the HiCar application 15, there can be an independent display desktop and application quick entry on the in-vehicle device, while providing the ability to obtain voice commands.
2、应用程序框架层2. Application Framework Layer
应用程序框架层包括多种服务程序或者一些预先定义的函数,可以为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。如图3所示,应用程序框架层可以包括内容提供器(content sensor)21、多屏框架服务模块23和视图系统25等。The application framework layer includes a variety of service programs or some predefined functions, which can provide an application programming interface (API) and a programming framework for applications in the application layer. As shown in FIG. 3 , the application framework layer may include a content sensor (content sensor) 21, a multi-screen framework service module 23, a view system 25, and the like.
其中,内容提供器21可以用来存放和获取数据,并使这些数据可以被应用程序访问。例如,该内容提供器21获取的数据可以包括电子设备的界面显示数据、视频、图像、音频、用户浏览的历史和书签等数据。示例性的,在本申请实施中,内容控制器21可以获取手机前台或后台显示的界面内容。Among them, the content provider 21 can be used to store and obtain data, and make these data accessible by application programs. For example, the data acquired by the content provider 21 may include interface display data of the electronic device, video, image, audio, user browsing history, bookmarks and other data. Exemplarily, in the implementation of the present application, the content controller 21 may acquire the interface content displayed in the foreground or background of the mobile phone.
多屏框架服务模块23可以包括窗口管理器等,用于管理电子设备的窗口显示。示例性的,窗口管理器可以获取手机102的显示屏大小或者待显示的窗口的大小,获取该待显示窗口的内容等。此外,多屏框架服务模块23还可以管理电子设备的投屏显示过程,例如获取电子设备后台运行的一个或多个应用的界面内容,将该界面内容传输给其他电子设备,用于实现在其他电子设备上显示该电子设备的界面内容,此处不再赘述。The multi-screen frame service module 23 may include a window manager, etc., for managing the window display of the electronic device. Exemplarily, the window manager may acquire the size of the display screen of the mobile phone 102 or the size of the window to be displayed, and acquire the content of the window to be displayed, and the like. In addition, the multi-screen framework service module 23 can also manage the screen projection display process of the electronic device, for example, obtain the interface content of one or more applications running in the background of the electronic device, and transmit the interface content to other electronic devices for realizing other electronic devices. The interface content of the electronic device is displayed on the electronic device, which is not repeated here.
视图系统25包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统25可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system 25 includes visual controls, such as controls for displaying text, controls for displaying pictures, and the like. View system 25 can be used to build applications. A display interface can consist of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
3、系统库3. System library
Android runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。The Android runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。The core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(media libraries),三维图形处理库(例如:OpenGL ES),二维图形引擎(例如:SGL)等。其中,表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了二维和三维图层的融合。媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体 库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。二维图形引擎是二维绘图的绘图引擎。图像处理库可以提供针对各种图像数据的分析以及提供多种图像处理算法等,例如可以提供图像切割、图像融合、图像模糊、图像锐化等处理,此处不再赘述。A system library can include multiple functional modules. For example: surface manager (surface manager), media library (media library), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc. Among them, the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications. The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc. The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing. A 2D graphics engine is a drawing engine for 2D drawing. The image processing library can provide analysis of various image data and provide a variety of image processing algorithms, such as image cutting, image fusion, image blurring, image sharpening and other processing, which will not be repeated here.
4、内核层4. Kernel layer
内核层是硬件和软件之间的层。内核层至少包含显示驱动、音频驱动、传感器驱动等,各种驱动可以调用手机的麦克风、扬声器或传感器等硬件结构,例如调用手机的麦克风获取用于的语音指令,调用手机的扬声器进行语音输出等,此处不再赘述。The kernel layer is the layer between hardware and software. The kernel layer at least includes display drivers, audio drivers, sensor drivers, etc. Various drivers can call the hardware structures such as the microphone, speaker or sensor of the mobile phone, such as calling the microphone of the mobile phone to obtain the voice commands used, and calling the speaker of the mobile phone for voice output, etc. , and will not be repeated here.
上述介绍了手机的可能的软件结构,车载设备103作为显示设备,可以具有可手机102相同或不同的软件结构。如图3所示,在本申请实施例中,车载设备至少包括显示模块31、麦克风/扬声器32等。其中,显示模块31可以用于显示车载设备103当前运行的界面内容,或者显示手机102投屏过来的应用界面。The possible software structure of the mobile phone is described above. The in-vehicle device 103 as a display device may have the same or different software structure than that of the mobile phone 102 . As shown in FIG. 3 , in this embodiment of the present application, the vehicle-mounted device at least includes a display module 31 , a microphone/speaker 32 , and the like. The display module 31 may be used to display the interface content currently running on the in-vehicle device 103 , or display the application interface projected by the mobile phone 102 .
应理解,车载设备103可以有独立的显示系统,在本申请实施例中,通过HiCar应用15将手机102的应用投屏到车载设备103之后,在车载设备103上可以有独立的显示桌面和应用快速入口。换言之,手机102的音乐、导航、视频等应用通过HiCar应用15将手机的应用投屏到车载设备103之后,可以按照车载设备103的显示系统,重新排列显示在车载设备103上,本申请实施例对此不作限定。It should be understood that the in-vehicle device 103 may have an independent display system. In the embodiment of the present application, after the application of the mobile phone 102 is projected to the in-vehicle device 103 through the HiCar application 15, there may be an independent display desktop and application on the in-vehicle device 103 Fast entry. In other words, music, navigation, video and other applications of the mobile phone 102 can be rearranged and displayed on the in-vehicle device 103 according to the display system of the in-vehicle device 103 after the application of the mobile phone is projected to the in-vehicle device 103 through the HiCar application 15. This embodiment of the present application This is not limited.
麦克风/扬声器32是车载设备的硬件结构,可以实现和手机的麦克风/扬声器等相同的功能。在本申请实施例中,用户语音指令的输入可以通过该手机102本身所具有的麦克风,也可以是远程的虚拟麦克风。该远程的虚拟麦克风可以理解为一种借助于车载设备103的麦克风,通过车载设备103的麦克风提供语音指令的获取能力,并将获取的语音指令传递给手机,手机102可以对该语音指令进行识别等,此处不再赘述。The microphone/speaker 32 is the hardware structure of the in-vehicle device, and can realize the same functions as the microphone/speaker of the mobile phone. In this embodiment of the present application, the input of the user's voice instruction may be through the microphone of the mobile phone 102 itself, or may be a remote virtual microphone. The remote virtual microphone can be understood as a kind of acquisition capability of voice commands provided by the microphone of the in-vehicle device 103 by means of the microphone of the in-vehicle device 103, and the acquired voice commands are transmitted to the mobile phone, and the mobile phone 102 can recognize the voice commands. etc., will not be repeated here.
在本申请实施例中,HiCar应用15可以依托于手机多屏框架能力,将手机的多个应用的界面投屏到车载设备的界面,该多个应用本身实际运行在手机侧,界面则显示在车载设备的屏幕上。通过手机系统的content sensor提取屏幕内容,获取到投屏车载设备的界面的应用界面内容。智慧语音可以通过端侧(依托手机自身强大的算例)以及云侧分析能力,端云结合,更加快速准确的分析用户语义,将识别到的结果发送给可见可说进行界面内容匹配,识别用户的意图。最后通过模拟点击,进行界面的操作,实现控件点击、上下左右滑动以及返回等控制操作。In the embodiment of the present application, the HiCar application 15 can rely on the multi-screen framework capability of the mobile phone to project the interfaces of multiple applications of the mobile phone to the interface of the in-vehicle device. The multiple applications themselves actually run on the side of the mobile phone, and the interface is displayed on the on the screen of the in-vehicle device. The screen content is extracted through the content sensor of the mobile phone system, and the application interface content of the interface of the screen-casting vehicle device is obtained. Smart Voice can analyze user semantics more quickly and accurately through the terminal-side (relying on the powerful calculation example of the mobile phone itself) and cloud-side analysis capabilities, and the combination of the terminal and the cloud can send the recognized results to Visible to match the interface content and identify the user. the purpose. Finally, by simulating clicks, the interface is operated to realize control operations such as control clicks, sliding up and down, left and right, and return.
此外,在本申请实施例中,手机102通过HiCar应用15可将多个应用投屏到车载设备103的过程中,车载设备103和手机102为已经建立连接的状态。In addition, in the embodiment of the present application, during the process that the mobile phone 102 can project multiple applications to the in-vehicle device 103 through the HiCar application 15, the in-vehicle device 103 and the mobile phone 102 are in a state of established connection.
一种可能的实现方式中,在本申请实施例中,手机102和车载设备103之间建立连接可以有多种不同的方式。例如,手机102和车载设备103之间连接可以包括有线连接或者无线连接等多种不同连接方式。示例性的,手机102和车载设备103之间的有线连接可以是通过USB数据线连接;手机102和车载设备103之间的无线连接可以是通过建立Wi-Fi连接的方式,或者,借助于手机102和车载设备103支持近场通信(near field communication,NFC)的功能,通过“碰一碰”功能进行靠近连接,又或者,通过手机102和车载设备103通过蓝牙扫码连接等。In a possible implementation manner, in this embodiment of the present application, there may be various manners for establishing a connection between the mobile phone 102 and the in-vehicle device 103 . For example, the connection between the mobile phone 102 and the in-vehicle device 103 may include various connection modes such as wired connection or wireless connection. Exemplarily, the wired connection between the mobile phone 102 and the in-vehicle device 103 may be through a USB data cable; the wireless connection between the mobile phone 102 and the in-vehicle device 103 may be established by means of a Wi-Fi connection, or by means of a mobile phone 102 and the in-vehicle device 103 support the function of near field communication (NFC), and perform proximity connection through the "touch" function, or connect through the mobile phone 102 and the in-vehicle device 103 through Bluetooth scanning code, etc.
或者,随着通信技术的发展,通信带宽和速率逐渐提高,手机102和车载设备103之间可能在不建立近场通信连接的情况下也可以传输数据。示例性的,未来的第五代(5th generation,5G)移动通信系统等高速率通信方式的普及,手机102和车载设备103可能通过5G通信就可以实现将手机投屏到车载设备上。例如,通过在手机102和车载设备103上安装不同或相同的应用,借助于5G通信网络传输数据。在该种实现方式中,手机可以不提供发现,并和车载设备建立连接的功能等。Alternatively, with the development of communication technology, the communication bandwidth and rate are gradually increased, and data may be transmitted between the mobile phone 102 and the in-vehicle device 103 without establishing a near field communication connection. Exemplarily, with the popularization of high-speed communication methods such as the fifth generation (5th generation, 5G) mobile communication system in the future, the mobile phone 102 and the in-vehicle device 103 may be able to project the screen of the mobile phone to the in-vehicle device through 5G communication. For example, by installing different or the same applications on the mobile phone 102 and the in-vehicle device 103, data is transmitted by means of the 5G communication network. In this implementation manner, the mobile phone may not provide functions such as discovery and establishing a connection with the in-vehicle device.
又或者,手机102和车载设备103之间可以通过登录同一个账号,基于该账号下的相关设置进行连接和通信。示例性的,手机102和车载设备103都可以注册一个华为账号,该账号下,手机102的应用就可以显示在车载设备103的显示屏上,应用实际可以运行在手机102侧,此处不再赘述。Alternatively, the mobile phone 102 and the in-vehicle device 103 can be connected and communicated based on the relevant settings under the account by logging into the same account. Exemplarily, both the mobile phone 102 and the in-vehicle device 103 can register a Huawei account. Under this account, the application of the mobile phone 102 can be displayed on the display screen of the in-vehicle device 103, and the application can actually run on the side of the mobile phone 102, which is not repeated here. Repeat.
应理解,本申请实施例对手机102和车载设备103之间建立连接的方式不做限定,在后续实施例中,假设手机102和车载设备103之间已经通过HiCar应用15建立了连接。It should be understood that the embodiments of the present application do not limit the manner of establishing a connection between the mobile phone 102 and the in-vehicle device 103 .
还应理解,手机102和车载设备103除了图3中示出的多个层以及包括的多个功能模块之外,还可以有不同的划分方式或者包括更多的功能模块,本申请实施例对此不做限定。It should also be understood that, in addition to the multiple layers and multiple functional modules included in the mobile phone 102 and the in-vehicle device 103 shown in FIG. 3 , the mobile phone 102 and the vehicle-mounted device 103 may also have different division methods or include more functional modules. This is not limited.
下面结合附图和应用场景,以具有图3所示的软件结构的手机102和车载设备103为例,进行具体阐述。In the following, the mobile phone 102 and the in-vehicle device 103 having the software structure shown in FIG. 3 are taken as examples for detailed description in conjunction with the accompanying drawings and application scenarios.
图4是本申请实施例提供的一例车载设备上实现语音交互过程的图形用户界面(graphical user interface,GUI)的示意图。FIG. 4 is a schematic diagram of a graphical user interface (graphical user interface, GUI) for implementing a voice interaction process on an in-vehicle device provided by an embodiment of the present application.
示例性的,图4中的(a)图示出了车载设备103的屏幕显示系统显示了当前输出的界面,该界面的内容可以来源于实际运行的手机102侧的应用,由HiCar应用获取并提供到车载设备103。Exemplarily, (a) in FIG. 4 shows that the screen display system of the in-vehicle device 103 displays the currently output interface, and the content of the interface can be derived from the application on the side of the mobile phone 102 that is actually running, obtained by the HiCar application. Provided to the in-vehicle device 103 .
应理解,车载设备103的显示屏上的界面内容可以基于自身的显示系统进行排列和填充,同样的内容可以具有和手机102不同的显示样式、图标大小、排列顺序等,即某一个应用提供的内容按照车载设备103的显示系统的要求,排列、填充在该车载设备103的显示屏上。It should be understood that the interface content on the display screen of the in-vehicle device 103 can be arranged and filled based on its own display system, and the same content can have a different display style, icon size, arrangement order, etc. The content is arranged and filled on the display screen of the in-vehicle device 103 according to the requirements of the display system of the in-vehicle device 103 .
如图4中的(a)图所示,车载设备103的屏幕显示区域可以包括顶部位置的状态显示区域,以及虚线框示出的导航菜单区域401、内容区域402。其中,状态显示区域显示了当前时间日期、蓝牙图标、WIFI图标等;导航菜单区域401可以包括主页、导航、电话和音乐等图标,每一个图标对应至少一个实际运行在手机102上的应用,用户可以点击任意一个图标进入该应用相应的界面;内容区域402显示不同应用提供给车载设备103的内容。As shown in (a) of FIG. 4 , the screen display area of the in-vehicle device 103 may include a status display area at the top position, and a navigation menu area 401 and a content area 402 shown by dashed boxes. Among them, the status display area displays the current time and date, Bluetooth icon, WIFI icon, etc.; the navigation menu area 401 may include icons such as homepage, navigation, phone and music, each icon corresponds to at least one application actually running on the mobile phone 102, the user Any icon can be clicked to enter the corresponding interface of the application; the content area 402 displays the content provided to the in-vehicle device 103 by different applications.
例如,手机102上安装了华为音乐,后台运行华为音乐,华为音乐将运行过程中显示的歌单或者歌曲列表等内容发送给车载设备103。车载设备103的屏幕显示系统将该华为音乐提供的内容填充在显示屏的内容区域,如图4中的(a)图所示的歌曲1至歌曲5、歌曲列表1至歌曲列表5、每日推荐、歌单、排行榜、电台、搜索等内容,后续实施例对该显示过程不再赘述。For example, Huawei Music is installed on the mobile phone 102 , and the Huawei Music runs in the background, and the Huawei Music sends the content such as the playlist or song list displayed during the running process to the in-vehicle device 103 . The screen display system of the in-vehicle device 103 fills the content area provided by the Huawei Music in the content area of the display screen, as shown in (a) in FIG. Recommendations, playlists, ranking lists, radio stations, searches, etc., the display process will not be repeated in subsequent embodiments.
应理解,车载设备103的界面还可以显示其他更多的菜单或者应用程序的内容,本申请实施例对此不作限定。It should be understood that the interface of the in-vehicle device 103 may also display other more menus or contents of application programs, which are not limited in this embodiment of the present application.
示例性的,图4中的(a)图示出了用户点击导航菜单区域401的音乐应用之后的界 面,导航菜单区域401的音乐应用的图标为灰色高亮状态。在内容区域402中,显示了华为音乐提供的歌曲名称或者歌曲列表,假设当前的歌曲1为播放状态,歌曲1的图标上显示播放按钮20,歌曲2、歌曲3、歌曲4和歌曲5为暂停播放的状态,显示暂停按钮30。Exemplarily, (a) in FIG. 4 shows the interface after the user clicks on the music application in the navigation menu area 401, and the icon of the music application in the navigation menu area 401 is highlighted in gray. In the content area 402, the song name or song list provided by Huawei Music is displayed. Assuming that the current song 1 is playing, the play button 20 is displayed on the icon of song 1, and the song 2, song 3, song 4 and song 5 are paused In the playback state, a pause button 30 is displayed.
此外,在车载设备103的界面上,还可以包括语音球图标10,如图4中的(a)图所示,用户点击该语音球图标10可以开启车载设备103的语音监听功能,响应于用户的点击操作,车载设备103可以显示如图4中的(b)图所示的界面403。该界面403上可以显示虚线框示出的唤醒窗口403-1,该唤醒窗口403-1包括语音识别图标40。In addition, on the interface of the in-vehicle device 103, a voice ball icon 10 may also be included, as shown in (a) of FIG. The in-vehicle device 103 can display the interface 403 as shown in (b) of FIG. 4 . The wake-up window 403 - 1 shown by the dotted box may be displayed on the interface 403 , and the wake-up window 403 - 1 includes the voice recognition icon 40 .
这里需要说明的是,唤醒窗口403-1可以不以窗口的形式体现,仅仅包括语音识别图标40,或者,包括语音识别图标40和向用户推荐的语音指令,悬浮显示在车载设备103的显示屏上。本申请实施例为了便于描述。将包括语音识别图标40的区域称为“唤醒窗口”,该称呼不应对本申请实施例的方案造成限定,后续不再赘述。It should be noted here that the wake-up window 403 - 1 may not be embodied in the form of a window, but only includes the voice recognition icon 40 , or includes the voice recognition icon 40 and the voice command recommended to the user, and is displayed in a floating manner on the display screen of the in-vehicle device 103 superior. The embodiments of the present application are for convenience of description. The area including the voice recognition icon 40 is referred to as a "wake-up window", which should not limit the solution of the embodiment of the present application, and will not be described in detail later.
可选地,该语音识别图标40可以为动态显示,用于表示车载设备103处于监听并获取用户语音指令的状态。此外,该唤醒窗口403-1还可以包括一些向用户推荐的语音指令,例如“停止播放”和“继续播放”等语音指令。应理解,该推荐的语音指令也可以接受用户的点击操作,并执行响应的指令所对应的用途,本申请实施例对此不再赘述。Optionally, the voice recognition icon 40 may be displayed dynamically, which is used to indicate that the in-vehicle device 103 is in a state of monitoring and acquiring the user's voice instruction. In addition, the wake-up window 403-1 may also include some voice commands recommended to the user, such as voice commands such as "stop playing" and "continue playing". It should be understood that the recommended voice command may also accept a user's click operation, and execute the purpose corresponding to the response command, which will not be repeated in this embodiment of the present application.
如图4中的(b)图所示,如果用户输入“停止播放歌曲1”的语音指令,车载设备103获取到用户的指令之后,可以将该语音指令发送给手机102,手机102识别用户的语音指令,响应于该语音指令,后台执行对歌曲1的播放按钮20的点击操作,该歌曲1上的播放按钮20变化为暂停按钮30。手机102可以将对歌曲1的播放按钮20的点击操作的显示界面传递回车载设备103,进而车载设备103可以显示如图4中的(c)图所示的界面404,即歌曲1上变化为暂停按钮30。As shown in (b) in FIG. 4 , if the user inputs a voice command of “stop playing song 1”, after the in-vehicle device 103 obtains the user’s command, the voice command can be sent to the mobile phone 102, and the mobile phone 102 recognizes the user’s A voice command, in response to the voice command, a click operation on the play button 20 of the song 1 is performed in the background, and the play button 20 on the song 1 changes to a pause button 30 . The mobile phone 102 can transfer the display interface of the click operation on the play button 20 of the song 1 back to the in-vehicle device 103, and then the in-vehicle device 103 can display the interface 404 shown in (c) in FIG. Pause button 30.
上述实现过程,可以理解为用户指令可以对车载设备103的显示屏上的任意一个控件执行点击操作,并进一步在车载设备103的显示屏上显示执行点击操作之后的界面。The above implementation process can be understood as a user instruction can perform a click operation on any control on the display screen of the in-vehicle device 103 , and further display the interface after the click operation is performed on the display screen of the in-vehicle device 103 .
一种可能的实现方式中,用户还可以通过按压汽车的车控语音按键来开启语音监听功能,例如用户按压方向盘上的车控语音按键来开启车载设备103监听并获取用户语音指令的功能,显示虚线框示出的唤醒窗口403-1。In a possible implementation manner, the user can also turn on the voice monitoring function by pressing the car control voice button of the car, for example, the user presses the car control voice button on the steering wheel to turn on the function of the in-vehicle device 103 to monitor and obtain the user's voice command, and display The wake-up window 403-1 is shown by the dashed box.
应理解,在本申请实施例中,用户点击语音球图标10的操作可以触发车载设备103开启语音监听功能;或者,可以开启车载设备103和用户的语音交互功能,即车载设备103一直处于监听语音指令的状态;又或者,用户可以进一步通过其他快捷操作开启车载设备103一直处于监听语音指令的状态,本申请实施例对此不作限定。It should be understood that, in this embodiment of the present application, the operation of the user clicking the voice ball icon 10 can trigger the in-vehicle device 103 to enable the voice monitoring function; or, the voice interaction function between the in-vehicle device 103 and the user can be enabled, that is, the in-vehicle device 103 is always monitoring the voice The state of the instruction; or, the user can further turn on the in-vehicle device 103 through other shortcut operations to be in the state of monitoring the voice instruction all the time, which is not limited in this embodiment of the present application.
可选地,当车载设备103一直处于监听语音指令的状态时,唤醒窗口可以一直显示在车载设备103的显示屏上;或者,该唤醒窗口也可以短暂消失,当监测到用户开始发出用户指令时,再次悬浮显示在车载设备103的显示屏上;当预设时间内(例如2分钟)没有监测到用户的任何语音指令,车载设备103可以自动退出监听功能,本申请实施例对此不作限定。Optionally, when the in-vehicle device 103 is always in the state of monitoring the voice command, the wake-up window can always be displayed on the display screen of the in-vehicle device 103; , which is suspended and displayed on the display screen of the in-vehicle device 103 again; when no voice command of the user is detected within a preset time (for example, 2 minutes), the in-vehicle device 103 can automatically exit the monitoring function, which is not limited in this embodiment of the present application.
还应理解,在本申请实施例中,将界面上的可见且可以被用户执行点击操作的按键、按钮、开关、菜单、选项、图片、列表、文字等都统一称为“控件”,后续实施例不再赘述。It should also be understood that, in the embodiments of the present application, the buttons, buttons, switches, menus, options, pictures, lists, texts, etc. that are visible on the interface and that can be clicked by the user are collectively referred to as “controls”. The example will not be repeated.
可选地,唤醒窗口403-1中显示的向用户推荐的语音指令可以是和当前显示界面403 上的可被用户执行点击操作的控件相关联的指令内容。Optionally, the voice instruction recommended to the user displayed in the wake-up window 403-1 may be instruction content associated with a control on the currently displayed interface 403 that can be clicked by the user.
一种可能的实现过程中,可以由手机102的content sensor可以通过获取每一个控件的当前状态,并根据当前状态提供给用户推荐指令。In a possible implementation process, the content sensor of the mobile phone 102 can obtain the current state of each control and provide the user with a recommendation instruction according to the current state.
示例性的,如图4中的(b)图所示,界面403上包括播放按钮20,唤醒窗口403-1中的推荐指令就可以包括“停止播放”,即“停止播放”可以理解为播放按钮20被用户点击后所实现的状态。同样地,界面403上的歌曲2上还包括暂停按钮30,那么唤醒窗口403-1中的推荐指令就可以包括“播放歌曲2”,即“播放歌曲”可以理解为用户点击歌曲2的暂停按钮30后所实现的状态。或者,如图4中的(c)图所示,歌曲1至歌曲5都显示暂停按钮30,为暂停播放音乐的状态,手机102获取到当前界面上不包括“播放按钮20”,如果用户唤醒语音球之后,唤醒窗口403-1中可以显示“开始播放”指令,而不会显示“停止播放”指令,后续不再赘述。Exemplarily, as shown in (b) of FIG. 4 , the interface 403 includes the play button 20, and the recommended instruction in the wake-up window 403-1 may include “stop playing”, that is, “stop playing” can be understood as playing. The state achieved after the button 20 is clicked by the user. Similarly, song 2 on the interface 403 also includes the pause button 30, then the recommended instruction in the wake-up window 403-1 may include "play song 2", that is, "play song" can be understood as the user clicking the pause button of song 2 The state achieved after 30. Or, as shown in (c) of FIG. 4 , the pause button 30 is displayed on all songs 1 to 5. In a state where music playback is paused, the mobile phone 102 obtains that the current interface does not include the “play button 20”. If the user wakes up After the voice ball is activated, the "start playing" instruction may be displayed in the wake-up window 403-1, but the "stop playing" instruction will not be displayed, which will not be described in detail later.
又或者,唤醒窗口403-1中显示的向用户推荐的语音指令还可以为某一个应用的固定推荐指令。例如,图4中的(a)图显示的音乐应用的界面,那么唤醒窗口403-1中显示的向用户推荐的语音指令可以固定为“停止播放”、“开始播放”等,本申请实施例对此不作限定。Alternatively, the voice command recommended to the user displayed in the wake-up window 403-1 may also be a fixed recommended command of a certain application. For example, in the interface of the music application shown in (a) of FIG. 4 , the voice instruction recommended to the user displayed in the wake-up window 403-1 can be fixed as “stop playing”, “start playing”, etc. This embodiment of the present application This is not limited.
一种可能的方式中,界面上可以被用户执行点击操作的控件可以划分为以下类别:In a possible way, the controls on the interface that can be clicked by the user can be divided into the following categories:
1、文本控件1. Text control
文本控件包含可以被识别到的文本信息。示例性的,如图4中的(a)图所示的“每日推荐”、“歌单”、“排行榜”、“电台”、“歌曲X”和“歌曲列表1”等。Text controls contain textual information that can be recognized. Exemplarily, "daily recommendation", "song list", "top chart", "radio station", "song X" and "song list 1" as shown in (a) of FIG. 4 .
一种可能的实现方式中,可以由手机102的应用程序框架层的内容提供器(content sensor)直接识别到该文本控件包括的文本信息。应理解,音乐应用实际后台运行在手机102中,手机102后台可以获取被投屏显示在车载设备103的显示屏上的文本控件的文本信息。In a possible implementation manner, the text information included in the text control may be directly identified by a content sensor (content sensor) of the application framework layer of the mobile phone 102 . It should be understood that the music application actually runs in the mobile phone 102 in the background, and the mobile phone 102 can acquire the text information of the text control projected and displayed on the display screen of the in-vehicle device 103 in the background.
可选地,在本申请实施例中,唤醒窗口403-1中显示的向用户推荐的语音指令可以和以上获取到的文本控件相关,例如“播放歌曲2”等,此处不再一一赘述。Optionally, in the embodiment of the present application, the voice command recommended to the user displayed in the wake-up window 403-1 may be related to the text control obtained above, such as “play song 2”, etc., which will not be repeated here. .
2、web控件2, web controls
常见的web控件可以包括文本输入框(TextBox)、下拉选框(DropList)、日期/时间控件(Date/TimePicker)等。示例性的,在图4中的(a)图中,“搜索”控件就可以划分为web控件类。Common web controls can include text input boxes (TextBox), drop-down boxes (DropList), date/time controls (Date/TimePicker), and so on. Exemplarily, in Fig. 4 (a), the "search" control can be divided into web control classes.
一种可能的实现方式中,可以由手机102的content sensor可以识别到界面上的web控件,在本申请实施例中,唤醒窗口403-1中显示的向用户推荐的语音指令可以和以上获取到的web控件相关,例如“搜索歌曲”等推荐指令。In a possible implementation manner, the web control on the interface can be recognized by the content sensor of the mobile phone 102. In the embodiment of the present application, the voice command recommended to the user displayed in the wake-up window 403-1 can be obtained from the above. related web controls, such as "search for songs" and other recommended commands.
3、图片控件3. Picture control
图片控件在界面上显示为图片,每个图片对应不同的描述词。示例性的,如图4中的(a)图所示的歌曲1上方的歌手图片或者专辑图片,以及歌曲列表1上方显示的“怀旧经典”的用于标识该列表的图片等。Picture controls are displayed as pictures on the interface, and each picture corresponds to a different descriptor. Exemplarily, as shown in (a) in FIG. 4 , the artist picture or album picture above the song 1, and the picture of “nostalgic classics” displayed above the song list 1 for identifying the list, etc.
一种可能的实现方式中,可以由手机102的content sensor可以通过获取每一个图片的描述词,将图片的含义泛化,提供给用户推荐指令。在本申请实施例中,手机102获取到图片的描述词为“张XX的歌曲1”唤醒窗口403-1中显示的向用户推荐的语音指令可 以显示“播放张XX的歌曲1”等推荐指令。In a possible implementation manner, the content sensor of the mobile phone 102 can generalize the meaning of the picture by obtaining the description word of each picture, and provide the user with a recommendation instruction. In the embodiment of the present application, the mobile phone 102 obtains a picture with the description word "Zhang XX's song 1" and the voice instruction recommended to the user displayed in the wake-up window 403-1 may display a recommended instruction such as "play Zhang XX's song 1" .
4、列表控件4. List control
如图4中的(a)图所示,歌曲列表1可以包括多个歌曲,“歌曲列表1”就可以划分为列表控件。用户点击该歌曲列表1,进入的下一级界面可能呈现给用户该歌曲列表1包括的多个歌曲列表,并没有开始播放该歌曲列表1的音乐。As shown in (a) of FIG. 4 , the song list 1 may include a plurality of songs, and the “song list 1” can be divided into list controls. When the user clicks on the song list 1, the entered next-level interface may be presented to the user with multiple song lists included in the song list 1, and the music in the song list 1 is not started to be played.
5、开关控件5. Switch control
开关控件可以理解为界面上具有开关功能的控件,示例性的,在图4中的(a)图中,播放按钮20、暂停按钮30就可以划分为开关控件。A switch control can be understood as a control with a switch function on the interface. Exemplarily, in (a) of FIG. 4 , the play button 20 and the pause button 30 can be divided into switch controls.
以上介绍了5种可能的控件类型,手机102可以获取当前显示界面403上的控件,进一步根据获取的控件的类型、描述词等信息,确定在唤醒窗口403-1中显示的推荐给用户的指令。Five possible types of controls have been introduced above. The mobile phone 102 can obtain the controls on the current display interface 403, and further according to the obtained control types, descriptors and other information, determine the recommended instructions displayed in the wake-up window 403-1 to the user .
应理解,本申请实施例中针对不同的应用、不同的界面,可能包括比上述列举的5种控件类型更多的控件,本申请实施例对此不再一一举例。此外,某些控件可以同时被划分为多种控件类型,本申请实施例对此不作限定。It should be understood that, for different applications and different interfaces, the embodiments of the present application may include more controls than the five types of controls listed above, and the embodiments of the present application will not exemplify them one by one. In addition, some controls may be divided into multiple control types at the same time, which is not limited in this embodiment of the present application.
针对上述列举的控件类型,表1列举了几种常见的有声类应用的页面的控件。如下表1所示,对于用户常用的网易云音乐、酷狗音乐、华为音乐、喜马拉雅、宝宝巴士故事、小伴龙儿歌等有声类应用,不同的页面可能包括不同的控件,以及每一款应用的一级页面和二级页面包括的控件的数量和种类都不同。For the control types listed above, Table 1 lists the controls on several common audio application pages. As shown in Table 1 below, for audio applications such as NetEase Cloud Music, Kugou Music, Huawei Music, Himalaya, Baby Bus Story, Xiaobanlong Children's Song, which are commonly used by users, different pages may include different controls, and each application The number and types of controls included in the first-level page and the second-level page are different.
示例性的,以网易云音乐应用为例,以一级页面可以理解为用户点击网易云音乐应用图标之后进入的网易云音乐的主界面,包括“每日推荐”、“我喜欢的音乐”、“本地音乐”、“私人FM”等页面内容,二级界面为用户点击该网易云音乐的主界面上的任意菜单或者控件进入的下一级页面,例如歌单页面、播放页面等。在本申请实施例中,每一个页面上的页面内容都可以被手机获取,并识别到每一个控件包括的文本信息,此处不再赘述。Exemplarily, taking the NetEase Cloud Music application as an example, the first-level page can be understood as the main interface of NetEase Cloud Music entered after the user clicks the NetEase Cloud Music application icon, including "daily recommendation", "My favorite music", "Local Music", "Private FM" and other page content, the second-level interface is the next-level page that the user clicks on any menu or control on the main interface of the NetEase Cloud Music to enter, such as the playlist page, play page, etc. In the embodiment of the present application, the page content on each page can be acquired by the mobile phone, and the text information included in each control can be recognized, which will not be repeated here.
表1Table 1
Figure PCTCN2021113542-appb-000006
Figure PCTCN2021113542-appb-000006
Figure PCTCN2021113542-appb-000007
Figure PCTCN2021113542-appb-000007
图5是本申请实施例提供的又一例车载设备上实现语音交互过程的界面示意图。FIG. 5 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application.
示例性的,图5中的(a)图示出了车载设备103的屏幕显示系统显示了当前输出的界面501。在该界面501的内容区域,歌曲1、歌曲2、歌曲3、歌曲4和歌曲5都为暂停播放的状态,显示暂停按钮30。Exemplarily, (a) in FIG. 5 shows that the screen display system of the in-vehicle device 103 displays an interface 501 currently output. In the content area of the interface 501, song 1, song 2, song 3, song 4, and song 5 are all in the state of being paused, and the pause button 30 is displayed.
如图5中的(a)图所示,在车载设备103的显示界面上,用户点击语音球图标10开启了车载设备103的语音监听功能,车载设备103的显示屏显示如图5中的(b)图所示虚线框示出的唤醒窗口502-1。该唤醒窗口502-1可以包括语音识别图标40和推荐的语音指令。例如,如图5中的(b)图所示,推荐的语音指令可以是“开始播放”和“下一页”等。As shown in (a) of FIG. 5 , on the display interface of the in-vehicle device 103, the user clicks the voice ball icon 10 to enable the voice monitoring function of the in-vehicle device 103, and the display screen of the in-vehicle device 103 displays as shown in FIG. 5 (( b) The wake-up window 502-1 shown by the dashed box in the figure. The wake-up window 502-1 may include the voice recognition icon 40 and recommended voice commands. For example, as shown in (b) of FIG. 5 , the recommended voice commands may be "start playing" and "next page", and so on.
如果用户输入“播放歌曲列表1”的语音指令,车载设备103获取到用户的指令之后,响应于用户输入的语音指令,可以显示如图5中的(c)图所示的界面503,该界面503是执行对歌曲列表1的点击操作之后的界面。示例性的,如图5中的(c)图所示,该界面503可以包括以下控件:返回上一级、歌曲列表1——经典怀旧、播放全部、以及歌曲6等多首包含在歌曲列表1中的歌曲名称。If the user inputs the voice command of "play song list 1", after the in-vehicle device 103 obtains the user's command, in response to the voice command input by the user, an interface 503 as shown in (c) in FIG. 5 can be displayed, which interface 503 is the interface after the click operation on the song list 1 is performed. Exemplarily, as shown in (c) of FIG. 5 , the interface 503 may include the following controls: return to the previous level, song list 1—classic nostalgia, play all, and song 6 and many other songs included in the song list 1 in the song name.
在图5示出的实施例中,假设用户点击语音球图标10的操作开启了车载设备103一直处于监听用户指令的状态。那么在用户如图5中的(a)图所示点击语音球图标10之后,唤醒窗口一直悬浮显示在车载设备103的显示屏上。如图5中的(c)图所示的界面503包括唤醒窗口503-1。可选地,该唤醒窗口503-1中推荐给用户的指令可以根据当前界面 503上包括的控件发生了改变,例如,显示“播放全部”、“下一页”等语音指令。In the embodiment shown in FIG. 5 , it is assumed that the user's operation of clicking the voice ball icon 10 enables the in-vehicle device 103 to be in a state of monitoring the user's instruction all the time. Then, after the user clicks the voice ball icon 10 as shown in (a) of FIG. 5 , the wake-up window is always suspended and displayed on the display screen of the in-vehicle device 103 . The interface 503 shown in (c) of FIG. 5 includes a wake-up window 503-1. Optionally, the instructions recommended to the user in the wake-up window 503-1 may be changed according to the controls included in the current interface 503, for example, voice instructions such as “play all” and “next page” are displayed.
用户输入“播放全部”的语音指令,车载设备103获取到用户的指令之后,响应于用户输入的语音指令,可以显示界面504,该界面504是执行对“播放全部”控件的点击操作之后的界面。示例性的,如图5中的(d)图所示,在界面504上,“播放全部”控件显示为播放状态,且从该歌曲列表1中排列的第一首歌曲(歌曲6)开始播放,第一首歌曲所在位置显示声音图标50,用于标识声音来源为歌曲6,即歌曲6是当前正在播放的歌曲,本申请实施例对此不作限定。The user inputs a voice command of "play all", after the in-vehicle device 103 obtains the user's command, in response to the voice command input by the user, an interface 504 can be displayed, which is the interface after the click operation on the "play all" control is performed. . Exemplarily, as shown in (d) in FIG. 5 , on the interface 504, the “play all” control is displayed as the playing state, and starts playing from the first song (song 6) arranged in the song list 1 , a sound icon 50 is displayed at the location of the first song, which is used to identify the source of the sound as song 6, that is, song 6 is the song currently being played, which is not limited in this embodiment of the present application.
结合上述图4和图5的不同实现过程,本申请实施例提供的人机交互的方法,通过获取界面上显示的可见且可以被用户执行点击操作的控件,用户可以输入语音指令执行对界面上的任意控件的点击等操作。对于显示屏上显示的所有应用、所有可见的内容,用户都可以通过语音指令进行控制。特别地,在驾驶场景中,减少了用户的手动操作,从而避免用户分心,提高了用户驾驶场景中的安全性。Combining the different implementation processes of the above-mentioned Fig. 4 and Fig. 5, the method for human-computer interaction provided by the embodiment of the present application, by obtaining the controls displayed on the interface that are visible and can be clicked by the user, the user can input a voice command to execute the control on the interface. Click and other operations of any control. All apps and all visible content on the display can be controlled by the user with voice commands. In particular, in the driving scene, the manual operation of the user is reduced, thereby avoiding user distraction and improving the safety of the user in the driving scene.
此外,对于语音交互的场景,不同的应用不需要单独开发语音助手,换言之,用户可以通过同样的语音交互方式控制多个不同的应用,语音助手和应用不再分离,丰富了应用生态。In addition, for voice interaction scenarios, different applications do not need to develop voice assistants separately. In other words, users can control multiple different applications through the same voice interaction method, and voice assistants and applications are no longer separated, which enriches the application ecology.
图6是本申请实施例提供的又一例车载设备上实现语音交互过程的界面示意图。FIG. 6 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
示例性的,图6中的(a)图示出了车载设备103的屏幕显示系统显示了当前输出的界面601。在该界面601的内容区域,歌曲1、歌曲2、歌曲3、歌曲4和歌曲5都为暂停播放的状态,显示暂停按钮30。Exemplarily, (a) in FIG. 6 shows that the screen display system of the in-vehicle device 103 displays an interface 601 currently output. In the content area of the interface 601, song 1, song 2, song 3, song 4, and song 5 are all in the state of being paused, and the pause button 30 is displayed.
如图6中的(a)图所示,在车载设备103的显示界面601上,用户点击语音球图标10开启了车载设备103的语音监听功能,车载设备103的显示屏显示如图6中的(b)图所示唤醒窗口602-1。该唤醒窗口602-1可以包括语音识别图标40、“开始播放”和“下一页”等推荐的语音指令。As shown in (a) of FIG. 6 , on the display interface 601 of the in-vehicle device 103, the user clicks the voice ball icon 10 to enable the voice monitoring function of the in-vehicle device 103, and the display screen of the in-vehicle device 103 displays as shown in FIG. 6 . (b) The wake-up window 602-1 is shown in the figure. The wake-up window 602-1 may include the voice recognition icon 40, recommended voice commands such as "start playing" and "next page".
一种可能的实现方式中,当用户开了车载设备103的语音监听功能,手机102可以获取界面的内容,判断界面中是否包括图标(或者称为图片),并按照一定的顺序给图标加上数字角标。In a possible implementation, when the user enables the voice monitoring function of the vehicle-mounted device 103, the mobile phone 102 can obtain the content of the interface, determine whether the interface includes icons (or pictures), and add icons to the icons in a certain order. Digital corner markers.
可选地,用户开启车载设备103的语音监听功能的操作可以触发在界面中的图标上增加数字角标,或者,可以在用户开启车载设备103的语音监听功能之前,通过其他预设的操作来触发在界面中的图标上增加数字角标,本申请实施例对此不作限定。Optionally, the user's operation of enabling the voice monitoring function of the in-vehicle device 103 may trigger the addition of a digital corner mark to the icon in the interface, or, before the user enables the voice monitoring function of the in-vehicle device 103, other preset operations may be used. Trigger to add a digital corner label on the icon in the interface, which is not limited in this embodiment of the present application.
可选地,本申请实施例所说的增加数字角标的图标还可以包括不同应用的应用图标。例如如图6中的(a)图所示的界面601导航菜单区域的主页图标、导航图标、电话图标、音乐图标等。Optionally, the icons with added digital corner marks mentioned in the embodiments of the present application may also include application icons of different applications. For example, the interface 601 as shown in (a) of FIG. 6 navigates the home icon, navigation icon, phone icon, music icon, etc. in the menu area.
或者,本申请实施例所说的增加数字角标的图标还可以包括界面601上显示的图片。例如如图6中的(a)图所示的界面601的内容区域的歌曲1的歌手图片、歌曲列表1的图片等,对界面上包括的图片标记数字角标。Alternatively, the icon for adding a digital superscript mentioned in the embodiment of the present application may also include a picture displayed on the interface 601 . For example, in the content area of the interface 601 as shown in (a) of FIG. 6 , the singer picture of song 1, the picture of song list 1, etc., mark the pictures included on the interface with digital superscripts.
示例性的,当界面601的内容区域的歌曲或者歌曲列表显示为外语时,可能用户也无法准确地发出包含歌曲名称的语音指令,通过该在不同的歌曲的图片上、或者歌曲列表的图片上增加数字角标,用户可以通过包含数字角标的语音指令来执行操作,方便快捷,提高了用户体验。Exemplarily, when a song or a song list in the content area of the interface 601 is displayed in a foreign language, the user may not be able to accurately issue a voice command including the song name, by using the pictures of different songs or pictures of the song list. With the addition of digital corner markers, users can perform operations through voice commands containing digital corner markers, which is convenient and quick, and improves user experience.
又或者,本申请实施例所说的增加数字角标的图标还可以包括界面601上显示的按钮等控件,本申请实施例对此不作限定。Alternatively, the icon for adding a digital corner label mentioned in the embodiment of the present application may also include controls such as buttons displayed on the interface 601, which are not limited in the embodiment of the present application.
示例性的,以应用图标为例,在给应用图标增加数字角标时,数字角标的显示大小可以适配该应用图标在车载设备103的界面上显示的大小尺寸。例如,如果界面上的应用图标较小,增加数字角标的话可能造成数字角标太小,用户无法准确获取该数字角标,因此,如果应用图标较小,如该应用图标在车载设备的显示屏上所占的像素小于或等于预设像素时,可以不标记该应用图标,而仅标记大于预设像素的应用图标,本申请实施例对此不作限定。Exemplarily, taking an application icon as an example, when adding a digital subscript to the application icon, the display size of the numeric subscript can be adapted to the size of the application icon displayed on the interface of the in-vehicle device 103 . For example, if the application icon on the interface is small, adding a digital corner mark may cause the digital corner mark to be too small, and the user cannot obtain the digital corner mark accurately. Therefore, if the application icon is small, such as the display of the application icon on the in-vehicle device When the pixels occupied on the screen are less than or equal to the preset pixels, the application icon may not be marked, but only the application icons larger than the preset pixels are marked, which is not limited in this embodiment of the present application.
示例性的,当手机102获取如图6中的(a)图所示的界面601,确定界面601中包括了不同歌曲的图标、不同歌曲列表的图标。手机102可以按照界面上图标从左到右、从上到下的排列顺序,给每个图标上增加如图6中的(b)图所示的数字角标60,例如给歌曲1增加数字角标1,歌曲2增加数字角标2,以此类推,给该音乐界面上的所有图标都增加数字角标60。Exemplarily, when the mobile phone 102 acquires the interface 601 as shown in (a) of FIG. 6 , it is determined that the interface 601 includes icons of different songs and icons of different song lists. The mobile phone 102 can add a digital corner mark 60 as shown in (b) of FIG. 6 to each icon according to the arrangement order of the icons on the interface from left to right and from top to bottom, for example, add a digital corner to song 1. Mark 1, add a digital corner mark 2 to song 2, and so on, add a digital corner mark 60 to all the icons on the music interface.
具体地,上述过程可以由手机102的content sensor获取界面上内容,并由手机102上安装的HiCar应用15从content sensor获取该界面内容,HiCar应用15可以根据该界面内容判断当前的界面是否有图标。当该界面中包括图标时,按照一定的顺序依次给图标上增加数字角标60,本申请实施例对此不作限定。Specifically, the above-mentioned process can obtain the content on the interface by the content sensor of the mobile phone 102, and obtain the interface content from the content sensor by the HiCar application 15 installed on the mobile phone 102, and the HiCar application 15 can judge whether the current interface has an icon according to the interface content. . When an icon is included in the interface, a digital corner mark 60 is added to the icon in a certain order, which is not limited in this embodiment of the present application.
当车载设备103的界面上每个歌曲的图片或者歌曲列表的图片被标记了数字角标之后,用户可以通过输入包括数字角标的语音指令,并通过该语音指令执行对该数字角标所在图片的点击操作。示例性的,如图6中的(b)图所示,车载设备103的界面上包括的图片被标记了数字角标之后,用户可以输入“1”或者“播放1”等包含对应的数字的语音指令,响应于用户输入的语音指令,手机102可以后台执行对标记为1的歌曲1的点击操作,并显示如图6中的(c)图所示的界面603,歌曲1的暂停按钮30转变为播放按钮20,且车载设备103开始播放该歌曲1。When the picture of each song or the picture of the song list on the interface of the in-vehicle device 103 is marked with a digital superscript, the user can input a voice command including the digital superscript, and use the voice command to execute the picture of the digital superscript. Click Action. Exemplarily, as shown in (b) of FIG. 6 , after the pictures included on the interface of the in-vehicle device 103 are marked with a numerical corner mark, the user can input “1” or “play 1”, etc. containing the corresponding number. Voice command, in response to the voice command input by the user, the mobile phone 102 can perform a click operation on the song 1 marked as 1 in the background, and display the interface 603 as shown in (c) in FIG. 6 , the pause button 30 of the song 1 The transition is made to the play button 20, and the in-vehicle device 103 starts to play the song 1.
具体地,结合图2的软件架构和功能模块,在该实现过程中,用户通过语音指令说出对应的数字,比如数字1,通过应用程序层的智慧语音13识别用户的语音指令,将用户的语音指令转化成文本“1”。同时由应用程序框架层的content sensor对当前界面的内容进行提取,由可见即可说11进行控件内容的分析,获取控件的文字信息。例如将识别到的控件信息与智慧语音返回的文本“1”进行匹配。匹配成功后,则对歌曲1的图标执行点击操作,并将对歌曲1的图标的点击事件,传递到音乐应用自身的业务逻辑中,实现对应业务逻辑的跳转。同时HiCar应用15结束此轮的语音识别,退出智慧语音的语音识别功能,语音球图标10恢复成如图6中的(c)图所示的静止状态,唤醒窗口602-1以及推荐的语音指令等消失。Specifically, in combination with the software architecture and functional modules of FIG. 2 , in the implementation process, the user speaks a corresponding number, such as number 1, through a voice command, and the user's voice command is recognized by the intelligent voice 13 of the application layer, and the user's voice command is The voice command translates into the text "1". At the same time, the content sensor of the application framework layer extracts the content of the current interface, analyzes the content of the control from the visible, and obtains the text information of the control. For example, match the recognized control information with the text "1" returned by Smart Voice. After the matching is successful, the click operation is performed on the icon of song 1, and the click event on the icon of song 1 is transmitted to the business logic of the music application itself, so as to realize the jump of the corresponding business logic. At the same time, the HiCar application 15 ends this round of voice recognition, exits the voice recognition function of Smart Voice, the voice ball icon 10 returns to the static state as shown in (c) in FIG. 6 , wakes up the window 602-1 and the recommended voice command etc. to disappear.
在另一种可能的实现方式中,在增加数字角标的过程中,可以按照一定的原则,为当前界面上的一个或多个控件中的部分控件上增加数字角标,能够增加数字角标的所述部分控件可以包括识别到当前界面的一个或多个控件中的所有为图片类型的控件;或者识别到当前界面的一个或多个控件中具有网格型排列顺序的控件;或者识别到当前界面的一个或多个控件中具有列表型排列顺序的控件;或者识别到当前界面的一个或多个控件中显示尺寸大于或等于预设值的控件。In another possible implementation, in the process of adding a digital corner mark, a digital corner mark may be added to some controls in one or more controls on the current interface according to certain principles, which can increase all the digital corner marks. Said part of the controls may include all the controls identified as picture type in one or more controls of the current interface; or identified as controls with grid-type arrangement order in one or more controls of the current interface; or identified as the current interface One or more of the controls in the list have a list-type arrangement order; or one or more controls in the current interface are identified and the display size is greater than or equal to the preset value.
在又一种可能的实现方式中,在增加数字角标的过程中,可以按照预设的顺序,在所述一个或多个控件中的部分控件上增加所述数字角标。可选地,该预设的顺序包括从左到右,和/或从上到下的顺序。In yet another possible implementation manner, during the process of adding a digital subscript, the numeric subscript may be added to some controls in the one or more controls according to a preset sequence. Optionally, the preset order includes a left-to-right and/or a top-to-bottom order.
在又一种可能的实现方式中,获取界面包括的控件的过程中,如果控件中包括图标控件,可以获取所述图标控件的轮廓,根据所述轮廓确定描述所述图标控件的轮廓关键词;确定所述语音指令中的包括的一个或多个关键词和所述图标控件的轮廓关键词的匹配程度,将所述匹配程度最大的图标控件确定为所述目标控件。In yet another possible implementation, in the process of acquiring the controls included in the interface, if the controls include icon controls, the outline of the icon controls can be acquired, and the outline keywords describing the icon controls can be determined according to the outlines; The matching degree between one or more keywords included in the voice instruction and the outline keyword of the icon control is determined, and the icon control with the largest matching degree is determined as the target control.
示例性的,在音乐播放界面,当用户输入语音指令为“我喜欢”,语音指令和界面包括的控件匹配过程中,假如音乐播放界面上的收藏按钮的描述词为“喜欢”、“收藏”,该收藏按钮的轮廓为“桃心”的形状,那么该桃心的形状可以被匹配为“我喜欢”,该方式可以泛化用户语音指令,更智能的将用户指令和界面上的控件进行匹配。Exemplarily, in the music playing interface, when the user inputs the voice command as "I like", during the matching process between the voice command and the controls included in the interface, if the description words of the favorite button on the music playing interface are "like", "favorite" , the outline of the favorite button is the shape of "peach heart", then the shape of the peach heart can be matched as "I like", this method can generalize the user's voice command, and more intelligently combine the user's command with the controls on the interface match.
一种可能的实现方式中,在将识别到的控件信息与智慧语音识别的语音指令文本进行匹配的过程中,优先强匹配,即控件信息与智慧语音识别的语音指令文本需要一一对应。如果强匹配不成功,则进行弱匹配,即判断控件信息是否包含智慧语音识别的语音指令文本,只要包含部分智慧语音识别的语音指令文本,就判断为匹配成功,并对该控件信息对应的控件执行点击操作。In a possible implementation manner, in the process of matching the recognized control information with the voice command text of the smart voice recognition, strong matching is given priority, that is, the control information and the voice command text of the smart voice recognition need to be in one-to-one correspondence. If the strong match is unsuccessful, a weak match is performed, that is, it is judged whether the control information contains the voice command text of the intelligent voice recognition. As long as it contains part of the voice command text of the intelligent voice recognition, it is judged that the matching is successful, and the control corresponding to the control information is determined. Perform a click action.
通过上述方法,本申请实施例通过对界面上显示的图片、应用图标等可以被点击的控件增加数字角标,用户可以发出包括数字的语音指令,通过该语音指令执行对该数字角标标记的控件的点击操作等。当用户看见界面的数字角标后,再发出包括数字的语音指令,通过语音识别将包括数字的语音指令进行转换,从而确定该数字对应的图片、应用图标等可以被点击的控件,执行点击操作。该过程中用户不需要记住多种复杂的语音指令,仅通过包括数字的语音指令实现语音交互过程,更加简单、便捷,降低了语音交互的难度,提高了用户体验。Through the above method, in this embodiment of the present application, a digital corner mark is added to the clickable controls such as pictures and application icons displayed on the interface, and the user can issue a voice command including numbers, and the digital corner mark is executed through the voice command. Control click operation, etc. When the user sees the digital corner mark on the interface, he sends out a voice command including a number, and converts the voice command including a number through voice recognition, so as to determine the picture, application icon and other controls corresponding to the number that can be clicked, and execute the click operation. . In this process, the user does not need to memorize a variety of complex voice commands, and only realizes the voice interaction process through digital voice commands, which is simpler and more convenient, reduces the difficulty of voice interaction, and improves user experience.
以上结合附图4至图6,介绍了针对音乐等有声类应用,用户进行语音交互的过程,除此之外,本申请实施例还可以应用于导航类应用,下面结合图7介绍在导航类应用中实现语音交互的过程。4 to 6 , the process of user voice interaction for audio applications such as music has been described above. In addition, the embodiments of the present application can also be applied to navigation applications. The following describes the navigation applications in conjunction with FIG. 7 . The process of implementing voice interaction in an application.
图7是本申请实施例提供的又一例车载设备上实现语音交互过程的界面示意图。FIG. 7 is a schematic interface diagram of another example of a voice interaction process implemented on a vehicle-mounted device provided by an embodiment of the present application.
应理解,对于车载设备103,屏幕显示系统的导航菜单区域401显示主页、导航、电话和音乐等导航菜单,不同导航菜单之间的切换也可以通过用户的语音指令来控制。It should be understood that for the in-vehicle device 103, the navigation menu area 401 of the screen display system displays navigation menus such as home page, navigation, phone and music, and switching between different navigation menus can also be controlled by the user's voice commands.
示例性的,由车载设备103的屏幕显示界面由图6中的(c)图示出的音乐界面跳转到导航界面的过程,也可以由语音指令来实现。具体地,如图7中的(a)图所示,车载设备103的屏幕显示系统显示了当前输出的界面701。在该界面701的内容区域,歌曲1显示为播放的状态,歌曲2、歌曲3、歌曲4和歌曲5都为暂停播放的状态,显示暂停按钮30。Exemplarily, the process of jumping from the music interface shown in (c) of FIG. 6 to the navigation interface from the screen display interface of the in-vehicle device 103 can also be implemented by voice commands. Specifically, as shown in (a) of FIG. 7 , the screen display system of the in-vehicle device 103 displays the currently output interface 701 . In the content area of the interface 701, song 1 is displayed in a playing state, song 2, song 3, song 4 and song 5 are all in a paused state, and a pause button 30 is displayed.
如图7中的(a)图所示,在车载设备103的显示界面701上,用户点击语音球图标10开启了车载设备103的语音监听功能,车载设备103的显示屏显示如图7中的(b)图所示唤醒窗口702-1。该唤醒窗口702-1可以包括语音识别图标40、“开始搜索”和“下一页”等推荐的语音指令。As shown in (a) of FIG. 7 , on the display interface 701 of the in-vehicle device 103 , the user clicks the voice ball icon 10 to enable the voice monitoring function of the in-vehicle device 103 , and the display screen of the in-vehicle device 103 displays as shown in FIG. 7 (b) The wake-up window 702-1 is shown in the figure. The wake-up window 702-1 may include the voice recognition icon 40, recommended voice commands such as "start search" and "next page".
应理解,该唤醒窗口702-1中推荐的语音指令可以不同于前述图4中的(b)图所示 的唤醒窗口403-1、图5中的(b)图所示的唤醒窗口502-1和(c)图所示的唤醒窗口503-1等显示的推荐的语音指令,唤醒窗口中显示的推荐的语音指令可以跟随当前界面上的显示内容进行相应的变化,显示与当前界面上的显示内容相关的语音指令,或者还可以显示与当前界面上的显示内容不相关的语音指令,本申请实施例对此不作限定。It should be understood that the voice commands recommended in the wake-up window 702-1 may be different from the wake-up window 403-1 shown in (b) of FIG. 4 and the wake-up window 502- shown in (b) of FIG. 5. 1 and (c) the recommended voice commands displayed in the wake-up window 503-1, etc. shown in Figures 1 and (c), the recommended voice commands displayed in the wake-up window can follow the display content on the current interface to make corresponding changes, and display the same as the one on the current interface. Voice commands related to the displayed content, or voice commands not related to the displayed content on the current interface may also be displayed, which is not limited in this embodiment of the present application.
示例性的,如图7中的(b)图所示,当用户输入“进入对话模式,打开导航”的语音指令,车载设备103获取到用户的指令之后,可以将该语音指令发送给手机102,手机102识别用户的语音指令,响应于该语音指令,开启车载设备103和用户的语音交互功能,即车载设备103一直处于监听语音指令的状态,用户可以不需要多次启动车载设备103来监听获取用户的语音指令。Exemplarily, as shown in (b) of FIG. 7 , when the user inputs a voice instruction of “enter the dialogue mode, open the navigation”, after the in-vehicle device 103 obtains the user’s instruction, the voice instruction can be sent to the mobile phone 102. , the mobile phone 102 recognizes the user's voice command, and in response to the voice command, enables the voice interaction function between the in-vehicle device 103 and the user, that is, the in-vehicle device 103 is always in the state of monitoring the voice command, and the user does not need to activate the in-vehicle device 103 multiple times to monitor Get the user's voice command.
此外,响应于该语音指令,如图7中的(c)图所示,车载设备103的显示屏的显示界面可以从音乐菜单跳转到导航菜单的界面703。在该导航菜单的界面703上,可以为用户提供右侧区域示出的包括“美食”、“加油站”、“商场”等多种类型的搜索选项,以中间的搜索区域,此处对该导航菜单的界面703的界面内容不再赘述。In addition, in response to the voice instruction, as shown in (c) of FIG. 7 , the display interface of the display screen of the in-vehicle device 103 can jump from the music menu to the interface 703 of the navigation menu. On the interface 703 of the navigation menu, the user can be provided with various types of search options including "food", "gas station", "shopping mall", etc. shown in the right area. The interface content of the interface 703 of the navigation menu is not repeated here.
一种可能的实现方式中,当车载设备103获取并执行完一次用户的指令后,该唤醒窗口可以短暂消失,后台监测用户的语音指令。当再次监测到用户发出语音指令时,可以再次悬浮显示在显示屏上。In a possible implementation manner, after the in-vehicle device 103 obtains and executes the user's instruction once, the wake-up window may disappear briefly, and the user's voice instruction is monitored in the background. When the voice command issued by the user is detected again, it can be suspended and displayed on the display screen again.
示例性的,如图7中的(d)图所示,用户开始发出语音指令,出现该唤醒窗口704-1。可选地,该唤醒窗口704-1显示的推荐指令可以适配当前的界面内容,或者该推荐指令可以关联用户使用该导航应用时搜索频率最高的历史数据。例如该唤醒窗口704-1可以包括语音识别图标40、“导航去公司”和“导航去商场”等推荐的语音指令,本申请实施例对此不作限定。Exemplarily, as shown in (d) of FIG. 7 , the user starts to issue a voice command, and the wake-up window 704-1 appears. Optionally, the recommended instruction displayed in the wake-up window 704-1 may be adapted to the current interface content, or the recommended instruction may be associated with historical data with the highest search frequency when the user uses the navigation application. For example, the wake-up window 704-1 may include the voice recognition icon 40, and recommended voice commands such as “navigate to the company” and “navigate to the mall”, which are not limited in this embodiment of the present application.
当用户输入“搜索美食”的语音指令,车载设备103获取到用户的指令之后,可以将该语音指令发送给手机102,手机102识别用户的语音指令,响应于该语音指令,模拟点击如图7中的(d)图所示的界面704上的“美食”选项,并为用户显示如图7中的(e)图所示的搜索结果界面705。示例性的,该界面705上显示了多项搜索的餐厅,餐厅可以按照与用户当前位置之间的距离进行排序,并为用户显示餐厅人均单价、距离等内容,本申请实施例对此不作限定。When the user inputs the voice command of "search for food", after the in-vehicle device 103 obtains the user's command, it can send the voice command to the mobile phone 102, and the mobile phone 102 recognizes the user's voice command, and in response to the voice command, simulates clicking as shown in Figure 7 The "food" option on the interface 704 shown in (d) of FIG. 7 is displayed for the user, and the search result interface 705 shown in (e) of FIG. 7 is displayed for the user. Exemplarily, multiple searched restaurants are displayed on the interface 705, and the restaurants can be sorted according to the distance from the user's current location, and the per capita unit price and distance of the restaurant are displayed for the user, which is not limited in this embodiment of the present application. .
可选地,如图7中的(e)图所示,界面705上显示的唤醒窗口705-1显示的推荐指令可以重新适配当前的界面内容,例如该唤醒窗口705-1可以包括语音识别图标40、“开始搜索”和“下一页”等推荐的语音指令,本申请实施例对此不作限定。Optionally, as shown in (e) of FIG. 7 , the recommended instruction displayed in the wake-up window 705-1 displayed on the interface 705 can be re-adapted to the current interface content. For example, the wake-up window 705-1 can include voice recognition. Icon 40, "start search", "next page" and other recommended voice commands, which are not limited in this embodiment of the present application.
当用户输入“下一页”的语音指令,车载设备103获取到用户的指令之后,响应于该语音指令,为用户显示如图7中的(f)图所示的搜索结果界面706。应理解,该界面706是对界面705执行了如黑色箭头所示的滑动之后显示的界面。当用户选定目标餐厅为“5.XX轻食餐厅”之后,可以继续输入“导航去5”的语音指令,车载设备103获取到用户的指令之后,响应于该语音指令,为用户显示如图7中的(g)图所示的导航路线界面707,该界面707包括去5.XX轻食餐厅的路线和距离等,本申请实施例对此不作限定。When the user inputs the voice command of "next page", after the in-vehicle device 103 obtains the user's command, in response to the voice command, a search result interface 706 as shown in (f) of FIG. 7 is displayed for the user. It should be understood that the interface 706 is an interface displayed after performing a swipe on the interface 705 as indicated by the black arrow. When the user selects the target restaurant as "5.XX light food restaurant", he can continue to input the voice command of "navigate to 5". After the in-vehicle device 103 obtains the user's command, in response to the voice command, the user can display the display as shown in the figure below. The navigation route interface 707 shown in (g) in Figure 7, the interface 707 includes the route and distance to the 5.XX light food restaurant, etc., which is not limited in this embodiment of the present application.
一种可能的实现方式中,如图7中的(f)图所示,用户的语音指令包括“5”,车载设备103获取到用户的指令之后,可以将该语音指令发送给手机102,手机102可以根据该指令进行界面匹配,即截取指令中的关键词,并和当前界面上的所有界面上的控件等包 含的关键字或描述信息进行匹配。示例性的,例如用户指令关键字为“导航”、“5”,手机检测到界面上“5.XX轻食餐厅”所在选项的关键字为“5”、“轻食餐厅”等,用户指令和该选项的匹配程度最高,因此执行点击界面706上的“5.XX轻食餐厅”选项,并显示如图7中的(g)图所示的界面707。In a possible implementation manner, as shown in (f) of FIG. 7 , the user's voice command includes "5". After the in-vehicle device 103 obtains the user's command, the voice command can be sent to the mobile phone 102. 102 may perform interface matching according to the instruction, that is, intercept the keyword in the instruction, and match it with the keyword or description information contained in the controls on all interfaces on the current interface. Exemplarily, for example, the keywords of the user instruction are "navigation" and "5", and the mobile phone detects that the keywords of the option "5.XX light food restaurant" on the interface are "5", "light food restaurant", etc., the user instruction The matching degree with this option is the highest, so click the "5.XX light food restaurant" option on the interface 706 to be executed, and the interface 707 as shown in (g) in FIG. 7 is displayed.
上述方法,通过获取界面上显示的可见且可以被用户执行点击操作的文本控件、图片控件、按钮以及图标控件等,再根据获取的用户语音指令,在界面上匹配目标控件,并执行对界面上的目标控件的点击等操作。The above method obtains the text controls, picture controls, buttons, and icon controls that are visible on the interface and can be clicked by the user, and then matches the target controls on the interface according to the obtained user voice commands, and executes the matching on the interface. The target control's click and other operations.
结合前述列举的控件类型,表2示出了几种常见的导航类应用的页面的控件。如下表2所示,对于用户常用的百度地图、高德地图等导航类应用,不同的页面可能包括不同的控件,以及每一款应用的一级页面和二级页面包括的控件的数量和种类都不同。Combined with the control types listed above, Table 2 shows several common controls on pages of navigation applications. As shown in Table 2 below, for navigation applications such as Baidu Map and AutoNavi Map that are commonly used by users, different pages may include different controls, as well as the number and types of controls included in the primary and secondary pages of each application. all different.
示例性的,以百度地图应用为例,以一级页面可以理解为用户点击百度地图应用图标之后进入的百度地图的主界面,包括“放大”、“缩小”、“定位”、“路况”、“搜索”、“更多”、“退出”等控件,二级界面为用户点击百度地图的主界面上的任意菜单或者控件进入的下一级页面,例如路线偏好设置页面等。在本申请实施例中,每一个页面上的页面内容、控件都可以被手机获取,并识别到每一个控件包括的文本信息,此处不再赘述。Exemplarily, taking the Baidu map application as an example, the first-level page can be understood as the main interface of Baidu map entered by the user after clicking the Baidu map application icon, including "zoom in", "zoom out", "positioning", "road conditions", "Search", "More", "Exit" and other controls, the second-level interface is the next-level page that the user clicks on any menu or control on the main interface of Baidu Maps to enter, such as the route preference setting page, etc. In the embodiment of the present application, the page content and controls on each page can be acquired by the mobile phone, and the text information included in each control can be recognized, which will not be repeated here.
表2Table 2
Figure PCTCN2021113542-appb-000008
Figure PCTCN2021113542-appb-000008
以上结合有声类应用、导航类应用,介绍了本申请实施例提供的语音交互过程,应理解,本申请实施例可以识别出以上列举的不同应用的不同页面的页面内容、控件等,其中,还可以通用指令控件。The above describes the voice interaction process provided by the embodiments of the present application in combination with audio applications and navigation applications. It should be understood that the embodiments of the present application can identify page contents, controls, etc. of different pages of different applications listed above, and the Can be generic command control.
应理解,通用指令控件可以包括界面上的返回、向左翻/向右翻、向上翻/向下翻、上一页/下一页等控件。It should be understood that the general instruction controls may include controls on the interface, such as return, turn left/turn right, turn up/down, page up/page down, and the like.
示例性的,用户开启语音识别功能之后,当智慧语音识别到上述通用指令文本,将文本发送给可见可说。对于返回指令,通过对系统的inject key event方法,发送返回键的点击事件(key event)给当前界面所属的应用,由当前界面所属的应用通过对返回按键事件 的监听,收到对应返回事件,进行返回业务的处理。Exemplarily, after the user enables the voice recognition function, when the intelligent voice recognizes the above-mentioned general instruction text, the text is sent to Visible and Talkable. For the return instruction, through the inject key event method of the system, the click event of the return key (key event) is sent to the application to which the current interface belongs, and the application to which the current interface belongs will receive the corresponding return event by monitoring the return key event. Process the return business.
对于向左翻或向右翻、向上翻或向下翻、上一页或下一页等控件,则是通过content sensor返回的界面控件,识别出对应的滑动列表控件。在分析需要滑动的方向,调用控件自身的滑动方法,比如RecyclerView的scrollBy滑动方法,进行上下滑动实现。实现左右滑动,则是根据控件自身是否支持左右滑动的特性,当控件支持左右滑动时,则在调用的scrollBy滑动方法中传入水平方向移动的距离,通过正负值进行判断左滑还是右滑。当控件支持上下滑动时,则传入竖直方向移动的距离,通过正负值进行判断上滑还是下滑,此处对该通用指令控件的实现过程不再赘述。图8是本申请实施例提供的又一例车载设备上实现语音交互过程的界面示意图,不同导航菜单之间的切换也可以通过用户的语音指令来控制。For controls such as flip left or right, flip up or down, previous page or next page, the corresponding sliding list control is identified through the interface control returned by the content sensor. When analyzing the direction of sliding, call the sliding method of the control itself, such as the scrollBy sliding method of RecyclerView, to implement up and down sliding. To implement left and right sliding, it is based on whether the control itself supports the feature of left and right sliding. When the control supports left and right sliding, the distance moved in the horizontal direction is passed in the scrollBy sliding method called, and the positive and negative values are used to judge left or right sliding. . When the control supports sliding up and down, the distance moved in the vertical direction is passed in, and the positive and negative values are used to determine whether to slide up or down. The implementation process of this general command control will not be repeated here. FIG. 8 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application. Switching between different navigation menus can also be controlled by a user's voice command.
示例性的,图8示出了车载设备103的屏幕显示界面由图7中的(g)图示出的导航路线界面跳转到电话菜单的过程,该过程也可以由用户的语音指令来实现。具体地,如图8中的(a)图所示,车载设备103的屏幕显示系统显示了当前输出的导航路线界面801。该界面801上的唤醒窗口显示了语音识别图标40、“退出导航”和“搜索”等推荐的语音指令。Exemplarily, FIG. 8 shows the process that the screen display interface of the in-vehicle device 103 jumps from the navigation route interface shown in (g) in FIG. 7 to the phone menu, and this process can also be implemented by the user's voice command. . Specifically, as shown in (a) of FIG. 8 , the screen display system of the in-vehicle device 103 displays the currently output navigation route interface 801 . The wake-up window on the interface 801 displays the voice recognition icon 40, recommended voice commands such as "exit navigation" and "search".
当用户输入“打开电话簿”的语音指令,车载设备103获取到用户的指令之后,响应于该语音指令,为用户显示如图8中的(b)图所示的电话应用界面802,该界面802可以包括通话记录、联系人和拨号等子菜单,该界面802当前显示了用户的通话记录等内容,此处不再赘述。When the user inputs the voice command of "opening the phone book", after the in-vehicle device 103 obtains the user's command, in response to the voice command, a phone application interface 802 as shown in (b) in FIG. 8 is displayed for the user. 802 may include submenus such as call records, contacts, and dialing, and the interface 802 currently displays content such as the user's call records, which will not be repeated here.
通过上述实施例,通过获取界面上显示的可见且可以被用户执行点击操作的控件,在获取用户可以输入语音指令执行对界面上的任意控件的点击等操作。对于显示屏上显示的所有应用、所有可见的内容,用户都可以通过语音指令进行控制。特别地,在驾驶场景中,减少了用户的手动操作,从而避免用户分心,提高了用户驾驶场景中的安全性。Through the above-mentioned embodiments, by obtaining the controls displayed on the interface that are visible and that can be clicked by the user, the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands. In particular, in the driving scene, the manual operation of the user is reduced, thereby avoiding user distraction and improving the safety of the user in the driving scene.
此外,对于语音交互的场景,不同的应用不需要单独开发语音助手,换言之,用户可以通过同样的语音交互方式控制多个不同的应用,语音助手和应用不再分离,丰富了应用生态。In addition, for voice interaction scenarios, different applications do not need to develop voice assistants separately. In other words, users can control multiple different applications through the same voice interaction method, and voice assistants and applications are no longer separated, which enriches the application ecology.
结合上述实施例及相关附图,介绍了本申请实施例提供的语音交互的方法在界面上的显示过程,以上述附图所示的手机102和车载设备103场景为例,下面结合图9介绍该语音交互的方法的具体实现过程。The display process of the voice interaction method provided by the embodiment of the present application on the interface is described with reference to the above-mentioned embodiments and the related drawings. Taking the scenarios of the mobile phone 102 and the in-vehicle device 103 shown in the above-mentioned drawings as an example, the following description will be introduced in conjunction with FIG. 9 . The specific implementation process of the voice interaction method.
图9是本申请实施例提供的语音交互的方法的示意性流程图,如图9所示,该方法900可以包括以下步骤:FIG. 9 is a schematic flowchart of a method for voice interaction provided by an embodiment of the present application. As shown in FIG. 9 , the method 900 may include the following steps:
901,用户打开第一应用。901, the user opens the first application.
可选地,该第一应用可以是实际运行在手机102侧的应用,例如手机102前台运行的应用或者后台运行的应用。Optionally, the first application may be an application actually running on the side of the mobile phone 102 , for example, an application running in the foreground or an application running in the background of the mobile phone 102 .
应理解,该步骤901的执行可以由用户在车载设备103侧执行,并由车载设备103传递回手机102,在手机102后台启动该第一应用,或者由用户在手机102侧执行,直接投屏显示到车载设备103的显示屏上,本申请实施例对此不作限定。It should be understood that the execution of this step 901 can be performed by the user on the side of the in-vehicle device 103, and transmitted back to the mobile phone 102 by the in-vehicle device 103 to start the first application in the background of the mobile phone 102, or the user can perform it on the side of the mobile phone 102 to directly cast the screen It is displayed on the display screen of the in-vehicle device 103, which is not limited in this embodiment of the present application.
902,第一应用进行界面刷新。902, the first application performs interface refresh.
903,触发界面识别。可选地,第一应用进行界面刷新可以触发手机102通过算法服 务进行界面识别。903, trigger interface identification. Optionally, performing interface refresh by the first application may trigger the mobile phone 102 to perform interface identification through an algorithm service.
904,手机102进行界面热词识别,获取界面内容的信息。应理解,该904的界面热词识别过程的时延小于500毫秒。904 , the mobile phone 102 performs interface hot word recognition to obtain information of the interface content. It should be understood that the time delay of the interface hot word recognition process in this 904 is less than 500 milliseconds.
可选地,界面内容可以包括当前显示界面上的用户可见的部分。示例性的,用户可见部分可以包括界面上显示的图片、文字、菜单、选项、图标、按钮等,本申请实施例中,统一称为“控件”等。Optionally, the interface content may include user-visible portions of the currently displayed interface. Exemplarily, the user-visible part may include pictures, text, menus, options, icons, buttons, etc. displayed on the interface, which are collectively referred to as "controls" and the like in this embodiment of the present application.
应理解,本申请实施例中,当通过用户的语音指令匹配到界面上的目标控件时,可以对该目标控件执行操作。可选地,该操作可以包括单击、点击、双击、滑动、右击等输入操作。It should be understood that, in this embodiment of the present application, when a user's voice instruction is matched to a target control on the interface, an operation may be performed on the target control. Optionally, the operation may include input operations such as clicking, clicking, double-clicking, sliding, and right-clicking.
还应理解,在本申请实施例中,通过获取用户的语音指令,解析语音指令之后,将语音指令匹配界面上的目标控件,即识别出用户的意图,进一步执行对目标控件的点击操作。It should also be understood that, in the embodiment of the present application, after obtaining the user's voice command, after parsing the voice command, the voice command is matched with the target control on the interface, that is, the user's intention is recognized, and the click operation on the target control is further performed.
905,用户启动语音识别功能。905. The user activates the voice recognition function.
应理解,用户可以在车载设备103上启动语音识别功能。在本申请实施例中,启动语音识别功能可以是启动车载设备103开始监听用户的语音指令;或者,启动语音识别功能还可以理解为起启动车载设备103监听用户的语音指令,并将获取的语音指令传递回手机102,由手机102进行语音指令的分析等,本申请实施例对此不作限定。It should be understood that the user can activate the voice recognition function on the in-vehicle device 103 . In this embodiment of the present application, starting the voice recognition function may be starting the vehicle-mounted device 103 to start monitoring the user's voice command; The command is transmitted back to the mobile phone 102, and the mobile phone 102 analyzes the voice command, etc., which is not limited in this embodiment of the present application.
可选地,用户可以通过车载设备的物理按键或者通过语音启动该语音识别功能。Optionally, the user can activate the voice recognition function through a physical button of the vehicle-mounted device or through voice.
一种可能的实现方式中,在车载设备103的显示界面上,还可以包括语音球图标,如图4中的(a)图所示,用户点击该语音球图标可以开启车载设备103的语音监听功能。可选地,响应于用户的点击操作,车载设备103可以显示如图4中的(b)图所示的唤醒窗口403-1,此处不再赘述。In a possible implementation manner, the display interface of the in-vehicle device 103 may also include a voice ball icon, as shown in (a) in FIG. Function. Optionally, in response to the user's click operation, the in-vehicle device 103 may display a wake-up window 403-1 as shown in (b) of FIG. 4 , which will not be repeated here.
另一种可能的实现方式中,用户还可以通过按压汽车的车控语音按键来开启语音监听功能,例如用户按压图1中的(b)图所示的方向盘上的车控语音按键50来开启车载设备103监听并获取用户语音指令的功能,本申请实施例对此不作限定。In another possible implementation manner, the user can also turn on the voice monitoring function by pressing the car control voice button of the car, for example, the user presses the car control voice button 50 on the steering wheel as shown in (b) in FIG. 1 to turn on the voice monitoring function. The in-vehicle device 103 has a function of monitoring and acquiring a user's voice command, which is not limited in this embodiment of the present application.
906,触发手机的HiCar应用请求获取界面内容的信息。906. Trigger the HiCar application of the mobile phone to request to obtain the information of the interface content.
907,返回界面内容的信息到手机的HiCar应用。907. Return the information of the interface content to the HiCar application of the mobile phone.
908,手机的HiCar应用将获取的界面内容的信息传递到智慧语音服务模块。908, the HiCar application of the mobile phone transmits the acquired information of the interface content to the smart voice service module.
一种可能的实现方式中,在本申请实施例中,智慧语音服务模块可以对应于安装在手机102侧的智慧语音应用程序,即由手机102的智慧语音应用执行图9中提供的服务过程。In a possible implementation manner, in this embodiment of the present application, the smart voice service module may correspond to a smart voice application installed on the mobile phone 102, that is, the smart voice application of the mobile phone 102 executes the service process provided in FIG. 9 .
另一种可能的实现方式中,智慧语音服务模块对应的服务可以由服务器所提供,该场景可以对应图1中的(c)图所示,借助于服务器104的语音分析能力,由手机102将用户的语音指令发送到服务器104,服务器104进行语音指令的分析之后,返回给手机102语音指令的识别结果,此处不再赘述。In another possible implementation manner, the service corresponding to the smart voice service module may be provided by the server, and this scenario may correspond to (c) in FIG. 1. With the help of the voice analysis capability of the server 104, the mobile phone 102 will The user's voice command is sent to the server 104, and after the server 104 analyzes the voice command, it returns the recognition result of the voice command to the mobile phone 102, which will not be repeated here.
909,用户输入语音指令。909. The user inputs a voice command.
910,手机将语音指令发送给智慧语音服务模块。910, the mobile phone sends the voice command to the smart voice service module.
可选地,该步骤909和910的过程可以是用户在车载设备103侧输入语音指令,车载设备103的麦克风获取用户的语音指令后,将语音指令发送到手机的HiCar应用,再经由手机的HiCar应用传递到智慧语音服务模块,有智慧语音服务模块进行用户语音指令的分析。Optionally, the process of steps 909 and 910 may be that the user inputs a voice command on the side of the in-vehicle device 103, and after the microphone of the in-vehicle device 103 obtains the user's voice command, the voice command is sent to the HiCar application of the mobile phone, and then via the HiCar of the mobile phone. The application is passed to the smart voice service module, and the smart voice service module analyzes the user's voice commands.
911,智慧语音服务模块将获取的用户语音指令和界面内容的信息传递给ASR模块。911 , the smart voice service module transmits the acquired information of the user's voice command and interface content to the ASR module.
912,ASR模块根据界面内容的信息,增强识别用户语音指令。912, the ASR module enhances and recognizes the user's voice instruction according to the information of the interface content.
一种可能的实现方式中,ASR模块中有ASR模型,本申请实施例中,可以同步步骤908将获取的当前显示的界面内容的信息传递给ASR模块,即在ASR模型中输入界面内容的信息作为参数,再根据更新后的ASR模型对用户语音指令进行识别。In a possible implementation manner, there is an ASR model in the ASR module. In this embodiment of the present application, the acquired information of the currently displayed interface content can be transferred to the ASR module in synchronization with step 908, that is, the information of the interface content is entered in the ASR model. As a parameter, the user's voice command is recognized according to the updated ASR model.
示例性的,用户语音指令可能包括同音字等,例如用户输入“综艺”,受不同用户的发音等影响,可能经过ASR模块的ASR分析,根据“zong yi”、“zhong yi”等拼音会产生“中意”、“中医”、“忠义”、“综艺”等可能的识别结果,这样的同音字、相近拼音的词,可能导致手机无法准确通过用户的语音指令获取用户的操作意图。结合本申请实施例,根据当前的车载设备103的界面内容信息,例如当前界面显示了很多音频信息、明星照片、视频信息等,ASR模块进行分析时,就会从“中意”、“中医”、“忠义”、“综艺”等可能的识别结果中,选择与当前显示的音频信息、明星照片、视频信息等更相关的“综艺”,进而确定当前用户发出的语音指令为“综艺”。Exemplarily, the user's voice command may include homophones, etc., for example, the user inputs "variety show", which is affected by the pronunciation of different users, and may be analyzed by the ASR module. The possible recognition results of "Zhongyi", "Traditional Chinese Medicine", "Loyalty", "Variety", etc. Such homophones and words with similar pinyin may cause the mobile phone to fail to accurately obtain the user's operation intention through the user's voice command. In conjunction with the embodiments of the present application, according to the current interface content information of the in-vehicle device 103, for example, the current interface displays a lot of audio information, star photos, video information, etc., when the ASR module analyzes, it will select from “Zhongyi”, “Traditional Chinese Medicine”, From the possible identification results such as "loyalty" and "variety show", select "variety show" that is more relevant to the currently displayed audio information, star photos, video information, etc., and then determine that the voice command issued by the current user is "variety show".
通过上述更新的算法实现过程,在现有的ASR模块的语音指令是被过程中引入当前显示的界面内容的信息,从而可以根据当前显示的界面内容的信息准确地分析出用户当前语音指令的使用场景,进而准确地定位当前用户语音指令针对的应用场景,提高了识别语音指令的准确率。Through the above-mentioned updated algorithm implementation process, the voice command of the existing ASR module is the information of the currently displayed interface content introduced into the process, so that the use of the user's current voice command can be accurately analyzed according to the information of the currently displayed interface content. scene, and then accurately locate the application scene targeted by the current user's voice command, and improve the accuracy of recognizing the voice command.
913,ASR模块返回分析的语音指令文本到手机的HiCar应用。913, the ASR module returns the analyzed voice command text to the HiCar application of the mobile phone.
914,手机的HiCar应用向算法服务模块发送语音指令文本。914, the HiCar application of the mobile phone sends a voice command text to the algorithm service module.
915,手机通过一定的算法服务,将语音指令文本和当前的界面内容的信息进行匹配,确定匹配结果。915, the mobile phone uses a certain algorithm service to match the text of the voice command with the information of the current interface content to determine the matching result.
916,向手机的HiCar应用返回匹配结果。916, return the matching result to the HiCar application of the mobile phone.
917,向手机的第一应用返回模拟点击指令,第一应用执行模拟点击操作。917. Return the simulated click instruction to the first application of the mobile phone, and the first application executes the simulated click operation.
918,并将执行点击操作之后的界面反馈到车载设备103的显示屏,提供给用户。918, and the interface after performing the click operation is fed back to the display screen of the in-vehicle device 103 and provided to the user.
919,确定第一应用执行操作之后的操作结果。919. Determine an operation result after the first application performs the operation.
920,向手机的HiCar应用返回操作结果。920. Return the operation result to the HiCar application of the mobile phone.
同时,智慧语音服务模块可以执行图9中虚线框示出的步骤914-1至919-1:At the same time, the smart voice service module can perform steps 914-1 to 919-1 shown by the dotted box in FIG. 9:
914-1,智慧语音服务的NLU模块也可以获取该语音指令文本。914-1, the NLU module of the smart voice service can also obtain the text of the voice command.
915-1,智慧语音服务的NLU模块根据该语音指令文本进行意图识别,确定该语音指令文本对应的用户意图。915-1, the NLU module of the smart voice service performs intention recognition according to the voice command text, and determines the user's intention corresponding to the voice command text.
916-1,返回用户意图。916-1, return user intent.
917-1,向DM模块发送用户意图。917-1, send user intent to DM module.
918-1,DM模块可以根据返回的用户意图,进行意图处理后确定的该用户的语音指令的用户意图。918-1, the DM module may, according to the returned user intent, perform the intent processing and determine the user intent of the user's voice command.
919-1和921,智慧语音服务模块向手机的HiCar应用返回用户意图。应理解,虚线框示出的步骤914-1至919-1可以是可选地的步骤,该过程可以理解为借助于服务器等强大的语音识别功能,精确地分析用户的意图,在手机侧响应于用户的语音指令,结合返回的用户意图,由手机的HiCar应用判定是否执行该用户意图所对应的操作,提高了语音指令识别的准确性。919-1 and 921, the smart voice service module returns the user intent to the HiCar application of the mobile phone. It should be understood that steps 914-1 to 919-1 shown in the dotted box may be optional steps, and this process can be understood as accurately analyzing the user's intention with the help of a powerful speech recognition function such as a server, and responding on the mobile phone side Based on the user's voice command and the returned user intent, the HiCar application of the mobile phone determines whether to execute the operation corresponding to the user's intent, which improves the accuracy of voice command recognition.
应理解,以上过程可以理解为获取用户的语音指令之后,根据所述语音指令,确定用户意图的过程,例如确定用户当前输入的语音指令是要点击当前界面上的哪一个控件。It should be understood that the above process can be understood as a process of determining the user's intention according to the voice command after acquiring the user's voice command, for example, determining which control on the current interface is to be clicked by the voice command currently input by the user.
一种可能的实现方式中,根据用户的语音指令,确定所述语音指令和所述一个或多个控件中的每一个控件的匹配程度,将所述匹配程度最大的控件确定为用户要执行点击操作的目标控件。In a possible implementation manner, according to the user's voice command, the matching degree of the voice command and each of the one or more controls is determined, and the control with the largest matching degree is determined as the user to perform the click. The target control for the operation.
可选地,确定语音指令和所述一个或多个控件中的每一个控件的匹配程度的过程中,可以提取用户语音指令包含的一个或多个关键词;确定所述一个或多个关键词中的每一个关键词和所述一个或多个控件中的每一个控件的描述信息的匹配程度;将所述匹配程度最大的控件确定为所述目标控件。Optionally, in the process of determining the degree of matching between the voice command and each of the one or more controls, one or more keywords contained in the user's voice command may be extracted; the one or more keywords may be determined. The degree of matching between each keyword in the and the description information of each of the one or more controls; and determining the control with the greatest degree of matching as the target control.
可选地,关键词可以包括语音指令的字、词、部分或全部汉字的拼音等,本申请实施例对此不作限定。Optionally, the keywords may include words, words, pinyin of part or all of the Chinese characters of the voice command, etc., which are not limited in this embodiment of the present application.
可选地,该每一个控件的描述信息可以包括该控件的轮廓信息、文本信息、颜色信息、位置信息、图标信息等,本申请实施例对此不作限定。Optionally, the description information of each control may include outline information, text information, color information, position information, icon information, etc. of the control, which is not limited in this embodiment of the present application.
当通过用户语音指令确定了用户的意图,即用户要点击的目标控件之后,可以继续执行下述过程:After the user's intention, that is, the target control to be clicked by the user, is determined through the user's voice command, the following process can be continued:
922,手机的HiCar应用判定是否执行该用户意图所对应的操作。922, the HiCar application of the mobile phone determines whether to execute the operation corresponding to the user's intention.
923,当手机的HiCar应用判定不执行该用户意图所对应的操作时,向智慧语音服务模块发送不执行用户指令的通知消息。923. When the HiCar application of the mobile phone determines that the operation corresponding to the user's intention is not to be performed, a notification message that the user's instruction is not to be performed is sent to the smart voice service module.
924,智慧语音服务模块根据该不执行用户指令的通知消息,结束当前对话。924. The smart voice service module ends the current conversation according to the notification message that the user instruction is not executed.
925,通知DM模块结束当前的对话模式,即不再持续性获取用户的语音指令等。925 , notify the DM module to end the current dialogue mode, that is, not to continuously acquire the user's voice command and the like.
通过上述实现过程,该方法通过获取界面上显示的可见且可以被用户执行点击操作的控件等,在获取用户可以输入语音指令执行对界面上的任意控件的点击等操作。对于显示屏上显示的所有应用、所有可见的内容,用户都可以通过语音指令进行控制。Through the above implementation process, the method obtains the controls displayed on the interface that are visible and that can be clicked by the user, and then the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands.
具体地,在对用户的语音指令进行分析的过程中,将获取的当前界面的界面内容信息作为ASR分析的参数,即根据当前界面的界面内容信息,精确地分析出当前的用户语音指令可能发生的应用场景,进而对用户的语音指令进行识别之后,将识别的语音指令文本和当前可能发生的应用场景中的控件等进行匹配,以更准确地获取用户的意图,提高了语音交互场景下的语音识别的准确率。Specifically, in the process of analyzing the user's voice command, the acquired interface content information of the current interface is used as the parameter of the ASR analysis, that is, according to the interface content information of the current interface, it is accurately analyzed that the current user's voice command may occur. After recognizing the user's voice command, the text of the recognized voice command is matched with the controls in the current possible application scenario, so as to obtain the user's intention more accurately and improve the voice interaction scenario. Accuracy of speech recognition.
特别地,在驾驶场景中,对于车载设备等嘈杂的环境中,当用户输入的语音指令伴随噪音时,本申请实施例可以结合当前的用户语音指令可能发生的应用场景,分析用户的语音指令,提高语音识别的准确率,从而减少了用户的手动操作,从而避免用户分心,提高了用户驾驶场景中的安全性。In particular, in a driving scenario, in a noisy environment such as an in-vehicle device, when the voice command input by the user is accompanied by noise, the embodiment of the present application can analyze the user's voice command in combination with the current application scenario where the user's voice command may occur. The accuracy of speech recognition is improved, thereby reducing user manual operations, thereby avoiding user distraction, and improving user safety in driving scenarios.
此外,对于语音交互的场景,不同的应用不需要单独开发语音助手,换言之,用户可以通过同样的语音交互方式控制多个不同的应用,语音助手和应用不再分离,丰富了应用生态。In addition, for voice interaction scenarios, different applications do not need to develop voice assistants separately. In other words, users can control multiple different applications through the same voice interaction method, and voice assistants and applications are no longer separated, which enriches the application ecology.
可以理解的是,电子设备为了实现上述功能,其包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例 对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。It can be understood that, in order to realize the above-mentioned functions, the electronic device includes corresponding hardware and/or software modules for executing each function. The present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
本实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块可以采用硬件的形式实现。需要说明的是,本实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In this embodiment, the electronic device can be divided into functional modules according to the above method examples. For example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware. It should be noted that, the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
在采用对应各个功能划分各个功能模块的情况下,上述实施例中涉及的电子设备可以包括:显示单元、检测单元和处理单元。In the case where each functional module is divided according to each function, the electronic device involved in the above embodiment may include: a display unit, a detection unit, and a processing unit.
其中,显示单元、检测单元、处理单元相互合作,可以用于支持电子设备执行上述实施例中所描述的技术过程。The display unit, the detection unit, and the processing unit cooperate with each other, and may be used to support the electronic device to perform the technical process described in the above embodiments.
需要说明的是,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。It should be noted that, all relevant contents of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.
本实施例提供的电子设备,用于执行上述人机交互的方法,因此可以达到与上述实现方法相同的效果。The electronic device provided in this embodiment is used to execute the above-mentioned method for human-computer interaction, and thus can achieve the same effect as the above-mentioned implementation method.
在采用集成的单元的情况下,电子设备可以包括处理模块、存储模块和通信模块。其中,处理模块可以用于对电子设备的动作进行控制管理,例如,可以用于支持电子设备执行上述显示单元、检测单元和处理单元执行的步骤。存储模块可以用于支持电子设备执行存储程序代码和数据等。通信模块,可以用于支持电子设备与其他设备的通信。Where an integrated unit is employed, the electronic device may include a processing module, a memory module and a communication module. The processing module may be used to control and manage the actions of the electronic device, for example, may be used to support the electronic device to perform the steps performed by the display unit, the detection unit and the processing unit. The storage module may be used to support the electronic device to execute stored program codes and data, and the like. The communication module can be used to support the communication between the electronic device and other devices.
其中,处理模块可以是处理器或控制器。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理(digital signal processing,DSP)和微处理器的组合等等。存储模块可以是存储器。通信模块具体可以为射频电路、蓝牙芯片、Wi-Fi芯片等与其他电子设备交互的设备。The processing module may be a processor or a controller. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, and the like. The storage module may be a memory. The communication module may specifically be a device that interacts with other electronic devices, such as a radio frequency circuit, a Bluetooth chip, and a Wi-Fi chip.
在一个实施例中,当处理模块为处理器,存储模块为存储器时,本实施例所涉及的电子设备可以为具有图2所示结构的设备。In one embodiment, when the processing module is a processor and the storage module is a memory, the electronic device involved in this embodiment may be a device having the structure shown in FIG. 2 .
本实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述实施例中的人机交互的方法。This embodiment also provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the electronic device, the electronic device executes the above-mentioned related method steps to realize the above-mentioned embodiments. methods of human-computer interaction.
本实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的人机交互的方法。This embodiment also provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned relevant steps, so as to realize the method for human-computer interaction in the above-mentioned embodiment.
另外,本申请的实施例还提供一种装置,这个装置具体可以是芯片,组件或模块,该装置可包括相连的处理器和存储器;其中,存储器用于存储计算机执行指令,当装置运行时,处理器可执行存储器存储的计算机执行指令,以使芯片执行上述各方法实施例中的人机交互的方法。In addition, the embodiments of the present application also provide an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, and when the apparatus is running, The processor can execute the computer-executed instructions stored in the memory, so that the chip executes the method for human-computer interaction in the foregoing method embodiments.
其中,本实施例提供的电子设备、计算机可读存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。Wherein, the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配 由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。From the description of the above embodiments, those skilled in the art can understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated by different The function module is completed, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or May be integrated into another device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。Units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above content is only a specific embodiment of the present application, but the protection scope of the present application is not limited to this. Covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (21)

  1. 一种人机交互的方法,其特征在于,所述方法应用于电子设备,所述方法包括:A method for human-computer interaction, wherein the method is applied to an electronic device, and the method comprises:
    在所述电子设备中的人机交互应用运行的过程中,获取当前的界面内容信息;In the process of running the human-computer interaction application in the electronic device, obtain current interface content information;
    根据所述界面内容信息,确定界面上的一个或多个控件,所述一个或多个控件包括按钮、图标、图片、文字中的一种或多种;According to the interface content information, determine one or more controls on the interface, where the one or more controls include one or more of buttons, icons, pictures, and text;
    获取用户的语音指令;Get the user's voice command;
    根据所述语音指令,从所述一个或多个控件中,匹配目标控件;According to the voice command, from the one or more controls, matching a target control;
    以及,根据所述语音指令,确定用户意图,响应于所述用户意图,执行对所述目标控件的操作。And, according to the voice instruction, a user intent is determined, and an operation on the target control is performed in response to the user intent.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述语音指令,从所述一个或多个控件中,匹配目标控件,包括:The method according to claim 1, wherein the matching target control from the one or more controls according to the voice instruction comprises:
    根据所述语音指令,确定所述语音指令和所述一个或多个控件中的每一个控件的匹配程度;According to the voice command, determining the degree of matching between the voice command and each of the one or more controls;
    将所述匹配程度最大的控件确定为所述目标控件。The control with the highest matching degree is determined as the target control.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述语音指令,确定所述语音指令和所述一个或多个控件中的每一个控件的匹配程度,包括:The method according to claim 2, wherein the determining, according to the voice command, a degree of matching between the voice command and each of the one or more controls comprises:
    提取所述语音指令包含的一个或多个关键词;extracting one or more keywords contained in the voice instruction;
    确定所述一个或多个关键词中的每一个关键词和所述一个或多个控件中的每一个控件的描述信息的匹配程度;determining the degree of matching between each of the one or more keywords and the description information of each of the one or more controls;
    将所述匹配程度最大的控件确定为所述目标控件。The control with the highest matching degree is determined as the target control.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,当所述一个或多个控件中包括图标控件时,所述方法还包括:The method according to any one of claims 1 to 3, wherein when the one or more controls includes an icon control, the method further comprises:
    获取所述图标控件的轮廓,根据所述轮廓确定描述所述图标控件的轮廓关键词;Obtain the outline of the icon control, and determine the outline keyword describing the icon control according to the outline;
    确定所述语音指令中的包括的一个或多个关键词和所述图标控件的轮廓关键词的匹配程度;determining the degree of matching between one or more keywords included in the voice instruction and the outline keywords of the icon control;
    将所述匹配程度最大的图标控件确定为所述目标控件。The icon control with the highest matching degree is determined as the target control.
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 3, wherein the method further comprises:
    检测到所述语音指令时,在所述一个或多个控件中的部分控件上增加数字角标;When the voice command is detected, a digital corner mark is added to some controls in the one or more controls;
    当检测到所述语音指令中包括第一数字时,将所述第一数字标记的控件确定为所述目标控件。When it is detected that the voice instruction includes a first number, the control marked by the first number is determined as the target control.
  6. 根据权利要求5所述的方法,其特征在于,所述在所述一个或多个控件中的部分控件上增加数字角标,包括:The method according to claim 5, wherein the adding a digital corner mark to some of the one or more controls comprises:
    按照预设的顺序,在所述一个或多个控件中的部分控件上增加所述数字角标,所述预设的顺序包括从左到右,和/或从上到下的顺序。The numerical subscripts are added to some of the one or more controls according to a preset sequence, where the preset sequence includes a left-to-right and/or a top-to-bottom sequence.
  7. 根据权利要求5或6所述的方法,其特征在于,能够增加数字角标的所述部分控件包括以下一种或多种:The method according to claim 5 or 6, wherein the part of the controls capable of adding a digital subscript includes one or more of the following:
    所述一个或多个控件中的所有为图片类型的控件;或者All of the one or more controls are picture-type controls; or
    所述一个或多个控件中具有网格型排列顺序的控件;或者A control in the one or more controls having a grid-type arrangement order; or
    所述一个或多个控件中具有列表型排列顺序的控件;或者A control with a list-type arrangement order among the one or more controls; or
    所述一个或多个控件中显示尺寸大于或等于预设值的控件。Controls whose size is greater than or equal to a preset value are displayed in the one or more controls.
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述界面内容信息对应的界面是在所述电子设备的前台运行的应用的界面,和/或是在所述电子设备的后台运行的应用的界面。The method according to any one of claims 1 to 7, wherein the interface corresponding to the interface content information is the interface of an application running in the foreground of the electronic device, and/or the interface of the electronic device The interface of the app running in the background.
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 8, wherein the method further comprises:
    启动所述电子设备上的所述人机交互应用。The human-computer interaction application on the electronic device is started.
  10. 根据权利要求9所述的方法,其特征在于,所述启动所述电子设备上的所述人机交互应用,包括:The method according to claim 9, wherein the starting the human-computer interaction application on the electronic device comprises:
    获取用户的预设输入,启动所述电子设备上的所述人机交互应用,所述预设输入包括触发一个按钮的操作、语音输入的预设人机交互指令或预设的指纹输入中的至少一种方式。Obtain the user's preset input, start the human-computer interaction application on the electronic device, and the preset input includes an operation to trigger a button, a preset human-computer interaction command of voice input, or a preset fingerprint input. at least one way.
  11. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    一个或多个处理器;one or more processors;
    一个或多个存储器;one or more memories;
    安装有多个应用程序的模块;Modules with multiple applications installed;
    所述存储器存储有一个或多个程序,当所述一个或者多个程序被所述处理器执行时,使得所述电子设备执行以下步骤:The memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the following steps:
    在人机交互应用运行的过程中,获取当前的界面内容信息;During the running process of the human-computer interaction application, obtain the current interface content information;
    根据所述界面内容信息,确定界面上的一个或多个控件,所述一个或多个控件包括按钮、图标、图片、文字中的一种或多种;According to the interface content information, determine one or more controls on the interface, where the one or more controls include one or more of buttons, icons, pictures, and text;
    获取用户的语音指令;Get the user's voice command;
    根据所述语音指令,从所述一个或多个控件中,匹配目标控件;According to the voice command, from the one or more controls, matching a target control;
    以及,根据所述语音指令,确定用户意图,响应于所述用户意图,执行对所述目标控件的操作。And, according to the voice instruction, a user intent is determined, and an operation on the target control is performed in response to the user intent.
  12. 根据权利要求11所述的电子设备,其特征在于,当所述一个或者多个程序被所述处理器执行时,使得所述电子设备执行以下步骤:The electronic device according to claim 11, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:
    根据所述语音指令,确定所述语音指令和所述一个或多个控件中的每一个控件的匹配程度;According to the voice command, determining the degree of matching between the voice command and each of the one or more controls;
    将所述匹配程度最大的控件确定为所述目标控件。The control with the highest matching degree is determined as the target control.
  13. 根据权利要求12所述的电子设备,其特征在于,当所述一个或者多个程序被所述处理器执行时,使得所述电子设备执行以下步骤:The electronic device according to claim 12, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:
    提取所述语音指令包含的一个或多个关键词;extracting one or more keywords contained in the voice instruction;
    确定所述一个或多个关键词中的每一个关键词和所述一个或多个控件中的每一个控件的描述信息的匹配程度;determining the degree of matching between each of the one or more keywords and the description information of each of the one or more controls;
    将所述匹配程度最大的控件确定为所述目标控件。The control with the highest matching degree is determined as the target control.
  14. 根据权利要求11至13中任一项所述的电子设备,其特征在于,所述一个或多个控件中包括图标控件,当所述一个或者多个程序被所述处理器执行时,使得所述电子设备执行以下步骤:The electronic device according to any one of claims 11 to 13, wherein the one or more controls include icon controls, and when the one or more programs are executed by the processor, all The described electronic device performs the following steps:
    获取所述图标控件的轮廓,根据所述轮廓确定描述所述图标控件的轮廓关键词;Obtain the outline of the icon control, and determine the outline keyword describing the icon control according to the outline;
    确定所述语音指令中的包括的一个或多个关键词和所述图标控件的轮廓关键词的匹配程度;determining the degree of matching between one or more keywords included in the voice instruction and the outline keywords of the icon control;
    将所述匹配程度最大的图标控件确定为所述目标控件。The icon control with the highest matching degree is determined as the target control.
  15. 根据权利要求11至13中任一项所述的电子设备,其特征在于,当所述一个或者多个程序被所述处理器执行时,使得所述电子设备执行以下步骤:The electronic device according to any one of claims 11 to 13, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:
    检测到所述语音指令时,在所述一个或多个控件中的部分控件上增加数字角标;When the voice command is detected, a digital corner mark is added to some controls in the one or more controls;
    当检测到所述语音指令中包括第一数字时,将所述第一数字标记的控件确定为所述目标控件。When it is detected that the voice instruction includes a first number, the control marked by the first number is determined as the target control.
  16. 根据权利要求15所述的电子设备,其特征在于,当所述一个或者多个程序被所述处理器执行时,使得所述电子设备执行以下步骤:The electronic device according to claim 15, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:
    按照预设的顺序,在所述一个或多个控件中的部分控件上增加所述数字角标,所述预设的顺序包括从左到右,和/或从上到下的顺序。The numerical subscripts are added to some of the one or more controls according to a preset sequence, where the preset sequence includes a left-to-right and/or a top-to-bottom sequence.
  17. 根据权利要求15或16所述的电子设备,其特征在于,能够增加数字角标的所述部分控件包括以下一种或多种:The electronic device according to claim 15 or 16, wherein the part of the controls capable of adding a digital subscript includes one or more of the following:
    所述一个或多个控件中的所有为图片类型的控件;或者All of the one or more controls are picture-type controls; or
    所述一个或多个控件中具有网格型排列顺序的控件;或者A control in the one or more controls having a grid-type arrangement order; or
    所述一个或多个控件中具有列表型排列顺序的控件;或者A control with a list-type arrangement order among the one or more controls; or
    所述一个或多个控件中显示尺寸大于或等于预设值的控件。Controls whose size is greater than or equal to a preset value are displayed in the one or more controls.
  18. 根据权利要求11至17中任一项所述的电子设备,其特征在于,所述界面内容信息对应的界面是在所述电子设备的前台运行的应用的界面,和/或是在所述电子设备的后台运行的应用的界面。The electronic device according to any one of claims 11 to 17, wherein the interface corresponding to the interface content information is an interface of an application running in the foreground of the electronic device, and/or an interface in the electronic device The interface of the application running in the background on the device.
  19. 根据权利要求11至18中任一项所述的电子设备,其特征在于,当所述一个或者多个程序被所述处理器执行时,使得所述电子设备执行以下步骤:The electronic device according to any one of claims 11 to 18, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:
    启动所述人机交互应用。Start the human-computer interaction application.
  20. 根据权利要求19所述的电子设备,其特征在于,当所述一个或者多个程序被所述处理器执行时,使得所述电子设备执行以下步骤:The electronic device according to claim 19, wherein when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps:
    获取用户的预设输入,启动所述人机交互应用,所述预设输入包括触发一个按钮的操作、语音输入的预设人机交互指令或预设的指纹输入中的至少一种方式。The user's preset input is acquired, and the human-computer interaction application is started, and the preset input includes at least one mode of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1至10中任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions that, when the computer instructions are executed on an electronic device, cause the electronic device to perform any one of claims 1 to 10. one of the methods described.
PCT/CN2021/113542 2020-09-10 2021-08-19 Human-computer interaction method, and electronic device and system WO2022052776A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010950650.8A CN114255745A (en) 2020-09-10 2020-09-10 Man-machine interaction method, electronic equipment and system
CN202010950650.8 2020-09-10

Publications (1)

Publication Number Publication Date
WO2022052776A1 true WO2022052776A1 (en) 2022-03-17

Family

ID=80630251

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113542 WO2022052776A1 (en) 2020-09-10 2021-08-19 Human-computer interaction method, and electronic device and system

Country Status (2)

Country Link
CN (1) CN114255745A (en)
WO (1) WO2022052776A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115562772A (en) * 2022-03-31 2023-01-03 荣耀终端有限公司 Scene recognition and preprocessing method and electronic equipment
CN116229973A (en) * 2023-03-16 2023-06-06 润芯微科技(江苏)有限公司 Method for realizing visible and can-say function based on OCR
CN116578264A (en) * 2023-05-16 2023-08-11 润芯微科技(江苏)有限公司 Method, system, equipment and storage medium for using voice control in screen projection
CN116707851A (en) * 2022-11-21 2023-09-05 荣耀终端有限公司 Data reporting method and terminal equipment
CN116707851B (en) * 2022-11-21 2024-04-23 荣耀终端有限公司 Data reporting method and terminal equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103869931A (en) * 2012-12-10 2014-06-18 三星电子(中国)研发中心 Method and device for controlling user interface through voice
EP2851891A1 (en) * 2013-09-20 2015-03-25 Kapsys Mobile user terminal and method for controlling such a terminal
CN105161106A (en) * 2015-08-20 2015-12-16 深圳Tcl数字技术有限公司 Voice control method of intelligent terminal, voice control device and television system
CN107992587A (en) * 2017-12-08 2018-05-04 北京百度网讯科技有限公司 A kind of voice interactive method of browser, device, terminal and storage medium
CN108538291A (en) * 2018-04-11 2018-09-14 百度在线网络技术(北京)有限公司 Sound control method, terminal device, cloud server and system
CN108877796A (en) * 2018-06-14 2018-11-23 合肥品冠慧享家智能家居科技有限责任公司 The method and apparatus of voice control smart machine terminal operation
CN108877791A (en) * 2018-05-23 2018-11-23 百度在线网络技术(北京)有限公司 Voice interactive method, device, server, terminal and medium based on view
CN109979446A (en) * 2018-12-24 2019-07-05 北京奔流网络信息技术有限公司 Sound control method, storage medium and device
CN110457105A (en) * 2019-08-07 2019-11-15 腾讯科技(深圳)有限公司 Interface operation method, device, equipment and storage medium
CN111383631A (en) * 2018-12-11 2020-07-07 阿里巴巴集团控股有限公司 Voice interaction method, device and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103869931A (en) * 2012-12-10 2014-06-18 三星电子(中国)研发中心 Method and device for controlling user interface through voice
EP2851891A1 (en) * 2013-09-20 2015-03-25 Kapsys Mobile user terminal and method for controlling such a terminal
CN105161106A (en) * 2015-08-20 2015-12-16 深圳Tcl数字技术有限公司 Voice control method of intelligent terminal, voice control device and television system
CN107992587A (en) * 2017-12-08 2018-05-04 北京百度网讯科技有限公司 A kind of voice interactive method of browser, device, terminal and storage medium
CN108538291A (en) * 2018-04-11 2018-09-14 百度在线网络技术(北京)有限公司 Sound control method, terminal device, cloud server and system
CN108877791A (en) * 2018-05-23 2018-11-23 百度在线网络技术(北京)有限公司 Voice interactive method, device, server, terminal and medium based on view
CN108877796A (en) * 2018-06-14 2018-11-23 合肥品冠慧享家智能家居科技有限责任公司 The method and apparatus of voice control smart machine terminal operation
CN111383631A (en) * 2018-12-11 2020-07-07 阿里巴巴集团控股有限公司 Voice interaction method, device and system
CN109979446A (en) * 2018-12-24 2019-07-05 北京奔流网络信息技术有限公司 Sound control method, storage medium and device
CN110457105A (en) * 2019-08-07 2019-11-15 腾讯科技(深圳)有限公司 Interface operation method, device, equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115562772A (en) * 2022-03-31 2023-01-03 荣耀终端有限公司 Scene recognition and preprocessing method and electronic equipment
CN115562772B (en) * 2022-03-31 2023-10-27 荣耀终端有限公司 Scene recognition and preprocessing method and electronic equipment
CN116707851A (en) * 2022-11-21 2023-09-05 荣耀终端有限公司 Data reporting method and terminal equipment
CN116707851B (en) * 2022-11-21 2024-04-23 荣耀终端有限公司 Data reporting method and terminal equipment
CN116229973A (en) * 2023-03-16 2023-06-06 润芯微科技(江苏)有限公司 Method for realizing visible and can-say function based on OCR
CN116229973B (en) * 2023-03-16 2023-10-17 润芯微科技(江苏)有限公司 Method for realizing visible and can-say function based on OCR
CN116578264A (en) * 2023-05-16 2023-08-11 润芯微科技(江苏)有限公司 Method, system, equipment and storage medium for using voice control in screen projection

Also Published As

Publication number Publication date
CN114255745A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
RU2766255C1 (en) Voice control method and electronic device
CN110910872B (en) Voice interaction method and device
CN110111787B (en) Semantic parsing method and server
CN110138959B (en) Method for displaying prompt of human-computer interaction instruction and electronic equipment
WO2020192456A1 (en) Voice interaction method and electronic device
WO2021027476A1 (en) Method for voice controlling apparatus, and electronic apparatus
WO2022052776A1 (en) Human-computer interaction method, and electronic device and system
WO2020119455A1 (en) Method for repeating word or sentence during video playback, and electronic device
CN115240664A (en) Man-machine interaction method and electronic equipment
CN111970401B (en) Call content processing method, electronic equipment and storage medium
WO2022100221A1 (en) Retrieval processing method and apparatus, and storage medium
CN111881315A (en) Image information input method, electronic device, and computer-readable storage medium
CN112383664B (en) Device control method, first terminal device, second terminal device and computer readable storage medium
WO2022143258A1 (en) Voice interaction processing method and related apparatus
CN113806473A (en) Intention recognition method and electronic equipment
WO2022135157A1 (en) Page display method and apparatus, and electronic device and readable storage medium
CN113852714A (en) Interaction method for electronic equipment and electronic equipment
CN112740148A (en) Method for inputting information into input box and electronic equipment
WO2020181505A1 (en) Input method candidate content recommendation method and electronic device
CN112416984A (en) Data processing method and device
WO2022002213A1 (en) Translation result display method and apparatus, and electronic device
CN113380240B (en) Voice interaction method and electronic equipment
WO2022007757A1 (en) Cross-device voiceprint registration method, electronic device and storage medium
WO2022095983A1 (en) Gesture misrecognition prevention method, and electronic device
WO2022033432A1 (en) Content recommendation method, electronic device and server

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21865832

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21865832

Country of ref document: EP

Kind code of ref document: A1