WO2022052776A1 - Procédé d'interaction homme-ordinateur, ainsi que dispositif électronique et système - Google Patents
Procédé d'interaction homme-ordinateur, ainsi que dispositif électronique et système Download PDFInfo
- Publication number
- WO2022052776A1 WO2022052776A1 PCT/CN2021/113542 CN2021113542W WO2022052776A1 WO 2022052776 A1 WO2022052776 A1 WO 2022052776A1 CN 2021113542 W CN2021113542 W CN 2021113542W WO 2022052776 A1 WO2022052776 A1 WO 2022052776A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- controls
- interface
- electronic device
- user
- voice
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 164
- 230000003993 interaction Effects 0.000 title claims abstract description 96
- 230000008569 process Effects 0.000 claims abstract description 75
- 230000015654 memory Effects 0.000 claims description 32
- 238000003860 storage Methods 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 16
- 230000006870 function Effects 0.000 description 67
- 238000004891 communication Methods 0.000 description 43
- 230000006854 communication Effects 0.000 description 43
- 238000012545 processing Methods 0.000 description 32
- 238000007726 management method Methods 0.000 description 22
- 238000012544 monitoring process Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 17
- 230000005236 sound signal Effects 0.000 description 13
- 238000010295 mobile communication Methods 0.000 description 12
- 238000004590 computer program Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 235000013305 food Nutrition 0.000 description 8
- 230000009471 action Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 244000144730 Amygdalus persica Species 0.000 description 4
- 235000006040 Prunus persica var persica Nutrition 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 239000003814 drug Substances 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000003825 pressing Methods 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000003416 augmentation Effects 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000005266 casting Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 238000003707 image sharpening Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04817—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/0482—Interaction with lists of selectable items, e.g. menus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- the present application relates to the field of electronic technology, and in particular, to a method, electronic device and system for human-computer interaction.
- the voice assistant is not integrated in the application, and the user cannot control the operations in the application through voice commands.
- audio applications such as music, or media applications such as videos do not have the ability to interact with the user by voice, and the user cannot control the execution of such applications through voice commands.
- the voice assistant of an electronic device is separated from the application, and it is impossible for different applications to access the same voice assistant.
- the embodiments of the present application will provide a human-computer interaction method, electronic device, and system, which can realize system-level voice interaction, for all applications displayed on the interface, all visible buttons, pictures, icons, text, controls, etc. , users can click and other operations through voice commands to achieve precise human-computer interaction, generalize the recognition of voice commands, and improve the accuracy of user intent recognition.
- a human-computer interaction method is provided, the method is applied to an electronic device, and the method includes: acquiring current interface content information during the running process of the human-computer interaction application in the electronic device; according to the interface Content information, determine one or more controls on the interface, the one or more controls include one or more of buttons, icons, pictures, and text; obtain the user's voice command; according to the voice command, from the one or more Among the plurality of controls, a target control is matched; and, according to the voice instruction, a user's intention is determined, and an operation on the target control is performed in response to the user's intention.
- the interface content may include a user-visible portion of the currently displayed interface.
- the user-visible part may include pictures, text, menus, options, icons, buttons, etc. displayed on the interface, which are collectively referred to as "controls" and the like in this embodiment of the present application.
- an operation may be performed on the target control.
- the operation may include input operations such as clicking, clicking, double-clicking, sliding, and right-clicking.
- the voice command is matched with the target control on the interface, that is, the user's intention is recognized, and the click operation on the target control is further performed.
- the method obtains the controls displayed on the interface that are visible and that can be clicked by the user, and then the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands.
- the acquired interface content information of the current interface is used as the parameter of the ASR analysis, that is, according to the interface content information of the current interface, it is accurately analyzed that the current user's voice command may occur.
- the text of the recognized voice command is matched with the controls in the current possible application scenario, so as to obtain the user's intention more accurately and improve the voice interaction scenario. Accuracy of speech recognition.
- the embodiment of the present application can analyze the user's voice command in combination with the current application scenario where the user's voice command may occur.
- the accuracy of speech recognition is improved, thereby reducing user manual operations, thereby avoiding user distraction, and improving user safety in driving scenarios.
- the text controls, pictures, text and icons included in the interface are identified, and the user's voice commands are matched to the controls of the screen content to achieve precise human-computer interaction.
- the recognition of voice commands is generalized, and the accuracy of user intent recognition and ASR recognition is improved; in addition, the delay of voice interaction is reduced, so that the processing delay of visible and speaking intent is within 200ms. Recognizing voice commands, the delay is 200ms, which greatly improves the detection efficiency of voice commands and improves the user experience.
- matching the target control from the one or more controls according to the voice command includes: determining the voice command and The degree of matching of each of the one or more controls; the control with the greatest degree of matching is determined as the target control.
- the smart voice service module may correspond to the smart voice application installed on the mobile phone side, that is, the smart voice application of the mobile phone performs the voice command recognition service of the embodiment of the present application. Process.
- the service corresponding to the smart voice service module can be provided by the server.
- the mobile phone can send the user's voice command to the server with the help of the server's voice analysis capability, and the server can analyze the voice command. After that, the recognition result of the voice command of the mobile phone is returned, which will not be repeated here.
- determining the matching degree of the voice command and each of the one or more controls includes: extracting the voice command One or more keywords included in the instruction; determine the matching degree of each keyword in the one or more keywords and the description information of each control in the one or more controls; the control with the largest matching degree Determined as the target control.
- the method further includes: acquiring an outline of the icon control, and determining a description according to the outline The outline keyword of the icon control; determining the matching degree of one or more keywords included in the voice instruction and the outline keyword of the icon control; determining the icon control with the largest matching degree as the target control.
- the acquired information of the currently displayed interface content can be transferred to the ASR module in synchronization with step 908, that is, the information of the interface content is entered in the ASR model.
- the user's voice command is recognized according to the updated ASR model.
- the user's voice command may include homophones, etc., for example, the user inputs "variety show", which is affected by the pronunciation of different users, and may be analyzed by the ASR module.
- Such homophones and words with similar pinyin may cause the mobile phone to fail to accurately obtain the user's operation intention through the user's voice command.
- the current interface displays a lot of audio information, star photos, video information, etc.
- the ASR module analyzes, it will select from “Zhongyi”, “Traditional Chinese Medicine”, From the possible identification results such as “loyalty” and “variety show”, select "variety show” that is more relevant to the currently displayed audio information, star photos, video information, etc., and then determine that the voice command issued by the current user is "variety show”.
- the voice command of the existing ASR module is the information of the currently displayed interface content introduced into the process, so that the use of the user's current voice command can be accurately analyzed according to the information of the currently displayed interface content. scene, and then accurately locate the application scene targeted by the current user's voice command, and improve the accuracy of recognizing the voice command.
- the above process can be understood as a process of determining the user's intention according to the voice command after acquiring the user's voice command, for example, determining which control on the current interface is to be clicked by the voice command currently input by the user.
- the matching degree of the voice command and each of the one or more controls is determined, and the control with the largest matching degree is determined as the user to perform the click.
- the target control for the operation is determined.
- one or more keywords contained in the user's voice command may be extracted; the one or more keywords may be determined.
- the keywords may include words, words, pinyin of part or all of the Chinese characters of the voice command, etc., which are not limited in this embodiment of the present application.
- the description information of each control may include outline information, text information, color information, position information, icon information, etc. of the control, which is not limited in this embodiment of the present application.
- the music playing interface when the user inputs the voice command as "I like", during the matching process between the voice command and the controls included in the interface, if the description words of the favorite button on the music playing interface are "like", “favorite” , the outline of the favorite button is the shape of "peach heart”, then the shape of the peach heart can be matched as "I like", this method can generalize the user's voice command, and more intelligently combine the user's command with the controls on the interface match.
- the method further includes: when the voice command is detected, adding a digital corner label to some controls in the one or more controls; When it is detected that the voice instruction includes the first number, the control marked with the first number is determined as the target control.
- adding a digital corner label to some of the one or more controls includes: according to a preset order, in the one or more controls.
- the number label is added to some of the controls, and the preset order includes the order from left to right and/or from top to bottom.
- the part of the controls that can add a digital corner mark includes one or more of the following: all of the one or more controls are picture-type controls ; or the one or more controls have a grid-type arrangement order; or the one or more controls have a list-type arrangement order; or the one or more controls have a display size greater than or equal to the preset value 's controls.
- the interface corresponding to the interface content information is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device The interface of the running application.
- the method further includes: starting the human-computer interaction application on the electronic device.
- starting the human-computer interaction application on the electronic device includes: obtaining a user's preset input, and starting the human-computer interaction application on the electronic device
- the preset input includes at least one of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.
- an electronic device comprising: one or more processors; one or more memories; a module installed with a plurality of application programs; the memory stores one or more programs, when the one or more
- the electronic device is made to perform the following steps: in the process of running the human-computer interaction application, obtain the current interface content information; according to the interface content information, determine one or more controls on the interface, The one or more controls include one or more of buttons, icons, pictures, and text; obtain a user's voice command; according to the voice command, match the target control from the one or more controls; and, according to the The voice command determines the user's intention, and in response to the user's intention, performs an operation on the target control.
- the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: according to the voice instruction, determine the voice instruction and the one The matching degree of each control in or multiple controls; the control with the largest matching degree is determined as the target control.
- the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: extracting one or more components included in the voice instruction Multiple keywords; determine the matching degree of each keyword in the one or more keywords and the description information of each control in the one or more controls; determine the control with the largest matching degree as the target control .
- the one or more controls include icon controls, and when the one or more programs are executed by the processor, the electronic device is made to execute Following steps: obtain the outline of the icon control, determine the outline keyword describing the icon control according to the outline; determine the matching degree of one or more keywords included in the voice command and the outline keyword of the icon control; The icon control with the highest matching degree is determined as the target control.
- the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: when the voice instruction is detected, in A digital corner mark is added to some of the one or more controls; when it is detected that the voice instruction includes a first number, the control marked with the first number is determined as the target control.
- the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: in a preset order, in the The numerical superscript is added to some of the one or more controls, and the preset order includes the order from left to right and/or from top to bottom.
- the part of the controls that can add digital corner labels includes one or more of the following: all of the one or more controls are picture-type controls ; or the one or more controls have a grid-type arrangement order; or the one or more controls have a list-type arrangement order; or the one or more controls have a display size greater than or equal to the preset value 's controls.
- the interface corresponding to the interface content information is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device.
- the interface of the running application is the interface of the application running in the foreground of the electronic device, and/or the background of the electronic device.
- the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: start the human-computer interaction application.
- the electronic device when the one or more programs are executed by the processor, the electronic device is caused to perform the following steps: acquiring a user's preset input, starting In the human-computer interaction application, the preset input includes at least one mode of triggering an operation of a button, a preset human-computer interaction instruction of voice input, or a preset fingerprint input.
- the present application provides a system, the system includes a connected electronic device and a display device, the electronic device can perform any one of the possible human-computer interaction methods in the first aspect above, and the display device is used for displaying The application interface of the electronic device.
- the present application provides an apparatus, the apparatus is included in an electronic device, and the apparatus has a function of implementing the behavior of the electronic device in the above-mentioned aspect and possible implementations of the above-mentioned aspect.
- the functions can be implemented by hardware, or by executing corresponding software by hardware.
- the hardware or software includes one or more modules or units corresponding to the above functions. For example, a display module or unit, a detection module or unit, a processing module or unit, and the like.
- the present application provides an electronic device, comprising: a touch display screen, wherein the touch display screen includes a touch-sensitive surface and a display; a positioning chip; one or more cameras; one or more processors; a plurality of memory; a plurality of application programs; and one or more computer programs.
- one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. The instructions, when executed by one or more processors, cause an electronic device to perform any of the possible human-computer interaction methods described above.
- the present application provides an electronic device including one or more processors and one or more memories.
- the one or more memories are coupled to the one or more processors for storing computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform A method for human-computer interaction in any possible implementation of any of the above aspects.
- the present application provides a computer storage medium, including computer instructions, when the computer instructions are executed on an electronic device, the electronic device can perform any of the possible human-computer interaction methods in any of the foregoing aspects.
- the present application provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to perform any of the possible human-computer interaction methods in any of the foregoing aspects.
- FIG. 1 is a schematic diagram of an application scenario of an example of a method for human-computer interaction provided by an embodiment of the present application.
- FIG. 2 is a schematic structural diagram of an example of an electronic device provided by an embodiment of the present application.
- FIG. 3 is a software structural block diagram of an implementation process of an example of a method for human-computer interaction according to an embodiment of the present application.
- FIG. 4 is a schematic interface diagram of an example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
- FIG. 5 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application.
- FIG. 6 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
- FIG. 7 is a schematic interface diagram of another example of a voice interaction process implemented on a vehicle-mounted device provided by an embodiment of the present application.
- FIG. 8 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
- FIG. 9 is a schematic flowchart of a voice interaction method provided by an embodiment of the present application.
- first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
- a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
- the embodiments of the present application will provide a human-computer interaction method.
- the following describes in detail how to implement system-level voice interaction through the human-computer interaction method with reference to the accompanying drawings and different embodiments.
- FIG. 1 is a schematic diagram of an application scenario of an example of a method for human-computer interaction provided by an embodiment of the present application.
- the human-computer interaction method provided by the embodiments of the present application may be applied to a scenario including a separate electronic device.
- the smart screen 101 is used as the electronic device, and the human-computer interaction method is applied to a scenario where a user uses the smart screen 101 .
- the smart screen 101 can acquire the user's voice command through the microphone, recognize the voice command, perform corresponding operations according to the user's voice command, display a corresponding interface, and the like.
- the human-computer interaction method provided by the embodiments of the present application may also be applied to a scenario including two electronic devices, and the two electronic devices in the scenario may include a mobile phone, a tablet computer, and a wearable device. , vehicle equipment and other different types of electronic equipment.
- the in-vehicle device 103 can be used as a display device, connected to the mobile phone 102 to display and run the mobile phone 102 .
- the mobile phone 102 can acquire the user's voice command, recognize the voice command, and perform the corresponding operation in the background according to the user's voice command, and then display the screen after the corresponding operation is performed on the in-vehicle device 103 .
- the in-vehicle device 103 can also obtain the user's voice command, and transmit the voice command to the mobile phone 102, the mobile phone recognizes the voice command, and performs the corresponding operation in the background according to the user's voice command, and then executes the corresponding operation.
- the interface projection screen is displayed on the in-vehicle device 103 .
- the human-computer interaction method provided in the embodiments of the present application may also be applied to a scenario including at least one electronic device and a server.
- a scenario including at least one electronic device and a server Exemplarily, as shown in (c) of FIG. 1 , in the scenario including the mobile phone 102, the in-vehicle device 103 and the server 104, the mobile phone 102 or the in-vehicle device 103 can obtain the user's voice command, and then convert the user's voice command to the user's voice command. Upload the data to the server 104, analyze the user's voice command more quickly and accurately with the help of the voice analysis capability of the server 104, and then transmit the analyzed voice command result back to the mobile phone 102, and perform corresponding operations on the mobile phone.
- the method for human-computer interaction can be applied to mobile phones, smart screens, tablet computers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (virtual reality, VR) devices, notebook computers, ultra-mobile personal computers (ultra-mobile personal computers, UMPCs), netbooks, personal digital assistants (personal digital assistants, PDAs) and other electronic devices, the embodiments of the present application do not make any specific types of electronic devices. limit.
- AR augmented reality
- VR virtual reality
- UMPCs ultra-mobile personal computers
- netbooks personal digital assistants
- PDAs personal digital assistants
- the smart screen 101 , the mobile phone 102 , and the in-vehicle device 103 listed in FIG. 1 are collectively referred to as “electronic device 100 ”, and possible structures of the electronic device 100 are described below.
- FIG. 2 is a schematic structural diagram of an example of an electronic device 100 provided by an embodiment of the present application.
- the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
- SIM Subscriber identification module
- the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
- the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
- the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
- application processor application processor, AP
- modem processor graphics processor
- graphics processor graphics processor
- ISP image signal processor
- controller memory
- video codec digital signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- the controller may be the nerve center and command center of the electronic device 100 .
- the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
- a memory may also be provided in the processor 110 for storing instructions and data.
- the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
- the processor 110 may include one or more interfaces.
- the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
- I2C integrated circuit
- I2S integrated circuit built-in audio
- PCM pulse code modulation
- PCM pulse code modulation
- UART universal asynchronous transceiver
- MIPI mobile industry processor interface
- GPIO general-purpose input/output
- SIM subscriber identity module
- USB universal serial bus
- the I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL).
- the processor 110 may contain multiple sets of I2C buses.
- the processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces.
- the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .
- the I2S interface can be used for audio communication.
- the processor 110 may contain multiple sets of I2S buses.
- the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
- the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
- the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
- the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
- the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
- the UART interface is a universal serial data bus used for asynchronous communication.
- the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
- a UART interface is typically used to connect the processor 110 with the wireless communication module 160 .
- the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
- the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
- the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
- MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
- the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 .
- the processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .
- the GPIO interface can be configured by software.
- the GPIO interface can be configured as a control signal or as a data signal.
- the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
- the GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
- the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
- the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones.
- the interface can also be used to connect other electronic devices, such as AR devices.
- the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100.
- the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
- the charging management module 140 is used to receive charging input from the charger.
- the charger may be a wireless charger or a wired charger.
- the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
- the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
- the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
- the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
- the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
- the power management module 141 may also be provided in the processor 110 .
- the power management module 141 and the charging management module 140 may also be provided in the same device.
- the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
- Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
- the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
- the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
- the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
- the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
- at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
- at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
- the modem processor may include a modulator and a demodulator.
- the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
- the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
- the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
- the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
- the modem processor may be a stand-alone device.
- the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
- the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
- WLAN wireless local area networks
- BT Bluetooth
- GNSS global navigation satellite system
- FM frequency modulation
- NFC near field communication
- IR infrared technology
- the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
- the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
- the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
- the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
- the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
- the GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
- global positioning system global positioning system, GPS
- global navigation satellite system global navigation satellite system, GLONASS
- Beidou navigation satellite system beidou navigation satellite system, BDS
- quasi-zenith satellite system quadsi -zenith satellite system, QZSS
- SBAS satellite based augmentation systems
- the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
- the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
- Display screen 194 is used to display images, videos, and the like.
- Display screen 194 includes a display panel.
- the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
- LED diode AMOLED
- flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
- the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
- the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
- the ISP is used to process the data fed back by the camera 193 .
- the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
- ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
- ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
- the ISP may be provided in the camera 193 .
- Camera 193 is used to capture still images or video.
- the object is projected through the lens to generate an optical image onto the photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
- the ISP outputs the digital image signal to the DSP for processing.
- DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
- the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
- a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
- Video codecs are used to compress or decompress digital video.
- the electronic device 100 may support one or more video codecs.
- the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
- MPEG Moving Picture Experts Group
- MPEG2 moving picture experts group
- MPEG3 MPEG4
- MPEG4 Moving Picture Experts Group
- the NPU is a neural-network (NN) computing processor.
- NN neural-network
- Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
- the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
- the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
- Internal memory 121 may be used to store computer executable program code, which includes instructions.
- the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
- the internal memory 121 may include a storage program area and a storage data area.
- the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
- the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
- the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
- the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
- the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
- Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
- the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
- the voice can be answered by placing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
- the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
- the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
- the earphone jack 170D is used to connect wired earphones.
- the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
- OMTP open mobile terminal platform
- CTIA cellular telecommunications industry association of the USA
- the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
- the gyro sensor 180B can be used to determine the motion attitude of the electronic device 100.
- the air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
- the magnetic sensor 180D includes a Hall sensor. The electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
- the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
- Distance sensor 180F for measuring distance. The electronic device 100 can measure the distance through infrared or laser.
- Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
- the ambient light sensor 180L is used to sense ambient light brightness.
- the fingerprint sensor 180H is used to collect fingerprints.
- the temperature sensor 180J is used to detect the temperature.
- the bone conduction sensor 180M can acquire vibration signals.
- Touch sensor 180K also called "touch panel”.
- the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
- the touch sensor 180K is used to detect a touch operation on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- Visual output related to touch operations may be provided through display screen 194 .
- the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
- the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
- Motor 191 can generate vibrating cues.
- the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
- touch operations acting on different applications can correspond to different vibration feedback effects.
- the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
- Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
- the touch vibration feedback effect can also support customization.
- the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
- the SIM card interface 195 is used to connect a SIM card.
- electronic devices such as the smart screen 101 , the mobile phone 102 , and the in-vehicle device 103 may all have the structure shown in FIG. 2 , or have a structure with more or fewer components than that shown in FIG.
- the embodiments of the present application do not limit the types of electronic devices included in the application scenario.
- the electronic device 100 shown in FIG. 2 when the electronic device 100 shown in FIG. 2 is a mobile phone, it may have a Harmony OS system, system, The system or any other possible operating system, or may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture, etc.
- the mobile phone has a layered architecture. Taking the system as an example, the software structure of the mobile phone 102 is exemplarily described.
- FIG. 3 is a software structural block diagram of an implementation process of an example of a method for human-computer interaction according to an embodiment of the present application.
- the in-vehicle device 103 can be used as a screen projection device (or “display device”) of the mobile phone 102 , and the application of the mobile phone 102 can be displayed on the display screen of the in-vehicle device 103 .
- the system has a layered architecture, the software can be divided into several layers, each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
- the The system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer.
- the application layer can include a series of application packages. As shown in Figure 3, the application package can include applications such as Visible to Speak 11, Smart Voice 13, Music, Navigation, and HiCar 15. The following mainly introduces the functional modules respectively corresponding to the visible to speak 11 and the intelligent voice 13 in the embodiments of the present application.
- “visible” may refer to the part that the user can see during the human-computer interaction between the user and the electronic device.
- the user-visible portion may include display content on the screen of the electronic device, such as the desktop, windows, menus, icons, buttons, and controls of the electronic device.
- the visible portion may also include multimedia content such as text, pictures, and videos displayed on the screen of the electronic device, which is not limited in this embodiment of the present application.
- the display content on the screen of the electronic device can be an interface displayed by an application running in the foreground of the electronic device, or a virtual display interface running an application in the background of the electronic device. on other electronic devices.
- “speakable” means that the user can interact with the visible part through a voice command, thereby completing the interactive task.
- user-visible parts such as desktops, windows, menus, icons, buttons, and controls of an electronic device
- the user can control them through voice commands, and then perform input operations such as clicking, double-clicking, and sliding on the visible parts.
- 11 may include an interface information acquisition module 111, an intent processing module 112, an interface module 113, a predefined action execution module 114, and the like.
- the interface information acquisition module 111 may acquire interface content information of applications running in the foreground or background of the mobile phone.
- the intent processing module 112 may receive the user's voice instruction returned by the smart voice 13, and determine the user's intent according to the user's voice instruction.
- the interface module 113 is used to realize data and information exchange between various applications.
- the predefined action execution module 114 is configured to execute corresponding operations according to voice commands, user intentions, and the like.
- the smart voice 13 may correspond to a smart voice application installed on the side of the mobile phone 102 , that is, a service process of voice recognition provided by the smart voice application of the mobile phone 102 .
- the service process of the voice recognition provided by the smart voice 13 can be provided by the server, and this scenario can correspond to (c) in FIG.
- the server 104 analyzes the voice command, the server 104 returns the recognition result of the voice command to the mobile phone 102, which will not be repeated here.
- the intelligent speech 13 may include a semantic understanding (natural language understanding, NLU) module 131, a speech recognition (automatic speech recognition, ASR) module 132, a speech synthesis (text to speech, TTS) module 133 and a session management (dialog management) module 131 , DM) module 134 and so on.
- NLU natural language understanding
- ASR automatic speech recognition
- TTS text to speech
- TTS session management
- DM session management
- the ASR module 132 can convert the original voice signal input by the user into text information; the NUL module 131 can convert the recognized text information into semantics that can be understood by electronic devices such as mobile phones and in-vehicle devices; the DM module 134 can be based on dialogue The state determines the action that the system should take, etc.; the TTS module 133 can convert the natural language text into speech and output it to the user.
- the smart speech 13 may also use a natural language generation (NLG) module, etc., which is not limited in this embodiment of the present application.
- NLG natural language generation
- the application of the mobile phone can be projected to the vehicle device through the HiCar application 15. During the projection process, the application actually runs on the side of the mobile phone, and the operation can include the foreground operation or the background operation of the mobile phone.
- the in-vehicle device can have an independent display system. After the application of the mobile phone is projected to the in-vehicle device through the HiCar application 15, there can be an independent display desktop and application quick entry on the in-vehicle device, while providing the ability to obtain voice commands.
- the application framework layer includes a variety of service programs or some predefined functions, which can provide an application programming interface (API) and a programming framework for applications in the application layer.
- the application framework layer may include a content sensor (content sensor) 21, a multi-screen framework service module 23, a view system 25, and the like.
- the content provider 21 can be used to store and obtain data, and make these data accessible by application programs.
- the data acquired by the content provider 21 may include interface display data of the electronic device, video, image, audio, user browsing history, bookmarks and other data.
- the content controller 21 may acquire the interface content displayed in the foreground or background of the mobile phone.
- the multi-screen frame service module 23 may include a window manager, etc., for managing the window display of the electronic device.
- the window manager may acquire the size of the display screen of the mobile phone 102 or the size of the window to be displayed, and acquire the content of the window to be displayed, and the like.
- the multi-screen framework service module 23 can also manage the screen projection display process of the electronic device, for example, obtain the interface content of one or more applications running in the background of the electronic device, and transmit the interface content to other electronic devices for realizing other electronic devices.
- the interface content of the electronic device is displayed on the electronic device, which is not repeated here.
- the view system 25 includes visual controls, such as controls for displaying text, controls for displaying pictures, and the like. View system 25 can be used to build applications.
- a display interface can consist of one or more views.
- the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
- the Android runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
- the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
- the application layer and the application framework layer run in virtual machines.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
- a system library can include multiple functional modules. For example: surface manager (surface manager), media library (media library), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
- the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
- a 2D graphics engine is a drawing engine for 2D drawing.
- the image processing library can provide analysis of various image data and provide a variety of image processing algorithms, such as image cutting, image fusion, image blurring, image sharpening and other processing, which will not be repeated here.
- the kernel layer is the layer between hardware and software.
- the kernel layer at least includes display drivers, audio drivers, sensor drivers, etc.
- Various drivers can call the hardware structures such as the microphone, speaker or sensor of the mobile phone, such as calling the microphone of the mobile phone to obtain the voice commands used, and calling the speaker of the mobile phone for voice output, etc. , and will not be repeated here.
- the in-vehicle device 103 as a display device may have the same or different software structure than that of the mobile phone 102 .
- the vehicle-mounted device at least includes a display module 31 , a microphone/speaker 32 , and the like.
- the display module 31 may be used to display the interface content currently running on the in-vehicle device 103 , or display the application interface projected by the mobile phone 102 .
- the in-vehicle device 103 may have an independent display system.
- the application of the mobile phone 102 after the application of the mobile phone 102 is projected to the in-vehicle device 103 through the HiCar application 15, there may be an independent display desktop and application on the in-vehicle device 103 Fast entry.
- music, navigation, video and other applications of the mobile phone 102 can be rearranged and displayed on the in-vehicle device 103 according to the display system of the in-vehicle device 103 after the application of the mobile phone is projected to the in-vehicle device 103 through the HiCar application 15.
- This embodiment of the present application This is not limited.
- the microphone/speaker 32 is the hardware structure of the in-vehicle device, and can realize the same functions as the microphone/speaker of the mobile phone.
- the input of the user's voice instruction may be through the microphone of the mobile phone 102 itself, or may be a remote virtual microphone.
- the remote virtual microphone can be understood as a kind of acquisition capability of voice commands provided by the microphone of the in-vehicle device 103 by means of the microphone of the in-vehicle device 103, and the acquired voice commands are transmitted to the mobile phone, and the mobile phone 102 can recognize the voice commands. etc., will not be repeated here.
- the HiCar application 15 can rely on the multi-screen framework capability of the mobile phone to project the interfaces of multiple applications of the mobile phone to the interface of the in-vehicle device.
- the multiple applications themselves actually run on the side of the mobile phone, and the interface is displayed on the on the screen of the in-vehicle device.
- the screen content is extracted through the content sensor of the mobile phone system, and the application interface content of the interface of the screen-casting vehicle device is obtained.
- Smart Voice can analyze user semantics more quickly and accurately through the terminal-side (relying on the powerful calculation example of the mobile phone itself) and cloud-side analysis capabilities, and the combination of the terminal and the cloud can send the recognized results to Visible to match the interface content and identify the user. the purpose.
- the interface is operated to realize control operations such as control clicks, sliding up and down, left and right, and return.
- the in-vehicle device 103 and the mobile phone 102 are in a state of established connection.
- connection between the mobile phone 102 and the in-vehicle device 103 may include various connection modes such as wired connection or wireless connection.
- the wired connection between the mobile phone 102 and the in-vehicle device 103 may be through a USB data cable; the wireless connection between the mobile phone 102 and the in-vehicle device 103 may be established by means of a Wi-Fi connection, or by means of a mobile phone 102 and the in-vehicle device 103 support the function of near field communication (NFC), and perform proximity connection through the "touch" function, or connect through the mobile phone 102 and the in-vehicle device 103 through Bluetooth scanning code, etc.
- NFC near field communication
- the communication bandwidth and rate are gradually increased, and data may be transmitted between the mobile phone 102 and the in-vehicle device 103 without establishing a near field communication connection.
- the mobile phone 102 and the in-vehicle device 103 may be able to project the screen of the mobile phone to the in-vehicle device through 5G communication.
- the mobile phone may not provide functions such as discovery and establishing a connection with the in-vehicle device.
- the mobile phone 102 and the in-vehicle device 103 can be connected and communicated based on the relevant settings under the account by logging into the same account.
- both the mobile phone 102 and the in-vehicle device 103 can register a Huawei account.
- the application of the mobile phone 102 can be displayed on the display screen of the in-vehicle device 103, and the application can actually run on the side of the mobile phone 102, which is not repeated here. Repeat.
- the mobile phone 102 and the vehicle-mounted device 103 may also have different division methods or include more functional modules. This is not limited.
- the mobile phone 102 and the in-vehicle device 103 having the software structure shown in FIG. 3 are taken as examples for detailed description in conjunction with the accompanying drawings and application scenarios.
- FIG. 4 is a schematic diagram of a graphical user interface (graphical user interface, GUI) for implementing a voice interaction process on an in-vehicle device provided by an embodiment of the present application.
- GUI graphical user interface
- FIG. 4 shows that the screen display system of the in-vehicle device 103 displays the currently output interface, and the content of the interface can be derived from the application on the side of the mobile phone 102 that is actually running, obtained by the HiCar application. Provided to the in-vehicle device 103 .
- the interface content on the display screen of the in-vehicle device 103 can be arranged and filled based on its own display system, and the same content can have a different display style, icon size, arrangement order, etc.
- the content is arranged and filled on the display screen of the in-vehicle device 103 according to the requirements of the display system of the in-vehicle device 103 .
- the screen display area of the in-vehicle device 103 may include a status display area at the top position, and a navigation menu area 401 and a content area 402 shown by dashed boxes.
- the status display area displays the current time and date, Bluetooth icon, WIFI icon, etc.
- the navigation menu area 401 may include icons such as homepage, navigation, phone and music, each icon corresponds to at least one application actually running on the mobile phone 102, the user Any icon can be clicked to enter the corresponding interface of the application; the content area 402 displays the content provided to the in-vehicle device 103 by different applications.
- the Huawei Music is installed on the mobile phone 102 , and the Huawei Music runs in the background, and the Huawei Music sends the content such as the playlist or song list displayed during the running process to the in-vehicle device 103 .
- the screen display system of the in-vehicle device 103 fills the content area provided by the Huawei Music in the content area of the display screen, as shown in (a) in FIG. Recommendations, playlists, ranking lists, radio stations, searches, etc., the display process will not be repeated in subsequent embodiments.
- the interface of the in-vehicle device 103 may also display other more menus or contents of application programs, which are not limited in this embodiment of the present application.
- FIG. 4 shows the interface after the user clicks on the music application in the navigation menu area 401, and the icon of the music application in the navigation menu area 401 is highlighted in gray.
- the song name or song list provided by Huawei Music is displayed in the content area 402
- the play button 20 is displayed on the icon of song 1, and the song 2, song 3, song 4 and song 5 are paused
- a pause button 30 is displayed in the playback state.
- a voice ball icon 10 may also be included, as shown in (a) of FIG.
- the in-vehicle device 103 can display the interface 403 as shown in (b) of FIG. 4 .
- the wake-up window 403 - 1 shown by the dotted box may be displayed on the interface 403 , and the wake-up window 403 - 1 includes the voice recognition icon 40 .
- the wake-up window 403 - 1 may not be embodied in the form of a window, but only includes the voice recognition icon 40 , or includes the voice recognition icon 40 and the voice command recommended to the user, and is displayed in a floating manner on the display screen of the in-vehicle device 103 superior.
- the embodiments of the present application are for convenience of description.
- the area including the voice recognition icon 40 is referred to as a "wake-up window", which should not limit the solution of the embodiment of the present application, and will not be described in detail later.
- the voice recognition icon 40 may be displayed dynamically, which is used to indicate that the in-vehicle device 103 is in a state of monitoring and acquiring the user's voice instruction.
- the wake-up window 403-1 may also include some voice commands recommended to the user, such as voice commands such as "stop playing” and "continue playing". It should be understood that the recommended voice command may also accept a user's click operation, and execute the purpose corresponding to the response command, which will not be repeated in this embodiment of the present application.
- the voice command can be sent to the mobile phone 102, and the mobile phone 102 recognizes the user’s
- a voice command in response to the voice command, a click operation on the play button 20 of the song 1 is performed in the background, and the play button 20 on the song 1 changes to a pause button 30 .
- the mobile phone 102 can transfer the display interface of the click operation on the play button 20 of the song 1 back to the in-vehicle device 103, and then the in-vehicle device 103 can display the interface 404 shown in (c) in FIG. Pause button 30.
- the above implementation process can be understood as a user instruction can perform a click operation on any control on the display screen of the in-vehicle device 103 , and further display the interface after the click operation is performed on the display screen of the in-vehicle device 103 .
- the user can also turn on the voice monitoring function by pressing the car control voice button of the car, for example, the user presses the car control voice button on the steering wheel to turn on the function of the in-vehicle device 103 to monitor and obtain the user's voice command, and display
- the wake-up window 403-1 is shown by the dashed box.
- the operation of the user clicking the voice ball icon 10 can trigger the in-vehicle device 103 to enable the voice monitoring function; or, the voice interaction function between the in-vehicle device 103 and the user can be enabled, that is, the in-vehicle device 103 is always monitoring the voice The state of the instruction; or, the user can further turn on the in-vehicle device 103 through other shortcut operations to be in the state of monitoring the voice instruction all the time, which is not limited in this embodiment of the present application.
- the wake-up window can always be displayed on the display screen of the in-vehicle device 103; , which is suspended and displayed on the display screen of the in-vehicle device 103 again; when no voice command of the user is detected within a preset time (for example, 2 minutes), the in-vehicle device 103 can automatically exit the monitoring function, which is not limited in this embodiment of the present application.
- buttons, buttons, switches, menus, options, pictures, lists, texts, etc. that are visible on the interface and that can be clicked by the user are collectively referred to as “controls”. The example will not be repeated.
- the voice instruction recommended to the user displayed in the wake-up window 403-1 may be instruction content associated with a control on the currently displayed interface 403 that can be clicked by the user.
- the content sensor of the mobile phone 102 can obtain the current state of each control and provide the user with a recommendation instruction according to the current state.
- the interface 403 includes the play button 20, and the recommended instruction in the wake-up window 403-1 may include “stop playing”, that is, “stop playing” can be understood as playing.
- the state achieved after the button 20 is clicked by the user.
- song 2 on the interface 403 also includes the pause button 30, then the recommended instruction in the wake-up window 403-1 may include "play song 2", that is, "play song” can be understood as the user clicking the pause button of song 2 The state achieved after 30.
- the pause button 30 is displayed on all songs 1 to 5.
- the mobile phone 102 obtains that the current interface does not include the “play button 20”. If the user wakes up After the voice ball is activated, the "start playing" instruction may be displayed in the wake-up window 403-1, but the "stop playing” instruction will not be displayed, which will not be described in detail later.
- the voice command recommended to the user displayed in the wake-up window 403-1 may also be a fixed recommended command of a certain application.
- the voice instruction recommended to the user displayed in the wake-up window 403-1 can be fixed as “stop playing”, “start playing”, etc. This embodiment of the present application This is not limited.
- controls on the interface that can be clicked by the user can be divided into the following categories:
- Text controls contain textual information that can be recognized. Exemplarily, “daily recommendation”, “song list”, “top chart”, “radio station”, “song X” and “song list 1" as shown in (a) of FIG. 4 .
- the text information included in the text control may be directly identified by a content sensor (content sensor) of the application framework layer of the mobile phone 102 . It should be understood that the music application actually runs in the mobile phone 102 in the background, and the mobile phone 102 can acquire the text information of the text control projected and displayed on the display screen of the in-vehicle device 103 in the background.
- a content sensor content sensor
- the voice command recommended to the user displayed in the wake-up window 403-1 may be related to the text control obtained above, such as “play song 2”, etc., which will not be repeated here. .
- Common web controls can include text input boxes (TextBox), drop-down boxes (DropList), date/time controls (Date/TimePicker), and so on.
- search control can be divided into web control classes.
- the web control on the interface can be recognized by the content sensor of the mobile phone 102.
- the voice command recommended to the user displayed in the wake-up window 403-1 can be obtained from the above. related web controls, such as "search for songs" and other recommended commands.
- Picture controls are displayed as pictures on the interface, and each picture corresponds to a different descriptor. Exemplarily, as shown in (a) in FIG. 4 , the artist picture or album picture above the song 1, and the picture of “nostalgic classics” displayed above the song list 1 for identifying the list, etc.
- the content sensor of the mobile phone 102 can generalize the meaning of the picture by obtaining the description word of each picture, and provide the user with a recommendation instruction.
- the mobile phone 102 obtains a picture with the description word "Zhang XX's song 1" and the voice instruction recommended to the user displayed in the wake-up window 403-1 may display a recommended instruction such as "play Zhang XX's song 1" .
- the song list 1 may include a plurality of songs, and the “song list 1” can be divided into list controls.
- the entered next-level interface may be presented to the user with multiple song lists included in the song list 1, and the music in the song list 1 is not started to be played.
- a switch control can be understood as a control with a switch function on the interface.
- the play button 20 and the pause button 30 can be divided into switch controls.
- the mobile phone 102 can obtain the controls on the current display interface 403, and further according to the obtained control types, descriptors and other information, determine the recommended instructions displayed in the wake-up window 403-1 to the user .
- the embodiments of the present application may include more controls than the five types of controls listed above, and the embodiments of the present application will not exemplify them one by one.
- some controls may be divided into multiple control types at the same time, which is not limited in this embodiment of the present application.
- Table 1 lists the controls on several common audio application pages. As shown in Table 1 below, for audio applications such as NetEase Cloud Music, Kugou Music, Huawei Music, Himalaya, Baby Bus Story, Xiaobanlong Children's Song, which are commonly used by users, different pages may include different controls, and each application The number and types of controls included in the first-level page and the second-level page are different.
- the first-level page can be understood as the main interface of NetEase Cloud Music entered after the user clicks the NetEase Cloud Music application icon, including "daily recommendation", “My favorite music”, “Local Music”, “Private FM” and other page content
- the second-level interface is the next-level page that the user clicks on any menu or control on the main interface of the NetEase Cloud Music to enter, such as the playlist page, play page, etc.
- the page content on each page can be acquired by the mobile phone, and the text information included in each control can be recognized, which will not be repeated here.
- FIG. 5 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application.
- FIG. 5 shows that the screen display system of the in-vehicle device 103 displays an interface 501 currently output.
- song 1, song 2, song 3, song 4, and song 5 are all in the state of being paused, and the pause button 30 is displayed.
- the user clicks the voice ball icon 10 to enable the voice monitoring function of the in-vehicle device 103, and the display screen of the in-vehicle device 103 displays as shown in FIG. 5 (( b)
- the wake-up window 502-1 may include the voice recognition icon 40 and recommended voice commands.
- the recommended voice commands may be "start playing" and "next page", and so on.
- an interface 503 as shown in (c) in FIG. 5 can be displayed, which interface 503 is the interface after the click operation on the song list 1 is performed.
- the interface 503 may include the following controls: return to the previous level, song list 1—classic nostalgia, play all, and song 6 and many other songs included in the song list 1 in the song name.
- the wake-up window is always suspended and displayed on the display screen of the in-vehicle device 103 .
- the interface 503 shown in (c) of FIG. 5 includes a wake-up window 503-1.
- the instructions recommended to the user in the wake-up window 503-1 may be changed according to the controls included in the current interface 503, for example, voice instructions such as “play all” and “next page” are displayed.
- an interface 504 can be displayed, which is the interface after the click operation on the "play all” control is performed.
- the “play all” control is displayed as the playing state, and starts playing from the first song (song 6) arranged in the song list 1 , a sound icon 50 is displayed at the location of the first song, which is used to identify the source of the sound as song 6, that is, song 6 is the song currently being played, which is not limited in this embodiment of the present application.
- the method for human-computer interaction provided by the embodiment of the present application, by obtaining the controls displayed on the interface that are visible and can be clicked by the user, the user can input a voice command to execute the control on the interface. Click and other operations of any control. All apps and all visible content on the display can be controlled by the user with voice commands. In particular, in the driving scene, the manual operation of the user is reduced, thereby avoiding user distraction and improving the safety of the user in the driving scene.
- FIG. 6 is a schematic interface diagram of another example of a process of implementing voice interaction on an in-vehicle device provided by an embodiment of the present application.
- FIG. 6 shows that the screen display system of the in-vehicle device 103 displays an interface 601 currently output.
- song 1, song 2, song 3, song 4, and song 5 are all in the state of being paused, and the pause button 30 is displayed.
- the wake-up window 602-1 is shown in the figure.
- the wake-up window 602-1 may include the voice recognition icon 40, recommended voice commands such as "start playing" and "next page”.
- the mobile phone 102 can obtain the content of the interface, determine whether the interface includes icons (or pictures), and add icons to the icons in a certain order. Digital corner markers.
- the user's operation of enabling the voice monitoring function of the in-vehicle device 103 may trigger the addition of a digital corner mark to the icon in the interface, or, before the user enables the voice monitoring function of the in-vehicle device 103, other preset operations may be used. Trigger to add a digital corner label on the icon in the interface, which is not limited in this embodiment of the present application.
- the icons with added digital corner marks mentioned in the embodiments of the present application may also include application icons of different applications.
- the interface 601 as shown in (a) of FIG. 6 navigates the home icon, navigation icon, phone icon, music icon, etc. in the menu area.
- the icon for adding a digital superscript mentioned in the embodiment of the present application may also include a picture displayed on the interface 601 .
- the singer picture of song 1, the picture of song list 1, etc. mark the pictures included on the interface with digital superscripts.
- a song or a song list in the content area of the interface 601 is displayed in a foreign language
- the user may not be able to accurately issue a voice command including the song name, by using the pictures of different songs or pictures of the song list.
- digital corner markers users can perform operations through voice commands containing digital corner markers, which is convenient and quick, and improves user experience.
- the icon for adding a digital corner label mentioned in the embodiment of the present application may also include controls such as buttons displayed on the interface 601, which are not limited in the embodiment of the present application.
- the display size of the numeric subscript can be adapted to the size of the application icon displayed on the interface of the in-vehicle device 103 .
- the application icon on the interface is small, adding a digital corner mark may cause the digital corner mark to be too small, and the user cannot obtain the digital corner mark accurately. Therefore, if the application icon is small, such as the display of the application icon on the in-vehicle device When the pixels occupied on the screen are less than or equal to the preset pixels, the application icon may not be marked, but only the application icons larger than the preset pixels are marked, which is not limited in this embodiment of the present application.
- the mobile phone 102 acquires the interface 601 as shown in (a) of FIG. 6 , it is determined that the interface 601 includes icons of different songs and icons of different song lists.
- the mobile phone 102 can add a digital corner mark 60 as shown in (b) of FIG. 6 to each icon according to the arrangement order of the icons on the interface from left to right and from top to bottom, for example, add a digital corner to song 1.
- Mark 1 add a digital corner mark 2 to song 2, and so on, add a digital corner mark 60 to all the icons on the music interface.
- the above-mentioned process can obtain the content on the interface by the content sensor of the mobile phone 102, and obtain the interface content from the content sensor by the HiCar application 15 installed on the mobile phone 102, and the HiCar application 15 can judge whether the current interface has an icon according to the interface content. .
- a digital corner mark 60 is added to the icon in a certain order, which is not limited in this embodiment of the present application.
- the user can input a voice command including the digital superscript, and use the voice command to execute the picture of the digital superscript.
- Click Action Exemplarily, as shown in (b) of FIG. 6 , after the pictures included on the interface of the in-vehicle device 103 are marked with a numerical corner mark, the user can input “1” or “play 1”, etc. containing the corresponding number.
- Voice command in response to the voice command input by the user, the mobile phone 102 can perform a click operation on the song 1 marked as 1 in the background, and display the interface 603 as shown in (c) in FIG. 6 , the pause button 30 of the song 1 The transition is made to the play button 20, and the in-vehicle device 103 starts to play the song 1.
- the user speaks a corresponding number, such as number 1, through a voice command, and the user's voice command is recognized by the intelligent voice 13 of the application layer, and the user's voice command is The voice command translates into the text "1".
- the content sensor of the application framework layer extracts the content of the current interface, analyzes the content of the control from the visible, and obtains the text information of the control. For example, match the recognized control information with the text "1" returned by Smart Voice.
- the click operation is performed on the icon of song 1, and the click event on the icon of song 1 is transmitted to the business logic of the music application itself, so as to realize the jump of the corresponding business logic.
- the HiCar application 15 ends this round of voice recognition, exits the voice recognition function of Smart Voice, the voice ball icon 10 returns to the static state as shown in (c) in FIG. 6 , wakes up the window 602-1 and the recommended voice command etc. to disappear.
- a digital corner mark in the process of adding a digital corner mark, may be added to some controls in one or more controls on the current interface according to certain principles, which can increase all the digital corner marks.
- Said part of the controls may include all the controls identified as picture type in one or more controls of the current interface; or identified as controls with grid-type arrangement order in one or more controls of the current interface; or identified as the current interface
- One or more of the controls in the list have a list-type arrangement order; or one or more controls in the current interface are identified and the display size is greater than or equal to the preset value.
- the numeric subscript may be added to some controls in the one or more controls according to a preset sequence.
- the preset order includes a left-to-right and/or a top-to-bottom order.
- the outline of the icon controls can be acquired, and the outline keywords describing the icon controls can be determined according to the outlines; The matching degree between one or more keywords included in the voice instruction and the outline keyword of the icon control is determined, and the icon control with the largest matching degree is determined as the target control.
- the music playing interface when the user inputs the voice command as "I like", during the matching process between the voice command and the controls included in the interface, if the description words of the favorite button on the music playing interface are "like", “favorite” , the outline of the favorite button is the shape of "peach heart”, then the shape of the peach heart can be matched as "I like", this method can generalize the user's voice command, and more intelligently combine the user's command with the controls on the interface match.
- strong matching is given priority, that is, the control information and the voice command text of the smart voice recognition need to be in one-to-one correspondence. If the strong match is unsuccessful, a weak match is performed, that is, it is judged whether the control information contains the voice command text of the intelligent voice recognition. As long as it contains part of the voice command text of the intelligent voice recognition, it is judged that the matching is successful, and the control corresponding to the control information is determined. Perform a click action.
- a digital corner mark is added to the clickable controls such as pictures and application icons displayed on the interface, and the user can issue a voice command including numbers, and the digital corner mark is executed through the voice command.
- Control click operation etc.
- the user sees the digital corner mark on the interface, he sends out a voice command including a number, and converts the voice command including a number through voice recognition, so as to determine the picture, application icon and other controls corresponding to the number that can be clicked, and execute the click operation.
- the user does not need to memorize a variety of complex voice commands, and only realizes the voice interaction process through digital voice commands, which is simpler and more convenient, reduces the difficulty of voice interaction, and improves user experience.
- FIG. 7 is a schematic interface diagram of another example of a voice interaction process implemented on a vehicle-mounted device provided by an embodiment of the present application.
- the navigation menu area 401 of the screen display system displays navigation menus such as home page, navigation, phone and music, and switching between different navigation menus can also be controlled by the user's voice commands.
- the process of jumping from the music interface shown in (c) of FIG. 6 to the navigation interface from the screen display interface of the in-vehicle device 103 can also be implemented by voice commands.
- the screen display system of the in-vehicle device 103 displays the currently output interface 701 .
- song 1 is displayed in a playing state
- song 2, song 3, song 4 and song 5 are all in a paused state
- a pause button 30 is displayed.
- the wake-up window 702-1 is shown in the figure.
- the wake-up window 702-1 may include the voice recognition icon 40, recommended voice commands such as "start search" and "next page”.
- the voice commands recommended in the wake-up window 702-1 may be different from the wake-up window 403-1 shown in (b) of FIG. 4 and the wake-up window 502- shown in (b) of FIG. 5. 1 and (c) the recommended voice commands displayed in the wake-up window 503-1, etc. shown in Figures 1 and (c), the recommended voice commands displayed in the wake-up window can follow the display content on the current interface to make corresponding changes, and display the same as the one on the current interface. Voice commands related to the displayed content, or voice commands not related to the displayed content on the current interface may also be displayed, which is not limited in this embodiment of the present application.
- the voice instruction can be sent to the mobile phone 102.
- the mobile phone 102 recognizes the user's voice command, and in response to the voice command, enables the voice interaction function between the in-vehicle device 103 and the user, that is, the in-vehicle device 103 is always in the state of monitoring the voice command, and the user does not need to activate the in-vehicle device 103 multiple times to monitor Get the user's voice command.
- the display interface of the display screen of the in-vehicle device 103 can jump from the music menu to the interface 703 of the navigation menu.
- the user can be provided with various types of search options including "food”, “gas station”, “shopping mall”, etc. shown in the right area.
- the interface content of the interface 703 of the navigation menu is not repeated here.
- the wake-up window may disappear briefly, and the user's voice instruction is monitored in the background.
- the voice command issued by the user is detected again, it can be suspended and displayed on the display screen again.
- the user starts to issue a voice command, and the wake-up window 704-1 appears.
- the recommended instruction displayed in the wake-up window 704-1 may be adapted to the current interface content, or the recommended instruction may be associated with historical data with the highest search frequency when the user uses the navigation application.
- the wake-up window 704-1 may include the voice recognition icon 40, and recommended voice commands such as “navigate to the company” and “navigate to the mall”, which are not limited in this embodiment of the present application.
- the in-vehicle device 103 When the user inputs the voice command of "search for food", after the in-vehicle device 103 obtains the user's command, it can send the voice command to the mobile phone 102, and the mobile phone 102 recognizes the user's voice command, and in response to the voice command, simulates clicking as shown in Figure 7
- the "food” option on the interface 704 shown in (d) of FIG. 7 is displayed for the user, and the search result interface 705 shown in (e) of FIG. 7 is displayed for the user.
- multiple searched restaurants are displayed on the interface 705, and the restaurants can be sorted according to the distance from the user's current location, and the per capita unit price and distance of the restaurant are displayed for the user, which is not limited in this embodiment of the present application. .
- the recommended instruction displayed in the wake-up window 705-1 displayed on the interface 705 can be re-adapted to the current interface content.
- the wake-up window 705-1 can include voice recognition. Icon 40, "start search", "next page” and other recommended voice commands, which are not limited in this embodiment of the present application.
- a search result interface 706 as shown in (f) of FIG. 7 is displayed for the user. It should be understood that the interface 706 is an interface displayed after performing a swipe on the interface 705 as indicated by the black arrow.
- the user selects the target restaurant as "5.XX light food restaurant”
- he can continue to input the voice command of "navigate to 5".
- the user can display the display as shown in the figure below.
- the navigation route interface 707 shown in (g) in Figure 7 the interface 707 includes the route and distance to the 5.XX light food restaurant, etc., which is not limited in this embodiment of the present application.
- the user's voice command includes "5".
- the voice command can be sent to the mobile phone 102.
- 102 may perform interface matching according to the instruction, that is, intercept the keyword in the instruction, and match it with the keyword or description information contained in the controls on all interfaces on the current interface.
- the keywords of the user instruction are “navigation” and "5"
- the mobile phone detects that the keywords of the option “5.XX light food restaurant” on the interface are “5", "light food restaurant”, etc.
- the user instruction The matching degree with this option is the highest, so click the "5.XX light food restaurant” option on the interface 706 to be executed, and the interface 707 as shown in (g) in FIG. 7 is displayed.
- the above method obtains the text controls, picture controls, buttons, and icon controls that are visible on the interface and can be clicked by the user, and then matches the target controls on the interface according to the obtained user voice commands, and executes the matching on the interface.
- Table 2 shows several common controls on pages of navigation applications. As shown in Table 2 below, for navigation applications such as Baidu Map and AutoNavi Map that are commonly used by users, different pages may include different controls, as well as the number and types of controls included in the primary and secondary pages of each application. all different.
- the first-level page can be understood as the main interface of Baidu map entered by the user after clicking the Baidu map application icon, including "zoom in”, “zoom out”, “positioning”, “road conditions”, “Search”, “More”, “Exit” and other controls
- the second-level interface is the next-level page that the user clicks on any menu or control on the main interface of Baidu Maps to enter, such as the route preference setting page, etc.
- the page content and controls on each page can be acquired by the mobile phone, and the text information included in each control can be recognized, which will not be repeated here.
- the general instruction controls may include controls on the interface, such as return, turn left/turn right, turn up/down, page up/page down, and the like.
- the text is sent to Visible and Talkable.
- the click event of the return key (key event) is sent to the application to which the current interface belongs, and the application to which the current interface belongs will receive the corresponding return event by monitoring the return key event. Process the return business.
- the corresponding sliding list control is identified through the interface control returned by the content sensor.
- the sliding method of the control itself such as the scrollBy sliding method of RecyclerView, to implement up and down sliding.
- left and right sliding it is based on whether the control itself supports the feature of left and right sliding.
- the control supports left and right sliding the distance moved in the horizontal direction is passed in the scrollBy sliding method called, and the positive and negative values are used to judge left or right sliding.
- the control supports sliding up and down the distance moved in the vertical direction is passed in, and the positive and negative values are used to determine whether to slide up or down.
- FIG. 8 is a schematic interface diagram of another example of a voice interaction process implemented on an in-vehicle device provided by an embodiment of the present application. Switching between different navigation menus can also be controlled by a user's voice command.
- FIG. 8 shows the process that the screen display interface of the in-vehicle device 103 jumps from the navigation route interface shown in (g) in FIG. 7 to the phone menu, and this process can also be implemented by the user's voice command.
- the screen display system of the in-vehicle device 103 displays the currently output navigation route interface 801 .
- the wake-up window on the interface 801 displays the voice recognition icon 40, recommended voice commands such as "exit navigation" and "search”.
- a phone application interface 802 as shown in (b) in FIG. 8 is displayed for the user.
- 802 may include submenus such as call records, contacts, and dialing, and the interface 802 currently displays content such as the user's call records, which will not be repeated here.
- the user can input voice commands to perform operations such as clicking on any control on the interface.
- All apps and all visible content on the display can be controlled by the user with voice commands.
- the manual operation of the user is reduced, thereby avoiding user distraction and improving the safety of the user in the driving scene.
- FIG. 9 is a schematic flowchart of a method for voice interaction provided by an embodiment of the present application. As shown in FIG. 9 , the method 900 may include the following steps:
- the user opens the first application.
- the first application may be an application actually running on the side of the mobile phone 102 , for example, an application running in the foreground or an application running in the background of the mobile phone 102 .
- this step 901 can be performed by the user on the side of the in-vehicle device 103, and transmitted back to the mobile phone 102 by the in-vehicle device 103 to start the first application in the background of the mobile phone 102, or the user can perform it on the side of the mobile phone 102 to directly cast the screen It is displayed on the display screen of the in-vehicle device 103, which is not limited in this embodiment of the present application.
- the first application performs interface refresh.
- performing interface refresh by the first application may trigger the mobile phone 102 to perform interface identification through an algorithm service.
- the mobile phone 102 performs interface hot word recognition to obtain information of the interface content. It should be understood that the time delay of the interface hot word recognition process in this 904 is less than 500 milliseconds.
- the interface content may include user-visible portions of the currently displayed interface.
- the user-visible part may include pictures, text, menus, options, icons, buttons, etc. displayed on the interface, which are collectively referred to as "controls" and the like in this embodiment of the present application.
- an operation may be performed on the target control.
- the operation may include input operations such as clicking, clicking, double-clicking, sliding, and right-clicking.
- the voice command is matched with the target control on the interface, that is, the user's intention is recognized, and the click operation on the target control is further performed.
- the user activates the voice recognition function.
- starting the voice recognition function may be starting the vehicle-mounted device 103 to start monitoring the user's voice command;
- the command is transmitted back to the mobile phone 102, and the mobile phone 102 analyzes the voice command, etc., which is not limited in this embodiment of the present application.
- the user can activate the voice recognition function through a physical button of the vehicle-mounted device or through voice.
- the display interface of the in-vehicle device 103 may also include a voice ball icon, as shown in (a) in FIG. Function.
- the in-vehicle device 103 may display a wake-up window 403-1 as shown in (b) of FIG. 4 , which will not be repeated here.
- the user can also turn on the voice monitoring function by pressing the car control voice button of the car, for example, the user presses the car control voice button 50 on the steering wheel as shown in (b) in FIG. 1 to turn on the voice monitoring function.
- the in-vehicle device 103 has a function of monitoring and acquiring a user's voice command, which is not limited in this embodiment of the present application.
- the HiCar application of the mobile phone transmits the acquired information of the interface content to the smart voice service module.
- the smart voice service module may correspond to a smart voice application installed on the mobile phone 102, that is, the smart voice application of the mobile phone 102 executes the service process provided in FIG. 9 .
- the service corresponding to the smart voice service module may be provided by the server, and this scenario may correspond to (c) in FIG. 1.
- the mobile phone 102 With the help of the voice analysis capability of the server 104, the mobile phone 102 will The user's voice command is sent to the server 104, and after the server 104 analyzes the voice command, it returns the recognition result of the voice command to the mobile phone 102, which will not be repeated here.
- the user inputs a voice command.
- the mobile phone sends the voice command to the smart voice service module.
- the process of steps 909 and 910 may be that the user inputs a voice command on the side of the in-vehicle device 103, and after the microphone of the in-vehicle device 103 obtains the user's voice command, the voice command is sent to the HiCar application of the mobile phone, and then via the HiCar of the mobile phone.
- the application is passed to the smart voice service module, and the smart voice service module analyzes the user's voice commands.
- the smart voice service module transmits the acquired information of the user's voice command and interface content to the ASR module.
- the ASR module enhances and recognizes the user's voice instruction according to the information of the interface content.
- the acquired information of the currently displayed interface content can be transferred to the ASR module in synchronization with step 908, that is, the information of the interface content is entered in the ASR model.
- the user's voice command is recognized according to the updated ASR model.
- the user's voice command may include homophones, etc., for example, the user inputs "variety show", which is affected by the pronunciation of different users, and may be analyzed by the ASR module.
- Such homophones and words with similar pinyin may cause the mobile phone to fail to accurately obtain the user's operation intention through the user's voice command.
- the current interface displays a lot of audio information, star photos, video information, etc.
- the ASR module analyzes, it will select from “Zhongyi”, “Traditional Chinese Medicine”, From the possible identification results such as “loyalty” and “variety show”, select "variety show” that is more relevant to the currently displayed audio information, star photos, video information, etc., and then determine that the voice command issued by the current user is "variety show”.
- the voice command of the existing ASR module is the information of the currently displayed interface content introduced into the process, so that the use of the user's current voice command can be accurately analyzed according to the information of the currently displayed interface content. scene, and then accurately locate the application scene targeted by the current user's voice command, and improve the accuracy of recognizing the voice command.
- the ASR module returns the analyzed voice command text to the HiCar application of the mobile phone.
- the HiCar application of the mobile phone sends a voice command text to the algorithm service module.
- the mobile phone uses a certain algorithm service to match the text of the voice command with the information of the current interface content to determine the matching result.
- the smart voice service module can perform steps 914-1 to 919-1 shown by the dotted box in FIG. 9:
- the NLU module of the smart voice service can also obtain the text of the voice command.
- the NLU module of the smart voice service performs intention recognition according to the voice command text, and determines the user's intention corresponding to the voice command text.
- the DM module may, according to the returned user intent, perform the intent processing and determine the user intent of the user's voice command.
- the smart voice service module returns the user intent to the HiCar application of the mobile phone.
- steps 914-1 to 919-1 shown in the dotted box may be optional steps, and this process can be understood as accurately analyzing the user's intention with the help of a powerful speech recognition function such as a server, and responding on the mobile phone side
- the HiCar application of the mobile phone determines whether to execute the operation corresponding to the user's intent, which improves the accuracy of voice command recognition.
- the above process can be understood as a process of determining the user's intention according to the voice command after acquiring the user's voice command, for example, determining which control on the current interface is to be clicked by the voice command currently input by the user.
- the matching degree of the voice command and each of the one or more controls is determined, and the control with the largest matching degree is determined as the user to perform the click.
- the target control for the operation is determined.
- one or more keywords contained in the user's voice command may be extracted; the one or more keywords may be determined.
- the keywords may include words, words, pinyin of part or all of the Chinese characters of the voice command, etc., which are not limited in this embodiment of the present application.
- the description information of each control may include outline information, text information, color information, position information, icon information, etc. of the control, which is not limited in this embodiment of the present application.
- the HiCar application of the mobile phone determines whether to execute the operation corresponding to the user's intention.
- the smart voice service module ends the current conversation according to the notification message that the user instruction is not executed.
- the method obtains the controls displayed on the interface that are visible and that can be clicked by the user, and then the user can input voice commands to perform operations such as clicking on any control on the interface. All apps and all visible content on the display can be controlled by the user with voice commands.
- the acquired interface content information of the current interface is used as the parameter of the ASR analysis, that is, according to the interface content information of the current interface, it is accurately analyzed that the current user's voice command may occur.
- the text of the recognized voice command is matched with the controls in the current possible application scenario, so as to obtain the user's intention more accurately and improve the voice interaction scenario. Accuracy of speech recognition.
- the embodiment of the present application can analyze the user's voice command in combination with the current application scenario where the user's voice command may occur.
- the accuracy of speech recognition is improved, thereby reducing user manual operations, thereby avoiding user distraction, and improving user safety in driving scenarios.
- the electronic device includes corresponding hardware and/or software modules for executing each function.
- the present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
- the electronic device can be divided into functional modules according to the above method examples.
- each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module.
- the above-mentioned integrated modules can be implemented in the form of hardware. It should be noted that, the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
- the electronic device involved in the above embodiment may include: a display unit, a detection unit, and a processing unit.
- the display unit, the detection unit, and the processing unit cooperate with each other, and may be used to support the electronic device to perform the technical process described in the above embodiments.
- the electronic device provided in this embodiment is used to execute the above-mentioned method for human-computer interaction, and thus can achieve the same effect as the above-mentioned implementation method.
- the electronic device may include a processing module, a memory module and a communication module.
- the processing module may be used to control and manage the actions of the electronic device, for example, may be used to support the electronic device to perform the steps performed by the display unit, the detection unit and the processing unit.
- the storage module may be used to support the electronic device to execute stored program codes and data, and the like.
- the communication module can be used to support the communication between the electronic device and other devices.
- the processing module may be a processor or a controller. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
- the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, and the like.
- the storage module may be a memory.
- the communication module may specifically be a device that interacts with other electronic devices, such as a radio frequency circuit, a Bluetooth chip, and a Wi-Fi chip.
- the electronic device involved in this embodiment may be a device having the structure shown in FIG. 2 .
- This embodiment also provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on the electronic device, the electronic device executes the above-mentioned related method steps to realize the above-mentioned embodiments. methods of human-computer interaction.
- This embodiment also provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the above-mentioned relevant steps, so as to realize the method for human-computer interaction in the above-mentioned embodiment.
- the embodiments of the present application also provide an apparatus, which may specifically be a chip, a component or a module, and the apparatus may include a connected processor and a memory; wherein, the memory is used for storing computer execution instructions, and when the apparatus is running, The processor can execute the computer-executed instructions stored in the memory, so that the chip executes the method for human-computer interaction in the foregoing method embodiments.
- the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.
- the disclosed apparatus and method may be implemented in other manners.
- the apparatus embodiments described above are only illustrative.
- the division of modules or units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or May be integrated into another device, or some features may be omitted, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- Units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium.
- a readable storage medium including several instructions to make a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
La présente invention porte sur un procédé d'interaction homme-ordinateur (900), ainsi que sur un dispositif électronique (100) et sur un système, le procédé (900) pouvant être appliqué à des dispositifs électroniques (100), tels qu'un écran intelligent (101), ou appliqué à un système comprenant un téléphone mobile (102) et un dispositif monté sur un véhicule (103). Une commande de texte, une commande d'image, des boutons (20, 30), une commande d'icône, etc. qui sont affichés sur des interfaces (403, 404, 501 à 504, 601 à 603, 701 à 707, 801, 802) et qui sont visibles et qui peuvent être soumis à une opération de clic par un utilisateur, sont acquis de telle sorte que des opérations, telles qu'un clic sur une quelconque commande sur les interfaces (403, 404, 501 à 504, 601 à 603, 701 à 707, 801, 802), soient exécutées selon une instruction vocale d'utilisateur. De plus, pendant le processus de mise en correspondance d'une instruction vocale avec une commande, combinée avec des informations de contenu sur les interfaces (403, 404, 501 à 504, 601 à 603, 701 à 707, 801, 802), un scénario d'application dans lequel une instruction vocale d'utilisateur actuelle peut se produire, est analysée avec précision, et, par conséquent, une commande dans le scénario d'application dans lequel l'instruction vocale d'utilisateur actuelle peut se produire, est mise en correspondance en fonction d'une instruction vocale reconnue de sorte à acquérir plus précisément l'intention de l'utilisateur, ce qui permet d'améliorer la précision de la reconnaissance vocale dans un scénario d'interaction vocale.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010950650.8 | 2020-09-10 | ||
CN202010950650.8A CN114255745A (zh) | 2020-09-10 | 2020-09-10 | 一种人机交互的方法、电子设备及系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022052776A1 true WO2022052776A1 (fr) | 2022-03-17 |
Family
ID=80630251
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/113542 WO2022052776A1 (fr) | 2020-09-10 | 2021-08-19 | Procédé d'interaction homme-ordinateur, ainsi que dispositif électronique et système |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114255745A (fr) |
WO (1) | WO2022052776A1 (fr) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114627850A (zh) * | 2022-03-25 | 2022-06-14 | 北斗星通智联科技有限责任公司 | 一种车载语音交互方法及系统 |
CN114860131A (zh) * | 2022-05-26 | 2022-08-05 | 中国第一汽车股份有限公司 | 车载多媒体应用的控制方法、装置、设备、介质和产品 |
CN115129208A (zh) * | 2022-05-25 | 2022-09-30 | 成都谷罗英科技有限公司 | 交互方法、电子设备及存储介质 |
CN115440211A (zh) * | 2022-06-01 | 2022-12-06 | 北京罗克维尔斯科技有限公司 | 车载语音管理方法、装置、设备及介质 |
CN115457951A (zh) * | 2022-05-10 | 2022-12-09 | 北京罗克维尔斯科技有限公司 | 一种语音控制方法、装置、电子设备以及存储介质 |
CN115562772A (zh) * | 2022-03-31 | 2023-01-03 | 荣耀终端有限公司 | 一种场景识别和预处理方法及电子设备 |
CN116229973A (zh) * | 2023-03-16 | 2023-06-06 | 润芯微科技(江苏)有限公司 | 一种基于ocr的可见即可说功能的实现方法 |
CN116578264A (zh) * | 2023-05-16 | 2023-08-11 | 润芯微科技(江苏)有限公司 | 一种投屏内使用语音控制的方法、系统、设备及存储介质 |
CN116707851A (zh) * | 2022-11-21 | 2023-09-05 | 荣耀终端有限公司 | 数据上报的方法及终端设备 |
CN117692832A (zh) * | 2023-05-29 | 2024-03-12 | 荣耀终端有限公司 | 超声波通路与耳机通路的冲突解决方法及相关装置 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115220631A (zh) * | 2022-07-19 | 2022-10-21 | 东软睿驰汽车技术(大连)有限公司 | 基于车内交互模式的应用控制方法、装置和电子设备 |
CN117827139A (zh) * | 2022-09-29 | 2024-04-05 | 华为技术有限公司 | 人机交互的方法、电子设备及系统 |
CN118280355A (zh) * | 2022-12-30 | 2024-07-02 | 华为技术有限公司 | 一种交互方法、电子设备及介质 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103869931A (zh) * | 2012-12-10 | 2014-06-18 | 三星电子(中国)研发中心 | 语音控制用户界面的方法及装置 |
EP2851891A1 (fr) * | 2013-09-20 | 2015-03-25 | Kapsys | Terminal mobile utilisateur et procédé de commande d'un tel terminal |
CN105161106A (zh) * | 2015-08-20 | 2015-12-16 | 深圳Tcl数字技术有限公司 | 智能终端的语音控制方法、装置及电视机系统 |
CN107992587A (zh) * | 2017-12-08 | 2018-05-04 | 北京百度网讯科技有限公司 | 一种浏览器的语音交互方法、装置、终端和存储介质 |
CN108538291A (zh) * | 2018-04-11 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | 语音控制方法、终端设备、云端服务器及系统 |
CN108877791A (zh) * | 2018-05-23 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | 基于视图的语音交互方法、装置、服务器、终端和介质 |
CN108877796A (zh) * | 2018-06-14 | 2018-11-23 | 合肥品冠慧享家智能家居科技有限责任公司 | 语音控制智能设备终端操作的方法和装置 |
CN109979446A (zh) * | 2018-12-24 | 2019-07-05 | 北京奔流网络信息技术有限公司 | 语音控制方法、存储介质和装置 |
CN110457105A (zh) * | 2019-08-07 | 2019-11-15 | 腾讯科技(深圳)有限公司 | 界面操作方法、装置、设备及存储介质 |
CN111383631A (zh) * | 2018-12-11 | 2020-07-07 | 阿里巴巴集团控股有限公司 | 一种语音交互方法、装置及系统 |
-
2020
- 2020-09-10 CN CN202010950650.8A patent/CN114255745A/zh active Pending
-
2021
- 2021-08-19 WO PCT/CN2021/113542 patent/WO2022052776A1/fr active Application Filing
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103869931A (zh) * | 2012-12-10 | 2014-06-18 | 三星电子(中国)研发中心 | 语音控制用户界面的方法及装置 |
EP2851891A1 (fr) * | 2013-09-20 | 2015-03-25 | Kapsys | Terminal mobile utilisateur et procédé de commande d'un tel terminal |
CN105161106A (zh) * | 2015-08-20 | 2015-12-16 | 深圳Tcl数字技术有限公司 | 智能终端的语音控制方法、装置及电视机系统 |
CN107992587A (zh) * | 2017-12-08 | 2018-05-04 | 北京百度网讯科技有限公司 | 一种浏览器的语音交互方法、装置、终端和存储介质 |
CN108538291A (zh) * | 2018-04-11 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | 语音控制方法、终端设备、云端服务器及系统 |
CN108877791A (zh) * | 2018-05-23 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | 基于视图的语音交互方法、装置、服务器、终端和介质 |
CN108877796A (zh) * | 2018-06-14 | 2018-11-23 | 合肥品冠慧享家智能家居科技有限责任公司 | 语音控制智能设备终端操作的方法和装置 |
CN111383631A (zh) * | 2018-12-11 | 2020-07-07 | 阿里巴巴集团控股有限公司 | 一种语音交互方法、装置及系统 |
CN109979446A (zh) * | 2018-12-24 | 2019-07-05 | 北京奔流网络信息技术有限公司 | 语音控制方法、存储介质和装置 |
CN110457105A (zh) * | 2019-08-07 | 2019-11-15 | 腾讯科技(深圳)有限公司 | 界面操作方法、装置、设备及存储介质 |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114627850A (zh) * | 2022-03-25 | 2022-06-14 | 北斗星通智联科技有限责任公司 | 一种车载语音交互方法及系统 |
CN115562772B (zh) * | 2022-03-31 | 2023-10-27 | 荣耀终端有限公司 | 一种场景识别和预处理方法及电子设备 |
CN115562772A (zh) * | 2022-03-31 | 2023-01-03 | 荣耀终端有限公司 | 一种场景识别和预处理方法及电子设备 |
CN115457951A (zh) * | 2022-05-10 | 2022-12-09 | 北京罗克维尔斯科技有限公司 | 一种语音控制方法、装置、电子设备以及存储介质 |
CN115129208A (zh) * | 2022-05-25 | 2022-09-30 | 成都谷罗英科技有限公司 | 交互方法、电子设备及存储介质 |
CN114860131A (zh) * | 2022-05-26 | 2022-08-05 | 中国第一汽车股份有限公司 | 车载多媒体应用的控制方法、装置、设备、介质和产品 |
CN115440211A (zh) * | 2022-06-01 | 2022-12-06 | 北京罗克维尔斯科技有限公司 | 车载语音管理方法、装置、设备及介质 |
CN116707851B (zh) * | 2022-11-21 | 2024-04-23 | 荣耀终端有限公司 | 数据上报的方法及终端设备 |
CN116707851A (zh) * | 2022-11-21 | 2023-09-05 | 荣耀终端有限公司 | 数据上报的方法及终端设备 |
CN116229973A (zh) * | 2023-03-16 | 2023-06-06 | 润芯微科技(江苏)有限公司 | 一种基于ocr的可见即可说功能的实现方法 |
CN116229973B (zh) * | 2023-03-16 | 2023-10-17 | 润芯微科技(江苏)有限公司 | 一种基于ocr的可见即可说功能的实现方法 |
CN116578264A (zh) * | 2023-05-16 | 2023-08-11 | 润芯微科技(江苏)有限公司 | 一种投屏内使用语音控制的方法、系统、设备及存储介质 |
CN117692832A (zh) * | 2023-05-29 | 2024-03-12 | 荣耀终端有限公司 | 超声波通路与耳机通路的冲突解决方法及相关装置 |
Also Published As
Publication number | Publication date |
---|---|
CN114255745A (zh) | 2022-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022052776A1 (fr) | Procédé d'interaction homme-ordinateur, ainsi que dispositif électronique et système | |
CN110910872B (zh) | 语音交互方法及装置 | |
RU2766255C1 (ru) | Способ голосового управления и электронное устройство | |
CN110138959B (zh) | 显示人机交互指令的提示的方法及电子设备 | |
WO2020192456A1 (fr) | Procédé d'interaction vocale et dispositif électronique | |
WO2021027476A1 (fr) | Procédé de commande vocale d'un appareil, et appareil électronique | |
WO2020119455A1 (fr) | Procédé de répétition de mot ou de phrase pendant une lecture vidéo, et dispositif électronique | |
CN112154431B (zh) | 一种人机交互的方法及电子设备 | |
WO2022100221A1 (fr) | Procédé et appareil de traitement de récupération et support de stockage | |
CN111970401B (zh) | 一种通话内容处理方法、电子设备和存储介质 | |
CN111881315A (zh) | 图像信息输入方法、电子设备及计算机可读存储介质 | |
CN112383664B (zh) | 一种设备控制方法、第一终端设备、第二终端设备及计算机可读存储介质 | |
CN113806473A (zh) | 意图识别方法和电子设备 | |
CN113852714A (zh) | 一种用于电子设备的交互方法和电子设备 | |
WO2022143258A1 (fr) | Procédé de traitement d'interaction vocale et appareil associé | |
WO2022002213A1 (fr) | Procédé et appareil d'affichage de résultat de traduction, et dispositif électronique | |
WO2022033432A1 (fr) | Procédé de recommandation de contenu, dispositif électronique et serveur | |
WO2021238371A1 (fr) | Procédé et appareil de génération d'un personnage virtuel | |
CN114173184B (zh) | 投屏方法和电子设备 | |
CN112740148A (zh) | 一种向输入框中输入信息的方法及电子设备 | |
CN112416984B (zh) | 一种数据处理方法及其装置 | |
WO2022089276A1 (fr) | Procédé de traitement de collecte et appareil associé | |
CN113380240B (zh) | 语音交互方法和电子设备 | |
WO2022007757A1 (fr) | Procédé d'enregistrement d'empreinte vocale inter-appareils, dispositif électronique et support de stockage | |
WO2022095983A1 (fr) | Procédé de prévention de fausse reconnaissance de geste et dispositif électronique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21865832 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21865832 Country of ref document: EP Kind code of ref document: A1 |