CN115145529B - Voice control device method and electronic device - Google Patents
Voice control device method and electronic device Download PDFInfo
- Publication number
- CN115145529B CN115145529B CN202210690830.6A CN202210690830A CN115145529B CN 115145529 B CN115145529 B CN 115145529B CN 202210690830 A CN202210690830 A CN 202210690830A CN 115145529 B CN115145529 B CN 115145529B
- Authority
- CN
- China
- Prior art keywords
- user interface
- voice
- information
- application
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 100
- 230000015654 memory Effects 0.000 claims description 59
- 230000006870 function Effects 0.000 claims description 42
- 230000008569 process Effects 0.000 claims description 11
- 230000002618 waking effect Effects 0.000 claims 6
- 239000000523 sample Substances 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 36
- 238000004891 communication Methods 0.000 description 28
- 239000010410 layer Substances 0.000 description 18
- 238000004590 computer program Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 9
- 230000003993 interaction Effects 0.000 description 9
- 238000007726 management method Methods 0.000 description 8
- 230000004044 response Effects 0.000 description 7
- 238000013461 design Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000005452 bending Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000366 juvenile effect Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000005352 clarification Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 239000012792 core layer Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The application provides a method and electronic equipment for voice control equipment applied to the field of artificial intelligence. A method of controlling a device, comprising: acquiring a voice instruction of a user, wherein the voice instruction is used for indicating a target instruction; acquiring user interface information of a current user interface, wherein the current user interface is a user interface currently displayed by client equipment; and determining the target instruction corresponding to the voice instruction, wherein the target instruction is obtained by the voice instruction and the user interface information. The method for controlling the equipment by the voice and the electronic equipment are beneficial to improving the efficiency of voice recognition.
Description
The present application is a divisional application, the application number of the original application is 202010273843.4, the original application date is 09, 04, 2020, and the entire content of the original application is incorporated herein by reference.
Technical Field
The present application relates to the field of artificial intelligence and the field of electronic devices, and more particularly, to a method of controlling a device by voice and an electronic device.
Background
The user can watch live television broadcast, network video resources, local video resources, and listen to network audio resources, local audio resources, etc. through the large screen display device. Before a user watches video or listens to music, the user can speak the video and audio resources to be played according to a user interface displayed by the large-screen display device; the large screen display device or a set top box connected to the large screen display device may capture and respond to the user's voice.
To ensure accuracy and efficiency of speech recognition, large screen display devices are typically configured with files for speech recognition that can be used to recognize speech instructions that call up data resources configured on the large screen display device. To obtain a relatively good user experience, it is necessary to frequently update the data resources displayed or played on the large screen display device. For example, a large screen display device may play a newly-listed television show. Accordingly, a large amount of work is required to update the voice recognition file on the large screen display device. This may reduce the efficiency of speech recognition.
Disclosure of Invention
The application provides a method for controlling equipment by voice and electronic equipment, and aims to improve the efficiency of voice recognition.
In a first aspect, a method for controlling a device by voice is provided, including: acquiring a voice instruction of a user, wherein the voice instruction is used for indicating a target instruction; acquiring user interface information of a current user interface, wherein the current user interface is a user interface currently displayed by client equipment; and determining the target instruction corresponding to the voice instruction, wherein the target instruction is obtained by the voice instruction and the user interface information.
Alternatively, the method of the voice control device may be performed by a client device (also referred to as a terminal device) or a server (also referred to as a network device).
Alternatively, the method of voice controlling the device may be performed by a voice assistant on the client device.
The user interface information may include various information indicating a current user interface.
In the application, compared with the data resources which can be displayed and played by the client device, the information content on the current user interface is relatively less, and the time for acquiring the user interface information can be relatively shorter. Moreover, the client device can acquire at least part of the user interface information while displaying the user interface information, so that the efficiency of acquiring the user interface information can be relatively high. Then, the user interface information can be updated while the user display interface of the client device is updated, and the updating mode is relatively simple. In the case where the current user interface is updated, the impact of the update of the current user interface on speech recognition efficiency is typically relatively small, since the client device already knows at least part of the user interface information of the updated user interface when displaying the updated user interface. In addition, the user interface information can reflect information which can be observed by a user on the current user interface, and the voice command of the user is identified by referring to the user interface information, so that the accuracy of voice identification is improved.
With reference to the first aspect, in certain implementations of the first aspect, the user interface information includes at least one of: and the icon name, hotword information, instruction information of a control instruction and target corner mark information of the current user interface.
Icons may be categorized as menu icons, resource collection icons, function icons, and the like.
In this application, user interface information may reflect content on a user interface from multiple perspectives to facilitate a user's manipulation of a client device in a variety of ways.
With reference to the first aspect, in certain implementation manners of the first aspect, the target corner mark information corresponds to a target icon or a target manipulation instruction.
Optionally, the user interface information further includes a correspondence between the target corner mark information and a target icon.
Optionally, the user interface information further includes a correspondence between the target corner mark information and a target set.
Optionally, the user interface information further includes a correspondence between the target corner mark information and a target manipulation instruction.
In the method, the corner marks are displayed on the current user interface, so that the number of recognizable voice instructions can be increased, and the accuracy of voice recognition can be improved. For example, in the case where the user cannot describe the pattern in a language, the user can express the voice instruction relatively quickly based on information reflected by the corner mark.
With reference to the first aspect, in certain implementation manners of the first aspect, the acquiring a voice instruction of a user includes: receiving the voice instruction sent by the client device; the obtaining the user interface information of the current user interface comprises the following steps: receiving the user interface information sent by the client device; the determining the target instruction corresponding to the voice instruction comprises the following steps: and determining the target instruction according to the voice instruction and the user interface information.
The server may implement the operation of speech recognition, for example, through a speech recognition (automatic speech recognition, ASR) module, a semantic understanding (natural language understanding, NLU) module. Optionally, the server or the client device may further include a dialog control (dialogue state tracking, DST) module, a Dialog Management (DM) module, a dialog generation (natural language generation, NLG) module, a voice to speech (TTS) module, and the like to implement voice recognition.
In the method, after the server acquires the user interface information, the server can refer to the content currently displayed by the client to identify the voice instruction made by the user, so that the server is beneficial to eliminating useless voice identification data, and the voice instruction of the user is relatively quickly and accurately converted into the corresponding target instruction.
With reference to the first aspect, in certain implementation manners of the first aspect, the method further includes: and sending the target instruction to the client device.
In the application, the server is used for identifying the voice instruction and transmitting data through the communication network, so that the requirement on the processing capacity of the client device can be reduced. For example, the client device may not have voice recognition capabilities, or the processor speed, memory capacity, of the client device may be relatively ordinary.
With reference to the first aspect, in some implementation manners of the first aspect, the determining a target instruction corresponding to the voice instruction includes: and the client device determines the target instruction according to the voice instruction and the user interface information.
In this application, a client device may be provided with speech recognition capabilities. The user interface information reduces the reference data volume of the voice recognition, so that the voice recognition effect of the client device is improved.
With reference to the first aspect, in some implementation manners of the first aspect, before the determining a target instruction corresponding to the voice instruction, the method further includes: transmitting the user interface information and the voice command to a server; the determining the target instruction corresponding to the voice instruction comprises the following steps: and receiving a target instruction sent by the server, wherein the target instruction is determined by the server according to the user interface information and the voice instruction of the user.
In the method, after the server acquires the user interface information, the server can refer to the content currently displayed by the client to identify the voice instruction made by the user, so that the server is beneficial to eliminating useless voice identification data, and the voice instruction of the user is relatively quickly and accurately converted into the corresponding target instruction. In addition, the server is used for recognizing the voice command and transmitting data through the communication network, so that the requirement on the processing capacity of the client device can be reduced. For example, the client device may not have voice recognition capabilities, or the processor speed, memory capacity, of the client device may be relatively ordinary.
With reference to the first aspect, in certain implementation manners of the first aspect, before the acquiring the user interface information of the current user interface, the method further includes: sending first indication information to a foreground application, wherein the first indication information is used for indicating the foreground application to feed back the user interface information; the obtaining the user interface information of the current user interface comprises the following steps: and receiving the user interface information sent by the foreground application, wherein the user interface information is obtained by the foreground application searching information related to the current user interface.
The foreground application may be, for example, a video playing application, an audio playing application, a desktop application, a setup application, a live tv application, a radio application, etc.
Retrieval may also be interpreted as meaning searching, scanning, etc.
The foreground application may determine the user interface information by searching for a document for displaying the current user interface to obtain the user interface information. The document may include, for example, a hypertext markup language (hyper text markup language, HTML) file, an extensible markup language (extensible markup language, XML) file, a script file, and the like.
The foreground application may determine the user interface information by scanning an element of the current user interface and obtaining the user interface information according to the element. The elements may include icons, collection information corresponding to the icons, manipulation instructions corresponding to the current user interface, and the like.
In the application, the identifier of the foreground application can correlate the voice instruction, the current user interface currently displayed by the foreground application and the target instruction for controlling the foreground application, so that a user can control a plurality of foreground applications through the voice instruction, and the application has relatively stronger flexibility.
With reference to the first aspect, in certain implementation manners of the first aspect, the user interface information further includes an identification of the foreground application.
In the application, the voice assistant can know that the current user interface is provided by the foreground application according to the user interface information, and further can control the foreground application to execute the operation corresponding to the target instruction according to the current user interface through the target instruction corresponding to the voice instruction.
With reference to the first aspect, in certain implementation manners of the first aspect, the target instruction further includes an identification of the foreground application.
In the application, the voice assistant can know that the target instruction is used for indicating the foreground application to execute the target operation according to the target instruction, so that the foreground application can execute the operation for meeting the user expectations.
With reference to the first aspect, in certain implementation manners of the first aspect, the user interface information includes target corner mark information, and before the acquiring the voice instruction of the user, the method further includes: displaying a corner mark on the current user interface; after the voice instruction of the user is acquired, the method further comprises: and removing the corner mark on the current user interface.
In the method, the display of the corner mark can provide more optional voice instruction modes for the user, and the corner mark display is removed at proper time, so that the user interface has a relatively simple display effect.
In a second aspect, there is provided an electronic device comprising: the acquisition module is used for acquiring a voice instruction of a user, wherein the voice instruction is used for indicating a target instruction; the acquisition module is further used for acquiring user interface information of a current user interface, wherein the current user interface is a user interface currently displayed by the client device; and the processing module is used for determining the target instruction corresponding to the voice instruction, wherein the target instruction is obtained by the voice instruction and the user interface information.
With reference to the second aspect, in certain implementations of the second aspect, the user interface information includes at least one of: and the icon name, hotword information, instruction information of a control instruction and target corner mark information of the current user interface.
With reference to the second aspect, in some implementations of the second aspect, the target corner mark information corresponds to a target icon or a target manipulation instruction.
With reference to the second aspect, in some implementations of the second aspect, the electronic device is a server, and the obtaining module is specifically configured to receive the voice instruction sent by the client device; the acquisition module is specifically configured to receive the user interface information sent by the client device; the processing module is specifically configured to determine the target instruction according to the voice instruction and the user interface information.
With reference to the second aspect, in certain implementations of the second aspect, the server further includes: and the receiving and transmitting module is used for transmitting the target instruction to the client equipment.
With reference to the second aspect, in some implementations of the second aspect, the electronic device is the client device, and the processing module is specifically configured to determine the target instruction according to the voice instruction and the user interface information.
With reference to the second aspect, in some implementations of the second aspect, the electronic device is the client device, and the client device further includes a transceiver module, configured to send the user interface information and the voice instruction to a server before the processing module determines a target instruction corresponding to the voice instruction; the processing module is specifically configured to receive a target instruction sent by the server, where the target instruction is determined by the server according to the user interface information and the voice instruction of the user.
With reference to the second aspect, in certain implementations of the second aspect, the electronic device further includes: the sending module is used for sending first indication information to a foreground application before the obtaining module obtains the user interface information of the current user interface, wherein the first indication information is used for indicating the foreground application to feed back the user interface information, the obtaining module is specifically used for receiving the user interface information sent by the foreground application, and the user interface information is obtained by retrieving information related to the current user interface by the foreground application.
With reference to the second aspect, in certain implementations of the second aspect, the user interface information further includes an identification of the foreground application.
With reference to the second aspect, in certain implementations of the second aspect, the target instruction further includes an identification of the foreground application.
With reference to the second aspect, in certain implementation manners of the second aspect, the user interface information includes target corner mark information, and the processing module is further configured to display a corner mark on the current user interface before the obtaining module obtains a voice instruction of a user; the processing module is further configured to remove the corner mark on the current user interface after the obtaining module obtains the voice command of the user.
In a third aspect, an electronic device is provided, comprising: the processor is used for acquiring a voice instruction of a user, wherein the voice instruction is used for indicating a target instruction; the processor is further configured to obtain user interface information of a current user interface, where the current user interface is a user interface currently displayed by the client device; the processor is further configured to determine the target instruction corresponding to the voice instruction, where the target instruction is obtained from the voice instruction and the user interface information.
With reference to the third aspect, in certain implementations of the third aspect, the user interface information includes at least one of: and the icon name, hotword information, instruction information of a control instruction and target corner mark information of the current user interface.
With reference to the third aspect, in some implementations of the third aspect, the target corner mark information corresponds to a target icon or a target manipulation instruction.
With reference to the third aspect, in some implementations of the third aspect, the electronic device is a server, and the processor is specifically configured to receive the voice instruction sent by the client device; the processor is specifically configured to receive the user interface information sent by the client device; the processor is specifically configured to determine the target instruction according to the voice instruction and the user interface information.
With reference to the third aspect, in certain implementations of the third aspect, the electronic device further includes: and the transceiver is used for sending the target instruction to the client equipment.
With reference to the third aspect, in some implementations of the third aspect, the electronic device is the client device, and the processor is specifically configured to determine the target instruction according to the voice instruction and the user interface information.
With reference to the third aspect, in some implementations of the third aspect, the electronic device is the client device, and the client device further includes a transceiver, configured to send the user interface information and the voice command to a server before the processor determines a target command corresponding to the voice command; the processor is specifically configured to: and receiving a target instruction sent by the server, wherein the target instruction is determined by the server according to the user interface information and the voice instruction of the user.
With reference to the third aspect, in certain implementations of the third aspect, the electronic device further includes: the transceiver is used for sending first indication information to a foreground application before the processor acquires the user interface information of the current user interface, wherein the first indication information is used for indicating the foreground application to feed back the user interface information, and the processor is specifically used for receiving the user interface information sent by the foreground application, and the user interface information is obtained by searching information related to the current user interface by the foreground application.
With reference to the third aspect, in certain implementations of the third aspect, the user interface information further includes an identification of the foreground application.
With reference to the third aspect, in certain implementations of the third aspect, the target instruction further includes an identification of the foreground application.
With reference to the third aspect, in certain implementations of the third aspect, the user interface information includes target corner mark information, and the processor is further configured to display a corner mark on the current user interface before the processor obtains a voice instruction of a user; the processor is further configured to remove the corner mark on the current user interface after the processor obtains the voice command of the user.
In a fourth aspect, the present technical solution provides an electronic device, including: one or more processors; a memory; a plurality of applications; and one or more computer programs. Wherein one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. The instructions, when executed by the electronic device, cause the electronic device to perform the method in any of the implementations of the first aspect.
In a fifth aspect, the present disclosure provides an electronic device, including one or more processors and one or more memories. The one or more memories are coupled to the one or more processors, the one or more memories being operable to store computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of the implementations of the first aspect.
In a sixth aspect, there is provided a communication apparatus, the apparatus comprising: a processor, a memory and a transceiver, the memory being configured to store a computer program, the processor being configured to execute the computer program stored in the memory, to cause the apparatus to perform the method according to any one of the possible implementations of the first aspect.
In a seventh aspect, there is provided a communication apparatus comprising: at least one processor and a communication interface for information interaction by the communication device with other communication devices, which when executed in the at least one processor causes the communication device to carry out the method according to any one of the possible implementations of the first aspect.
In an eighth aspect, the present technical solution provides a non-transitory computer readable storage medium, including computer instructions, which when run on an electronic device, cause the electronic device to perform the method in any one of the implementations of the first aspect.
In a ninth aspect, the present technical solution provides a computer program product for causing an electronic device to perform the method of any one of the implementation forms of the first aspect when the computer program product is run on the electronic device.
In a tenth aspect, a chip is provided, the chip including a processor and a data interface, the processor reading instructions stored on a memory through the data interface, performing the method of any implementation of the first aspect.
Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform the method in any implementation manner of the first aspect.
Drawings
Fig. 1 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Fig. 2 is a schematic software structure of an electronic device according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a user interface according to an embodiment of the present application.
Fig. 4 is a schematic flow chart of a method for controlling a device by using voice according to an embodiment of the present application.
Fig. 5 is a schematic flowchart of a method for controlling a device by using voice according to an embodiment of the present application.
Fig. 6 is a schematic diagram of a user interface provided in an embodiment of the present application.
Fig. 7 is a schematic interaction diagram of a speech recognition module according to an embodiment of the present application.
Fig. 8 is a schematic flow chart of a method for controlling a device by using voice according to an embodiment of the present application.
Fig. 9 is a schematic flowchart of a method for controlling a device by using voice according to an embodiment of the present application.
Fig. 10 is a schematic block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the present application will be described below with reference to the accompanying drawings.
The terminology used in the following embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in the various embodiments herein below, "at least one", "one or more" means one, two or more than two. The term "and/or" is used to describe an association relationship of associated objects, meaning that there may be three relationships; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Embodiments of electronic devices, user interfaces for such electronic devices, and methods for using such electronic devices provided by embodiments of the present application are described below. In some embodiments, the electronic device may be a portable electronic device such as a cell phone, tablet computer, video player, etc. that also includes other functionality such as personal digital assistant and/or music player functionality. Exemplary embodiments of portable electronic devices include, but are not limited to, piggy-back Or other operating system. The portable electronic device may also be other portable electronic devices such as a Laptop computer (Laptop) or the like. It should also be appreciated that in other embodiments, the upperThe electronic device may be a desktop computer, a television, a notebook computer, a projection device, a set-top box, or the like, instead of a portable electronic device.
By way of example, fig. 1 shows a schematic diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, an antenna, a wireless communication module 160, a speaker 170, a microphone 171, an earphone interface 172, a high-definition multimedia interface (high definition multimedia interface, HDMI) 181, a composite video (AV) interface 182, keys 190, a camera 193, a display 194, and the like.
It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate components or may be integrated in one or more processors. In some embodiments, the electronic device 101 may also include one or more processors 110. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution. In other embodiments, memory may also be provided in the processor 110 for storing instructions and data. Illustratively, the memory in the processor 110 may be a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. This avoids repeated accesses and reduces the latency of the processor 110, thereby improving the efficiency of the electronic device 101 in processing data or executing instructions.
In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include inter-integrated circuit (inter-integrated circuit, I2C) interfaces, inter-integrated circuit audio (inter-integrated circuit sound, I2S) interfaces, pulse code modulation (pulse code modulation, PCM) interfaces, universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interfaces, general-purpose input/output (GPIO) interfaces, and/or USB interfaces, among others. The USB interface 130 is an interface conforming to the USB standard, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 101, or may be used to transfer data between the electronic device 101 and a peripheral device. The USB interface 130 may also be used to connect headphones through which audio is played.
It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and does not limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.
The wireless communication function of the electronic device 100 may be implemented by an antenna, a wireless communication module 160, a modem processor, a baseband processor, and the like.
Antennas may be used to transmit and receive electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (FLED), a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or more display screens 194.
The display 194 of the electronic device 100 may be a flexible screen that is currently of great interest due to its unique characteristics and great potential. Compared with the traditional screen, the flexible screen has the characteristics of strong flexibility and bending property, can provide a new interaction mode based on the bending property for a user, and can meet more requirements of the user on electronic equipment. For electronic devices equipped with foldable display screens, the foldable display screen on the electronic device can be switched between a small screen in a folded configuration and a large screen in an unfolded configuration at any time. Accordingly, users use split screen functions on electronic devices configured with foldable display screens, as well as more and more frequently.
The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or more cameras 193.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The internal memory 121 may be used to store one or more computer programs, including instructions. The processor 110 may cause the electronic device 101 to execute the method of off-screen display provided in some embodiments of the present application, as well as various applications, data processing, and the like, by executing the above-described instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area can store an operating system; the storage program area may also store one or more applications (such as gallery, contacts, etc.), etc. The storage data area may store data created during use of the electronic device 101 (e.g., photos, contacts, etc.), and so on. In addition, the internal memory 121 may include high-speed random access memory, and may also include nonvolatile memory, such as one or more disk storage units, flash memory units, universal flash memory (universal flash storage, UFS), and the like. In some embodiments, processor 110 may cause electronic device 101 to perform the methods of off-screen display provided in embodiments of the present application, as well as other applications and data processing, by executing instructions stored in internal memory 121, and/or instructions stored in a memory provided in processor 110.
The electronic device 100 may implement audio functions through a speaker 170, a microphone 171, an earphone interface 172, an application processor, and the like. Such as music playing, recording, etc.
The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.
The electronic device 100 may receive data through the high-definition multimedia interface (high definition multimedia interface, HDMI) 181 and implement display functions such as a split screen (also may be referred to as an extended screen) function, a video play function, etc. through the display screen 194, the speaker 170, the headphone interface 172.
The electronic device 100 may receive video asset data through an Audio Video (AV) interface 182 and implement display functions, such as a split screen function, a video play function, etc., through a display screen 194, a speaker 170, an earphone interface 172. The AV interface 182 may include a V (video interface) 183, an L (left) interface 184, and an R (right) interface 185. The V-interface 183 may be used to input a mixed video signal. L interface 184 may be used to input a left channel sound signal. The R interface 185 may be used to input a right channel sound signal.
Fig. 2 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively. The application layer may include a series of application packages.
As shown in fig. 2, the application package may include applications such as a voice assistant, a television broadcast, a television play, a movie broadcast, an audio broadcast, a gallery, a browser, a clock, a setting, etc.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 2, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, browsing history, bookmarks, and the like.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a television play interface may include a view showing text, a view showing images, and a view showing video.
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is presented in a status bar, a presentation sound is emitted, and the like.
Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.
The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface manager (surface manager), media library (media library), three-dimensional graphics processing library (e.g., openGL ES), 2D graphics engine (e.g., SGL), etc.
The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
The voice assistant in the application package may be a system level application. The voice assistant may also be referred to as a human-machine interactive robot, or a chat robot (ChatBOT), among others. The voice assistant application may also be referred to as an intelligent assistant application or the like. The voice assistant is widely applied to various electronic devices such as mobile phones, tablet computers, intelligent sound boxes, intelligent televisions and the like at present, and an intelligent voice interaction mode is provided for users. The voice assistant is one of the cores of man-machine interaction.
Fig. 3 is a schematic diagram of a user interface of the electronic device 100. The electronic device 100 may be the electronic device 100 as shown in fig. 1. For example, the electronic device 100 may be a large screen display device such as a television, a projection device, or the like. A plurality of icons may be displayed in the user interface. For example, the user interface may include a plurality of menu icons 301, a plurality of resource collection icons 302, a plurality of function icons 303, and the like. It should be understood that the user interface illustrated in the embodiments of the present application does not constitute a particular limitation of the user interface 300. In other embodiments of the present application, the user interface 300 may include more or fewer icons than illustrated, or some icons may be combined, some icons may be split, or a different arrangement of icons may be used.
On the user interface as shown in fig. 3, the plurality of menu icons 301 may include: "first page" icon, "TV play" icon, "movie" icon, "kids" icon, "applications" icon, "music" icon, "radio" icon, "education" icon, "variety" icon, and the like. It should be appreciated that the electronic device 100 may also provide more menu icons 301 for the user, however, due to the limited size of the user interface, the user interface may display a portion of the entire menu icons 301. The user may select the menu icon 301, for example, via an infrared remote control or voice.
In one example, the user selects the "home" icon and the electronic device 100 may display a variety of resource collection icons 302. The types of the plurality of resource pools may include a television episode, a movie episode, a juvenile episode, an application episode, a music episode, a radio episode, an educational episode, a variety episode, and the like. For example, the electronic device 100 may display an icon of the 3 television episodes with the highest current heat and an icon of the 3 movie episodes with the highest heat.
In one example, the user selects a "drama" icon, as shown in fig. 3, the electronic device 100 may display icons for multiple episodes of a television. For example, the electronic device 100 may display icons of the 3 episodes of the television that are currently the highest in heat, as well as icons of the 3 episodes of the television that are being updated and relatively high in heat. As shown in fig. 3, the 3 tv episodes with the highest current heat may include "e.g., exemplary", "Sha Hai", "about night". The 3 television episodes being updated and having a relatively high heat may include "king behind the scenes", "card house", "know whether it should be red, fat, green, thin". Optionally, the icon of each episode may include a schematic view 3021 of the episode (e.g., a episode of the episode), a name 3022 of the episode (e.g., a name of the episode), and an episode number 3023 of the episode (e.g., a latest episode number of the episode). If the drama episode has been updated, a "full X episode" may be displayed (e.g., a "full 87 episode" is displayed on an icon corresponding to "e.g.," exemplary "in fig. 3); if the episode of the television series has not been updated, the "update to Y episode" may be displayed (e.g., "update to 8 episodes" is displayed on the icon corresponding to "king behind the scenes" in fig. 3).
In one example, the user selects a "movie" icon, and the electronic device 100 may display icons for a plurality of movie collections. For example, the electronic device 100 may display an icon of the 3 movie collections with the highest current heat, and an icon of the 3 movie collections that have just been shown.
In one example, the user selects a "pediatric" icon and the electronic device 100 may display an icon for a collection of children. For example, the electronic device 100 may display icons of the 3 most popular children's programs currently and icons of the 3 most popular children's cartoons just now.
In one example, the user selects an "application" icon, and the electronic device 100 may display icons for multiple applications. For example, the electronic device 100 may display icons of 3 applications used by the most recent user, as well as 3 application icons for the most common use.
In one example, the user selects a "music" icon, and the electronic device 100 may display icons for multiple music collections. For example, the electronic device 100 may display icons of the 3 music albums just released, as well as icons of the 3 music playlists that the user has recently collected.
In one example, a user selects a "station" icon, and electronic device 100 may display an icon for a collection of stations. For example, electronic device 100 may display icons of the 3 radio programs with the highest current heat and icons of the 3 radio programs that the user has recently collected.
In one example, the user selects the "education" icon and the electronic device 100 may display icons for a plurality of educational pools. For example, the electronic device 100 may display icons of the 3 educational pools having the highest current heat and icons of the 3 educational pools recently played by the user.
In one example, the user selects the "variety" icon, and the electronic device 100 may display icons of a plurality of variety collections. For example, the electronic device 100 may display icons of the 3 variety complexes with the highest current heat and icons of the 3 variety complexes recently played by the user.
The plurality of function icons 303 in the user interface may include a return icon, a user information icon, a setup icon, a wireless connection icon, a clock icon, and the like. The user selects the return icon and may return to the upper user interface. The user selects the user information icon and may view user account information logged onto the electronic device 100. The user selects the setup icon, may enter the setup interface, and may adjust parameters of the electronic device 100. The user selects the wireless connection icon, and may use the wireless connection function of the electronic device 100, for example, search for available wireless networks around the electronic device 100 and access the available wireless networks. The user can view the clock icon to learn the current time. The user selects the clock icon and may set clock parameters of the electronic device 100.
Fig. 4 is a method of controlling a device by voice. In the method shown in fig. 4, the client device may be the electronic device 100 as shown in fig. 1.
The client device displays 401 a current user interface.
For example, the current user interface displayed by the client device may be user interface 300 as shown in FIG. 3.
402, the client device obtains a voice instruction of a user, where the voice instruction is used to indicate a target operation.
The user may speak a voice instruction such as "play as exemplified by 30 sets". That is, after viewing the current user interface, the user selects "e.g." for the television episode on the current user interface, and selects to view the 30 th video asset in the "e.g." episode. The voice instructions may be used to instruct the client device to play video assets of set 30 in a television episode such as a section.
Alternatively, step 402 may be performed by a voice assistant on the client device.
Alternatively, the user may speak a wake-up word to wake up the client device to capture the user's voice instructions.
Alternatively, as shown at 304 in FIG. 3, the client device may display a prompt on the current user interface to prompt the user that the voice recognition functionality of the client device is being used during the process of capturing the user's voice instructions.
403, the client device determines the target operation according to a voice recognition file and the voice command, where the voice recognition file is used to determine the target operation corresponding to the voice command.
403 may be done by a voice assistant on the client device.
The voice recognition file may include a variety of information for determining a target operation, so that the client device may determine an operation corresponding to the voice instruction. For example, the speech recognition file may include data for determining that the speech instruction is a play video instruction. As another example, the voice recognition file may include data for determining the voice instructions as download application instructions.
The client device may implement the operation of speech recognition, for example, through a speech recognition (automatic speech recognition, ASR) module, a semantic understanding (natural language understanding, NLU) module.
404, the client device performs the target operation.
That is, the client device may perform the target operation in response to a voice instruction issued by the user.
In summary, the client device may recognize the voice of the user according to the voice recognition file. In order to obtain a good user experience, contents such as a data resource library and a user interface need to be updated frequently. In addition, it is also necessary to update the voice recognition file so that the user can conveniently use the voice command. And thus requires a great deal of effort to update the speech recognition file. In addition, the amount of data in the voice packets is typically large, which is detrimental to the efficiency of voice recognition.
Fig. 5 is a method for controlling a device by using voice according to an embodiment of the present application. In the method shown in fig. 4, the client device may be the electronic device 100 as shown in fig. 1.
501, the client device obtains a voice instruction of a user, where the voice instruction is used to indicate a target instruction or a target operation.
Alternatively, 501 may be accomplished by a voice assistant on the client device.
Alternatively, the user may speak a wake-up word to wake up the client device to capture the user's voice instructions.
The target instruction may be, for example, text content of a voice instruction.
The target operation may be, for example, a response operation indicated by the target instruction.
In one example, the user may speak a voice instruction, such as "play as a transmission 30 set". That is, after observing the user interface displayed by the client device, the user may select "e.g." episode "for a television episode on the user interface and select to view the video asset of the 30 th episode in the" e.g. "episode". The voice instructions may be used to instruct the client device to play video assets of set 30 in a television episode such as a section.
In one example, the user may speak a voice instruction, such as "display movie page". That is, after observing the user interface displayed by the client device, the user may select a movie collection on the user interface so that browsing of movie resources in the movie collection may continue. The voice instructions may be used to instruct the client device to display a user interface corresponding to the collection of movies.
In one example, the user may speak a voice instruction, such as "turn on WiFi (i.e., wireless fidelity (wireless fidelity))". That is, after observing the user interface displayed by the client device, the user may select a wireless connection icon on the user interface and set wireless connection parameters of the client device. The voice instructions may be used to instruct the client device to activate the wireless connection module.
In one example, the user may speak a voice instruction, such as "3 rd". That is, after observing the user interface displayed by the client device, the user may select an icon or a manipulation instruction corresponding to the corner mark 3. The voice instruction may be used to instruct the client device to perform the operation corresponding to the corner mark 3.
In one example, the user may speak a voice instruction, such as "next page". That is, after observing the user interface displayed by the client device, the user can control the client device to perform a page turning operation, so that the user can continue browsing the next page of the user interface. The voice instructions may be used to instruct the client device to display a next page user interface.
Alternatively, as shown at 304 in FIG. 3, the client device may display a prompt on the current user interface to prompt the user that the voice recognition functionality of the client device is being used during the process of capturing the user's voice instructions.
502, the client device obtains user interface information of a current user interface, where the current user interface is a user interface currently displayed by the client device.
The current user interface may be the user interface observed by the user in 501. The current user interface may be, for example, user interface 300 as shown in fig. 3. The user interface information may include various information indicating a current user interface.
Alternatively, 502 may be accomplished by a voice assistant on the client device.
Alternatively, the order of execution of 501 and 502 may be reversed. Such as execution 501 followed by execution 502. Alternatively, 502 is performed first and 501 is performed later.
Optionally, the user interface information includes at least one of the following information: and the icon name, hotword information, instruction information of a control instruction and target corner mark information of the current user interface.
In one example, the user interface information may include an icon name of the current user interface.
Taking the user interface 300 shown in fig. 3 as an example, the user interface 300 may include a "home" icon, a "drama" icon, a "movie" icon, a "kids" icon, an "application" icon, a "music" icon, a "radio" icon, an "education" icon, a "variety" icon, a "e.g., a" pass "album icon, a" Sha Hai "album icon, a" night "album icon, a" king behind the scenes "album icon, a" card house "album icon, a" know whether or not it should be a red, green, thin "album icon, a return icon, a user information icon, a set icon, a wireless connection icon, a clock icon, and the like. Accordingly, the user interface information corresponding to the user interface 300 may include: front page, television show, movie, juvenile, application, music, radio, education, variety, e.g., transfer of art, sha Hai, night, king behind the scenes, card house, whether it should be red, green, thin, return, user information, settings, wireless connection, clock, etc. It should be appreciated that a collection may refer to a collection of data resources. For example, an exemplary episode may be a collection of assets that contains an exemplary full episode of television video.
In one example, the user interface information includes hotword information.
For example, "young children" are referred to as "children" and the like. That is, "pediatric" may correspond to the hotword "child". Accordingly, the hotword information included in the user interface information corresponding to the user interface 300 may be: and (5) children.
As another example, "music" is referred to as a "song" or the like. That is, "music" may correspond to the hotword "song". Accordingly, the hotword information included in the user interface information corresponding to the user interface 300 may be: a song.
As another example, "station" is referred to as "broadcast" or the like. That is, "station" may correspond to the hotword "broadcast". Accordingly, the hotword information included in the user interface information corresponding to the user interface 300 may be: and (5) broadcasting.
As another example, "exemplary" is often referred to simply as "exemplary". That is, "exemplary" may correspond to the hot word "exemplary". Accordingly, the hotword information included in the user interface information corresponding to the user interface 300 may be: such as e.g. a model.
For another example, "know whether it should be red, fat, green, thin" is often referred to simply as "know no", "know whether it should be" or not ", etc. That is, "know whether it should be red, fat, green, thin" may correspond to the hot word "know whether or not", the hot word "know whether or not". Accordingly, the hotword information included in the user interface information corresponding to the user interface 300 may be: whether or not, whether or not to know.
As another example, the "revenge alliance" is often referred to simply as "duplicate," "revenge," and the like. That is, the "revenge union" may correspond to the hotword "duplicate", the hotword "revenge". Thus, where the current user interface contains a "revenge coalition" collection icon, the hotword information included in the user interface information may be: duplicate and revenge.
As another example, "user information" is often referred to as "account," "login information," and the like. That is, "user information" may correspond to the hotword "account", the hotword "login information", and the like. Accordingly, the hotword information included in the user interface information corresponding to the user interface 300 may be: account, login information.
As another example, the functionality of a "wireless connection" application may include connecting wireless fidelity (wireless fidelity, wiFi), so "wireless connection" may correspond to the hot words "WiFi", "wireless", "hot spot", "network", and so on. Accordingly, the hotword information included in the user interface information corresponding to the user interface 300 may be: wiFi, wireless, hot spot, network.
As another example, the function of a "clock" application is to view time, so a "clock" may correspond to the hot words "time", "points". Accordingly, the hotword information included in the user interface information corresponding to the user interface 300 may be: time, points.
In one example, the user interface information includes indication information of a manipulation instruction.
Optionally, the manipulation instruction may include at least one of: refresh user interface instructions, move user interface instructions, page turn instructions, move selection box instructions, and the like.
For example, in the case where the manipulation instruction includes a refresh user interface instruction, the user interface information may include at least one of information such as "refresh", "refresh page", "refresh interface", "refresh user interface", and the like.
As another example, where the manipulation instruction includes a move user interface instruction, the user interface information may include at least one of "move left", "move right", "move up", "move down", "move", "slide", "move user interface", and the like information.
As another example, in the case where the manipulation instruction includes a page-turning instruction, the user interface information may include at least one of information of "previous page", "next page", "page-turning", "left-turning", "right-turning", and the like.
As another example, in the case where the manipulation instruction includes a movement selection box instruction, the user interface information may include at least one of "next", "previous", "movement selection box", and the like.
It can be seen that the user can indicate the same manipulation instruction through a plurality of different expressions. Thus, the user interface information may include representations that may be used by a user, which representations may indicate corresponding manipulation instructions.
In one example, the user interface information further includes target corner mark information displayed on the current user interface.
The target corner mark information may correspond to a target icon, a target collection, or a target manipulation instruction, for example.
Optionally, the user interface information further includes a correspondence between the target corner mark information and a target icon.
Optionally, the user interface information further includes a correspondence between the target corner mark information and a target set.
As shown in fig. 6, the current user interface may display a plurality of corner labels 601, including corner label 1, corner label 2, corner label 3, corner label 4, corner label 5, corner label 6. The corner mark 1 can correspond to an icon of a television drama collection such as a cross; the corner mark 2 can correspond to an icon of the television episode Sha Hai; the corner mark 3 can correspond to an icon of the television episode "night; the corner mark 4 can be corresponding to an icon of the "rear of the curtain" of the television episode; the corner mark 5 can correspond to an icon of a television drama collection "card house"; the corner mark 6 may correspond to an icon of "know whether it should be red, fat, green, thin" of the television episode. Thus, the user interface information may include information indicating "corner mark 1-as" transmitted ", information indicating" corner mark 2-Sha Hai ", information indicating" corner mark 3-about to be "night", information indicating "corner mark 4-the king behind the scenes", information indicating "corner mark 5-card house", and information indicating "corner mark 6-whether it is known that it should be red, green, thin".
Optionally, the user interface information further includes a correspondence between the target corner mark information and a target manipulation instruction.
For example, the current user interface displays corner mark 1 and corner mark 2. The target control instruction corresponding to the corner mark 1 is as follows: playing the video according to the resolution 720P; the target control instruction corresponding to the corner mark 2 is as follows: video is played at resolution 1080P. Thus, the user interface information may include information indicating "subscript 1-720P" and information indicating "subscript 2-1080P".
As another example, the current user interface displays corner mark 1 and corner mark 2. The target control instruction corresponding to the corner mark 1 is as follows: playing the video according to the speed of 1.0 x; the target control instruction corresponding to the corner mark 2 is as follows: video was played at a 2.0x speed. Thus, the user interface information may include information indicating "corner mark 1-1.0 x" and information indicating "corner mark 2-2.0 x".
Optionally, the obtaining the user interface information of the current user interface includes: sending first indication information to a foreground application, wherein the first indication information is used for indicating the foreground application to feed back the user interface information; and receiving the user interface information sent by the foreground application, wherein the user interface information is obtained by searching information related to the current user interface by the foreground.
The foreground application may be, for example, a video playing application, an audio playing application, a desktop application, a setup application, a live tv application, a radio application, etc.
For example, after detecting a voice command of a user, the voice assistant of the client device may send first indication information to a foreground application by calling a software interface of the foreground application of the client device, where the first indication information may indicate that the foreground application feeds back the user interface information. The foreground application may send the user interface information to the voice assistant through the software interface according to the first indication information.
The way the foreground application determines the user interface information may be to scan the elements of the current user interface. The elements may include icons, collection information corresponding to the icons, manipulation instructions corresponding to the current user interface, and the like.
The manner in which the foreground application determines the user interface information may also include retrieving data from a network device (e.g., a cloud server) related to the elements on the current user interface. For example, acquiring a hot word associated with "e.g." includes "e.g.".
The foreground application may determine the user interface information by searching for a document for displaying the current user interface to obtain the user interface information. The document may include, for example, a hypertext markup language (hyper text markup language, HTML) file, an extensible markup language (extensible markup language, XML) file, a script file, and the like.
Optionally, the user interface information further includes an identification of the foreground application.
That is, the voice assistant may learn from the user interface information that the current user interface is provided by the foreground application and that the user interface information is provided by the foreground application.
Assume that the current user interface is updated, for example, an icon of a television show is added to the current user interface. Because the voice assistant can acquire the user interface information through the foreground application, and the foreground application already knows the elements on the updated user interface when displaying the updated user interface, the updating of the current user interface does not affect the voice recognition efficiency of the voice assistant.
Table 1 below provides code for a voice assistant to obtain user interface information from a foreground application. Specifically, the user interface information may include hotword information, instruction information of a manipulation instruction, and a corner mark maximum value.
TABLE 1
Optionally, before the step of obtaining the voice command of the user, the method further includes: and displaying a corner mark on the current user interface.
For example, the user interface information includes target corner mark information, and the method further includes, before the voice assistant obtains the voice command of the user: the voice assistant sends second indication information to the foreground application, wherein the second indication information is used for indicating the foreground application to display a corner mark on the current user interface. That is, when the user speaks the wake word to wake up the voice assistant, the voice assistant may send the second indication information to the foreground application, so that the foreground application may display the corner mark on the currently displayed interface. Thereafter, the user may observe the user interface displaying the horns and speak speech containing the horn information. The user interface information fed back by the foreground application to the voice assistant may include information for the corner mark.
Optionally, after the voice instruction of the user is acquired, the method further includes: and removing the corner mark on the current user interface.
After the voice assistant acquires the voice command, the foreground application may remove the corner mark on the current user interface. For example, after the foreground application feeds back the user interface information to the voice assistant, the foreground application may remove corner marks on the user interface. A user interface that does not include a corner post may have a relatively compact display effect.
503, the client device sends the user interface information and the voice command to a server.
Correspondingly, the server receives the user interface information and the voice instruction sent by the client device.
Alternatively, 503 may be accomplished by a voice assistant on the client device.
The client device may not have voice recognition capabilities, i.e., the client device may not be able to translate the user's voice instructions into device control instructions corresponding to the voice instructions. The client device may send the user's voice instructions to the server, which performs voice recognition operations. And, the server may perform voice recognition operations according to the user interface currently displayed by the client device. Accordingly, the client device may transmit user interface indication information for indicating the currently displayed user interface to the server.
504, the server determines a target instruction according to the user interface information and the voice instruction of the user, where the target instruction is used to instruct the client device to execute the target operation.
In one example, the voice instruction is "play as exemplified by 30 set". The user interface information includes: such as exemplary transmissions. The server can determine that the type of the target operation is playing audio and video according to the play in the voice command, and correspond the like in the voice command with the like in the user interface information, so that the server can determine the target command, and the target operation corresponding to the target command is as follows: and playing the 30 th video resource in the television episode like exemplary.
In one example, the voice instruction is "display movie page". The user interface information includes: a movie. The server can determine the type of the target operation to display a specific user interface according to the display in the voice command, and correspond the movie in the voice command with the movie in the user interface information, so that the server can determine the target command, and the target operation corresponding to the target command is: and displaying a user interface corresponding to the movie collection.
In one example, the voice instruction is "turn on WiFi". The user interface information includes: and (5) WiFi. The server can determine the type of the target operation as starting a specific function according to the opening in the voice command, and correspond the WiFi in the voice command with the WiFi in the user interface information, so that the server can determine the target command, and the target operation corresponding to the target command is: and starting a wireless connection module of the client device.
In one example, the voice instruction is "3 rd". The user interface information includes: corner mark 3-will be night. The server can determine the type of the target operation as clicking according to the voice command, and corresponds '3' in the voice command with 'corner mark 3-night' in the user interface information, so that the server can determine the target command, and the target operation corresponding to the target command is: clicking on the icon of the tv episode "will night".
In one example, the voice instruction is "next page". The user interface information includes: the next page. The server can determine that the type of the target operation is a page turning operation according to the voice instruction, and correspond the next page in the voice instruction with the next page in the user interface information, so that the server can determine the target instruction, and the target operation corresponding to the target instruction is: the next page of user interface is displayed.
The server may implement the operation of speech recognition, for example, through a speech recognition (automatic speech recognition, ASR) module, a semantic understanding (natural language understanding, NLU) module. Optionally, the server or the client device may further include a dialog control (dialogue state tracking, DST) module, a Dialog Management (DM) module, a dialog generation (natural language generation, NLG) module, a voice to speech (TTS) module, and the like to implement voice recognition. The function of the various modules is described below with respect to fig. 7, where 701 in fig. 7 may represent voice commands.
(1) ASR module
The primary function of the ASR module is to recognize the user's speech as textual content. The ASR module can process the voice command of the user according to the user interface information to change a section of voice into corresponding words. For example, a portion of the voice instructions may be associated with an icon name contained in the user interface information. Due to the development of machine learning capability in recent years, the recognition accuracy of an ASR speech recognition module is greatly improved, so that the speech interaction between people and machines is possible, and therefore ASR is a starting point in the true sense of speech interaction. Although the ASR module can learn what the user is speaking, it cannot understand what the user's meaning, and understanding the semantics would be handled by the NLU module.
(2) NLU module
The NLU module is mainly used for understanding the intention (intent) of a user and analyzing slots (slots). The NLU module can determine the intention and the slot position of the voice instruction according to the user interface information. For example, the text obtained by the ASR module may be associated with an icon name contained in the user interface information.
Illustratively, the currently displayed user interface is shown in FIG. 3. User expression: play exemplary 30 sets.
Since the currently displayed user interface includes icons of the tv episode "e.g." the NLU module can parse out the content shown in table 2.
TABLE 2
Thus, the NLU module can convert the voice command play as exemplified by the exemplary 30 set into a corresponding target command.
In the above examples 2 concepts are mentioned, intended and slot, respectively, which are explained in detail below.
Intent of
The intention is understood to be a classifier that determines what type of sentence the user expresses, and then the program corresponding to that type makes a special parsing. In one implementation, the "program corresponding to this type" may be a robot (Bot), such as the user says: "put me a comedy movie bar", the NLU module determines that the user's intent classification is movie, and therefore calls the movie robot (Bot) to recommend a movie play to the user, and when the user hears something wrong, say that: the "change-to-one" or this movie robot continues to serve the user until the user has expressed another problem, and when the intent is not movie, then switches to another robot to serve the user.
Groove position
When the user's intent is determined, the NLU module needs to further understand the content of the dialog, and for simplicity, the most central parts may be selected for understanding, and others may be ignored, and those most important parts may be referred to as slots (slots).
In the example of "play exemplary 30 sets", 2 core slots are defined, which are "video name", "number of sets", respectively. If the content that needs to be input by the user to play the video is to be fully considered, more play start points, play speeds, play resolutions and the like can be considered certainly, and the start points of the design are defined slots for a designer of the voice interaction.
Several codes for determining target instructions are provided below.
Example 1
Example 2
Example 3
(3) DST module and DM module
The DST module is mainly used for carrying out slot position inspection and combination, and the DM module is mainly used for carrying out sequential slot filling clarification and disambiguation.
Illustratively, the user expresses "play as" when the NLU module can determine the user's intention as "play", and the intention-related slot information is "video name" and "album number". And only one piece of slot information of 'video name' is in the sentence of the user expression, so that the slot information of 'collection number' of the DST module is missing at the moment, the DST module can send the missing slot information to the DM module, and the DM module controls the NLG module to generate a dialogue for inquiring the missing slot information to the user.
Illustratively, the user: i want to see the video;
BOT: what is the name of the asking video?
The user: such as exemplary;
BOT: please ask you what kind of collection to start playing?
…
After the user supplements all the slot information in the 'play' intention, the DM module may perform the slot skipping on each slot information according to a preset sequence. For example, the sequence of filling the slots may be "video name", "set number", where the corresponding slot information is "e.g." 30 th set ".
After the slot filling is completed, the DM module may control the command execution module to perform the operation of "play". Illustratively, the command execution module may open the televised application and begin playing from, for example, the exemplary 30 th episode.
It should be understood that in different dialog systems, the dialogs and designs of the modules of the dialog manager are different, and the DST module and the DM module may be collectively considered as a whole for dialog state control and management. For example, if the user expresses a "play" requirement, but nothing is clear, we need the dialog system to ask the user for slot information that must be known.
(4) Dialogue generation NLG module
The main role of the NLG module is to generate a dialog.
For example, when the DM module determines that the slot information of "number of sets" is missing, the NLG module may be controlled to generate a corresponding dialogue as "please ask you what number of sets to start playing? ".
For example, after the command execution module completes the operation of "play", the DM module may be informed that the operation has been completed, and at this time, the DM module may control the NLG module to generate a corresponding dialogue as "play for your now as exemplified by the 30 th set …".
(5) TTS module
The main function of the TTS module is to broadcast a dialogue to the user.
TTS is a speech synthesis broadcasting technology, and is mainly aimed at handling the problem of "phonology" of good broadcasting, which requires judging and uniformly considering information such as symbols, polyphones, sentence patterns, etc., and handling word pronunciation methods in broadcasting. On the other hand, attention is paid to "tone" in order to adapt to different group preferences. In general, the "phonology" and "timbre" are well processed.
In order to improve the TTS broadcasting quality, a real person is invited to record a standard template part, so that the whole dialogue system sounds more natural.
505, the server sends the target instruction to the client device.
Correspondingly, the client device receives the target instruction sent by the server. Optionally, a voice assistant on the client device may receive the target instruction sent by the server.
That is, the server feeds back the recognition result of the voice instruction to the client device.
Optionally, the method further includes 506, where the client device determines and executes the target operation according to the target instruction.
That is, the client device may determine a target operation indicated by the target instruction according to the target instruction sent by the server, and execute the target operation in response to a voice instruction sent by the user.
Optionally, the target instruction includes an identification of the foreground application.
For example, the foreground application sends user interface information including the foreground application identification to the voice assistant, which in turn sends the user interface information to the server. And the server determines the target instruction according to the user interface information and the voice instruction, and the target instruction can carry the identification of the foreground application. Thus, the voice assistant may invoke the software interface of the foreground application according to the identifier of the foreground application sent by the server, and send the target instruction to the foreground application. The foreground application may perform the target operation according to the target instruction.
The method of the voice control apparatus provided in the present application is explained in detail below by way of an example shown in fig. 8.
801, the user utters a wake word to wake up the voice assistant.
The voice assistant may be, for example, a voice assistant of the client device.
802, the voice assistant establishes a binding relationship between the voice assistant and the foreground application.
The foreground application may be, for example, a foreground application of the client device.
For example, the voice assistant may invoke a software interface of the foreground application to establish a binding relationship between the voice assistant and the foreground application.
803, the foreground application displays the corner mark.
That is, the client device may display one or more corner marks on the currently displayed user interface. The currently displayed user interface may be a current user interface.
804, the user speaks a voice instruction.
Accordingly, the voice assistant obtains the voice instruction of the user.
A specific implementation of 804 may refer to 501 in the embodiment shown in fig. 5, and need not be described herein.
805, the foreground application sends the user interface information of the current user interface to the voice assistant.
Accordingly, the voice assistant receives the user interface information sent by the previous application.
Reference may be made to 502 in the embodiment shown in fig. 5 for a specific implementation of 805, which need not be described in detail herein.
For example, as shown in fig. 8, the foreground application may send the user interface information to the voice assistant for a period of time after the voice assistant is bound to the foreground application, and the period of time does not exceed a preset threshold (e.g., 100 ms), then the voice assistant may send the user interface information to the access platform of the cloud server. If the voice assistant does not receive the user interface information within a long time after the voice assistant is bound with the foreground application, the interface between the client and the cloud server does not carry parameters.
806, the voice assistant sends the voice instruction and the user interface information to the access platform of the server.
Correspondingly, the access platform of the server receives the voice command and the user interface information sent by the voice assistant.
806 may refer to 503 in the embodiment shown in fig. 5, and need not be described herein.
807, the voice assistant unbundles the binding between the voice assistant and the foreground application.
For example, the voice assistant may de-invoke a software interface of the foreground application.
808, the foreground application removes the corner mark on the current user interface.
809, the access platform may send the voice instructions and the user interface information sent by the voice assistant to an ASR module of the server.
Accordingly, the ASR module may receive the voice command and the user interface information sent by the access platform.
810, the ASR module may convert the voice command to text and send the text to the access platform based on the user interface information.
Correspondingly, the access platform receives the text sent by the ASR module.
811, the access platform may send the user interface information sent by the voice assistant and the text sent by the ASR module to a DM module of a server.
Accordingly, the DM module receives the text and the user interface information.
812, the DM module analyzes the intention and the slot of the text according to the user interface information to obtain a target instruction corresponding to the voice instruction.
813, the DM module sends the target instruction to the access platform.
Correspondingly, the access platform receives the target instruction sent by the DM module.
The specific implementation of 809 to 813 can refer to 504 in the embodiment shown in fig. 5, and need not be described here.
814, the access platform sends the target instruction to the voice assistant.
Correspondingly, the voice assistant receives the target instruction sent by the access platform.
A specific implementation of 814 may refer to 505 in the embodiment shown in fig. 5, and need not be described herein.
815, the voice assistant invokes the software interface of the foreground application.
816, the voice assistant sends the target instruction to the foreground application.
Accordingly, the foreground application receives the target instruction sent by the voice assistant.
817, the foreground application executes the target operation indicated by the voice instruction according to the target instruction.
815 to 817 may refer to 506 in the embodiment shown in fig. 5, and need not be described herein.
818, the foreground application sends feedback results to the voice assistant.
The feedback result may, for example, indicate that the foreground application successfully received the target instruction.
819, the foreground application presents the execution result of the target operation to a user.
That is, the user may perceive that the client device responds to the user's voice instruction by performing the target operation.
Fig. 9 is a schematic flowchart of a method for controlling a device by using voice according to an embodiment of the present application.
901, acquiring a voice instruction of a user, wherein the voice instruction is used for indicating a target instruction.
And 902, acquiring user interface information of a current user interface, wherein the current user interface is a user interface currently displayed by the client device.
903, determining the target instruction corresponding to the voice instruction, where the target instruction is obtained from the voice instruction and the user interface information.
Optionally, the user interface information includes at least one of the following information: and the icon name, hotword information, instruction information of a control instruction and target corner mark information of the current user interface.
Optionally, the target corner mark information corresponds to a target icon or a target control instruction.
One possible implementation is that the method 900 shown in fig. 9 is performed by a client device.
Optionally, the determining the target instruction corresponding to the voice instruction includes: and the client device determines the target instruction according to the voice instruction and the user interface information.
The client device may perform speech recognition operations, for example, through a speech recognition (automatic speech recognition, ASR) module, a semantic understanding (natural language understanding, NLU) module, and in conjunction with user interface information.
In this case, the specific implementation of 901 to 902 may refer to 501, 502, 903 in the embodiment shown in fig. 5, and the specific implementation may refer to 504 in the embodiment shown in fig. 5, which need not be described herein.
One possible implementation is that the method 900 shown in fig. 9 is performed by a client device.
Optionally, before the determining the target instruction corresponding to the voice instruction, the method further includes: transmitting the user interface information and the voice command to a server; the determining the target instruction corresponding to the voice instruction comprises the following steps: and receiving a target instruction sent by the server, wherein the target instruction is determined by the server according to the user interface information and the voice instruction of the user.
In this case, the specific implementation of 901 to 902 may refer to 501, 502, 903 in the embodiment shown in fig. 5, and the specific implementation may refer to 503 to 505 in the embodiment shown in fig. 5, which are not described in detail herein.
One possible implementation is that the method 900 shown in fig. 9 is performed by a server.
Optionally, the obtaining the voice command of the user includes: receiving the voice instruction sent by the client device; the obtaining the user interface information of the current user interface comprises the following steps: receiving the user interface information sent by the client device; the determining the target instruction corresponding to the voice instruction comprises the following steps: and determining the target instruction according to the voice instruction and the user interface information.
Optionally, the method further comprises: and sending the target instruction to the client device.
In this case, the specific implementation of 901 to 902 may refer to 503 and the specific implementation of 903 in the embodiment shown in fig. 5 may refer to 4 in the embodiment shown in fig. 5, and need not be described here again.
It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware and/or software modules that perform the respective functions. The steps of an algorithm for each example described in connection with the embodiments disclosed herein may be embodied in hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation is not to be considered as outside the scope of this application.
The present embodiment may divide the functional modules of the electronic device according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules described above may be implemented in hardware. It should be noted that, in this embodiment, the division of the modules is schematic, only one logic function is divided, and another division manner may be implemented in actual implementation.
In the case of dividing the respective functional modules with the respective functions, fig. 10 shows a schematic diagram of one possible composition of the electronic device 1000 involved in the above-described embodiment, and as shown in fig. 10, the electronic device 1000 may include: an acquisition module 1001 and a processing module 1002. The electronic device 1000 may be, for example, a client device or a server as described above.
The obtaining module 1001 may be configured to obtain a voice instruction of a user, where the voice instruction is used to indicate a target instruction.
By way of example, the voice assistant in fig. 2 may be used to implement the functionality of the acquisition unit 1001.
The obtaining module 1001 may be further configured to obtain user interface information of a current user interface, where the current user interface is a user interface currently displayed by the client device.
By way of example, the voice assistant in fig. 2 may be used to implement the functionality of the acquisition unit 1001.
The processing module 1002 is configured to determine the target instruction corresponding to the voice instruction, where the target instruction is obtained from the voice instruction and the user interface information.
It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.
The electronic device provided in this embodiment is configured to execute the method of controlling a voice control device, so that the same effects as those of the implementation method can be achieved.
In case an integrated unit is employed, the electronic device may comprise a processing module, a storage module and a communication module. The processing module may be configured to control and manage actions of the electronic device, for example, may be configured to support the electronic device to perform steps performed by the foregoing units. The memory module may be used to support the electronic device to execute stored program code, data, etc. And the communication module can be used for supporting the communication between the electronic device and other devices.
Wherein the processing module may be a processor or a controller. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. A processor may also be a combination that performs computing functions, e.g., including one or more microprocessors, digital signal processing (digital signal processing, DSP) and microprocessor combinations, and the like. The memory module may be a memory. The communication module can be a radio frequency circuit, a Bluetooth chip, a Wi-Fi chip and other equipment which interact with other electronic equipment.
In one embodiment, when the processing module is a processor and the storage module is a memory, the electronic device according to this embodiment may be a device having the structure shown in fig. 1.
The present embodiment also provides a computer program product which, when run on a computer, causes the computer to perform the above-mentioned related steps to implement the method of the speech control device in the above-mentioned embodiments.
In addition, embodiments of the present application also provide an apparatus, which may be specifically a chip, a component, or a module, and may include a processor and a memory connected to each other; the memory is configured to store computer-executable instructions, and when the apparatus is running, the processor may execute the computer-executable instructions stored in the memory, so that the chip performs the method of the voice control apparatus in the above method embodiments.
The embodiment of the application provides a terminal device, which has a function of realizing the behavior of the terminal device in any method embodiment. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to each of the above-described functions. In particular, the terminal device may be a user device.
The embodiment of the application also provides a communication system, which comprises the network equipment (such as a cloud server) and the terminal equipment.
The embodiment of the application also provides a communication system, which comprises the electronic equipment and the server according to any one of the embodiments.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a computer, implements the method flow related to the terminal device in any of the above method embodiments. Specifically, the computer may be the above-mentioned terminal device.
The present application further provides a computer program or a computer program product comprising a computer program, which when executed on a computer causes the computer to implement the method flow related to the terminal device in any of the above method embodiments. Specifically, the computer may be the above-mentioned terminal device.
The embodiment of the application also provides a device which is applied to the terminal equipment, and the device is coupled with the memory and used for reading and executing the instructions stored in the memory, so that the terminal equipment can execute the method flow related to the terminal equipment in any method embodiment. The memory may be integrated in the processor or may be separate from the processor. The means may be a chip (e.g. a system on a chip (SoC)) on the terminal device.
It should be appreciated that the processors referred to in the embodiments of the present application may be central processing units (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be understood that the memory referred to in the embodiments of the present application may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
It should also be understood that the first, second, and various numerical numbers referred to herein are merely descriptive convenience and are not intended to limit the scope of the present application.
In this application, "and/or" describes an association relationship of an association object, which means that there may be three relationships, for example, a and/or B may mean: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, "at least one (individual) of a, b, or c," or "at least one (individual) of a, b, and c," may each represent: a. b, c, a-b (i.e., a and b), a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple, respectively.
It should be understood that, in various embodiments of the present application, the sequence number of each process does not mean that the sequence of execution is sequential, and some or all of the steps may be executed in parallel or sequentially, where the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device or a terminal device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Relevant parts among the method embodiments can be mutually referred to; the apparatus provided by each apparatus embodiment is configured to perform the method provided by the corresponding method embodiment, so each apparatus embodiment may be understood with reference to the relevant part of the relevant method embodiment.
The device configuration diagrams presented in the device embodiments of the present application only show a simplified design of the corresponding device. In practical applications, the apparatus may include any number of transmitters, receivers, processors, memories, etc. to implement the functions or operations performed by the apparatus in the embodiments of the apparatus of the present application, and all apparatuses capable of implementing the present application are within the scope of protection of the present application.
The names of the messages/frames/indication information, modules or units, etc. provided in the embodiments of the present application are only examples, and other names may be used as long as the roles of the messages/frames/indication information, modules or units, etc. are the same.
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items. The character "/" herein generally indicates that the associated object is an "or" relationship.
It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe various messages, requests, and terminals, these messages, requests, and terminals should not be limited to these terms. These terms are only used to distinguish a message, a request, and a terminal from one another. For example, a first terminal may also be referred to as a second terminal, and similarly, a second terminal may also be referred to as a first terminal, without departing from the scope of embodiments of the present application.
The word "if" or "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
Those of ordinary skill in the art will appreciate that all or some of the steps in implementing the methods of the above embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a readable storage medium of a device, where the program includes all or some of the steps when executed, where the storage medium includes, for example: FLASH, EEPROM, etc.
The foregoing detailed description has set forth various embodiments for the purposes of providing a detailed description of the invention, including those of skill in the art, and it is to be understood that the invention is not limited to the specific embodiments described above, but is intended to cover all modifications, adaptations, alternatives, improvements, etc. as fall within the spirit and principles of the invention.
Claims (19)
1. A terminal, wherein a first application is installed on the terminal, and a voice assistant is installed or integrated on the terminal, the first application being an application other than the voice assistant, the terminal comprising:
a processor;
a memory coupled to the processor for storing computer-executable instructions;
when the computer-executable instructions stored in the memory are executed by the processor, the terminal performs:
displaying a first user interface of the first application, wherein the first user interface comprises M elements, the M elements comprise first elements, and M is a positive integer greater than 1;
obtaining a wake-up word, wherein the wake-up word is used for waking up the voice assistant;
After the wake-up word is acquired, waking up the voice assistant, wherein the voice assistant indicates to the first application that a corner mark is displayed on the first user interface, and displays N corner marks on the first user interface, each of the N corner marks corresponds to an element, the N corner marks comprise a first corner mark corresponding to the first element, and N is a positive integer less than or equal to M;
the voice assistant obtains a first voice instruction of a user, wherein the first voice instruction comprises the first corner mark;
after a first voice instruction of a user is acquired, the terminal removes the N corner marks, displays the first user interface which does not comprise the N corner marks, and the first application sends first user interface information to the voice assistant, wherein the first user interface information comprises corner mark information, hot word information and indication information of a control instruction, the corner mark information is used for indicating the corner marks of the first user interface, and the hot word information is related to the M elements;
and executing the operation corresponding to the first element according to the first voice instruction and the first user interface information.
2. The terminal according to claim 1, characterized in that it performs:
After waking up the voice assistant, the voice assistant establishes a binding relationship with the first application;
before removing the N corner marks, the voice assistant and the first application are unbinding.
3. The terminal according to claim 2, characterized in that it performs:
and if the voice assistant does not receive the first user interface information within a first preset time period after the binding relationship is established, an interface between the first application of the terminal and a first server does not carry parameters, and the first server is related to the first application.
4. Terminal according to claim 1 or 2, characterized in that,
according to the first voice instruction and the first user interface information, executing the operation corresponding to the first element, including:
when determining that the necessary slot is missing according to the first voice instruction and the first user interface information, the voice assistant acquires the necessary slot through a dialogue with the user, and then the terminal executes an operation corresponding to the first element.
5. The terminal according to claim 1, characterized in that it performs:
And in the process of acquiring the wake-up word, displaying prompt information on the first user interface, wherein the prompt information is used for prompting a user to use the voice recognition function of the terminal.
6. The terminal of claim 1 or 5, wherein the hotword information includes hotwords that are not displayed on the first user interface.
7. The terminal of claim 6, wherein the first user interface information is derived from the first application retrieving information related to the first user interface.
8. The terminal of claim 7, wherein the first user interface information further comprises an identification of the first application.
9. The terminal of claim 8, wherein the terminal comprises a base station,
according to the first voice instruction and the first user interface information, executing the operation corresponding to the first element, including:
transmitting the first voice command and the first user interface information to a server;
receiving a first target instruction from the server, and executing an operation corresponding to the first element according to the first target instruction;
wherein the first target instruction includes an identification of the first application.
10. A method for controlling a terminal by voice, the method being applied to the terminal, the terminal having a first application installed thereon and the terminal having a voice assistant installed thereon or integrated therewith, the first application being an application other than the voice assistant, the method comprising:
displaying a first user interface of the first application, wherein the first user interface comprises M elements, the M elements comprise first elements, and M is a positive integer greater than 1;
obtaining a wake-up word, wherein the wake-up word is used for waking up the voice assistant;
after the wake-up word is acquired, waking up the voice assistant, wherein the voice assistant indicates to the first application that a corner mark is displayed on the first user interface, and displays N corner marks on the first user interface, each of the N corner marks corresponds to an element, the N corner marks comprise a first corner mark corresponding to the first element, and N is a positive integer less than or equal to M;
the voice assistant obtains a first voice instruction of a user, wherein the first voice instruction comprises the first corner mark;
after a first voice instruction of a user is acquired, the terminal removes the N corner marks, displays the first user interface which does not comprise the N corner marks, and the first application sends first user interface information to the voice assistant, wherein the first user interface information comprises corner mark information, hot word information and indication information of a control instruction, the corner mark information is used for indicating the corner marks of the first user interface, and the hot word information is related to the first element;
And executing the operation corresponding to the first element according to the first voice instruction and the first user interface information.
11. The method according to claim 10, characterized in that the method comprises:
after waking up the voice assistant, the voice assistant establishes a binding relationship with the first application;
before removing the N corner marks, the voice assistant and the first application are unbinding.
12. The method according to claim 11, characterized in that the method comprises:
and if the voice assistant does not receive the first user interface information within a first preset time period after the binding relationship is established, an interface between the first application of the terminal and a first server does not carry parameters, and the first server is related to the first application.
13. The method according to claim 10 or 11, wherein,
according to the first voice instruction and the first user interface information, executing the operation corresponding to the first element, including:
when determining that the necessary slot is missing according to the first voice instruction and the first user interface information, the voice assistant acquires the necessary slot through a dialogue with the user, and then the terminal executes an operation corresponding to the first element.
14. The method according to claim 10, characterized in that the method comprises:
and in the process of acquiring the wake-up word, displaying prompt information on the first user interface, wherein the prompt information is used for prompting a user to use the voice recognition function of the terminal.
15. The method of claim 10, wherein the hotword information comprises hotwords that are not displayed on the first user interface.
16. The method of claim 14, wherein the first user interface information is derived from the first application retrieving information related to the first user interface.
17. The method of claim 16, wherein the step of determining the position of the probe comprises,
according to the first voice instruction and the first user interface information, executing the operation corresponding to the first element, including:
transmitting the first voice command and the first user interface information to a server;
receiving a first target instruction from the server, and executing an operation corresponding to the first element according to the first target instruction;
wherein the first target instruction includes an identification of the first application.
18. A computer readable storage medium storing program code which, when executed by a processor, implements the method of any of claims 10-17.
19. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform the method of any of claims 10-17.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210690830.6A CN115145529B (en) | 2019-08-09 | 2020-04-09 | Voice control device method and electronic device |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910735931 | 2019-08-09 | ||
CN2019107359319 | 2019-08-09 | ||
CN202010273843.4A CN112346695A (en) | 2019-08-09 | 2020-04-09 | Method for controlling equipment through voice and electronic equipment |
CN202210690830.6A CN115145529B (en) | 2019-08-09 | 2020-04-09 | Voice control device method and electronic device |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010273843.4A Division CN112346695A (en) | 2019-08-09 | 2020-04-09 | Method for controlling equipment through voice and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115145529A CN115145529A (en) | 2022-10-04 |
CN115145529B true CN115145529B (en) | 2023-05-09 |
Family
ID=74357840
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210690830.6A Active CN115145529B (en) | 2019-08-09 | 2020-04-09 | Voice control device method and electronic device |
CN202010273843.4A Pending CN112346695A (en) | 2019-08-09 | 2020-04-09 | Method for controlling equipment through voice and electronic equipment |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010273843.4A Pending CN112346695A (en) | 2019-08-09 | 2020-04-09 | Method for controlling equipment through voice and electronic equipment |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230176812A1 (en) |
EP (1) | EP3989047A4 (en) |
CN (2) | CN115145529B (en) |
WO (1) | WO2021027476A1 (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111367491A (en) * | 2020-03-02 | 2020-07-03 | 成都极米科技股份有限公司 | Voice interaction method and device, electronic equipment and storage medium |
CN113076077B (en) * | 2021-03-29 | 2024-06-14 | 北京梧桐车联科技有限责任公司 | Method, device and equipment for installing vehicle-mounted program |
CN113076079A (en) * | 2021-04-20 | 2021-07-06 | 广州小鹏汽车科技有限公司 | Voice control method, server, voice control system and storage medium |
CN113282268B (en) * | 2021-06-03 | 2023-03-14 | 腾讯科技(深圳)有限公司 | Sound effect configuration method and device, storage medium and electronic equipment |
CN113507500A (en) * | 2021-06-04 | 2021-10-15 | 上海闻泰信息技术有限公司 | Terminal control method, terminal control device, computer equipment and computer-readable storage medium |
WO2023272629A1 (en) * | 2021-06-30 | 2023-01-05 | 华为技术有限公司 | Interface control method, device, and system |
CN113628622A (en) * | 2021-08-24 | 2021-11-09 | 北京达佳互联信息技术有限公司 | Voice interaction method and device, electronic equipment and storage medium |
CN114090148A (en) * | 2021-11-01 | 2022-02-25 | 深圳Tcl新技术有限公司 | Information synchronization method and device, electronic equipment and computer readable storage medium |
CN114049892A (en) * | 2021-11-12 | 2022-02-15 | 杭州逗酷软件科技有限公司 | Voice control method and device and electronic equipment |
CN116170646A (en) * | 2021-11-25 | 2023-05-26 | 中移(杭州)信息技术有限公司 | Control method and system of set top box and storage medium |
CN114529641A (en) * | 2022-02-21 | 2022-05-24 | 重庆长安汽车股份有限公司 | Intelligent network connection automobile assistant dialogue and image management system and method |
CN114968533B (en) * | 2022-06-09 | 2023-03-24 | 中国人民解放军32039部队 | Embedded satellite task scheduling management method and system and electronic equipment |
CN117334183A (en) * | 2022-06-24 | 2024-01-02 | 华为技术有限公司 | Voice interaction method, electronic equipment and voice assistant development platform |
CN118193760B (en) * | 2024-03-11 | 2024-09-03 | 北京鸿鹄云图科技股份有限公司 | Drawing voice annotation method and system based on natural language understanding |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109391833A (en) * | 2018-09-13 | 2019-02-26 | 苏宁智能终端有限公司 | A kind of sound control method and smart television of smart television |
CN109801625A (en) * | 2018-12-29 | 2019-05-24 | 百度在线网络技术(北京)有限公司 | Control method, device, user equipment and the storage medium of virtual speech assistant |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6882974B2 (en) * | 2002-02-15 | 2005-04-19 | Sap Aktiengesellschaft | Voice-control for a user interface |
US20050273487A1 (en) * | 2004-06-04 | 2005-12-08 | Comverse, Ltd. | Automatic multimodal enabling of existing web content |
CN101193159A (en) * | 2006-11-24 | 2008-06-04 | 辉翼科技股份有限公司 | Master/slave communication for synchronizing multi-style data channels |
US20120030712A1 (en) * | 2010-08-02 | 2012-02-02 | At&T Intellectual Property I, L.P. | Network-integrated remote control with voice activation |
US20120110456A1 (en) * | 2010-11-01 | 2012-05-03 | Microsoft Corporation | Integrated voice command modal user interface |
US8886546B2 (en) * | 2011-12-19 | 2014-11-11 | Verizon Patent And Licensing Inc. | Voice application access |
US9060152B2 (en) * | 2012-08-17 | 2015-06-16 | Flextronics Ap, Llc | Remote control having hotkeys with dynamically assigned functions |
US20140350941A1 (en) * | 2013-05-21 | 2014-11-27 | Microsoft Corporation | Method For Finding Elements In A Webpage Suitable For Use In A Voice User Interface (Disambiguation) |
CN104093077B (en) * | 2013-10-29 | 2016-05-04 | 腾讯科技(深圳)有限公司 | Method, Apparatus and system that multiple terminals is interconnected |
US10318236B1 (en) * | 2016-05-05 | 2019-06-11 | Amazon Technologies, Inc. | Refining media playback |
WO2018112856A1 (en) * | 2016-12-22 | 2018-06-28 | 深圳前海达闼云端智能科技有限公司 | Location positioning method and device based on voice control, user equipment, and computer program product |
US12026456B2 (en) * | 2017-08-07 | 2024-07-02 | Dolbey & Company, Inc. | Systems and methods for using optical character recognition with voice recognition commands |
CN107919129A (en) * | 2017-11-15 | 2018-04-17 | 百度在线网络技术(北京)有限公司 | Method and apparatus for controlling the page |
CN107832036B (en) * | 2017-11-22 | 2022-01-18 | 北京小米移动软件有限公司 | Voice control method, device and computer readable storage medium |
CN108880961A (en) * | 2018-07-19 | 2018-11-23 | 广东美的厨房电器制造有限公司 | Appliances equipment control method and device, computer equipment and storage medium |
CN108683937B (en) * | 2018-03-09 | 2020-01-21 | 百度在线网络技术(北京)有限公司 | Voice interaction feedback method and system for smart television and computer readable medium |
CN108958844B (en) * | 2018-07-13 | 2021-09-03 | 京东方科技集团股份有限公司 | Application program control method and terminal |
CN109448727A (en) * | 2018-09-20 | 2019-03-08 | 李庆湧 | Voice interactive method and device |
CN109584879B (en) * | 2018-11-23 | 2021-07-06 | 华为技术有限公司 | Voice control method and electronic equipment |
US20220317968A1 (en) * | 2021-04-02 | 2022-10-06 | Comcast Cable Communications, Llc | Voice command processing using user interface context |
-
2020
- 2020-04-09 CN CN202210690830.6A patent/CN115145529B/en active Active
- 2020-04-09 CN CN202010273843.4A patent/CN112346695A/en active Pending
- 2020-07-15 EP EP20853218.4A patent/EP3989047A4/en active Pending
- 2020-07-15 US US17/633,702 patent/US20230176812A1/en not_active Abandoned
- 2020-07-15 WO PCT/CN2020/102113 patent/WO2021027476A1/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109391833A (en) * | 2018-09-13 | 2019-02-26 | 苏宁智能终端有限公司 | A kind of sound control method and smart television of smart television |
CN109801625A (en) * | 2018-12-29 | 2019-05-24 | 百度在线网络技术(北京)有限公司 | Control method, device, user equipment and the storage medium of virtual speech assistant |
Also Published As
Publication number | Publication date |
---|---|
EP3989047A1 (en) | 2022-04-27 |
WO2021027476A1 (en) | 2021-02-18 |
CN112346695A (en) | 2021-02-09 |
US20230176812A1 (en) | 2023-06-08 |
EP3989047A4 (en) | 2022-08-17 |
CN115145529A (en) | 2022-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115145529B (en) | Voice control device method and electronic device | |
CN110109636B (en) | Screen projection method, electronic device and system | |
CN111163274B (en) | Video recording method and display equipment | |
WO2022052776A1 (en) | Human-computer interaction method, and electronic device and system | |
WO2019047878A1 (en) | Method for controlling terminal by voice, terminal, server and storage medium | |
CN112527174B (en) | Information processing method and electronic equipment | |
US11425466B2 (en) | Data transmission method and device | |
EP4138381A1 (en) | Method and device for video playback | |
TW202025090A (en) | Display apparatus and method of controlling the same | |
CN112527222A (en) | Information processing method and electronic equipment | |
CN111526402A (en) | Method for searching video resources through voice of multi-screen display equipment and display equipment | |
CN113835649A (en) | Screen projection method and terminal | |
KR20140022320A (en) | Method for operating an image display apparatus and a server | |
CN111556350B (en) | Intelligent terminal and man-machine interaction method | |
CN112788422A (en) | Display device | |
CN115686401A (en) | Screen projection method, electronic equipment and system | |
WO2022012299A1 (en) | Display device and person recognition and presentation method | |
CN113468351A (en) | Intelligent device and image processing method | |
US11930236B2 (en) | Content playback device using voice assistant service and operation method thereof | |
CN112053688B (en) | Voice interaction method, interaction equipment and server | |
CN115334367A (en) | Video summary information generation method, device, server and storage medium | |
CN113497884B (en) | Dual-system camera switching control method and display equipment | |
CN116257159A (en) | Multimedia content sharing method, device, equipment, medium and program product | |
US20220014688A1 (en) | Image processing method and display device thereof | |
CN113365124A (en) | Display device and display method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |