US20230176812A1 - Method for controlling a device using a voice and electronic device - Google Patents
Method for controlling a device using a voice and electronic device Download PDFInfo
- Publication number
- US20230176812A1 US20230176812A1 US17/633,702 US202017633702A US2023176812A1 US 20230176812 A1 US20230176812 A1 US 20230176812A1 US 202017633702 A US202017633702 A US 202017633702A US 2023176812 A1 US2023176812 A1 US 2023176812A1
- Authority
- US
- United States
- Prior art keywords
- user interface
- instruction
- information
- target
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 95
- 230000015654 memory Effects 0.000 claims description 56
- 230000010365 information processing Effects 0.000 claims 2
- 238000005516 engineering process Methods 0.000 abstract description 11
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 61
- 238000012545 processing Methods 0.000 description 33
- 238000004891 communication Methods 0.000 description 29
- 238000004590 computer program Methods 0.000 description 19
- 230000008569 process Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 230000003993 interaction Effects 0.000 description 9
- 238000007726 management method Methods 0.000 description 7
- 230000004044 response Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 229920001621 AMOLED Polymers 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005352 clarification Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- JLGLQAWTXXGVEM-UHFFFAOYSA-N triethylene glycol monomethyl ether Chemical compound COCCOCCOCCO JLGLQAWTXXGVEM-UHFFFAOYSA-N 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Definitions
- This application relates to the field of artificial intelligence and the field of electronic devices, and more specifically, to a method for controlling a device using a voice and an electronic device.
- a user may watch a live TV broadcast, an on-line video resource, and a local video resource, listen to an on-line audio resource and a local audio resource, and the like by using a large-screen display apparatus.
- the user Before the user watches a video or listens to music, the user may say, based on a user interface displayed by the large-screen display apparatus, a video or audio resource that the user wants to play.
- the large-screen display apparatus or a set-top box connected to the large-screen display apparatus may capture and respond to the voice of the user.
- a file used for voice recognition is generally configured for the large-screen display apparatus.
- the voice recognition file may be used to identify a voice instruction for invoking a data resource configured on the large-screen display apparatus.
- data resources displayed or played on the large-screen display apparatus need to be frequently updated.
- the large-screen display apparatus may play a newly released TV series. Accordingly, a large amount of work is required to update the voice recognition file on the large-screen display apparatus. This may reduce voice recognition efficiency.
- This application provides a method for controlling a device using a voice and an electronic device, to improve voice recognition efficiency.
- a method for controlling a device using a voice including obtaining a voice instruction of a user, where the voice instruction is used to indicate a target instruction, obtaining user interface information of a current user interface, where the current user interface is a user interface currently displayed by a client device, and determining the target instruction corresponding to the voice instruction, where the target instruction is obtained by using the voice instruction and the user interface information.
- the method for controlling a device using a voice may be implemented by the client device (which may also be referred to as a terminal device) or a server (which may also be referred to as a network device).
- the method for controlling a device using a voice may be completed by a voice assistant on the client device.
- the user interface information may include various types of information indicating the current user interface.
- the client device may also obtain at least a part of the user interface information. In this way, efficiency of obtaining the user interface information may be higher.
- the user interface information may further be updated, and an update manner is relatively simple.
- the user interface information may reflect information that can be observed by the user on the current user interface, and the voice instruction of the user is recognized with reference to the user interface information. This helps improve an accuracy rate of voice recognition.
- the user interface information includes at least one of the following information: an icon name, hot word information, indication information of a control instruction, or target corner mark information of the current user interface.
- Icons may be classified into menu icons, resource collection icons, function icons, and the like.
- the user interface information may reflect content on the user interface from a plurality of perspectives, to help the user control the client device in a plurality of manners.
- the target corner mark information corresponds to a target icon or a target control instruction.
- the user interface information further includes a correspondence between the target corner mark information and the target icon.
- the user interface information further includes a correspondence between the target corner mark information and a target collection.
- the user interface information further includes a correspondence between the target corner mark information and the target control instruction.
- a corner mark is displayed on the current user interface, to help increase a quantity of recognizable voice instructions and improve the accuracy rate of voice recognition. For example, when the user cannot describe a pattern in language, the user may relatively quickly express a voice instruction based on information reflected by the corner mark.
- the obtaining a voice instruction of a user includes: receiving the voice instruction sent by the client device.
- the obtaining user interface information of a current user interface includes: receiving the user interface information sent by the client device.
- the determining the target instruction corresponding to the voice instruction includes: determining the target instruction based on the voice instruction and the user interface information.
- the server may implement a voice recognition operation by using an automatic speech recognition (ASR) module and a natural language understanding (NLU) module.
- ASR automatic speech recognition
- NLU natural language understanding
- the server or the client device may further include a dialog state tracking (DST) module, a dialog management (DM) module, a natural language generation (NLG) module, a text to speech (TTS) module, and the like, to implement the voice recognition operation.
- DST dialog state tracking
- DM dialog management
- NLG natural language generation
- TTS text to speech
- the server may recognize, with reference to content currently displayed on the client, the voice instruction made by the user. This helps the server eliminate useless voice recognition data, and relatively quickly and accurately convert the voice instruction of the user into the corresponding target instruction.
- the method further includes: sending the target instruction to the client device.
- the server recognizes the voice instruction and transmits data through a communications network, so that a requirement on a processing capability of the client device can be lowered.
- the client device may not have a voice recognition capability, or a processor speed and a memory capacity of the client device may be relatively common.
- the determining the target instruction corresponding to the voice instruction includes: the client device determining the target instruction based on the voice instruction and the user interface information.
- the client device may have the voice recognition capability. Because the user interface information reduces an amount of reference data for voice recognition, a voice recognition effect of the client device can be improved.
- the method before the determining the target instruction corresponding to the voice instruction, the method further includes: sending the user interface information and the voice instruction to the server.
- the determining the target instruction corresponding to the voice instruction includes: receiving the target instruction sent by the server, where the target instruction is determined by the server based on the user interface information and the voice instruction of the user.
- the server may recognize, with reference to content currently displayed on the client, the voice instruction made by the user. This helps the server eliminate useless voice recognition data, and relatively quickly and accurately convert the voice instruction of the user into the corresponding target instruction.
- the server recognizes the voice instruction and transmits data through a communications network, so that a requirement on a processing capability of the client device can be lowered.
- the client device may not have a voice recognition capability, or a processor speed and a memory capacity of the client device may be relatively common.
- the method before the obtaining user interface information of a current user interface, the method further includes: sending first indication information to a foreground application, where the first indication information is used to indicate the foreground application to feed back the user interface information.
- the obtaining user interface information of a current user interface includes: receiving the user interface information sent by the foreground application, where the user interface information is obtained by the foreground application by retrieving information related to the current user interface.
- the foreground application may be, for example, a video playing application, an audio playing application, a desktop application, a setting application, a live TV application, or a radio application.
- the retrieving may also be interpreted as searching, scanning, or the like.
- a manner in which the foreground application determines the user interface information may be: searching for a document used to display the current user interface, to obtain the user interface information.
- the document may include, for example, a hypertext markup language (HTML) file, an extensible markup language (XML) file, and a script file.
- a manner in which the foreground application determines the user interface information may be: scanning an element of the current user interface, to obtain the user interface information based on the element.
- the element may include an icon, collection information corresponding to the icon, a control instruction corresponding to the current user interface, or the like.
- an identifier of the foreground application may associate a voice instruction, a current user interface currently displayed by the foreground application, and a target instruction used to control the foreground application, so that the user can control a plurality of foreground applications by using the voice instruction. This is relatively more flexible.
- the user interface information further includes the identifier of the foreground application.
- the voice assistant may learn, based on the user interface information, that the current user interface is provided by the foreground application, to further control, by using the target instruction corresponding to the voice instruction, the foreground application to perform, based on the current user interface, an operation corresponding to the target instruction.
- the target instruction further includes the identifier of the foreground application.
- the voice assistant may learn, based on the target instruction, that the target instruction is used to instruct the foreground application to perform a target operation, so that the foreground application can perform an operation used to meet a user expectation.
- the user interface information includes the target corner mark information.
- the method further includes: displaying a corner mark on the current user interface.
- the method further includes: removing the corner mark on the current user interface.
- displaying the corner mark may provide more optional voice instruction manners for the user, and the displayed corner mark is removed at a proper time, so that the user interface has a relatively simple display effect.
- an electronic device including: an obtaining module, configured to obtain a voice instruction of a user, where the voice instruction is used to indicate a target instruction, the obtaining module is further configured to obtain user interface information of a current user interface, and the current user interface is a user interface currently displayed by a client device.
- the electronic device further includes a processing module, configured to determine the target instruction corresponding to the voice instruction, where the target instruction is obtained by using the voice instruction and the user interface information.
- the user interface information includes at least one of the following information: an icon name, hot word information, indication information of a control instruction, or target corner mark information of the current user interface.
- the target corner mark information corresponds to a target icon or a target control instruction.
- the electronic device is a server.
- the obtaining module is specifically configured to receive the voice instruction sent by the client device.
- the obtaining module is specifically configured to receive the user interface information sent by the client device.
- the processing module is specifically configured to determine the target instruction based on the voice instruction and the user interface information.
- the server further includes a transceiver module configured to send the target instruction to the client device.
- the electronic device is the client device.
- the processing module is specifically configured to determine the target instruction based on the voice instruction and the user interface information.
- the electronic device is the client device.
- the client device further includes a transceiver module, configured to: before the processing module determines the target instruction corresponding to the voice instruction, send the user interface information and the voice instruction to a server.
- the processing module is specifically configured to receive the target instruction sent by the server, where the target instruction is determined by the server based on the user interface information and the voice instruction of the user.
- the electronic device further includes a sending module, configured to: before the obtaining module obtains the user interface information of the current user interface, send first indication information to a foreground application, where the first indication information is used to indicate the foreground application to feed back the user interface information.
- the obtaining module is specifically configured to receive the user interface information sent by the foreground application, where the user interface information is obtained by the foreground application by retrieving information related to the current user interface.
- the user interface information further includes an identifier of the foreground application.
- the target instruction further includes the identifier of the foreground application.
- the user interface information includes the target corner mark information.
- the processing module is further configured to: before the obtaining module obtains the voice instruction of the user, display a corner mark on the current user interface.
- the processing module is further configured to: after the obtaining module obtains the voice instruction of the user, remove the corner mark on the current user interface.
- an electronic device including a processor configured to obtain a voice instruction of a user, where the voice instruction is used to indicate a target instruction.
- the processor is further configured to obtain user interface information of a current user interface, where the current user interface is a user interface currently displayed by a client device.
- the processor is further configured to determine the target instruction corresponding to the voice instruction, where the target instruction is obtained by using the voice instruction and the user interface information.
- the user interface information includes at least one of the following information: an icon name, hot word information, indication information of a control instruction, or target corner mark information of the current user interface.
- the target corner mark information corresponds to a target icon or a target control instruction.
- the electronic device is a server.
- the processor is specifically configured to receive the voice instruction sent by the client device.
- the processor is specifically configured to receive the user interface information sent by the client device.
- the processor is specifically configured to determine the target instruction based on the voice instruction and the user interface information.
- the electronic device further includes a transceiver configured to send the target instruction to the client device.
- the electronic device is the client device.
- the processor is specifically configured to determine the target instruction based on the voice instruction and the user interface information.
- the electronic device is the client device.
- the client device further includes a transceiver, configured to: before the processor determines the target instruction corresponding to the voice instruction, send the user interface information and the voice instruction to a server.
- the processor is specifically configured to receive the target instruction sent by the server, where the target instruction is determined by the server based on the user interface information and the voice instruction of the user.
- the electronic device further includes a transceiver, configured to: before the processor obtains the user interface information of the current user interface, send first indication information to a foreground application, where the first indication information is used to indicate the foreground application to feed back the user interface information.
- the processor is specifically configured to receive the user interface information sent by the foreground application, where the user interface information is obtained by the foreground application by retrieving information related to the current user interface.
- the user interface information further includes an identifier of the foreground application.
- the target instruction further includes the identifier of the foreground application.
- the user interface information includes the target corner mark information.
- the processor is further configured to: before the processor obtains the voice instruction of the user, display a corner mark on the current user interface.
- the processor is further configured to: after the processor obtains the voice instruction of the user, remove the corner mark on the current user interface.
- the technology provides an electronic device, including one or more processors, a memory, a plurality of applications, and one or more computer programs.
- the one or more computer programs are stored in the memory.
- the one or more computer programs include instructions.
- the electronic device is enabled to perform the method in any implementation of the first aspect.
- the technology provides an electronic device, including one or more processors and one or more memories.
- the one or more memories are coupled to the one or more processors.
- the one or more memories are configured to store computer program code, and the computer program code includes computer instructions.
- the electronic device is enabled to perform the method in any implementation of the first aspect.
- a communications apparatus includes a processor, a memory, and a transceiver.
- the memory is configured to store a computer program
- the processor is configured to execute the computer program stored in the memory, so that the apparatus is enabled to perform the method in any possible implementation of the first aspect.
- a communications apparatus includes at least one processor and a communications interface.
- the communications interface is used by the communications apparatus to exchange information with another communications apparatus.
- program instructions are executed in the at least one processor, the communications apparatus is enabled to implement the method in any possible implementation of the first aspect.
- the technology provides a non-volatile computer-readable storage medium, including computer instructions.
- the computer instructions When the computer instructions are run on an electronic device, the electronic device is enabled to perform the method in any implementation of the first aspect.
- the technology provides a computer program product.
- the computer program product runs on an electronic device, the electronic device is enabled to perform the method in any implementation of the first aspect.
- a chip includes a processor and a data interface.
- the processor reads, through the data interface, instructions stored in a memory, to perform the method in any implementation of the first aspect.
- the chip may further include the memory.
- the memory stores the instructions.
- the processor is configured to execute the instructions stored in the memory. When the instructions are executed, the processor is configured to perform the method in any implementation of the first aspect.
- FIG. 1 is an example schematic diagram of a hardware structure of an electronic device according to an embodiment of this application.
- FIG. 2 is an example schematic diagram of a software structure of an electronic device according to an embodiment of this application.
- FIG. 3 is an example schematic diagram of a user interface according to an embodiment of this application.
- FIG. 4 is an example schematic flowchart of a method for controlling a device using a voice according to an embodiment of this application;
- FIG. 5 is an example schematic flowchart of a method for controlling a device using a voice according to an embodiment of this application
- FIG. 6 is an example schematic diagram of a user interface according to an embodiment of this application.
- FIG. 7 is an example schematic interaction diagram of a voice recognition module according to an embodiment of this application.
- FIG. 8 A and FIG. 8 B are example schematic flowcharts of a method for controlling a device using a voice according to an embodiment of this application;
- FIG. 9 is an example schematic flowchart of a method for controlling a device using a voice according to an embodiment of this application.
- FIG. 10 is an example schematic block diagram of an electronic device according to an embodiment of this application.
- references to “an embodiment”, “some embodiments”, or the like described in this specification indicates that one or more embodiments of this application include a specific feature, structure, or characteristic described with reference to the embodiments. Therefore, in this specification, statements, such as “in an embodiment”, “in some embodiments”, “in some other embodiments”, and “in other embodiments”, that appear at different places do not necessarily mean referring to a same embodiment, instead, they mean “one or more but not all of the embodiments”, unless otherwise specifically emphasized.
- the terms “include”, “comprise”, “have”, and their variants all can mean “include but are not limited to”, unless otherwise specifically emphasized.
- the electronic device may be a portable electronic device, for example, a mobile phone, a tablet computer, or a video player that further includes other functions such as a personal digital assistant and/or a music player.
- a portable electronic device includes but is not limited to a portable electronic device using iOS®, Android®, Microsoft®, or another operating system.
- the portable electronic device may alternatively be another portable electronic device, such as a laptop.
- the electronic device may alternatively be a desktop computer, a television, a notebook computer, a projection device, a set-top box, or the like, but not a portable electronic device.
- FIG. 1 is a schematic diagram of a structure of an electronic device 100 .
- the electronic device 100 may include a processor 110 , an external memory interface 120 , an internal memory 121 , a universal serial bus (USB) port 130 , an antenna, a wireless communications module 160 , a speaker 170 , a microphone 171 , a headset jack 172 , a high definition multimedia interface (high definition multimedia interface, HDMI) 181 , an audio video (audio video, AV) interface 182 , a button 190 , a camera 193 , a display 194 , and the like.
- a processor 110 may include a processor 110 , an external memory interface 120 , an internal memory 121 , a universal serial bus (USB) port 130 , an antenna, a wireless communications module 160 , a speaker 170 , a microphone 171 , a headset jack 172 , a high definition multimedia interface (high definition multimedia interface, HDMI) 181 , an audio video (audio video, AV
- the structure shown in this embodiment of this application does not constitute a specific limitation on the electronic device 100 .
- the electronic device 100 may include more or fewer components than those shown in the figure, or some components may be combined, or some components may be split, or there may be a different component layout.
- the components shown in the figure may be implemented by hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units.
- the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU).
- AP application processor
- GPU graphics processing unit
- ISP image signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- Different processing units may be independent components, or may be integrated into one or more processors.
- the electronic device 100 may alternatively include one or more processors 110 .
- the controller may generate an operation control signal based on instruction operation code and a time sequence signal, to complete control of instruction reading and instruction execution.
- a memory may further be disposed in the processor 110 , to store instructions and data.
- the memory in the processor 110 may be a cache memory.
- the memory may store instructions or data just used or cyclically used by the processor 110 . If the processor 110 needs to use the instructions or the data again, the processor 110 may directly invoke the instructions or the data from the memory. In this way, repeated access is avoided, waiting time of the processor 110 is reduced, and efficiency of processing data or executing instructions by the electronic device 101 is improved.
- the processor 110 may include one or more interfaces.
- the interfaces may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a general-purpose input/output (GPIO) interface, a USB port, and/or the like.
- the USB port 130 is an interface that conforms to a USB standard specification, and may be specifically a mini USB port, a micro USB port, a USB Type-C port, or the like.
- the USB port 130 may be configured to connect to a charger to charge the electronic device 101 , or may be configured to transmit data between the electronic device 101 and a peripheral device.
- the USB port 130 may alternatively be configured to connect to a headset, and play audio by using the headset.
- an interface connection relationship between the modules illustrated in this embodiment of this application is merely an example for description, and does not constitute a limitation on the structure of the electronic device 100 .
- the electronic device 100 may alternatively use an interface connection manner different from that in the foregoing embodiment, or a combination of a plurality of interface connection manners.
- a wireless communication function of the electronic device 100 may be implemented through the antenna, the wireless communications module 160 , the modem processor, the baseband processor, and the like.
- the antenna may be configured to transmit and receive electromagnetic wave signals.
- Each antenna in the electronic device 100 may be configured to cover one or more communications frequency bands. Different antennas may further be multiplexed, to improve antenna utilization. In some other embodiments, the antenna may be used in combination with a tuning switch.
- the wireless communications module 160 may provide a wireless communication solution that is applied to the electronic device 100 and that includes a wireless local area network (WLAN) (for example, a wireless fidelity (Wi-Fi) network), Bluetooth (BT), a global navigation satellite system (GNSS), frequency modulation (FM), a near field communication (NFC) technology, an infrared (IR) technology, and the like.
- WLAN wireless local area network
- BT Bluetooth
- GNSS global navigation satellite system
- FM frequency modulation
- NFC near field communication
- IR infrared
- the wireless communications module 160 may be one or more components integrating at least one communications processor module.
- the wireless communications module 160 receives an electromagnetic wave by an antenna, performs frequency modulation and filtering processing on an electromagnetic wave signal, and sends a processed signal to the processor 110 .
- the wireless communications module 160 may further receive a to-be-sent signal from the processor 110 , perform frequency modulation and amplification on the signal, and convert a processed signal into an electromagnetic wave through the antenna 2 for
- the electronic device 100 implements a display function through the GPU, the display 194 , the application processor, and the like.
- the GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor.
- the GPU is configured to perform: mathematical and geometric calculation, and render an image.
- the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
- the display 194 is configured to display an image, a video, and the like.
- the display 194 includes a display panel.
- a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light emitting diode (AMOLED), a flexible light-emitting diode (FLED), a mini-LED, a micro-LED, a micro-OLED, quantum dot light emitting diodes (QLED), or the like may be used for the display panel.
- the electronic device 100 may include one or more displays 194 .
- the display 194 of the electronic device 100 may be a flexible screen.
- the flexible screen attracts much attention due to unique features and huge potential of the flexible screen.
- the flexible screen has features of strong flexibility and bendability, and can provide a user with a new interaction mode based on the feature of bendability, to meet more requirements of the user for an electronic device.
- the foldable display on the electronic device may be switched between a small screen in a folded form and a large screen in an expanded form at any time. Therefore, the user uses a split-screen function more frequently on the electronic device configured with the foldable display.
- the electronic device 100 may implement a photographing function through the ISP, the camera 193 , the video codec, the GPU, the display 194 , the application processor, and the like.
- the ISP is configured to process data fed back by the camera 193 .
- a shutter is pressed, and light is transmitted to a photosensitive element of the camera through a lens.
- the photosensitive element of the camera converts an optical signal into an electrical signal, and transmits the electrical signal to the ISP for processing.
- the ISP converts the electrical signal into a visible image.
- the ISP may further perform algorithm optimization on noise, brightness, and complexion of the image.
- the ISP may further optimize parameters such as exposure and a color temperature of a photographing scenario.
- the ISP may be disposed in the camera 193 .
- the camera 193 is configured to capture a static image or a video.
- An optical image of an object is generated through the lens, and is projected onto a photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts an optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert the electrical signal into a digital image signal.
- the ISP outputs the digital image signal to the DSP for processing.
- the DSP converts the digital image signal into a standard image signal in an RGB format, a YUV format, or the like.
- the electronic device 100 may include one or more cameras 193 .
- the digital signal processor is configured to process a digital signal, and may process another digital signal in addition to the digital image signal. For example, when the electronic device 100 selects a frequency, the digital signal processor is configured to perform Fourier transform on frequency energy, or the like.
- the video codec is configured to compress or decompress a digital video.
- the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in a plurality of coding formats, for example, moving picture experts group (MPEG)-1, MPEG-2, MPEG-3, and MPEG-4.
- MPEG moving picture experts group
- the NPU is a neural-network (NN) computing processor.
- NN neural-network
- the NPU With reference to a structure of a biological neural network, for example, with reference to a transfer mode between neurons of a human brain, the NPU quickly processes input information, and can further continuously perform self-learning.
- Applications such as intelligent cognition of the electronic device 100 may be implemented through the NPU, for example, image recognition, facial recognition, voice recognition, and text understanding.
- the external memory interface 120 may be configured to connect to an external memory card such as a micro SD card, to extend a storage capability of the electronic device 100 .
- the external memory card communicates with the processor 110 through the external memory interface 120 , to implement a data storage function. For example, files such as music and a video are stored in the external memory card.
- the internal memory 121 may be configured to store one or more computer programs, and the one or more computer programs include instructions.
- the processor 110 may run the instructions stored in the internal memory 121 , so that the electronic device 101 performs a screen-off display method provided in some embodiments of this application, various applications, data processing, and the like.
- the internal memory 121 may include a program storage area and a data storage area.
- the program storage area may store an operating system.
- the program storage area may further store one or more applications (for example, Gallery and Contacts), and the like.
- the data storage area may store data (for example, a photo and a contact) created during use of the electronic device 101 , and the like.
- the internal memory 121 may include a high-speed random access memory, or may include a non-volatile memory, for example, one or more magnetic disk storage devices, flash memory devices, or universal flash storage (UFS).
- the processor 110 may run the instructions stored in the internal memory 121 and/or the instructions stored in the memory that is disposed in the processor 110 , so that the electronic device 101 performs the screen-off display method provided in the embodiments of this application, other applications, and data processing.
- the electronic device 100 may implement an audio function, for example, music playing and recording, through the speaker 170 , the microphone 171 , the headset jack 172 , the application processor, and the like.
- an audio function for example, music playing and recording
- the button 190 includes a power button, a volume button, and the like.
- the button 190 may be a mechanical button or a touch button.
- the electronic device 100 may receive a key input, and generate a key signal input related to user settings and function control of the electronic device 100 .
- the electronic device 100 may receive data through the high definition multimedia interface (HDMI) 181 , and implement display functions such as a split-screen (which may also be referred to as an extended screen) function or a video playing function through the display 194 , the speaker 170 , and the headset jack 172 .
- HDMI high definition multimedia interface
- display functions such as a split-screen (which may also be referred to as an extended screen) function or a video playing function through the display 194 , the speaker 170 , and the headset jack 172 .
- the electronic device 100 may receive video resource data through the audio video (AV) interface 182 , and implement the display functions such as the split-screen function or the video playing function through the display 194 , the speaker 170 , and the headset jack 172 .
- the AV interface 182 may include a V (video) interface 183 , an L (left) interface 184 , and an R (right) interface 185 .
- the V interface 183 may be configured to input a mixed video signal.
- the L interface 184 may be configured to input a left-channel sound signal.
- the R interface 185 may be configured to input a right-channel sound signal.
- FIG. 2 is a block diagram of a software structure of the electronic device 100 according to an embodiment of this application.
- software is divided into several layers, and each layer has a clear role and task.
- the layers communicate with each other through a software interface.
- the Android system is divided into four layers: an application layer, an application framework layer, an Android runtime and system library, and a kernel layer from top to bottom.
- the application layer may include a series of application packages.
- the application packages may include applications such as a voice assistant, a TV playing application, a TV series playing application, a movie playing application, an audio playing application, Gallery, Browser, Clock, and Settings.
- applications such as a voice assistant, a TV playing application, a TV series playing application, a movie playing application, an audio playing application, Gallery, Browser, Clock, and Settings.
- the application framework layer provides an application programming interface (API) and a programming framework for an application at the application layer.
- API application programming interface
- the application framework layer includes some predefined functions.
- the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like.
- the window manager is configured to manage a window program.
- the window manager may obtain a size of the display, determine whether there is a status bar, perform screen locking, take a screenshot, and the like.
- the content provider is configured to store and obtain data, and enable the data to be accessed by an application.
- the data may include a video, an image, audio, a browsing history, a bookmark, and the like.
- the view system includes visual controls, such as a control for displaying a text and a control for displaying an image.
- the view system may be configured to construct an application.
- a display interface may include one or more views.
- a TV series playing interface may include a text display view, an image display view, and a video display view.
- the resource manager provides various resources for an application, such as a localized character string, an icon, a picture, a layout file, and a video file.
- the notification manager enables an application to display notification information in the status bar, and may be used to transmit a notification-type message.
- the displayed information may automatically disappear after a short pause without user interaction.
- the notification manager is configured to notify download completion, give a message notification, and the like.
- the notification manager may alternatively be a notification that appears in a top status bar of the system in a form of a graph or a scroll bar text, for example, a notification of an application program running on a background or a notification that appears on the interface in a form of a dialog window. For example, text information is prompted in the status bar or a prompt tone is produced.
- the Android runtime includes a kernel library and a virtual machine.
- the Android runtime is responsible for scheduling and management of the Android system.
- the kernel library includes two parts: a function that needs to be invoked in Java language and a kernel library of Android.
- the application layer and the application framework layer are run on the virtual machine.
- the virtual machine executes Java files of the application layer and the application framework layer as binary files.
- the virtual machine is configured to perform functions such as object lifecycle management, stack management, thread management, security and exception management, and garbage collection.
- the system library may include a plurality of function modules, such as a surface manager, a media library, a three-dimensional graphics processing library (for example, OpenGL ES), and a 2D graphics engine (for example, SGL).
- function modules such as a surface manager, a media library, a three-dimensional graphics processing library (for example, OpenGL ES), and a 2D graphics engine (for example, SGL).
- the surface manager is configured to manage a display subsystem and provide fusion of 2D and 3D layers for a plurality of applications.
- the media library supports playback and recording of a plurality of commonly used audio and video formats, static image files, and the like.
- the media library may support a plurality of audio and video coding formats such as MPEG-4, H.264, MP3, AAC, AMR, JPG, and PNG.
- the three-dimensional graphics processing library is configured to implement three-dimensional graphics drawing, image rendering, composition, layer processing, and the like.
- the 2D graphics engine is a drawing engine for 2D drawing.
- the kernel layer is a layer between hardware and software.
- the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
- the voice assistant in the application packages may be a system-level application.
- the voice assistant may also be referred to as a human-machine interaction robot, a human-machine conversation robot, a chatbot, or the like.
- the voice assistant application may further be referred to as a smart assistant application or the like.
- the voice assistant is widely used in various electronic devices such as a mobile phone, a tablet computer, a smart speaker, and a smart TV, and provides the user with an intelligent voice interaction mode.
- the voice assistant is one of the cores of human-machine interaction.
- FIG. 3 is a schematic diagram of a user interface 300 of an electronic device 100 .
- the electronic device 100 may be the electronic device 100 shown in FIG. 1 .
- the electronic device 100 may be a large-screen display apparatus such as a television or a projection device.
- a plurality of icons may be displayed on the user interface 300 .
- the user interface 300 may include a plurality of menu icons 301 , a plurality of resource collection icons 302 , and a plurality of function icons 303 . It may be understood that the user interface 300 shown in this embodiment of this application does not constitute a specific limitation on a user interface 300 . In some other embodiments of this application, the user interface 300 may include more or fewer icons than those shown in the figure, or some icons may be combined, or some icons may be split, or there may be a different icon layout.
- the plurality of menu icons 301 may include: a “home page” icon, a “TV series” icon, a “movies” icon, a “children” icon, an “applications” icon, a “music” icon, a “radio” icon, an “education” icon, a “variety shows” icon, and the like. It should be understood that the electronic device 100 may further provide more menu icons 301 for the user. However, because a size of the user interface 300 is limited, the user interface 300 may display some of all menu icons 301 . For example, the user may select the menu icon 301 by using an infrared remote control or a voice.
- the user selects the “home page” icon, and the electronic device 100 may display the plurality of resource collection icons 302 .
- Types of the plurality of resource collections may include a TV series collection, a movie collection, a collection for children, an application collection, a music collection, a radio collection, an education collection, a variety show collection, and the like.
- the electronic device 100 may display icons of three currently most popular TV series collections and icons of three currently most popular movie collections.
- the user selects the “TV series” icon.
- the electronic device 100 may display icons of a plurality of TV series collections.
- the electronic device 100 may display icons of three currently most popular TV series collections and icons of three relatively popular TV series collections that are being updated.
- the three currently most popular TV series collections may include “Ruyi's Royal Love in the Palace”, “Tomb of the Sea”, and “Ever Night”.
- the three relatively popular TV series collections that are being updated may include “Behind the Scenes”, “House of Cards”, and “The Story of Minglan”.
- the icon of each TV series collection may include a schematic diagram 3021 of the TV series collection (for example, a still of the TV series), a name 3022 of the TV series collection (for example, a name of the TV series), and a quantity 3023 of episodes of the TV series collection (for example, a quantity of latest episodes of the TV series).
- “total X episodes” may be displayed (for example, “all 87 episodes” is displayed on the icon corresponding to “Ruyi's Royal Love in the Palace” in FIG. 3 ).
- “updated to episode Y” may be displayed (for example, “updated to episode 8” is displayed on the icon corresponding to “Behind the Scenes” in FIG. 3 ).
- the user selects the “movies” icon, and the electronic device 100 may display icons of a plurality of movie collections.
- the electronic device 100 may display icons of three currently most popular movie collections and icons of three movie collections that are just released.
- the user selects the “children” icon, and the electronic device 100 may display icons of a plurality of collections for children.
- the electronic device 100 may display icons of three currently most popular programs for children and icons of three cartoons that are just most popular among children.
- the user selects the “applications” icon, and the electronic device 100 may display icons of a plurality of applications.
- the electronic device 100 may display icons of three applications most recently used by the user, and icons of three most frequently used applications.
- the user selects the “music” icon, and the electronic device 100 may display icons of a plurality of music collections.
- the electronic device 100 may display icons of three music albums that are just released and icons of three new favorite music playlists of the user.
- the user selects the “radio” icon, and the electronic device 100 may display icons of a plurality of radio collections.
- the electronic device 100 may display icons of three currently most popular radio programs and icons of three new favorite radio programs of the user.
- the user selects the “education” icon, and the electronic device 100 may display icons of a plurality of education collections.
- the electronic device 100 may display icons of three currently most popular education collections and icons of three education collections most recently played by the user.
- the user selects the “variety shows” icon, and the electronic device 100 may display icons of a plurality of variety show collections.
- the electronic device 100 may display icons of three currently most popular variety show collections and icons of three variety show collections most recently played by the user.
- the plurality of function icons 303 on the user interface 300 may include a back icon, a user information icon, a settings icon, a wireless connection icon, a clock icon, and the like.
- the user may return to an upper-level user interface 300 by selecting the back icon.
- the user may view information about a user account logged in to the electronic device 100 by selecting the user information icon.
- the user may enter a setting interface and adjust a parameter of the electronic device 100 by selecting the settings icon.
- the user may use a wireless connection function of the electronic device 100 by selecting the wireless connection icon, for example, search for an available wireless network around the electronic device 100 , and access the available wireless network.
- the user may view the clock icon to learn a current time.
- the user may set a clock parameter of the electronic device 100 by selecting the clock icon.
- FIG. 4 shows a method for controlling a device using voice.
- a client device may be the electronic device 100 shown in FIG. 1 .
- the client device displays a current user interface.
- the current user interface displayed by the client device may be the user interface 300 shown in FIG. 3 .
- the client device obtains a voice instruction of a user, where the voice instruction is used to indicate a target operation.
- the user may say the voice instruction, for example, “play episode 30 of Ruyi's Royal Love in the Palace”.
- the voice instruction may be used to instruct the client device to play the 30 th episode video resource in the TV series collection “Ruyi's Royal Love in the Palace”.
- step 402 may be completed by a voice assistant on the client device.
- the user may say a wake-up word, to wake up the client device to capture the voice instruction of the user.
- the client device may display prompt information on the current user interface, to prompt the user that a voice recognition function of the client device is being used.
- the client device determines the target operation based on a voice recognition file and the voice instruction, where the voice recognition file is used to determine the target operation corresponding to the voice instruction.
- Step 403 may be completed by the voice assistant on the client device.
- the voice recognition file may include a plurality of types of information used to determine the target operation. Therefore, the client device may determine the operation corresponding to the voice instruction.
- the voice recognition file may include data used to determine that the voice instruction is a video playing instruction.
- the voice recognition file may include data used to determine that the voice instruction is an application download instruction.
- the client device may implement a voice recognition operation by using an automatic speech recognition (ASR) module and a natural language understanding (NLU) module.
- ASR automatic speech recognition
- NLU natural language understanding
- the client device performs the target operation.
- the client device may perform the target operation in response to the voice instruction sent by the user.
- the client device may recognize a voice of the user based on the voice recognition file.
- Content such as a data resource library and a user interface needs to be frequently updated, to obtain good user experience.
- the voice recognition file further needs to be updated, so that the user can conveniently use the voice instruction. Therefore, a large amount of work is required to update the voice recognition file.
- a data amount of a voice package is usually relatively large, and is not conducive to voice recognition efficiency.
- FIG. 5 shows a method for controlling a device using a voice according to an embodiment of this application.
- a client device may be the electronic device 100 shown in FIG. 1 .
- the client device obtains a voice instruction of a user, where the voice instruction is used to indicate a target instruction or a target operation.
- step 501 may be completed by a voice assistant on the client device.
- the user may say a wake-up word, to wake up the client device to capture the voice instruction of the user.
- the target instruction may be, for example, text content of the voice instruction.
- the target operation may be, for example, a response operation indicated by the target instruction.
- the user may say a voice instruction, for example, “play episode 30 of Ruyi's Royal Love in the Palace.”
- a voice instruction for example, “play episode 30 of Ruyi's Royal Love in the Palace.”
- the user may select a TV series collection “Ruyi's Royal Love in the Palace” on the user interface, and select to watch the 30 th episode video resource in the “Ruyi's Royal Love in the Palace” collection.
- the voice instruction may be used to instruct the client device to play the 30 th episode video resource in the TV series collection “Ruyi's Royal Love in the Palace.”
- the user may say a voice instruction, for example, “display a movie page.”
- a voice instruction for example, “display a movie page.”
- the user may select a movie collection on the user interface, to continue to browse a movie resource in the movie collection.
- the voice instruction may be used to instruct the client device to display a user interface corresponding to the movie collection.
- the user may say a voice instruction, for example, “enable Wi-Fi (namely, wireless fidelity).”
- a voice instruction for example, “enable Wi-Fi (namely, wireless fidelity).”
- the user may select a wireless connection icon on the user interface, and set a wireless connection parameter of the client device.
- the voice instruction may be used to instruct the client device to start a wireless connection module.
- the user may say a voice instruction, for example, “the third.”
- a voice instruction for example, “the third.”
- the user may select an icon or a control instruction corresponding to a corner mark 3 .
- the voice instruction may be used to instruct the client device to perform an operation corresponding to the corner mark 3 .
- the user may say a voice instruction, for example, “next page.”
- the user may control the client device to perform a page turning operation, so that the user may continue to browse a next page of the user interface.
- the voice instruction may be used to instruct the client device to display the next page of the user interface.
- the client device may display prompt information on the current user interface, to prompt the user that a voice recognition function of the client device is being used.
- the client device obtains user interface information of a current user interface, where the current user interface is a user interface currently displayed by the client device.
- the current user interface may be the user interface observed by the user in step 501 .
- the current user interface may be, for example, the user interface 300 shown in FIG. 3 .
- the user interface information may include various types of information indicating the current user interface.
- step 502 may be completed by a voice assistant on the client device.
- steps 501 and 502 may be reversed.
- step 501 is performed before step 502 .
- step 502 is performed before step 501 .
- the user interface information includes at least one of the following information: an icon name, hot word information, indication information of a control instruction, or target corner mark information of the current user interface.
- the user interface information may include an icon name of the current user interface.
- the user interface 300 shown in FIG. 3 is used as an example.
- the user interface 300 may include a “home page” icon, a “TV series” icon, a “movies” icon, a “children” icon, an “applications” icon, a “music” icon, a “radio” icon, an “education” icon, a “variety shows” icon, a “Ruyi's Royal Love in the Palace” collection icon, a “Tomb of the Sea” collection icon, an “Ever Night” collection icon, a “Behind the Scenes” collection icon, a “House of Cards” collection icon, a “The Story of Minglan” collection icon, a back icon, a user information icon, a settings icon, a wireless connection icon, a clock icon, and the like.
- user interface information corresponding to the user interface 300 may include: a home page, TV series, movies, children, applications, music, radio, education, variety shows, Ruyi's Royal Love in the Palace, Tomb of the Sea, Ever Night, Behind the Scenes, House of Cards, The Story of Minglan, back, user information, settings, a wireless connection, a clock, and the like.
- the collection may be a collection of data resources.
- the “Ruyi's Royal Love in the Palace” collection may be a resource collection including all TV series videos of the “Ruyi's Royal Love in the Palace” collection.
- the user interface information includes hot word information.
- the hot word information included in the user interface information corresponding to the user interface 300 may include “kids.”
- “music” is referred to as “song.”
- “music” may correspond to the hot word “song.” Therefore, the hot word information included in the user interface information corresponding to the user interface 300 may include “song.”
- “radio” is referred to as “broadcast.”
- “radio” may correspond to the hot word “broadcast.” Therefore, the hot word information included in the user interface information corresponding to the user interface 300 may include “broadcast.”
- “Ruyi's Royal Love in the Palace” is often referred to as “Ruyi” or the like for short.
- “Ruyi's Royal Love in the Palace” may correspond to the hot word “Ruyi.” Therefore, the hot word information included in the user interface information corresponding to the user interface 300 may include “Ruyi.”
- the Story of Minglan is often referred to as “Zhifou,” “Minglan,” or the like for short.
- “The Story of Minglan” may correspond to the hot word “Zhifou” and the hot word “Minglan.” Therefore, the hot word information included in the user interface information corresponding to the user interface 300 may include “Zhifou” and “Minglan.”
- “Marvel's The Avengers” is often referred to as “The Avengers,” “Avengers,” or the like for short.
- “Marvel's The Avengers” may correspond to the hot word “The Avengers” and the hot word “Avengers.” Therefore, when the current user interface includes a “Marvel's The Avengers” collection icon, the hot word information included in the user interface information may include “The Avengers” and “Avengers.”
- “user information” is often referred to as “account”, “login information”, or the like.
- “user information” may correspond to the hot word “account”, the hot word “login information”, or the like. Therefore, the hot word information included in the user interface information corresponding to the user interface 300 may include “account” and “login information”.
- functions of a “wireless connection” application may include connecting to wireless fidelity (Wi-Fi). Therefore, “wireless connection” may correspond to hot words “Wi-Fi,” “wireless,” “hotspot,” “network,” or the like. Therefore, the hot word information included in the user interface information corresponding to the user interface 300 may include “Wi-Fi,” “wireless,” “hotspot,” and “network.”
- a function of a “clock” application is to view time. Therefore, “clock” may correspond to hot words “time” and “ what time.” Therefore, the hot word information included in the user interface information corresponding to the user interface 300 may include “time” and “what time.”
- the user interface information includes indication information of a control instruction.
- control instruction may include at least one of the following: a user interface refreshing instruction, a user interface moving instruction, a page turning instruction, a selection box moving instruction, and the like.
- the user interface information may include at least one of information such as “refresh”, “refresh the page,” “refresh the interface,” and “refresh the user interface.”
- the user interface information may include at least one of information such as “move left,” “move right,” “move up,” “move down,” “move,” “slide,” and “move the user interface.”
- the user interface information may include at least one of information such as “previous page,” “next page,” “turn the page,” “turn left,” and “turn right.”
- the user interface information may include at least one of information such as “next,” “previous,” and “move the selection box.”
- the user interface information may include an expression that may be used by the user, and the expression may indicate a corresponding control instruction.
- the user interface information further includes target corner mark information displayed on the current user interface.
- the target corner mark information may correspond to a target icon, a target collection, or a target control instruction.
- the user interface information further includes a correspondence between the target corner mark information and the target icon.
- the user interface information further includes a correspondence between the target corner mark information and the target collection.
- the current user interface may display a plurality of corner marks 601 , including a corner mark 1 , a corner mark 2 , a corner mark 3 , a corner mark 4 , a corner mark 5 , and a corner mark 6 .
- the corner mark 1 may correspond to the icon of the TV series collection
- the corner mark 2 may correspond to the icon of the TV series collection “Tomb of the Sea.”
- the corner mark 3 may correspond to the icon of the TV series collection “Ever Night.”
- the corner mark 4 may correspond to the icon of the TV series collection “Behind the Scenes.”
- the corner mark 5 may correspond to the icon of the TV series collection “House of Cards.”
- the corner mark 6 may correspond to the icon of the TV series collection “The Story of Minglan.” Therefore, the user interface information may include information indicating “the corner mark 1 -Ruyi's Royal Love in the Palace,” information indicating “the corner mark 2 -Tomb of the Sea,” information indicating “the corner mark 3 -Ever Night,” information indicating “the corner mark 4-Behind the Scenes,” information indicating “the corner mark 5 -House of Cards,” and information indicating “the corner mark 6 -The Story of Minglan.”
- the user interface information further includes a correspondence between the target corner mark information and the target control instruction.
- a corner mark 1 and a corner mark 2 are displayed on the current user interface.
- a target control instruction corresponding to the corner mark 1 is to play a video based on a resolution of 720 P.
- a target control instruction corresponding to the corner mark 2 is to play a video based on a resolution of 1080 P. Therefore, the user interface information may include information indicating “the corner mark 1 -720 P” and information indicating “the corner mark 2 -1080 R”
- obtaining the user interface information of the current user interface includes: sending first indication information to a foreground application, where the first indication information is used to indicate the foreground application to feed back the user interface information; and receiving the user interface information sent by the foreground application, where the user interface information is obtained by the foreground by retrieving information related to the current user interface.
- the foreground application may be, for example, a video playing application, an audio playing application, a desktop application, a setting application, a live TV application, or a radio application.
- the voice assistant of the client device may send the first indication information to the foreground application of the client device by invoking a software interface of the foreground application, and the first indication information may indicate the foreground application to feed back the user interface information.
- the foreground application may send the user interface information to the voice assistant based on the first indication information through the software interface.
- a manner in which the foreground application determines the user interface information may further include obtaining, from a network device (for example, a cloud server), data related to the element on the current user interface. For example, hot words related to “Ruyi's Royal Love in the Palace” are obtained, including “Ruyi.”
- a manner in which the foreground application determines the user interface information may be: searching for a document used to display the current user interface, to obtain the user interface information.
- the document may include, for example, a hypertext markup language (HTML) file, an extensible markup language (XML) file, and a script file.
- the user interface information further includes an identifier of the foreground application.
- the voice assistant may learn that the current user interface is provided by the foreground application and that the user interface information is provided by the foreground application.
- the following Table 1 provides code for the voice assistant to obtain the user interface information from the foreground application.
- the user interface information may include the hot word information, the indication information of the control instruction, and a maximum corner mark value.
- the method further includes: displaying a corner mark on the current user interface.
- the method further includes: removing the corner mark on the current user interface.
- the foreground application may remove the corner mark on the current user interface. For example, after the foreground application feeds back the user interface information to the voice assistant, the foreground application may remove the corner mark on the user interface.
- the user interface without the corner mark may have a relatively concise display effect.
- the server receives the user interface information and the voice instruction that are sent by the client device.
- step 503 may be completed by the voice assistant on the client device.
- the client device may not have a voice recognition capability.
- the client device may not have a capability of converting a voice instruction of the user into a device control instruction corresponding to the voice instruction.
- the client device may send the voice instruction of the user to the server, and the server performs a voice recognition operation.
- the server may perform the voice recognition operation based on the user interface currently displayed by the client device. Therefore, the client device may send, to the server, user interface indication information used to indicate the currently displayed user interface.
- the server determines the target instruction based on the user interface information and the voice instruction of the user, where the target instruction is used to instruct the client device to perform the target operation.
- the voice instruction is “play episode 30 of Ruyi's Royal Love in the Palace,” and the user interface information includes “Ruyi's Royal Love in the Palace.”
- the server may determine, based on “play” in the voice instruction, that a type of the target operation is playing audio and video, and match “Ruyi's Royal Love in the Palace” in the voice instruction with “Ruyi's Royal Love in the Palace” in the user interface information, so that the server may determine the target instruction, and the target operation corresponding to the target instruction is to play the 30 th episode video resource in the TV series collection “Ruyi's Royal Love in the Palace.”
- the voice instruction is “display a movie page,” and the user interface information includes “movie.”
- the server may determine, based on “display” in the voice instruction, that a type of the target operation is displaying a specific user interface, and match “movie” in the voice instruction with “movie” in the user interface information, so that the server may determine the target instruction, and the target operation corresponding to the target instruction is to display a user interface corresponding to a movie collection.
- the voice instruction is “enable Wi-Fi,” and the user interface information includes “Wi-Fi.”
- the server may determine, based on “enable” in the voice instruction, that a type of the target operation is enabling a specific function, and match “Wi-Fi” in the voice instruction with “Wi-Fi” in the user interface information, so that the server may determine the target instruction, and the target operation corresponding to the target instruction is to start the wireless connection module of the client device.
- the voice instruction is “the third,” and the user interface information includes “the corner mark 3 -Ever Night.”
- the server may determine, based on the voice instruction, that a type of the target operation is tapping, and match “ 3 ” in the voice instruction with “the corner mark 3 -Ever Night” in the user interface information, so that the server may determine the target instruction, and the target operation corresponding to the target instruction is to tap the icon of the TV series collection “Ever Night.”
- the voice instruction is “next page,” and the user interface information includes “next page.”
- the server may determine, based on the voice instruction, that a type of the target operation is a page turning operation, and match “next page” in the voice instruction with “next page” in the user interface information, so that the server may determine the target instruction, and the target operation corresponding to the target instruction is to display a next page of the user interface.
- the server may implement the voice recognition operation by using an automatic speech recognition (ASR) module and a natural language understanding (NLU) module.
- the server or the client device may further include a dialog state tracking (DST) module, a dialog management (DM) module, a natural language generation (NLG) module, a text to speech (TTS) module, and the like, to implement the voice recognition operation.
- DST dialog state tracking
- DM dialog management
- NLG natural language generation
- TTS text to speech
- a main function of the ASR module is to recognize a voice of the user as text content.
- the ASR module may process a segment of voice based on the user interface information and the voice instruction of the user, to convert the segment of voice into a corresponding text.
- a part of the voice instruction may correspond to an icon name included in the user interface information. Due to development of machine learning capabilities in recent years, recognition accuracy of the ASR speech recognition module is greatly improved. This makes voice interaction between human and machine possible. Therefore, ASR is the real starting point of voice interaction. Although the ASR module can learn what the user says, the ASR module cannot understand a meaning of the user. Semantic understanding is handled by the NLU module.
- a main function of the NLU module is to understand a user intent and parse a slot.
- the NLU module may determine an intent and a slot of the voice instruction based on the user interface information.
- the text obtained by the ASR module may correspond to the icon name included in the user interface information.
- the currently displayed user interface is shown in FIG. 3 .
- the user expresses: Play episode 30 of Ruyi's Royal Love in the Palace.
- the NLU module may obtain content shown in Table 2 through parsing.
- the NLU module may convert the voice instruction “play episode 30 of Ruyi's Royal Love in the Palace” into a corresponding target instruction.
- the intent may be understood as a classifier, to determine a type of a sentence expressed by the user. Then, a program corresponding to the type performs special parsing.
- the “program corresponding to the type” may be a robot (e.g., Bot). For example, the user says: “Play me a comedy movie.” The NLU module determines that an intent classification of the user is a movie, and therefore summons a movie robot to recommend a movie to the user for playing. When the user hears and feels wrong, the user says: “Change another one.” The movie robot continues to serve the user until the user expresses another question and the intent is not a movie. Then, another robot is switched to serve the user.
- Bot e.g., Bot
- the NLU module needs to further understand content in the dialog. For simplicity, a core part may be selected for understanding, and another part may be ignored. The most important part may be referred to as a slot.
- Two core slots are defined in the example of “play episode 30 of Ruyi's Royal Love in the Palace:” a “video name” and an “episode.” If content that the user needs to input during video playing needs to be comprehensively considered, certainly more content may be thought of, such as a playing start point, a playing speed, and a playing resolution. For a voice interaction designer, the starting point is to define a slot.
- the following provides several types of code for determining the target instruction.
- a main function of the DST module is to check and combine slots.
- a main function of the DM module is to perform sequential slot filling, clarification, and disambiguation.
- the NLU module may determine that the intent of the user is “play,” and slot information related to the intent is the “video name” and the “episode.” However, the statement expressed by the user includes only the slot information “video name.”
- the DST module determines that the slot information “episode” is missing, and the DST module may send the missing slot information to the DM module.
- the DM module controls the NLG module to generate a dialog for querying the user for the missing slot information.
- the user I want to watch a video
- the DM module may first perform slot filling on the slot information in a preset sequence.
- a slot filling sequence may be the “video name” and the “episode,” and slot information corresponding to the slots is “Ruyi's Royal Love in the Palace” and “episode 30 ” respectively.
- the DM module may control a command execution module to perform the “play” operation.
- the command execution module may open a TV series application, and start playing from the episode 30 of Ruyi's Royal Love in the Palace.
- the DST module and the DM module may be uniformly considered as a whole, and are configured to control and manage a dialog status. For example, if the user expresses a need for “play” but does not specify any information, the dialog system needs to ask the user for slot information that needs to be obtained.
- a main function of the NLG module is to generate a dialog.
- the DM module may control the NLG module to generate a corresponding dialog “which episode do you want to play from?”
- the command execution module may notify the DM module that the operation is completed.
- the DM module may control the NLG module to generate a corresponding dialog “now play episode 30 of Ruyi's Royal Love in the Palace for you . . . . ”
- a main function of the TTS module is to broadcast a dialog to the user.
- TTS is a voice synthesis broadcast technology.
- a main goal of TTS is to handle a “phonological” problem of broadcasting well. This needs to determine and uniformly consider information such as a symbol, a polyphone, and a sentence pattern, and handle pronunciation of words in broadcasting.
- “timbre” also requires attention.
- TTS is to handle “phonology” and “timbre” well.
- the server sends the target instruction to the client device.
- the client device receives the target instruction sent by the server.
- the voice assistant on the client device may receive the target instruction sent by the server.
- the server feeds back a recognition result of the voice instruction to the client device.
- step 506 is further included.
- the client device determines and performs the target operation based on the target instruction.
- the client device may determine, based on the target instruction sent by the server, the target operation indicated by the target instruction, and perform the target operation, to respond to the voice instruction sent by the user.
- the target instruction includes the identifier of the foreground application.
- the foreground application sends user interface information including the identifier of the foreground application to the voice assistant, and the voice assistant then sends the user interface information to the server.
- the server determines the target instruction based on the user interface information and the voice instruction, and the target instruction may carry the identifier of the foreground application. Therefore, the voice assistant may invoke the software interface of the foreground application based on the identifier of the foreground application sent by the server, and send the target instruction to the foreground application.
- the foreground application may perform the target operation based on the target instruction.
- the voice assistant may be, for example, a voice assistant of a client device.
- the voice assistant establishes a binding relationship between the voice assistant and a foreground application.
- the foreground application may be, for example, a foreground application of the client device.
- the voice assistant may invoke a software interface of the foreground application, to establish the binding relationship between the voice assistant and the foreground application.
- the foreground application displays a corner mark.
- the client device may display one or more corner marks on a currently displayed user interface.
- the currently displayed user interface may be a current user interface.
- the voice assistant obtains the voice instruction of the user.
- step 804 For a specific implementation of step 804 , refer to step 501 in the embodiment shown in FIG. 5 .
- the foreground application sends user interface information of the current user interface to the voice assistant.
- the voice assistant receives the user interface information sent by the foreground application.
- step 805 For a specific implementation of step 805 , refer to step 502 in the embodiment shown in FIG. 5 .
- the foreground application may send the user interface information to the voice assistant. If the period of time does not exceed a preset threshold (for example, 100 ms), the voice assistant may send the user interface information to an access platform of a cloud server. If the voice assistant does not receive the user interface information within a long period of time after the voice assistant is bound to the foreground application, an interface between the client and the cloud server does not carry parameters.
- a preset threshold for example, 100 ms
- the voice assistant sends the voice instruction and the user interface information to the access platform of the server.
- the access platform of the server receives the voice instruction and the user interface information that are sent by the voice assistant.
- step 806 For a specific implementation of step 806 , refer to step 503 in the embodiment shown in FIG. 5 .
- the voice assistant unbinds the binding relationship between the voice assistant and the foreground application.
- the voice assistant may cancel invoking the software interface of the foreground application.
- the foreground application removes the corner mark on the current user interface.
- the access platform may send, to an ASR module of the server, the voice instruction and the user interface information that are sent by the voice assistant.
- the ASR module may receive the voice instruction and the user interface information that are sent by the access platform.
- the ASR module may convert the voice instruction into a text based on the user interface information, and send the text to the access platform.
- the access platform receives the text sent by the ASR module.
- the access platform may send, to a DM module of the server, the user interface information sent by the voice assistant and the text sent by the ASR module.
- the DM module receives the text and the user interface information.
- the DM module parses an intention and a slot on the text based on the user interface information, to obtain the target instruction corresponding to the voice instruction.
- the DM module sends the target instruction to the access platform.
- the access platform receives the target instruction sent by the DM module.
- steps 809 to 813 refer to step 504 in the embodiment shown in FIG. 5 .
- the access platform sends the target instruction to the voice assistant.
- the voice assistant receives the target instruction sent by the access platform.
- step 814 For a specific implementation of step 814 , refer to step 505 in the embodiment shown in FIG. 5 .
- the voice assistant invokes the software interface of the foreground application.
- the voice assistant sends the target instruction to the foreground application.
- the foreground application receives the target instruction sent by the voice assistant.
- the foreground application executes, based on the target instruction, a target operation indicated by the voice instruction.
- steps 815 to 817 refer to step 506 in the embodiment shown in FIG. 5 .
- the foreground application sends a feedback result to the voice assistant.
- the feedback result may indicate that the foreground application successfully receives the target instruction.
- the foreground application displays the execution result of the target operation to the user.
- the user may perceive that the client device responds to the voice instruction of the user by performing the target operation.
- FIG. 9 is a schematic flowchart of a method for controlling a device using a voice according to an embodiment of this application.
- the user interface information includes at least one of the following information: an icon name, hot word information, indication information of a control instruction, or target corner mark information of the current user interface.
- the target corner mark information corresponds to a target icon or a target control instruction.
- the method 900 shown in FIG. 9 is executed by the client device.
- determining the target instruction corresponding to the voice instruction includes: The client device determines the target instruction based on the voice instruction and the user interface information.
- the client device may implement a voice recognition operation with reference to the user interface information by using an automatic speech recognition (ASR) module, a natural language understanding (NLU) module.
- ASR automatic speech recognition
- NLU natural language understanding
- steps 901 and 902 refer to steps 501 and 502 in the embodiment shown in FIG. 5 .
- steps 903 refer to step 504 in the embodiment shown in FIG. 5 .
- the method 900 shown in FIG. 9 is executed by the client device.
- the method further includes: sending the user interface information and the voice instruction to a server.
- the determining the target instruction corresponding to the voice instruction includes: receiving the target instruction sent by the server, where the target instruction is determined by the server based on the user interface information and the voice instruction of the user.
- steps 901 and 902 refer to steps 501 and 502 in the embodiment shown in FIG. 5 .
- steps 903 refer to steps 503 to 505 in the embodiment shown in FIG. 5 .
- the method 900 shown in FIG. 9 is executed by a server.
- obtaining the voice instruction of the user includes: receiving the voice instruction sent by the client device, obtaining the user interface information of the current user interface includes: receiving the user interface information sent by the client device, and determining the target instruction corresponding to the voice instruction includes: determining the target instruction based on the voice instruction and the user interface information.
- the method further includes: sending the target instruction to the client device.
- steps 901 and 902 refer to step 503 in the embodiment shown in FIG. 5 .
- step 903 refer to step 4 in the embodiment shown in FIG. 5 .
- the electronic device includes corresponding hardware and/or software modules for performing the functions.
- this application may be implemented in a form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or hardware driven by computer software depends on particular applications and design constraint conditions of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application with reference to the embodiments, but it should not be considered that the implementation goes beyond the scope of this application.
- the electronic device may be divided into function modules based on the foregoing method examples.
- each function module corresponding to each function may be obtained through division, or two or more functions may be integrated into one processing module.
- the integrated module may be implemented in a form of hardware. It should be noted that the module division in the embodiments is an example, and is merely logical function division. In an actual implementation, another division manner may be used.
- FIG. 10 is a possible schematic composition diagram of an electronic device 1000 in the foregoing embodiments.
- the electronic device 1000 may include an obtaining module 1001 and a processing module 1002 .
- the electronic device 1000 may be, for example, the client device or the server described above.
- the obtaining module 1001 may be configured to obtain a voice instruction of a user, where the voice instruction is used to indicate a target instruction.
- the voice assistant in FIG. 2 may be configured to implement a function of the obtaining unit 1001 .
- the obtaining module 1001 may further be configured to obtain user interface information of a current user interface, where the current user interface is a user interface currently displayed by the client device.
- the voice assistant in FIG. 2 may be configured to implement a function of the obtaining unit 1001 .
- the processing module 1002 is configured to determine the target instruction corresponding to the voice instruction, where the target instruction is obtained by using the voice instruction and the user interface information.
- the electronic device provided in this embodiment is configured to perform the method for controlling a device using a voice. Therefore, an effect the same as the effect of the foregoing implementation methods can be achieved.
- the electronic device may include a processing module, a storage module, and a communications module.
- the processing module may be configured to control and manage actions of the electronic device, for example, may be configured to support the electronic device to perform the steps performed by the foregoing units.
- the storage module may be configured to support the electronic device to store program code, data, and the like.
- the communications module may be configured to support communication between the electronic device and another device.
- the processing module may be a processor or a controller.
- the processing module may implement or execute various example logical blocks, modules and circuits described with reference to content disclosed in this application.
- the processor may be a combination of processors implementing a computing function, for example, a combination of one or more microprocessors, or a combination of a digital signal processor (DSP) and a microprocessor.
- the storage module may be a memory.
- the communications module may be specifically a device that interacts with another electronic device, such as a radio frequency circuit, a Bluetooth chip, or a Wi-Fi chip.
- the electronic device in this embodiment may be a device in the structure shown in FIG. 1 .
- An embodiment further provides a computer program product.
- the computer program product runs on a computer, the computer is enabled to perform the foregoing related steps, to implement the method for controlling a device using a voice in the foregoing embodiments.
- an embodiment of this application further provides an apparatus.
- the apparatus may be specifically a chip, a component, or a module.
- the apparatus may include a processor and a memory that are connected to each other.
- the memory is configured to store computer-executable instructions.
- the processor may execute the computer-executable instructions stored in the memory, to enable the chip to perform the method for controlling a device using a voice in the foregoing method embodiments.
- An embodiment of this application provides a terminal device.
- the terminal device has a function of implementing the actions of the terminal device in any one of the foregoing method embodiments.
- the function may be implemented by hardware, or may be implemented by hardware executing corresponding software.
- the hardware or the software includes one or more modules corresponding to sub-functions in the function.
- the terminal device may be user equipment.
- An embodiment of this application further provides a communications system, and the system includes the network device (for example, a cloud server) and the terminal device that are described in any one of the foregoing embodiments.
- the network device for example, a cloud server
- the terminal device that are described in any one of the foregoing embodiments.
- An embodiment of this application further provides a communications system, and the system includes the electronic device and the server that are described in any one of the foregoing embodiments.
- An embodiment of this application further provides a computer-readable storage medium, and the computer-readable storage medium stores a computer program.
- the computer program When the computer program is executed by a computer, a method procedure related to the terminal device in any one of the foregoing method embodiments is implemented.
- the computer may be the foregoing terminal device.
- An embodiment of this application further provides a computer program or a computer program product including a computer program.
- the computer program When the computer program is executed on a computer, the computer is enabled to implement a method procedure related to the terminal device in any one of the foregoing method embodiments.
- the computer may be the foregoing terminal device.
- An embodiment of this application further provides an apparatus for use in a terminal device.
- the apparatus is coupled to a memory, and is configured to read and execute instructions stored in the memory, so that the terminal device is enabled to perform a method procedure related to the terminal device in any one of the foregoing method embodiments.
- the memory may be integrated into a processor, or may be independent of a processor.
- the apparatus may be a chip (for example, a system on a chip (SoC)) on the terminal device.
- SoC system on a chip
- the processor in the embodiments of this application may be a central processing unit (CPU), or may be another general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or another programmable logic device, a discrete gate or a transistor logic device, a discrete hardware component, or the like.
- the general-purpose processor may be a microprocessor, or the processor may be any conventional processor or the like.
- the memory in the embodiments of this application may be a volatile memory, or a non-volatile memory, or may include both a volatile memory and a non-volatile memory.
- the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
- the volatile memory may be a random access memory (RAM), and used as an external cache.
- RAMs may be used, for example, a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), a double data rate synchronous dynamic random access memory (DDR SDRAM), an enhanced synchronous dynamic random access memory (ESDRAM), a synchronous link dynamic random access memory (SLDRAM), and a direct rambus random access memory (DR RAM).
- SRAM static random access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- DDR SDRAM double data rate synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link dynamic random access memory
- DR RAM direct rambus random access memory
- the memory described in this specification aims to include but is not limited to these memories and any memory of another proper type.
- the term “and/or” describes an association between associated objects and represents that at least three relationships may exist.
- a and/or B may represent the following three cases: Only A exists, both A and B exist, and only B exists, where A and B may be singular or plural.
- the character “/” usually represents an “or” relationship between the associated objects.
- “at least one” means one or more, and “a plurality of” means two or more. “At least one of the following items (pieces)” or a similar expression thereof indicates any combination of these items, including a single item (piece) or any combination of a plurality of items (pieces). For example, “at least one item (piece) of” a, b, or c, or “at least one item (piece) of a, b, and c” may indicate: a, b, c, a-b (that is, a and b), a-c, b-c, or a-b-c, where a, b, and c may be singular or plural.
- sequence numbers of the foregoing processes do not mean execution sequences in various embodiments of this application, and some or all of the steps may be performed in parallel or sequentially.
- the execution sequences of the processes should be determined by functions and internal logic of the processes, and should not be construed as any limitation on the implementation processes of the embodiments of this application.
- the disclosed system, apparatus, and method may be implemented in other manners.
- the described apparatus embodiments are merely an example.
- division into the units is merely logical function division and may be other division during actual implementation.
- a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed.
- the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces.
- the indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
- the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of the embodiments.
- the functions When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium.
- the computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, a terminal device, or the like) to perform all or some of the steps of the methods in the embodiments of this application.
- the foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disc.
- each apparatus embodiment is configured to perform the method provided in the corresponding method embodiment. Therefore, each apparatus embodiment may be understood with reference to a related part in a related method embodiment.
- the apparatus structure diagrams provided in the apparatus embodiments of this application show only simplified designs of corresponding apparatuses.
- the apparatus may include any quantity of transmitters, receivers, processors, memories, or the like, to implement functions or operations performed by the apparatus in the apparatus embodiments of this application. All apparatuses that can implement this application fall within the protection scope of this application.
- first”, “second”, “third”, and the like may be used in the embodiments of this application to describe various messages, requests, and terminals, the messages, requests, and terminals are not limited by the terms. These terms are used only to distinguish between the messages, the requests, and the terminals.
- a first terminal may alternatively be referred to as a second terminal, and similarly, a second terminal may alternatively be referred to as a first terminal.
- words if used herein may be explained as “while” or “when” or “in response to determining” or “in response to detection”.
- phrases “if determining” or “if detecting (a stated condition or event)” may be explained as “when determining” or “in response to determining” or “when detecting (the stated condition or event)” or “in response to detecting (the stated condition or event)”.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910735931 | 2019-08-09 | ||
CN201910735931.9 | 2019-08-09 | ||
CN202010273843.4 | 2020-04-09 | ||
CN202010273843.4A CN112346695A (zh) | 2019-08-09 | 2020-04-09 | 语音控制设备的方法及电子设备 |
PCT/CN2020/102113 WO2021027476A1 (fr) | 2019-08-09 | 2020-07-15 | Procédé de commande vocale d'un appareil, et appareil électronique |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230176812A1 true US20230176812A1 (en) | 2023-06-08 |
Family
ID=74357840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/633,702 Abandoned US20230176812A1 (en) | 2019-08-09 | 2020-07-15 | Method for controlling a device using a voice and electronic device |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230176812A1 (fr) |
EP (1) | EP3989047A4 (fr) |
CN (2) | CN112346695A (fr) |
WO (1) | WO2021027476A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118193760A (zh) * | 2024-03-11 | 2024-06-14 | 北京鸿鹄云图科技股份有限公司 | 基于自然语言理解的图纸语音批注方法及系统 |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111367491A (zh) * | 2020-03-02 | 2020-07-03 | 成都极米科技股份有限公司 | 语音交互方法、装置、电子设备及存储介质 |
CN113076077B (zh) * | 2021-03-29 | 2024-06-14 | 北京梧桐车联科技有限责任公司 | 安装车载程序的方法、装置和设备 |
CN113076079A (zh) * | 2021-04-20 | 2021-07-06 | 广州小鹏汽车科技有限公司 | 语音控制方法、服务器、语音控制系统和存储介质 |
CN113282268B (zh) * | 2021-06-03 | 2023-03-14 | 腾讯科技(深圳)有限公司 | 音效配置方法和装置、存储介质及电子设备 |
CN113507500A (zh) * | 2021-06-04 | 2021-10-15 | 上海闻泰信息技术有限公司 | 终端控制方法、装置、计算机设备和计算机可读存储介质 |
CN116670624A (zh) * | 2021-06-30 | 2023-08-29 | 华为技术有限公司 | 界面的控制方法、装置和系统 |
CN113628622A (zh) * | 2021-08-24 | 2021-11-09 | 北京达佳互联信息技术有限公司 | 语音交互方法、装置、电子设备及存储介质 |
CN114090148A (zh) * | 2021-11-01 | 2022-02-25 | 深圳Tcl新技术有限公司 | 信息同步方法、装置、电子设备及计算机可读存储介质 |
CN114049892A (zh) * | 2021-11-12 | 2022-02-15 | 杭州逗酷软件科技有限公司 | 语音控制方法、装置以及电子设备 |
CN116170646A (zh) * | 2021-11-25 | 2023-05-26 | 中移(杭州)信息技术有限公司 | 一种机顶盒的控制方法和系统,及存储介质 |
CN114529641A (zh) * | 2022-02-21 | 2022-05-24 | 重庆长安汽车股份有限公司 | 智能网联汽车助手对话和形象管理系统和方法 |
CN114968533B (zh) * | 2022-06-09 | 2023-03-24 | 中国人民解放军32039部队 | 嵌入式卫星任务调度管理方法、系统和电子设备 |
CN117334183A (zh) * | 2022-06-24 | 2024-01-02 | 华为技术有限公司 | 语音交互的方法、电子设备和语音助手开发平台 |
CN115396709B (zh) * | 2022-08-22 | 2024-10-29 | 海信视像科技股份有限公司 | 显示设备、服务器及免唤醒语音控制方法 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050273487A1 (en) * | 2004-06-04 | 2005-12-08 | Comverse, Ltd. | Automatic multimodal enabling of existing web content |
US20120030712A1 (en) * | 2010-08-02 | 2012-02-02 | At&T Intellectual Property I, L.P. | Network-integrated remote control with voice activation |
US20120110456A1 (en) * | 2010-11-01 | 2012-05-03 | Microsoft Corporation | Integrated voice command modal user interface |
US20130159002A1 (en) * | 2011-12-19 | 2013-06-20 | Verizon Patent And Licensing Inc. | Voice application access |
US20140350941A1 (en) * | 2013-05-21 | 2014-11-27 | Microsoft Corporation | Method For Finding Elements In A Webpage Suitable For Use In A Voice User Interface (Disambiguation) |
US9380334B2 (en) * | 2012-08-17 | 2016-06-28 | Flextronics Ap, Llc | Systems and methods for providing user interfaces in an intelligent television |
US20220317968A1 (en) * | 2021-04-02 | 2022-10-06 | Comcast Cable Communications, Llc | Voice command processing using user interface context |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6882974B2 (en) * | 2002-02-15 | 2005-04-19 | Sap Aktiengesellschaft | Voice-control for a user interface |
CN101193159A (zh) * | 2006-11-24 | 2008-06-04 | 辉翼科技股份有限公司 | 可同步化多形式数据通道的主从式通信系统 |
CN104093077B (zh) * | 2013-10-29 | 2016-05-04 | 腾讯科技(深圳)有限公司 | 多终端互联的方法、装置及系统 |
US10318236B1 (en) * | 2016-05-05 | 2019-06-11 | Amazon Technologies, Inc. | Refining media playback |
CN107077319A (zh) * | 2016-12-22 | 2017-08-18 | 深圳前海达闼云端智能科技有限公司 | 基于语音控制的位置定位方法、装置、用户设备及计算机程序产品 |
US12026456B2 (en) * | 2017-08-07 | 2024-07-02 | Dolbey & Company, Inc. | Systems and methods for using optical character recognition with voice recognition commands |
CN107919129A (zh) * | 2017-11-15 | 2018-04-17 | 百度在线网络技术(北京)有限公司 | 用于控制页面的方法和装置 |
CN107832036B (zh) * | 2017-11-22 | 2022-01-18 | 北京小米移动软件有限公司 | 语音控制方法、装置及计算机可读存储介质 |
CN108880961A (zh) * | 2018-07-19 | 2018-11-23 | 广东美的厨房电器制造有限公司 | 家电设备控制方法及装置、计算机设备和存储介质 |
CN108683937B (zh) * | 2018-03-09 | 2020-01-21 | 百度在线网络技术(北京)有限公司 | 智能电视的语音交互反馈方法、系统及计算机可读介质 |
CN108958844B (zh) * | 2018-07-13 | 2021-09-03 | 京东方科技集团股份有限公司 | 一种应用程序的控制方法及终端 |
CN109391833B (zh) * | 2018-09-13 | 2021-01-26 | 苏宁智能终端有限公司 | 一种智能电视的语音控制方法及智能电视 |
CN109448727A (zh) * | 2018-09-20 | 2019-03-08 | 李庆湧 | 语音交互方法以及装置 |
CN109584879B (zh) * | 2018-11-23 | 2021-07-06 | 华为技术有限公司 | 一种语音控制方法及电子设备 |
CN109801625A (zh) * | 2018-12-29 | 2019-05-24 | 百度在线网络技术(北京)有限公司 | 虚拟语音助手的控制方法、装置、用户设备及存储介质 |
-
2020
- 2020-04-09 CN CN202010273843.4A patent/CN112346695A/zh active Pending
- 2020-04-09 CN CN202210690830.6A patent/CN115145529B/zh active Active
- 2020-07-15 WO PCT/CN2020/102113 patent/WO2021027476A1/fr unknown
- 2020-07-15 EP EP20853218.4A patent/EP3989047A4/fr active Pending
- 2020-07-15 US US17/633,702 patent/US20230176812A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050273487A1 (en) * | 2004-06-04 | 2005-12-08 | Comverse, Ltd. | Automatic multimodal enabling of existing web content |
US20120030712A1 (en) * | 2010-08-02 | 2012-02-02 | At&T Intellectual Property I, L.P. | Network-integrated remote control with voice activation |
US20120110456A1 (en) * | 2010-11-01 | 2012-05-03 | Microsoft Corporation | Integrated voice command modal user interface |
US20130159002A1 (en) * | 2011-12-19 | 2013-06-20 | Verizon Patent And Licensing Inc. | Voice application access |
US9380334B2 (en) * | 2012-08-17 | 2016-06-28 | Flextronics Ap, Llc | Systems and methods for providing user interfaces in an intelligent television |
US20140350941A1 (en) * | 2013-05-21 | 2014-11-27 | Microsoft Corporation | Method For Finding Elements In A Webpage Suitable For Use In A Voice User Interface (Disambiguation) |
US20220317968A1 (en) * | 2021-04-02 | 2022-10-06 | Comcast Cable Communications, Llc | Voice command processing using user interface context |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118193760A (zh) * | 2024-03-11 | 2024-06-14 | 北京鸿鹄云图科技股份有限公司 | 基于自然语言理解的图纸语音批注方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
EP3989047A4 (fr) | 2022-08-17 |
EP3989047A1 (fr) | 2022-04-27 |
CN112346695A (zh) | 2021-02-09 |
CN115145529A (zh) | 2022-10-04 |
CN115145529B (zh) | 2023-05-09 |
WO2021027476A1 (fr) | 2021-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230176812A1 (en) | Method for controlling a device using a voice and electronic device | |
WO2022052776A1 (fr) | Procédé d'interaction homme-ordinateur, ainsi que dispositif électronique et système | |
WO2021057830A1 (fr) | Procédé de traitement d'informations et dispositif électronique | |
US11227598B2 (en) | Method for controlling terminal by voice, terminal, server and storage medium | |
CN112527174B (zh) | 一种信息处理方法及电子设备 | |
CN112527222A (zh) | 一种信息处理方法及电子设备 | |
EP4138381A1 (fr) | Procédé et dispositif de lecture vidéo | |
EP4044578A1 (fr) | Procédé de traitement audio et dispositif électronique | |
CN111526402A (zh) | 多屏显示设备的语音搜索视频资源的方法及显示设备 | |
US20210405767A1 (en) | Input Method Candidate Content Recommendation Method and Electronic Device | |
CN116048933A (zh) | 一种流畅度检测方法 | |
US20240086035A1 (en) | Display Method and Electronic Device | |
US12050633B2 (en) | Data processing method and apparatus | |
WO2021052488A1 (fr) | Procédé de traitement d'informations et dispositif électronique | |
WO2023016014A1 (fr) | Procédé d'édition vidéo et dispositif électronique | |
US11930236B2 (en) | Content playback device using voice assistant service and operation method thereof | |
CN117809649A (zh) | 显示设备和语义分析方法 | |
CN112053688A (zh) | 一种语音交互方法及交互设备、服务器 | |
WO2023197949A1 (fr) | Procédé de traduction du chinois et dispositif électronique | |
WO2023051116A1 (fr) | Procédé et système de mise en œuvre distribuée, et dispositif électronique et support de stockage | |
WO2022193735A1 (fr) | Dispositif d'affichage et procédé d'interaction vocale | |
EP4421607A1 (fr) | Procédé d'affichage et dispositif électronique | |
US12019947B2 (en) | Projection method and system | |
CN116055738B (zh) | 视频压缩方法及电子设备 | |
US20240056677A1 (en) | Co-photographing method and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
AS | Assignment |
Owner name: HUAWEI TECHNOLOGIES CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WU, YOUGUO;ZHANG, GUICHENG;YANG, JUNYUAN;AND OTHERS;SIGNING DATES FROM 20220728 TO 20231229;REEL/FRAME:065993/0367 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |