WO2023174155A1 - 一种多设备的语音控制系统及方法 - Google Patents

一种多设备的语音控制系统及方法 Download PDF

Info

Publication number
WO2023174155A1
WO2023174155A1 PCT/CN2023/080568 CN2023080568W WO2023174155A1 WO 2023174155 A1 WO2023174155 A1 WO 2023174155A1 CN 2023080568 W CN2023080568 W CN 2023080568W WO 2023174155 A1 WO2023174155 A1 WO 2023174155A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal device
voice
instruction
request information
information
Prior art date
Application number
PCT/CN2023/080568
Other languages
English (en)
French (fr)
Inventor
徐谦
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023174155A1 publication Critical patent/WO2023174155A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • Embodiments of the present application relate to the field of audio technology, and in particular, to a multi-device voice control system and method.
  • terminal devices have emerged in various forms; for example, mobile phones, tablets, TVs, vehicle-mounted equipment, and various household appliances.
  • many businesses may involve multiple devices and require multiple devices to work together for business processing; for example, during video conferencing, you can project the screen of a portable computer onto a large screen, log in to the same account on multiple devices, and display the current process on your mobile phone. Cast the played video to the TV, etc.
  • Embodiments of the present application provide a multi-device voice control system and method to provide a technical solution for simultaneously controlling multiple adjacent devices through voice, which can reduce the complexity of operations in scenarios where multiple devices collaborate to perform business processing.
  • embodiments of the present application provide a multi-device voice control system, in which the first terminal device receives and responds to the user's first voice command and uploads the first voice request information to the first server device, so The first voice request information includes the first voice instruction; and, the second terminal device receives and responds to the user's second voice instruction, uploads the second voice request information to the first server device, and the second terminal device
  • the voice request information includes the second voice instruction; the first server device determines that the first voice request information and the second voice request information are based on the first voice instruction and the second voice instruction.
  • multiple terminal devices receive the user's voice instructions
  • the server device can identify the relevant terminal devices by analyzing the voice instructions, that is, the user can control multiple devices through voice instructions.
  • the server device can instruct each terminal device to perform relevant operations after receiving the voice command, so as to realize multi-device collaboration for business processing.
  • the method provided by this application can reduce the complexity of the user's operation, thereby improving the user experience.
  • the first server device determines that the first voice request information and the second voice request information are related, including but not limited to at least one of the following methods: (1) Determine that the first voice command and the second voice command are the same; (2) determine that the similarity between the first voice command and the second voice command is greater than a first specified threshold; (3) determine that the The first voice command corresponds to the second voice command.
  • the server device can identify whether the terminal devices need to collaborate for business processing by uploading voice commands to multiple terminal devices respectively, so that the control of multiple devices can be realized based on the user's voice commands, and thus can Reduce the complexity of user operations.
  • the first voice request information and the second voice request information each further include but are not limited to at least one of the following information: time stamp information, voiceprint information, and terminal device status information.
  • the terminal device when the terminal device uploads the voice service request information to the server device, in addition to the voice command information, it can also include other information that can help determine whether the terminal devices are related, so that it can be more accurate. Almost realize voice control of multiple devices based on user's voice commands.
  • the first server device determines that the first voice request information and the second voice request information are related, including but not limited to one or more of the following methods: Determine that the timestamp information uploaded by the first terminal device is the same as the timestamp information uploaded by the second terminal device, or that the similarity is greater than a second specified threshold; determine that the voiceprint information uploaded by the first terminal device is the same as the timestamp information uploaded by the second terminal device. The voiceprint information uploaded by the second terminal device is the same, or the similarity is greater than the third specified threshold.
  • the server device in order to improve the accuracy of identifying whether terminal devices need to collaborate for business processing, can also combine some other information uploaded by the terminal devices to determine whether the terminal devices are related, which can improve many aspects.
  • the accuracy of voice control of devices; and, through the judgment of voiceprint information, the security of voice control of multiple devices can also be improved.
  • the first server device performs at least one of the following processes according to the first voice instruction and the second voice instruction; including: the first server device performs performing semantic analysis on the first voice command and the second voice command; and, the first server device performs semantic analysis according to the terminal device status information corresponding to the first terminal device and the terminal device status corresponding to the second terminal device. information to determine the status of the first terminal device and the second terminal device; the first server device based on the result of the semantic analysis and the status of the first terminal device and the second terminal device, At least one of the following processes is performed.
  • the server device when the server device recognizes that the terminal devices are related, it can further determine the relevant operations that each terminal device needs to perform based on the user intention corresponding to the user's voice command and instruct the corresponding terminal device, so that it can be based on The user's voice commands enable voice control of multiple terminal devices.
  • the server device can generate more accurate control instructions based on the status of each terminal device, thereby ensuring the accuracy of voice control of multiple devices.
  • the first control instruction includes the second The device identification of the terminal device, the device identification of the second terminal device is used by the first terminal device to perform the related operations of the first voice instruction according to the device identification of the second terminal device; or, if the The processing performed by the second server device is the generated and Send a second control instruction to the second terminal device, the second control instruction includes the device identification of the first terminal device, and the device identification of the first terminal device is used by the second terminal device according to The device identification of the first terminal device is used to perform operations related to the second voice instruction.
  • the server device determines that some terminal devices need to perform corresponding operations in some possible scenarios.
  • the corresponding operations usually need to be coordinated with another part of the terminal devices.
  • the server device may carry the device identification of another part of the terminal device in the control instructions sent to some of the terminal devices, so that the part of the terminal device determines which terminal devices need to implement collaborative processing.
  • the first terminal device or the second terminal device can be based on the device identification of the other terminal device carried in the control instruction. Achieve collaborative processing with another terminal device.
  • the method further includes: the first server device generating a first identification code, the first identification code being used to identify that the first terminal device and the second terminal device are related.
  • the first identification code can ensure multi-device voice control of the first terminal device and the second terminal device. processing efficiency. For example, if the first terminal device and the second terminal device need to interact with the first server device or the second server device or other devices after receiving the control instruction, the first terminal device can be quickly identified through the first identification code.
  • the terminal device is related to the second terminal device.
  • the system further includes a second server device, wherein: the first terminal device sends a first request instruction to the second server device according to the first control instruction, so The first control instruction and the first request instruction carry the first identification code; and the second terminal device sends a second request instruction to the second server device according to the second control instruction, The second control instruction and the second request instruction carry the first identification code; the second server device performs at least one of the following processes according to the first identification code: The terminal device sends a first response instruction and sends a second response instruction to the second terminal device.
  • voice control of the first terminal device and the second terminal device can also be realized through other server devices.
  • voice control of the first terminal device and the second terminal device can also be realized through other server devices.
  • the first identification codes generated for the first terminal device and the second terminal device it can be quickly determined that the first terminal device and the second terminal device are related, thereby ensuring the processing efficiency of voice control of multiple devices.
  • the first request instruction is used to request to log in to a designated platform
  • the second request instruction is used to request authorization to log in to the designated platform
  • the processing performed by the second terminal device is to request the login to the designated platform.
  • the second terminal device sends a second response instruction, and the second response instruction is used to instruct the first terminal device to authorize the second terminal device to log in to the designated platform.
  • the first voice command and the second voice command are used to indicate but are not limited to any one of the following scenarios: on the first terminal device or the second terminal device Log in to the designated platform, connect the first terminal device to the second terminal device, or connect the second terminal device to the first terminal device.
  • the first voice command and the second voice command are based on the same voice command of the user and are received by the first terminal device and the second terminal device respectively.
  • multiple terminal devices receive the user's voice instructions at the same time, which can more accurately realize voice control of multiple devices based on the user's voice instructions. This not only reduces the user's tedious operations, but also improves the efficiency of multiple devices. Voice control accuracy.
  • embodiments of the present application also provide a multi-device voice control method, including: a first terminal device receiving a first voice instruction from a user; and the first terminal device responding to the first voice instruction.
  • the first server device uploads first voice request information, and the first voice request information includes the first voice instruction; the first terminal device receives the first control instruction sent by the first server device, and the first terminal device receives the first control instruction sent by the first server device.
  • the first control instruction is used to perform operations related to the first voice instruction; the operations related to the first voice instruction are used to perform collaborative processing of services with the second terminal device; wherein the first control instruction is
  • the first server device generates the message when it determines that the first voice request information is related to the second voice request information uploaded by the second terminal device.
  • the first voice request information and the second voice request information each further include but are not limited to at least one of the following information: time stamp information, voiceprint information, and terminal device status information.
  • the first control instruction includes a device identification of the second terminal device, and the device identification of the second terminal device is used by the first terminal device according to the second terminal device. device identification, and perform operations related to the first voice instruction.
  • the first control instruction includes a first identification code; the first identification code is used to identify that the first terminal device and the second terminal device are related.
  • the method further includes: the first terminal device sending a first request instruction to the second server device according to the first control instruction, the first control instruction and the third A request instruction carries the first identification code, so that the second server device performs at least one of the following processes according to the first identification code: sending a first response instruction to the first terminal device, Send a second response instruction to the second terminal device.
  • the first request instruction is used to request to log in to a designated platform; or, the first request instruction is used to request authorization to log in to a designated platform; the method further includes: if the first The request instruction is used to request authorization to log in to the designated platform.
  • the first terminal device receives the first response instruction.
  • the first response instruction is used to instruct the second terminal device to authorize the first terminal device to log in to the designated platform. platform.
  • the first voice instruction and the second voice instruction included in the second voice request information are used to indicate but are not limited to any one of the following scenarios: when the first terminal device Or log in to a designated platform on the second terminal device, connect the first terminal device to the second terminal device, or connect the second terminal device to the first terminal device.
  • the first voice command and the second voice command included in the second voice request information are the same voice command based on the user, and are respectively sent by the first terminal device and the third voice command. Received by the second terminal device.
  • embodiments of the present application also provide a multi-device voice control method, including: a first server device receiving first voice request information uploaded by a first terminal device, where the first voice request information includes the The first voice command; and, the first server device receives the second voice request information uploaded by the second terminal device, the second voice request information includes the second voice command; the first server device according to The first voice command and the second voice command, if it is determined that the first voice request information and the second voice request information are related, perform at least one of the following processes: generate and send a message to the third voice command.
  • a terminal device sends a first control instruction, and the first control instruction Used to perform related operations of the first voice instruction; related operations of the first voice instruction are used to perform collaborative processing of services with the second terminal device; generate and send a second control to the second terminal device Instruction, the second control instruction is used to perform related operations of the second voice instruction; the related operations of the second voice instruction are used to perform collaborative processing of services with the first terminal device.
  • the first server device determines the correlation between the first voice request information and the second voice request information, including but not limited to at least one of the following methods: determining the The first voice command and the second voice command are the same; determine that the similarity between the first voice command and the second voice command is greater than a first specified threshold; determine that the first voice command and the second voice command are the same. Corresponds to voice commands.
  • the first voice request information and the second voice request information each further include but are not limited to at least one of the following information: time stamp information, voiceprint information, and terminal device status information.
  • the first server device determines that the first voice request information and the second voice request information are related, including but not limited to one or more of the following methods: Determine that the timestamp information uploaded by the first terminal device is the same as the timestamp information uploaded by the second terminal device, or that the similarity is greater than a second specified threshold; determine that the voiceprint information uploaded by the first terminal device is the same as the timestamp information uploaded by the second terminal device. The voiceprint information uploaded by the second terminal device is the same, or the similarity is greater than the third specified threshold.
  • the first server device performs at least one of the following processes according to the first voice instruction and the second voice instruction; including: the first server device performs performing semantic analysis on the first voice command and the second voice command; and, the first server device performs semantic analysis according to the terminal device status information corresponding to the first terminal device and the terminal device status corresponding to the second terminal device. information to determine the status of the first terminal device and the second terminal device; the first server device based on the result of the semantic analysis and the status of the first terminal device and the second terminal device, At least one of the following processes is performed.
  • the first control instruction includes the second The device identification of the terminal device
  • the device identification of the second terminal device is used by the first terminal device to perform the related operations of the first voice instruction according to the device identification of the second terminal device
  • the second control instruction includes the device identification of the first terminal device
  • the first terminal The device identification of the device is used by the second terminal device to perform related operations of the second voice instruction according to the device identification of the first terminal device.
  • the method further includes: the first server device generating a first identification code, the first identification code being used to identify that the first terminal device and the second terminal device are related .
  • the first terminal device sends a first request instruction to the second server device according to the first control instruction, and the first control instruction and the first request instruction carry the first identification code; and, the second terminal device sends a second request instruction to the second server device according to the second control instruction, the second control instruction and the second request instruction Carrying the first identification code; the second server device performs at least one of the following processes according to the first identification code: sending a first response instruction to the first terminal device, and sending a first response instruction to the second terminal device.
  • the terminal device sends a second response command.
  • the first request instruction is used to request to log in to a designated platform
  • the second request instruction is used to request authorization to log in to the designated platform
  • the processing performed by the second terminal device is to request the login to the designated platform.
  • the second terminal device sends a second response instruction, and the second response instruction is used to instruct the first terminal device to authorize the second terminal device to log in to the designated platform.
  • the first voice command and the second voice command are used to indicate but are not limited to any one of the following scenarios: on the first terminal device or the second terminal device Log in to the designated platform, connect the first terminal device to the second terminal device, or connect the second terminal device to the first terminal device.
  • the first voice command and the second voice command are based on the same voice command of the user and are received by the first terminal device and the second terminal device respectively.
  • embodiments of the present application also provide a terminal device, including: one or more processors; one or more memories; the one or more memories are used to store one or more computer programs and data Information; wherein the one or more computer programs include instructions; when the instructions are executed by the one or more processors, the terminal device is caused to perform as described in any of the possible designs of the second aspect above. method described.
  • embodiments of the present application also provide a server device, including: one or more processors; one or more memories; the one or more memories are used to store one or more computer programs; Data information; wherein the one or more computer programs include instructions; when the instructions are executed by the one or more processors, the server device is caused to perform any possible design in the third aspect above. the method described in .
  • embodiments of the present application further provide a multi-device voice control system, including at least two terminal devices as described in the fourth aspect and a server device as in the fifth aspect.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable medium stores a computer program (which may also be called a code, or an instruction), and when run on a computer, causes the computer to execute the above second step. method in any of the possible designs of aspect or third aspect.
  • inventions of the present application provide a computer program product.
  • the computer program product includes: a computer program (which may also be called a code, or an instruction).
  • a computer program which may also be called a code, or an instruction.
  • the computer program When the computer program is run, it causes the computer to execute the above second aspect or the third aspect. Any of three possible design approaches.
  • embodiments of the present application further provide a graphical user interface on a terminal device.
  • the terminal device has a display screen, one or more memories, and one or more processors.
  • the one or more processors are configured to Execute one or more computer programs stored in the one or more memories, and the graphical user interface includes a graphical user interface displayed when the terminal device executes any possible design of the second aspect of the embodiment of the present application.
  • Figure 1a is a schematic diagram of an application scenario of multi-device management provided by an embodiment of the present application
  • Figure 1b is a schematic flow chart corresponding to the application scenario shown in Figure 1a provided by an embodiment of the present application;
  • Figure 2 is a schematic diagram of the hardware structure of a possible terminal device provided by an embodiment of the present application.
  • Figure 3 is a software structure block diagram of a terminal device provided by an embodiment of the present application.
  • Figure 4 is one of the application scenario diagrams of a multi-device voice control method provided by an embodiment of the present application
  • Figure 5 is the second application scenario diagram of a multi-device voice control method provided by an embodiment of the present application.
  • Figure 6 is one of the interactive flow diagrams of a multi-device voice control method provided by an embodiment of the present application.
  • Figure 7 is a schematic flowchart of a multi-device voice control method provided by an embodiment of the present application.
  • Figure 8a is a second schematic diagram of the interaction flow of a multi-device voice control method provided by an embodiment of the present application.
  • Figure 8b is a second schematic diagram of the interaction flow of a multi-device voice control method provided by an embodiment of the present application.
  • Figure 9 is a third schematic diagram of the interaction flow of a multi-device voice control method provided by an embodiment of the present application.
  • terminal devices such as mobile phones, tablets, TVs, etc.
  • terminal equipment not only has communication functions, but also has powerful processing capabilities, storage capabilities, camera functions, etc.
  • the terminal device executes corresponding applications through the operating system. Users can use the terminal device to make calls, send short messages, browse the web, watch videos, etc.
  • ways to manage multiple devices such as logging in to the same account on multiple devices, screencasting between devices, etc.
  • Figure 1a is a schematic diagram of an application scenario for multi-device management provided by an embodiment of the present application.
  • APP application
  • the user has installed an application (APP) on the mobile phone shown in (a) of Figure 1a and has logged in to the user account. If you need to log in to the same APP on other terminal devices, log in to the same user account.
  • the user can manually adjust the QR code used to request authorization login on the tablet computer in (b); and then use the mobile phone shown in (a) to scan the code.
  • the backend server corresponding to the unlogged device requests the APP development platform for authorized login of the APP.
  • the APP development platform returns the QR code.
  • the background server corresponding to the unlogged device controls the unlogged device to display the QR code.
  • the user authorizes login through the logged-in device and instructs the APP development platform.
  • the APP development platform informs the backend server corresponding to the unlogged device that it has been authorized.
  • the backend server corresponding to the unlogged device requests user account data from the APP development platform.
  • the APP development platform returns the user account data to the backend server corresponding to the unlogged device.
  • embodiments of the present application provide a multi-device voice control system and method, which can associate multiple neighboring devices through voice, and realize collaborative management of multiple neighboring devices at the same time, so as to complete the need for collaboration of multiple devices.
  • the design idea is mainly for multiple nearby devices to obtain the user's voice instructions at the same time, and send the voice request information including but not limited to the voice instructions to the server device.
  • the service-side device can associate multiple nearby devices that report the same voice command based on the received voice request information from multiple devices, and generate corresponding control commands for each device. Therefore, the system or method provided by this application has the characteristics of easy operation and more convenient interaction mode.
  • adjacent devices represent multiple terminal devices that can receive the same voice command from the user at the same time, for example, a mobile phone and a TV in the same room, a computer and a mobile phone on the same desktop, etc.
  • the terminal device in the embodiment of the present application may be a smart home device (for example, a smart TV, a smart screen, a smart speaker, etc.), a mobile phone, a tablet, or a wearable device (for example, a watch, a helmet, a headset, etc.) , augmented reality (AR)/virtual reality (VR) equipment, laptops, ultra- Devices with voice command input capabilities such as ultra-mobile personal computers (UMPCs), netbooks, and personal digital assistants (PDAs).
  • a smart home device for example, a smart TV, a smart screen, a smart speaker, etc.
  • a mobile phone for example, a tablet, or a wearable device (for example, a watch, a helmet, a headset, etc.)
  • AR augmented reality
  • VR virtual reality
  • laptops laptops
  • ultra- Devices with voice command input capabilities such as ultra-mobile personal computers (UMPCs), netbooks, and personal digital assistants (PDAs).
  • UMPCs ultra-
  • Terminal devices to which the embodiments of this application can be applied include but are not limited to carrying Or portable terminal devices with other operating systems.
  • the above-mentioned portable terminal device may also be other portable terminal devices, such as a laptop computer (Laptop) with a touch-sensitive surface (such as a touch panel).
  • Figure 2 shows a schematic diagram of the hardware structure of a possible terminal device.
  • the terminal device 200 includes: a radio frequency (radio frequency, RF) circuit 210, a power supply 220, a processor 230, a memory 240, an input unit 250, a display unit 260, an audio circuit 270, a communication interface 280, and a wireless fidelity ( wireless-fidelity, Wi-Fi) module 290 and other components.
  • RF radio frequency
  • RF radio frequency
  • the terminal device 200 includes: a radio frequency (radio frequency, RF) circuit 210, a power supply 220, a processor 230, a memory 240, an input unit 250, a display unit 260, an audio circuit 270, a communication interface 280, and a wireless fidelity ( wireless-fidelity, Wi-Fi) module 290 and other components.
  • RF radio frequency
  • the terminal device 200 provided by the embodiment of the present application may include more or fewer components than shown in the figure. , can combine two or more components, or can
  • the RF circuit 210 can be used to receive and send data during communication or phone calls. In particular, after receiving the downlink data from the base station, the RF circuit 210 sends it to the processor 230 for processing; in addition, it sends the uplink data to be sent to the base station.
  • the RF circuit 210 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, etc.
  • LNA low noise amplifier
  • the RF circuit 210 can also communicate with other devices through a wireless communication network.
  • the wireless communication can use any communication standard or protocol, including but not limited to global system of mobile communication (GSM), general packet radio service (GPRS), code division multiple access (code division multiple access (CDMA), wideband code division multiple access (WCDMA), long term evolution (LTE), email, short messaging service (SMS), etc.
  • GSM global system of mobile communication
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • email short messaging service
  • Wi-Fi technology is a short-distance wireless transmission technology.
  • the terminal device 200 can connect to an access point (AP) through the Wi-Fi module 290, thereby achieving access to the data network.
  • the Wi-Fi module 290 can be used to receive and send data during communication.
  • the terminal device 200 can achieve physical connection with other devices through the communication interface 280 .
  • the communication interface 280 is connected to the communication interface of the other device through a cable to realize data transmission between the terminal device 200 and other devices.
  • the terminal device 200 can implement communication services and interact with service-side devices (for example, it may include but is not limited to: voice service servers, account servers, etc.), the terminal device 200 needs to have data
  • the transmission function means that the terminal device 200 needs to contain a communication module.
  • FIG. 2 shows communication modules such as the RF circuit 210, the Wi-Fi module 290, and the communication interface 280, it can be understood that at least one of the above components exists in the terminal device 200 or Other communication modules (such as Bluetooth modules) used to implement communication for data transmission.
  • the terminal device 200 when the terminal device 200 is a mobile phone, the terminal device 200 may include the RF circuit 210, The Wi-Fi module 290 may also be included, or a Bluetooth module (not shown in Figure 2) may be included; when the terminal device 200 is a computer, the terminal device 200 may include the communication interface 280, or may Contains the Wi-Fi module 290, or may include a Bluetooth module (not shown in Figure 2); when the terminal device 200 is a tablet computer, the terminal device 200 may include the Wi-Fi module, or may Contains a Bluetooth module (not shown in Figure 2).
  • the memory 240 may be used to store software programs and modules.
  • the processor 230 executes various functional applications and data processing of the terminal device 200 by running software programs and modules stored in the memory 240 .
  • the memory 240 may mainly include a program storage area and a data storage area.
  • the stored program area can store the operating system (mainly including the corresponding software programs or modules of the kernel layer, system layer, application framework layer, application layer, etc.).
  • the memory 240 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the input unit 250 may be used to receive editing operations of multiple different types of data objects such as numeric or character information input by the user, and to generate key signal input related to user settings and function control of the terminal device 200 .
  • the input unit 250 may include a touch panel 251 and other input devices 252.
  • the touch panel 251 also called a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable object or accessory such as a finger, stylus, etc. on the touch panel 251 or in the vicinity. operations near the touch panel 251), and drive the corresponding connection device according to the preset program.
  • the other input devices 252 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, etc.
  • function keys such as volume control keys, switch keys, etc.
  • trackball such as mouse, joystick, etc.
  • the display unit 260 may be used to display information input by a user or information provided to a user and various menus of the terminal device 200 .
  • the display unit 260 is the display system of the terminal device 200 and is used to present an interface and realize human-computer interaction.
  • the display unit 260 may include a display panel 261.
  • the display panel 261 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc. In the embodiment of the present application, the display unit 260 may not be provided on the terminal device.
  • the smart speaker device does not need to be provided with a display screen; or the display unit 260 may be provided on the terminal device, and the display unit 260 displays the reception received by the terminal device 200 through the microphone 271
  • the display content corresponding to the received voice command for example, if the voice command received by the microphone 271 is "Open and log in to the instant messaging application A", the display interface corresponding to the instant messaging application A can be displayed on the display panel 261, etc. .
  • the processor 230 is the control center of the terminal device 200, using various interfaces and lines to connect various components, by running or executing software programs and/or modules stored in the memory 240, and calling the software programs stored in the memory 240.
  • the data in the memory 240 executes various functions of the terminal device 200 and processes data, thereby realizing various services based on the terminal device 200 .
  • the processor 230 is used to implement the method provided in the embodiment of the present application, thereby providing a technical solution that can simultaneously control multiple adjacent devices through voice, thereby reducing the number of operations in a business processing scenario with multiple devices. Complexity.
  • the terminal device 200 also includes a power source 220 (such as a battery) for powering various components.
  • a power source 220 such as a battery
  • the power supply 220 can be logically connected to the processor 230 through a power management system, so that functions such as charging, discharging, and power consumption can be managed through the power management system.
  • the terminal device 200 also includes an audio circuit 270 , a microphone 271 and a speaker 272 , which can provide an audio interface between the user and the terminal device 200 .
  • the audio circuit 270 can be used to convert the audio data into a signal that can be recognized by the speaker 272, and transmit the signal to the speaker 272, and the speaker 272 converts the signal into a sound signal and outputs it.
  • Mike The wind 271 is used to collect external sound signals (such as human speech or other sounds, etc.), convert the collected external sound signals into signals that can be recognized by the audio circuit 270, and send them to the audio circuit 270.
  • the audio circuit 270 can also be used to convert the signal sent by the microphone 271 into audio data, and then output the audio data to the RF circuit 210 to send to, for example, another terminal device, or output the audio data to the memory 240 for subsequent further processing.
  • the triggering scenario for the microphone 271 to collect external sound signals can be triggered by the user clicking on the voice input control (such as a smart assistant, voice assistant, etc.) on the display interface of the terminal device 200, or it can also be triggered by the user through a preset It is assumed that the wake-up word is used to wake up, and this application does not limit this.
  • the terminal device 200 may also include at least one sensor, camera, etc., which will not be described again here.
  • At least one sensor may include, but is not limited to, a pressure sensor, an air pressure sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a touch sensor, a temperature sensor, etc.
  • the operating system (OS) involved in the embodiment of this application is the most basic system software running on the terminal device 200.
  • the operating system can be HarmonyOS, Android system or IOS system.
  • the software system of the terminal device 200 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • This embodiment of the present application takes an operating system adopting a layered architecture as an example to illustrate the software structure of the terminal device 200 .
  • Figure 3 is a software structure block diagram of a terminal device provided by an embodiment of the present application.
  • the software structure of the terminal device can be a layered architecture.
  • the software can be divided into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the operating system is divided into five layers, from top to bottom: application layer, application framework layer (framework, FWK), runtime and system libraries, kernel layer, and hardware layer.
  • the application layer can include a series of application packages. As shown in Figure 3, the application layer can include cameras, settings, skin modules, user interface (UI), third-party applications, etc. Among them, third-party applications can include WLAN, music, calls, Bluetooth, video, etc.
  • UI user interface
  • third-party applications can include WLAN, music, calls, Bluetooth, video, etc.
  • the application layer can be used to implement the presentation of an editing interface, and the editing interface can be used for users to view or perform operations.
  • the mobile phone includes a display panel 261
  • the user can display relevant interfaces of the instant messaging application on the main interface displayed by the display panel 261.
  • the application can be developed using the Java language and is completed by calling the application programming interface (API) provided by the application framework layer. Developers can operate through the application framework layer. Interact with the bottom layer of the system (such as hardware layer, kernel layer, etc.) to develop your own applications.
  • API application programming interface
  • the application framework layer is mainly a series of services and management systems of the operating system.
  • the application framework layer provides application programming interfaces and programming frameworks for applications in the application layer.
  • the application framework layer includes some predefined functions. As shown in Figure 3, the application framework layer can include a shortcut icon management module, a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, etc.
  • the shortcut icon management module is used to manage shortcut icons displayed on terminal devices, such as creating shortcut icons, removing shortcut icons, monitoring whether shortcut icons meet display conditions, etc.
  • a window manager is used to manage window programs.
  • the window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make this data accessible to applications. Said data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, etc.
  • a view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the terminal device. For example, call status management (including connected, hung up, etc.).
  • the resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a prompt sound is emitted, the terminal device vibrates, and the indicator light flashes, etc.
  • the application framework layer is mainly responsible for calling the service interface that communicates with the hardware layer to pass the user's operation request to the hardware layer.
  • the operation request may include the user's control through voice instructions. Operation requests to open or log in to an APP, etc.
  • the runtime includes core libraries and virtual machines.
  • the runtime is responsible for the scheduling and management of the operating system.
  • the core library contains two parts: one part is the functional functions that need to be called by the Java language, and the other part is the core library of the operating system.
  • the application layer and application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and application framework layer into binary files.
  • the virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
  • System libraries can include multiple functional modules. For example: surface manager (surface manager), media libraries (media libraries), 3D graphics processing libraries (for example: OpenGL ES), 2D graphics engines (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
  • 2D Graphics Engine is a drawing engine for 2D drawing.
  • a three-dimensional graphics processing library can be used to draw three-dimensional motion trajectory images
  • a 2D graphics engine can be used to draw two-dimensional motion trajectory images.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the hardware layer can include various types of sensors, such as acceleration sensors, gyroscope sensors, touch sensors, etc.
  • the terminal device 200 can run multiple applications at the same time.
  • one application can correspond to one process; for more complex ones, one application can correspond to multiple processes.
  • Each process has a process number (process ID).
  • At least one refers to one or more, and “multiple” refers to two or more.
  • “And/or” describes the relationship between associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist simultaneously, and B exists alone, where A and B can be singular or plural.
  • the character “/” generally indicates that the related objects are in an “or” relationship.
  • “At least one (item) of the following” or similar expressions thereof refers to any combination of these items, including any combination of single item (items) or plural items (items).
  • At least one of a, b or c can mean: a, b, c, a and b, a and c, b and c, or a, b and c, where a, b, c Can be single or multiple.
  • the plurality involved in the embodiments of this application refers to more than or equal to two.
  • terminal equipment can be used interchangeably, that is, to refer to various devices that can be used to implement the embodiments of this application;
  • application and “application program” in the embodiments of this application also mean Can be used mixedly, both refer to programs or clients with certain service provision capabilities. That is to say, applications and clients can also be mixed.
  • video clients and instant messaging clients can also be called video applications or instant messaging applications.
  • the hardware structure of the terminal device can be as shown in Figure 2, and the software architecture can be as shown in Figure 3, where the software programs and/or modules corresponding to the software architecture in the terminal device can be stored in the memory 240, and the processor 230 Software programs and applications stored in the memory 240 can be run to execute the process of a multi-device voice control method provided by embodiments of the present application.
  • the embodiments of this application are suitable for application scenarios that require the cooperation of multiple devices for business processing.
  • the application scenarios applicable to the embodiments of this application are described through the following examples. It can be understood that the implementation of this application is not limited to the following application scenarios.
  • FIG. 4 is an application scenario diagram of a multi-device voice control method provided by an embodiment of the present application.
  • WeChat account that is, the mobile phone is a logged-in device
  • tablette which can also be referred to as "tablet”
  • the WeChat account is not logged in (that is, the tablet is an unlogged device).
  • the mobile phone and tablet can first obtain the user's voice instructions at the same time or almost at the same time, as shown in Figure 4 "Log in to the WeChat account on the tablet".
  • the mobile phone and tablet can respectively upload the voice request information containing the voice instruction to the first server device (such as a voice service server) for processing; where the voice request information can also include but is not limited to: timestamp information, voice fingerprint information and terminal device status information.
  • the first server device such as a voice service server
  • the voice request information can also include but is not limited to: timestamp information, voice fingerprint information and terminal device status information.
  • the voice service server can analyze the voice request information uploaded by each terminal device, such as determining whether the voice request information uploaded by the mobile phone and tablet is relevant based on the voice instructions uploaded by the mobile phone and tablet. . For example, if it is analyzed that the voice commands uploaded by the mobile phone and the tablet are the same, or the similarity is greater than the specified threshold or corresponding, it is determined that the voice request information uploaded by the mobile phone and the tablet are related; in addition, the timestamp information and voiceprint information can also be combined. information for further accurate judgment.
  • the voice service server can perform semantic analysis based on the semantic analysis of voice instructions uploaded by mobile phones and tablets. Control for mobile phones and tablets.
  • the voice service server may respectively generate a first control instruction corresponding to the mobile phone and a second control instruction corresponding to the tablet; wherein, the first control instruction may be to instruct the mobile phone to initiate an authorization request to the account server, and the second control instruction may be to instruct the mobile phone to initiate an authorization request to the account server.
  • the instruction may be to instruct the tablet to initiate a login request to the second server device (such as an account server).
  • the account server after the account server receives the login request sent by the tablet and the authorization request sent by the mobile phone, it can respond to the login request of the tablet according to the authorization request of the mobile phone, so as to log in to WeChat on the tablet.
  • the voice service server can also generate a first control instruction corresponding to the mobile phone.
  • the first control instruction can include the device identification of the tablet, and instruct the mobile phone to send the user account number to the tablet based on the device identification of the tablet. and key authorization information, so that after receiving the authorization information sent by the mobile phone, the tablet can log in directly based on the user account and key.
  • scenarios such as screen projection can also be implemented, such as projecting the display screen of a mobile phone onto a TV.
  • realizing this scenario usually not only requires the mobile phone and the TV to be connected to the same local area network, but also requires the user to manually operate the screen projection control on the mobile phone to project the display interface of the mobile phone to the TV.
  • the mobile phone and the TV do not need to be connected to the same local area network or in a connected state.
  • the user can control the mobile phone and the TV at the same time through voice instructions, so that different control of the mobile phone and the TV can be realized through the same voice instructions. , to realize displaying the display interface of the mobile phone on the TV.
  • FIG. 5 is an application scenario diagram of a multi-device voice control method provided by an embodiment of the present application.
  • the mobile phone and the TV can first obtain the user's voice command at the same time.
  • the user's voice command is "Put the mobile phone screen on the TV.”
  • the mobile phone obtains the voice command as shown in (a) of Figure 5.
  • the user's voice command "cast the mobile phone screen to the TV” is received, and the TV as shown in (b) of Figure 5 also obtains the user's voice command "project the mobile phone screen to the TV".
  • the mobile phone and the TV can respectively upload the voice request information containing the voice command to the server device (such as a voice service server) for processing; the voice request information can also include but is not limited to: timestamp information, voiceprint information .
  • the voice service server can analyze the voice request information uploaded by each terminal device. For example, based on the voice commands uploaded by the mobile phone and TV, determine whether the voice request information uploaded by the mobile phone and TV is relevant. . For example, if it is analyzed that the voice commands uploaded by the mobile phone and the TV are the same, or the similarity is greater than the specified threshold or corresponding, then it is determined that the voice request information uploaded by the mobile phone and the TV are related; in addition, the timestamp information and voiceprint information can also be combined. information for further accurate judgment.
  • the voice service server can control the mobile phone and TV based on the semantic analysis of the voice instructions uploaded by the mobile phone and TV.
  • the voice service server can generate a control instruction corresponding to the mobile phone, such as sending a screen casting instruction to the mobile phone; where the screen casting instruction can include but is not limited to: the device identification of the TV (such as an access address).
  • the mobile phone receives the screen casting instruction sent by the voice service server, it can connect to the TV according to the access address of the TV. There is no need for the mobile phone and the TV to be connected to the same local area network.
  • This application can be implemented based on the user's voice instructions.
  • Authentication related to mobile phones and TVs can further enable the content corresponding to the display interface of the mobile phone to be projected onto the TV for display.
  • the voice service server can also generate a control instruction corresponding to the TV, such as sending an instruction to accept the screencasting to the TV, where the instruction to accept the screencasting can include but is not limited to: the device identification of the mobile phone, and instructing the TV After connecting to the mobile phone, instruct the mobile phone to send the data content of the displayed page to the TV.
  • the voice service server is not limited to determining the voice request information of the two terminal devices. After being closely related, the control methods of the two terminal devices are respectively controlled. In actual implementation, the control instructions corresponding to the two terminal devices or any one of the terminal devices can be generated based on the semantic analysis results of the voice instructions, so as to realize the control instructions corresponding to the user's voice instructions. user intent.
  • the implementation of this application does not limit the number and sending method of the first control instruction sent by the server device to the first terminal device or the second control instruction sent by the second terminal device.
  • the control instruction can be once or multiple times, if necessary. Multiple control instructions are required to realize the user intention corresponding to the user's voice instruction, and the voice service server can send multiple control instructions. For example, when the voice service server sends a control instruction for the first time, the terminal device may not be able to receive the control instruction correctly, and the terminal device may wait for a preset time and then send it again. For another example, if the user intends to require the voice service server to perform periodic control, the voice service server can send a control instruction to the terminal device when each period time arrives.
  • the design idea of the method provided by the embodiments of the present application is that multiple neighboring devices can receive the same (or almost the same or Corresponding) voice commands are uploaded to the server device by each neighboring device, and then the server device can generate different control instructions for different neighboring devices.
  • the interactive process of the method provided by this application is described in detail below.
  • FIG. 6 is a schematic interactive flow chart of a multi-device voice control method provided by an embodiment of the present application.
  • the first terminal device and the second terminal device are used as examples.
  • the type and number of terminal devices are not limited; during specific implementation, more terminal devices may be included to participate in voice processing.
  • the interaction process of each terminal device can refer to the implementation process of the first terminal device or the second terminal device.
  • the interactive process includes:
  • Step 601a The first terminal device receives the input of the first voice command.
  • Step 601b The second terminal device receives the input of the second voice command.
  • the first voice command and the second voice command can be obtained based on the same voice command of the user, and are received by the first terminal device and the second terminal device respectively, so that the server device can recognize the first terminal
  • the voice request information uploaded respectively by the device and the second terminal device is related.
  • the user voice command corresponding to the first voice command and the second voice command can be "Log in to the WeChat account on the tablet”
  • the first terminal device can be a mobile phone
  • the second terminal device can be a tablet computer.
  • the user voice command corresponding to the first voice command and the second voice command may be "project the mobile phone screen to the TV”
  • the first terminal device may be a mobile phone
  • the second terminal device may be a TV .
  • the first terminal device or the second terminal device may have been awakened through a wake-up word, or through a designated control or designated gesture on the terminal device, etc., embodiments of the present application There is no limit on the wake-up process of the terminal device.
  • step 601a and step 601b can be performed simultaneously; for example, the user can first wake up the first terminal device and the second terminal device, so that the first terminal device and the second terminal device can receive the input of the user's voice command at the same time.
  • the first voice command and the second voice command are a voice command input by the user on two different terminal devices, that is, the first voice command and the second voice command originate from the same user voice command.
  • step 601a can be executed before step 601b, or step 601b can be executed before step 602a; it should be noted that the implementation can also limit the occurrence time difference between step 601a and step 601b to be less than a specified time threshold, and the first The first voice instructions received by the terminal device and the second terminal device respectively are the same (or almost the same); for example, the user can first wake up the first terminal device, And output the first voice command with the content of "project the mobile phone screen to the TV", so that the first terminal device receives the user's voice command, then wakes up the second terminal device, and also outputs the content of "project the mobile phone screen to the TV".
  • the first server device may determine that the first terminal device and the second terminal have an association relationship for receiving the same user voice command. It can be understood that executing step 601a and step 601b at the same time can have a more accurate voice control effect on multiple devices; if executed step by step, although it comes from the same user, it will be implemented through user voice instructions with the same content but at different times. If the difference is large, the first server device can set a lower threshold when judging the similarity.
  • Step 602a The first terminal device uploads first voice request information to the first server device, where the first voice request information includes the first voice command.
  • Step 602b The second terminal device uploads second voice request information to the first server device, where the second voice request information includes the second voice command.
  • the first terminal device and the second terminal device may also carry other information in the voice request information (first voice request information or second voice request information) according to the actual application scenario.
  • the voice request information may also include but is not limited to: timestamp information, voiceprint information, and terminal device status information.
  • the timestamp information can be used to identify the time when the terminal device receives the first voice command, which can facilitate the first server device to determine whether the first terminal device and the second terminal device are in the same application scenario by combining the timestamp information. received the same voice command.
  • the voiceprint information can be used to identify user identity information, which can facilitate the first server device to combine the voiceprint information to determine whether the voice commands received by the first terminal device and the second terminal device come from the same person.
  • the terminal device status information can be, but is not limited to, used to identify the account login status on the terminal device, for example, identifying the WeChat account as logged in on the first terminal device, and the WeChat account as unlogged on the second terminal device. status, which can facilitate the first server device to determine the role of each terminal device in combination with the terminal device status information (for example, the mobile phone shown in Figure 4 can be considered as the "source device” role, and the tablet as the "target device” role; For another example, the mobile phone shown in Figure 5 can be considered as the "initiator” role, the TV as the "receiver” role, etc.), and then different control instructions can be generated according to the roles of each terminal device.
  • the first server device receives the voice request information of the first terminal device (mobile phone), the second terminal device (tablet computer), and the TV respectively, where Table 1a shows the comparison of voice request information from mobile phones and tablets by the voice service server.
  • Table 1b shows the comparison of voice request information from mobile phones and TVs by the voice service server, as follows:
  • the voice service server compares the voice request information from the mobile phone and tablet computer and determines that the voice instructions, timestamp information and voiceprint information are the same, or the similarity is greater than the specified threshold, then It can be determined that the voice request information of the mobile phone and tablet computer are related. Among them, the greater the similarity, the greater the probability of correlation between the two voice request information; the same means the correlation between the two voice request information. Furthermore, the voice service server may further analyze the semantics of the voice instruction, and generate a first control instruction corresponding to the mobile phone and/or a second control instruction corresponding to the tablet computer based on the analysis results.
  • determining whether the voice command, timestamp information and voiceprint information are the same or whether the similarity is greater than a specified threshold can be implemented by separately determining whether the first voice command and the second voice command are the same or whether the similarity is greater than the first specified threshold. threshold, whether the timestamp information uploaded by the first terminal device and the timestamp information uploaded by the second terminal device are the same or whether the similarity is greater than the second specified threshold, the voiceprint information uploaded by the first terminal device and the voiceprint information uploaded by the second terminal device. Whether the similarity of the texture information is greater than the third specified threshold.
  • determining the similarity of the voice command can be implemented, for example, by determining the time domain parameters, frequency domain parameters, etc.
  • the method can be implemented by extracting voiceprint features based on artificial intelligence technology and then comparing the similarity of the voiceprint features.
  • the user's voice commands may not be exactly the same, but may be corresponding.
  • the voice command initiated by the user to the mobile phone can be "cast the mobile phone screen to the TV”
  • the voice command initiated to the TV can be "display the mobile phone screen on the TV”
  • the similarity is not high, but it corresponds to the same user, so the first voice command received by the mobile phone corresponds to the second voice command received by the TV.
  • the voice service server can determine that the mobile phone has the role of "initiator”, the TV has the role of “receiver” and the intention is to cast the screen based on the “Cast the mobile phone screen to the TV" initiated by the mobile phone.
  • “Show mobile phone screen” can also determine that the mobile phone is the “initiator” role, the TV is the “receiver” role, and the intention is to cast the screen, so the voice commands of the mobile phone and tablet are corresponding.
  • the voice service server compares the voice request information from the mobile phone and the tablet computer and obtains that the similarity of the voice instructions is not greater than the specified threshold, or the similarity of the timestamp information is not greater than the specified threshold. , or the similarity of the voiceprint information is not greater than the specified threshold, it can be determined that the voice request information of the mobile phone and the TV are not relevant.
  • the first server device can usually receive multiple pieces of voice request information.
  • the first server device can one by one based on the information contained in the voice request information. judge. For example, the first server device may first based on the similarity of timestamp information between terminal devices, Determine whether it is relevant. If it is not greater than the specified threshold, you can continue to determine the voiceprint information and other information, otherwise it can be determined to be irrelevant; and if it is finally determined that the similarity of the voice commands between the terminal devices is not greater than the specified threshold, you can determine Related to the voice request information uploaded by the terminal device. It should be noted that when this application is implemented, the order in which the first server device performs the judgment is not limited. In this way, by filtering or excluding some information in the voice request information, the processing efficiency of determining whether the voice request information uploaded by the terminal device is relevant can be improved.
  • Step 603 If it is determined that the first voice request information and the second voice request information are related according to the first voice command and the second voice command, the first server device performs the following steps 604a and 604b. at least one of them.
  • the first server device after receiving the voice instructions from each terminal device, performs a similarity comparison as shown in Table 1a and Table 1b to determine the relevant voice request information, that is, in the same application scenario.
  • Multiple terminal devices receive the same voice command; on the other hand, semantic analysis, slot and other processing can be performed, which can be implemented by first recognizing the voice command of each terminal device as text content, and then performing the user's intention based on the obtained text content. Understanding or slot analysis, etc., to determine the role and intention of each terminal device in this application scenario.
  • the first server device can determine that multiple terminal devices receive the same voice command in the same application scenario based on the first voice request information from the mobile phone and the second voice request information from the tablet; based on Therefore, the first server device can continue to analyze the semantics of the voice command such as "Log in to WeChat account on the tablet” to determine the role of the mobile phone as the "source device” and the tablet as the “target device” role and intention in this application scenario. It is "Log in to WeChat account”.
  • the first server device may be, for example, a voice service server, which is mainly used to process the voice request information uploaded by each terminal device.
  • the first server device may generate a first control instruction for the first terminal device and/or a second control instruction corresponding to the second terminal device according to the determined roles and intentions of each terminal device when relevant.
  • Step 604a The first server device generates and sends the first control instruction to the first terminal device, where the first control instruction is used to perform operations related to the first voice instruction; the first voice instruction The related operations are used for collaborative processing of services with the second terminal device.
  • the operation related to the first voice instruction may be determined by the speech analysis result of the first voice instruction by the first server device.
  • the first server device can also obtain the terminal device status information on each terminal device (that is, the login status of the WeChat account) from the voice request information of each terminal device.
  • the related operation of the first voice command may be to generate a first control command that instructs the mobile phone to initiate an authorization request to the account server and instructs the tablet
  • the second control instruction to initiate a login request to the account server.
  • Step 604b The first server device generates and sends the second control instruction to the second terminal device, where the second control instruction is used to perform operations related to the second voice instruction; the second voice instruction
  • the related operations are used for collaborative processing of services with the first terminal device.
  • the related operations of the second voice command may be determined by the voice analysis result of the second voice command by the first server device.
  • the first server device can send a first control instruction to the mobile phone according to the semantic analysis result of the voice instruction; wherein, the first control instruction can include a tablet device.
  • Identity such as access address (for example, the access address can be a MAC address or IP address, etc.).
  • the related operation of the second voice command may be to instruct the mobile phone to access the tablet according to the access address of the tablet.
  • the first server device can control each terminal device to collaboratively implement business processing, thereby reducing the complexity of the user's operation.
  • step 603 it may be determined that step 604a, step 604b, or both step 604a and step 604b need to be performed.
  • a first identification code (such as a voice fingerprint) that uniquely identifies the first terminal device and the second terminal device can also be generated.
  • a unique identifier universalally unique identifier, UUID
  • UUID universalally unique identifier
  • Step 701a The first server device obtains the first voice command from the first terminal device.
  • Step 701b The first server device obtains the second voice command from the second terminal device.
  • Step 701a may be obtained based on the first voice request information received in the above step 602a
  • step 701b may be obtained based on the second voice request information received in the above step 602b.
  • Step 702 The first server device determines whether the first terminal device and the second terminal device are related. It can be implemented by determining that the similarity between the first voice command of the first terminal device and the first voice command of the second terminal device belongs to the same voice input (such as voice command, timestamp information, voiceprint information, etc.) is greater than a specified threshold respectively. ), if it can be determined that they are relevant, step 703 will be continued.
  • Step 703 The first server device generates a first identification code, where the first identification code is used to identify that the first terminal device and the second terminal device are related.
  • the first server device may carry the first identification code through the first control instruction and the second control instruction. Furthermore, when the first terminal device and the second terminal device interact with another server respectively, they may carry the first identification code.
  • the first terminal device and the second terminal device interact with another server respectively, which can be implemented as the first terminal device sends a first request instruction to the second server device (such as an account server, etc.) according to the first control instruction.
  • the first request instruction is used to request to log in to a designated platform (for example, it can be an authorization request);
  • the second terminal device sends a second request instruction to the second server device according to the second control instruction, wherein, The second request instruction is used to request authorization to log in to the specified platform (for example, it may be a login request).
  • the mobile phone can carry the first identification code when sending an authorization request to the account server according to the first control instruction; the tablet can also carry the first identification code when sending a login request to the account server according to the second control instruction. the first identification code.
  • the account server can perform at least one of the following processes based on the received first identification codes of the mobile phone and tablet: sending a first response instruction to the first terminal device, and sending a second response instruction to the second terminal device. Response instruction; wherein, the second response instruction is used to instruct the first terminal device to authorize the second terminal device to log in to the designated platform; for example, send the second response instruction to the tablet to determine the authorization request user of the mobile phone. In response to the tablet's login request.
  • the first server device may also instruct the first terminal device to assign users of the designated platform to The account information is sent to the second terminal device without interaction from the second server device. It may be implemented that the processing performed by the first server device after step 603 is to send the first control instruction to the first terminal device.
  • the first server device may carry the device identification of the second terminal device in the first control instruction, so that the first terminal device may The device identifier sends the user account information of the specified platform to the second terminal device, where the device identifier may be the access address (such as MAC address or IP address, etc.) of the second terminal device.
  • the first server device receives the signals from the mobile phone and the TV respectively.
  • the server device can only send the first control instruction to the mobile phone without sending the second control instruction to the TV; where, The first control instruction sent to the mobile phone in this application scenario may be to indicate the access address of the TV.
  • FIG 8a is another schematic diagram of an interaction flow of a multi-device voice control method provided by an embodiment of the present application.
  • the interaction process between the first terminal device (mobile phone), the second terminal device (tablet) and the server device (voice service server and account server) includes:
  • Step 801a The user inputs the first voice command to the mobile phone through voice.
  • the first voice command may be "Log in to the WeChat account on the tablet” as shown in Figure 8a.
  • Step 801b The user inputs the second voice command to the tablet through voice.
  • Step 802a The mobile phone sends the first voice request information to the voice service server.
  • the first voice request information may include but is not limited to: the first voice command, timestamp information, voiceprint information and terminal device status information (such as the login status of the WeChat account on the mobile phone. It can be understood that if you need to log in to other APP accounts , it can be the login status of other APP accounts on the mobile phone).
  • Step 802b The tablet sends the second voice request information to the voice service server.
  • the second voice request information may include but is not limited to: the second voice command, timestamp information, voiceprint information, and terminal device status information (such as the login status of the WeChat account on the tablet).
  • the voice service server may determine the relevant voice request information based on each voice request information.
  • the terminal device that is, it is determined that the mobile phone and tablet are related to the voice request information.
  • the voice service server can generate a control instruction corresponding to each terminal device, such as a first control instruction corresponding to a mobile phone and/or a second control instruction corresponding to the mobile phone.
  • the voice service server can also generate unique identifiers for multiple terminal devices in this application scenario.
  • Step 803a The voice service server sends the first control instruction to the mobile phone.
  • the first control instruction can be used to instruct the mobile phone to send an authorization request (first request instruction) to the (third-party) account server, that is, instructing the mobile phone to send an authorization request to the (third-party) account server.
  • an authorization request first request instruction
  • the account server that manages WeChat account-related data to authorize tablet login.
  • Step 803b The voice service server sends the second control instruction to the tablet.
  • the second control instruction can be used to instruct the tablet to send a login request (second request instruction) to the (third-party) account server.
  • Step 804a The mobile phone sends an authorization request to the (third-party) account server according to the first control instruction.
  • Step 804b The tablet sends a login request to the (third-party) account server according to the second control instruction.
  • Step 805 The (third-party) account server authorizes tablet login based on the authorization request and the login request (Section 805). 2. Response command).
  • the mobile phone when the mobile phone sends the authorization request, it can carry the first identification code indicated by the voice service server, and when the tablet sends the login request, it can also carry the first identification code indicated by the voice service server. If the first identification codes of the two are the same, then the account server Matching between the current authorization request of the mobile phone and the current login request of the tablet can be achieved based on the first identification code.
  • FIG. 8b is a schematic diagram of another interaction flow of a multi-device voice control method provided by an embodiment of the present application. Still as shown in the application scenario in Figure 4, the interaction process between the first terminal device (mobile phone), the second terminal device (tablet) and the server device (voice service server and account server), in which steps 801a to 802b and The same ones shown in Figure 8a will not be repeated here.
  • the different interaction processes at least include:
  • Step 803 The voice service server sends a first control instruction to the mobile phone, where the first control instruction includes the tablet identification (device identification of the second terminal device).
  • Step 804 The mobile phone sends the user account information to the tablet according to the tablet identification.
  • the tablet can log in to the account based on the user account information sent by the mobile phone.
  • the mobile phone determines that it has a connection status with the tablet based on the tablet identification, the mobile phone can directly send the user account information according to the corresponding connection channel; wherein, the connection status can be in one of the following ways, but is not limited to To achieve: Bluetooth connection, Wi-Fi direct connection.
  • the mobile phone and the tablet are not connected, the mobile phone can send the user account information to the tablet according to the tablet access address indicated by the tablet identification.
  • this application when this application is implemented, it can realize the identification of multiple devices that need to cooperate in business processing scenarios based on the user's voice instructions, and the association of multiple terminal devices included in the scenario, and generate corresponding responses to each terminal device. control instructions, thus enabling portable operation based on user voice.
  • the user needs to use the first terminal device and scan to realize authorized login to the second terminal device, which can reduce the complexity of the user's operation.
  • FIG. 9 is another schematic diagram of an interaction flow of a multi-device voice control method provided by an embodiment of the present application.
  • the interaction process between the first terminal device (mobile phone), the second terminal device (TV) and the server device (voice service server) includes:
  • Step 901a The user inputs the first voice command to the mobile phone through voice.
  • Step 901b The user inputs the second voice command to the TV through voice.
  • the first voice command and the second voice command may be "project the screen to the TV" as shown in FIG. 9 .
  • Step 902a The mobile phone sends the first voice request information to the voice service server.
  • the first voice request information may include but is not limited to: the first voice instruction, timestamp information, and voiceprint information. It can be understood that in this scenario, terminal device status information is not required, and the voice request information can be set according to the specific scenario during specific implementation.
  • Step 902b The television sends the second voice request information to the voice service server.
  • the second voice request information may include but is not limited to: the second voice instruction, timestamp information, and voiceprint information.
  • the voice service server can determine that the current voice request information of the mobile phone and the TV are related based on the current voice request information of the mobile phone and the current voice request information of the TV. Therefore, the voice service server can generate a first control instruction indicating the access address of the TV to the mobile phone by analyzing the first voice request information and the second voice request information.
  • Step 903 The voice service server sends the first control instruction to the mobile phone.
  • the first control instruction may include but is not limited to: a device identifier of the television (such as an access address).
  • Step 904 The mobile phone connects to the TV according to the first control instruction.
  • the first terminal device and the second terminal device need to be connected to the same local area network, and the user manually performs screen projection on the first terminal device, which can reduce the complexity of the user's operation; and, this method
  • the first terminal device and the second terminal device may be in a connected state or not in a connected state.
  • the first server device may be based on the first terminal device.
  • the first voice command uploaded by one terminal device and the second voice command uploaded by the second terminal device are authenticated to determine whether the first terminal device and the second terminal device are related, thereby enabling collaborative business processing of multiple devices.
  • this application also provides a terminal device, which includes multiple functional modules; the multiple functional modules interact to implement the first terminal device or the third terminal device in the methods described in the embodiments of this application.
  • a terminal device which includes multiple functional modules; the multiple functional modules interact to implement the first terminal device or the third terminal device in the methods described in the embodiments of this application.
  • Functions performed by terminal equipment For example, step 601a performed by the first terminal device in the embodiment shown in FIG. 6 is performed, or step 601b performed by the second terminal device in the embodiment shown in FIG. 6 is performed.
  • the multiple functional modules can be implemented based on software, hardware, or a combination of software and hardware, and the multiple functional modules can be arbitrarily combined or divided based on specific implementation.
  • this application also provides a terminal device.
  • the terminal device includes at least one processor and at least one memory. Computer program instructions are stored in the at least one memory.
  • the terminal device is running, the at least one processing The server performs the functions performed by the terminal device in each method described in the embodiments of this application. For example, step 601a performed by the first terminal device in the embodiment shown in FIG. 6 is performed, or step 601b performed by the second terminal device in the embodiment shown in FIG. 6 is performed.
  • this application also provides a server device.
  • the server device includes multiple functional modules; the multiple functional modules interact to implement the first server in the methods described in the embodiments of this application.
  • the multiple functional modules can be implemented based on software, hardware, or a combination of software and hardware, and the multiple functional modules can be arbitrarily combined or divided based on specific implementation.
  • this application also provides a server device.
  • the server device includes at least one processor and at least one memory. Computer program instructions are stored in the at least one memory.
  • the At least one processor performs functions performed by the server device in each method described in the embodiments of this application. For example, steps 602a to 604b performed by the first server device in the embodiment shown in FIG. 6 are performed.
  • this application also provides a multi-device voice control system, which includes at least two terminal devices and a server device; wherein the at least two terminal devices perform collaborative processing of services.
  • the at least two terminal devices may be the first terminal device and the second terminal device in the above embodiment
  • the server device may be the first server device and the second server device in the above embodiment.
  • the computer program product includes: a computer program (which can also be called a code, or an instruction).
  • a computer program which can also be called a code, or an instruction.
  • the computer program When the computer program is run, it causes the computer to execute the steps described in the embodiments of this application. Each method.
  • the present application also provides a computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium.
  • the computer program is executed by a computer, the computer is caused to execute the embodiments of the present application. Each method is described.
  • this application also provides a chip, which is used to read the computer program stored in the memory and implement the methods described in the embodiments of this application.
  • this application provides a chip system.
  • the chip system includes a processor and is used to support computing.
  • the computer device implements the methods described in the embodiments of this application.
  • the chip system further includes a memory, and the memory is used to store necessary programs and data of the computer device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种多设备的语音控制系统及方法,用以提供一种通过语音来同时控制多个邻近设备的技术方案,可以降低多设备协同进行业务处理场景下的操作繁琐度。第一终端设备接收并响应于用户的第一语音指令,向第一服务端设备上传第一语音请求信息,第一语音请求信息包含第一语音指令;第二终端设备接收并响应于用户的第二语音指令,向第一服务端设备上传第二语音请求信息,第二语音请求信息包含第二语音指令;第一服务端设备根据第一语音指令和第二语音指令,若确定第一语音请求信息和第二语音请求信息之间相关,执行以下处理中的至少一种:生成并向第一终端设备发送第一控制指令;生成并向第二终端设备发送第二控制指令。

Description

一种多设备的语音控制系统及方法
相关申请的交叉引用
本申请要求在2022年03月18日提交中华人民共和国知识产权局、申请号为202210272315.6、申请名称为“一种多设备的语音控制系统及方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及音频技术领域,尤其涉及一种多设备的语音控制系统及方法。
背景技术
随着半导体技术和软件技术的发展,终端设备出现了各种形态;例如,手机,平板电脑,电视,车载设备,以及各种家用电器等。目前,很多业务可能涉及到多个设备,需要多个设备协同进行业务处理;例如,视频会议时可以将便携机画面投影到大屏上、在多个设备上登录相同的账号、将手机上正在播放的视频投屏到电视上等。
相关技术中,在需要多个设备协同进行业务处理的场景中,通常需要用户手动操作。比如,用户在手机上已经登录账号,如果需要在电脑上登录该账号,一般需要用户通过手机手动授权登录。又比如,用户想将手机上正在播放的视频投屏到电视上,需要用户在手机上手动进行投屏操作。由此,相关技术在多设备协同进行业务处理下,存在操作繁琐等问题。
因此,如何降低多设备协同进行业务处理场景下的操作繁琐度,是具有研究意义的。
发明内容
本申请实施例提供一种多设备的语音控制系统及方法,用以提供一种通过语音来同时控制多个邻近设备的技术方案,可以降低多设备协同进行业务处理场景下的操作繁琐度。
第一方面,本申请实施例提供了一种多设备的语音控制系统,其中,第一终端设备接收并响应于用户的第一语音指令,向第一服务端设备上传第一语音请求信息,所述第一语音请求信息包含所述第一语音指令;以及,第二终端设备接收并响应于用户的第二语音指令,向所述第一服务端设备上传第二语音请求信息,所述第二语音请求信息包含所述第二语音指令;所述第一服务端设备根据所述第一语音指令和所述第二语音指令,若确定所述第一语音请求信息和所述第二语音请求信息之间相关,执行以下处理中的至少一种:生成并向所述第一终端设备发送第一控制指令,所述第一控制指令用于执行所述第一语音指令的相关操作,所述第一语音指令的相关操作用于与所述第二终端设备进行业务的协同处理;生成并向所述第二终端设备发送第二控制指令,所述第二控制指令用于执行所述第二语音指令的相关操作;所述第二语音指令的相关操作用于与所述第一终端设备进行业务的协同处理。
该方法中,多个终端设备均接收用户的语音指令,服务端设备可以通过对语音指令的分析,识别出相关的终端设备,也即用户可以通过语音指令实现多设备的控制的场景,从 而可以由服务端设备分别指示各终端设备执行接收到语音指令的相关操作,以实现多设备协同进行业务处理。相比于相关技术中在多设备协同进行业务处理场景下,需要用户手动操作甚至多次手动操作才能实现,本申请提供的方法可以降低用户的操作繁琐度,从而可以提升用户体验。
在一种可能的设计中,所述第一服务端设备确定所述第一语音请求信息和所述第二语音请求信息之间相关,包括但不限于以下方式中的至少一种:(1)确定所述第一语音指令和所述第二语音指令相同;(2)确定所述第一语音指令和所述第二语音指令之间的相似度大于第一指定阈值;(3)确定所述第一语音指令和所述第二语音指令相对应。
该设计中,服务端设备通过对多个终端设备分别上传的语音指令,可以实现对终端设备之间是否需要协同进行业务处理的识别,从而可以基于用户的语音指令实现多设备的控制,进而可以降低用户的操作繁琐度。
在一种可能的设计中,所述第一语音请求信息和所述第二语音请求信息分别还包含但不限于以下信息中的至少一种:时间戳信息、声纹信息、终端设备状态信息。
该设计中,终端设备在向服务端设备上传语音业务请求信息时,除了语音指令信息之外,还可以包含其他一些可以有利于确定终端设备之间是否相关的判断的其他信息,从而可以更准确地实现基于用户的语音指令对多设备的语音控制。
在一种可能的设计中,所述第一服务端设备确定所述第一语音请求信息和所述第二语音请求信息之间相关,还包括但不限于以下方式中的一种或多种:确定所述第一终端设备上传的时间戳信息和所述第二终端设备上传的时间戳信息相同、或相似度大于第二指定阈值;确定所述第一终端设备上传的声纹信息和所述第二终端设备上传的声纹信息相同、或相似度大于第三指定阈值。
该设计中,为了提升对终端设备之间是否需要协同进行业务处理的识别的准确性,服务端设备还可以结合终端设备上传的一些其他信息,来确定终端设备之间是否相关,从而可以提升多设备的语音控制的准确性;并且,通过声纹信息的判断,还可以提升多设备的语音控制的安全性。
在一种可能的设计中,所述第一服务端设备根据所述第一语音指令和所述第二语音指令,执行以下处理中的至少一种;包括:所述第一服务端设备对所述第一语音指令和所述第二语音指令进行语义分析;以及,所述第一服务端设备根据所述第一终端设备对应的终端设备状态信息和所述第二终端设备对应的终端设备状态信息,确定所述第一终端设备和所述第二终端设备的状态;所述第一服务端设备根据所述语义分析的结果、所述第一终端设备和所述第二终端设备的状态,执行所述以下处理中的至少一种。
该设计中,服务端设备在识别到终端设备之间相关时,进一步可以根据用户的语音指令所对应的用户意图,确定各终端设备需要执行的相关操作并指示给对应的终端设备,从而可以基于用户的语音指令,实现对多个终端设备的语音控制。并且,服务端设备结合各终端设备的状态,还可以生成更准确的控制指令,从而可以保证多设备的语音控制的准确性。
在一种可能的设计中,若所述第一服务端设备执行的处理为所述生成并向所述第一终端设备发送第一控制指令,则所述第一控制指令中包含所述第二终端设备的设备标识,所述第二终端设备的设备标识用于所述第一终端设备根据所述第二终端设备的设备标识,执行所述第一语音指令的相关操作;或者,若所述第二服务端设备执行的处理为所述生成并 向所述第二终端设备发送第二控制指令,则所述第二控制指令中包含所述第一终端设备的设备标识,所述第一终端设备的设备标识用于所述第二终端设备根据所述第一终端设备的设备标识,执行所述第二语音指令的相关操作。
该设计中,服务端设备基于语音指令所对应的用户意图,一些可能的场景中,确定部分终端设备需要进行相应操作,该相应操作通常需要与另一部分终端设备进行协同处理。此时,服务端设备可以在向部分终端设备发送的控制指令中携带另一部分终端设备的设备标识,以使所述部分终端设备确定需要与哪些终端设备实现协同处理。这样,通过该设计,无需第一终端设备和第二终端设备处于连接状态或者处于相同的局域网中,第一终端设备或第二终端设备可以根据控制指令中携带的另一终端设备的设备标识,实现与另一终端设备进行协同处理。
在一种可能的设计中,还包括:所述第一服务端设备生成第一标识码,所述第一标识码用于标识所述第一终端设备和所述第二终端设备相关。
该设计中,为了便于实现对相关的第一终端设备和第二终端设备的控制,通过第一标识码可以保证对所述第一终端设备和所述第二终端设备进行多设备的语音控制的处理效率。例如,若第一终端设备和第二终端设备在接收到控制指令后,还需要与第一服务端设备或第二服务端设备或其他设备进行交互,可以通过第一标识码来快速识别第一终端设备和第二终端设备相关。
在一种可能的设计中,所述系统还包括第二服务端设备,其中:所述第一终端设备根据所述第一控制指令,向所述第二服务端设备发送第一请求指令,所述第一控制指令和所述第一请求指令携带所述第一标识码;以及,所述第二终端设备根据所述第二控制指令,向所述第二服务端设备发送第二请求指令,所述第二控制指令和所述第二请求指令携带所述第一标识码;所述第二服务端设备根据所述第一标识码,执行以下处理中的至少一种:向所述第一终端设备发送第一应答指令,向所述第二终端设备发送第二应答指令。
该设计中,一些可能的场景下,实现对第一终端设备和第二终端设备的语音控制,还可以通过其他服务端设备来实现。此时,通过为第一终端设备和第二终端设备生成的第一标识码,可以快速地确定第一终端设备和第二终端设备相关,从而可以保证多设备的语音控制的处理效率。
在一种可能的设计中,所述第一请求指令用于请求登录指定平台,所述第二请求指令用于请求授权对指定平台的登录;所述第二终端设备执行的处理为所述向所述第二终端设备发送第二应答指令,所述第二应答指令用于指示所述第一终端设备授权所述第二终端设备登录所述指定平台。
该设计中,给出了对第一终端设备和第二终端设备的语音控制为登录指定平台的场景。相比于相关技术中,通常需要用户在已登录设备上通过扫一扫方式,扫描未登录设备上生成的二维码的方式,本申请提供的方法,可以基于用户的语音指令实现对多设备的语音控制,从而可以降低用户的操作繁琐度。
在一种可能的设计中,所述第一语音指令和所述第二语音指令用于指示但不限于以下场景中的任一种:在所述第一终端设备或所述第二终端设备上登录指定平台、将所述第一终端设备接入所述第二终端设备或将所述第二终端设备接入所述第一终端设备。
该设计中,给出了基于用户的语音指令可以实现多设备的语音控制的场景。通过本申请提供的方法,可以基于用户的语音指令实现对多设备的语音控制,从而可以降低用户的 操作繁琐度。
在一种可能的设计中,所述第一语音指令和所述第二语音指令为基于用户的同一语音指令,且分别由所述第一终端设备和所述第二终端设备接收的。
该设计中,通过多个终端设备同时接收用户的语音指令,可以更准确的实现基于用户的语音指令实现对多设备的语音控制,从而不仅可以降低用户的操作繁琐度,还可以提高多设备的语音控制的准确性。
第二方面,本申请实施例还提供了一种多设备的语音控制方法,包括:第一终端设备接收用户的第一语音指令;所述第一终端设备响应于所述第一语音指令,向第一服务端设备上传第一语音请求信息,所述第一语音请求信息包含所述第一语音指令;所述第一终端设备接收所述第一服务端设备发送的第一控制指令,所述第一控制指令用于执行所述第一语音指令的相关操作;所述第一语音指令的相关操作用于与所述第二终端设备进行业务的协同处理;其中,所述第一控制指令为所述第一服务端设备在确定所述第一语音请求信息与第二终端设备上传的第二语音请求信息相关时生成的。
在一种可能的设计中,所述第一语音请求信息和所述第二语音请求信息分别还包含但不限于以下信息中的至少一种:时间戳信息、声纹信息、终端设备状态信息。
在一种可能的设计中,所述第一控制指令中包含所述第二终端设备的设备标识,所述第二终端设备的设备标识用于所述第一终端设备根据所述第二终端设备的设备标识,执行所述第一语音指令的相关操作。
在一种可能的设计中,所述第一控制指令中包含第一标识码;所述第一标识码用于标识所述第一终端设备和所述第二终端设备相关。
在一种可能的设计中,所述方法还包括:所述第一终端设备根据所述第一控制指令,向第二服务端设备发送第一请求指令,所述第一控制指令和所述第一请求指令携带所述第一标识码,以使所述第二服务端设备根据所述第一标识码,执行以下处理中的至少一种:向所述第一终端设备发送第一应答指令,向所述第二终端设备发送第二应答指令。
在一种可能的设计中,所述第一请求指令用于请求登录指定平台;或者,所述第一请求指令用于请求授权对指定平台的登录;所述方法还包括:若所述第一请求指令用于请求授权对指定平台的登录,所述第一终端设备接收第一应答指令,所述第一应答指令用于指示所述第二终端设备授权所述第一终端设备登录所述指定平台。
在一种可能的设计中,所述第一语音指令和所述第二语音请求信息中包含的第二语音指令用于指示但不限于以下场景中的任一种:在所述第一终端设备或所述第二终端设备上登录指定平台、将所述第一终端设备接入所述第二终端设备或将所述第二终端设备接入所述第一终端设备。
在一种可能的设计中,所述第一语音指令和所述第二语音请求信息中包含的第二语音指令为基于用户的同一语音指令,且分别由所述第一终端设备和所述第二终端设备接收的。
第三方面,本申请实施例还提供了一种多设备的语音控制方法,包括:第一服务端设备接收第一终端设备上传的第一语音请求信息,所述第一语音请求信息包含所述第一语音指令;以及,所述第一服务端设备接收第二终端设备上传的第二语音请求信息,所述第二语音请求信息包含所述第二语音指令;所述第一服务端设备根据所述第一语音指令和所述第二语音指令,若确定所述第一语音请求信息和所述第二语音请求信息之间相关,执行以下处理中的至少一种:生成并向所述第一终端设备发送第一控制指令,所述第一控制指令 用于执行所述第一语音指令的相关操作;所述第一语音指令的相关操作用于与所述第二终端设备进行业务的协同处理;生成并向所述第二终端设备发送第二控制指令,所述第二控制指令用于执行所述第二语音指令的相关操作;所述第二语音指令的相关操作用于与所述第一终端设备进行业务的协同处理。
在一种可能的设计中,所述第一服务端设备确定所述第一语音请求信息和所述第二语音请求信息之间相关,包括但不限于以下方式中的至少一种:确定所述第一语音指令和所述第二语音指令相同;确定所述第一语音指令和所述第二语音指令之间的相似度大于第一指定阈值;确定所述第一语音指令和所述第二语音指令相对应。
在一种可能的设计中,所述第一语音请求信息和所述第二语音请求信息分别还包含但不限于以下信息中的至少一种:时间戳信息、声纹信息、终端设备状态信息。
在一种可能的设计中,所述第一服务端设备确定所述第一语音请求信息和所述第二语音请求信息之间相关,还包括但不限于以下方式中的一种或多种:确定所述第一终端设备上传的时间戳信息和所述第二终端设备上传的时间戳信息相同、或相似度大于第二指定阈值;确定所述第一终端设备上传的声纹信息和所述第二终端设备上传的声纹信息相同、或相似度大于第三指定阈值。
在一种可能的设计中,所述第一服务端设备根据所述第一语音指令和所述第二语音指令,执行以下处理中的至少一种;包括:所述第一服务端设备对所述第一语音指令和所述第二语音指令进行语义分析;以及,所述第一服务端设备根据所述第一终端设备对应的终端设备状态信息和所述第二终端设备对应的终端设备状态信息,确定所述第一终端设备和所述第二终端设备的状态;所述第一服务端设备根据所述语义分析的结果、所述第一终端设备和所述第二终端设备的状态,执行所述以下处理中的至少一种。
在一种可能的设计中,若所述第一服务端设备执行的处理为所述生成并向所述第一终端设备发送第一控制指令,则所述第一控制指令中包含所述第二终端设备的设备标识,所述第二终端设备的设备标识用于所述第一终端设备根据所述第二终端设备的设备标识,执行所述第一语音指令的相关操作;或者,若所述第二服务端设备执行的处理为所述生成并向所述第二终端设备发送第二控制指令,则所述第二控制指令中包含所述第一终端设备的设备标识,所述第一终端设备的设备标识用于所述第二终端设备根据所述第一终端设备的设备标识,执行所述第二语音指令的相关操作。
在一种可能的设计中,所述方法还包括:所述第一服务端设备生成第一标识码,所述第一标识码用于标识所述第一终端设备和所述第二终端设备相关。
在一种可能的设计中,所述第一终端设备根据所述第一控制指令,向所述第二服务端设备发送第一请求指令,所述第一控制指令和所述第一请求指令携带所述第一标识码;以及,所述第二终端设备根据所述第二控制指令,向所述第二服务端设备发送第二请求指令,所述第二控制指令和所述第二请求指令携带所述第一标识码;所述第二服务端设备根据所述第一标识码,执行以下处理中的至少一种:向所述第一终端设备发送第一应答指令,向所述第二终端设备发送第二应答指令。
在一种可能的设计中,所述第一请求指令用于请求登录指定平台,所述第二请求指令用于请求授权对指定平台的登录;所述第二终端设备执行的处理为所述向所述第二终端设备发送第二应答指令,所述第二应答指令用于指示所述第一终端设备授权所述第二终端设备登录所述指定平台。
在一种可能的设计中,所述第一语音指令和所述第二语音指令用于指示但不限于以下场景中的任一种:在所述第一终端设备或所述第二终端设备上登录指定平台、将所述第一终端设备接入所述第二终端设备或将所述第二终端设备接入所述第一终端设备。
在一种可能的设计中,所述第一语音指令和所述第二语音指令为基于用户的同一语音指令,且分别由所述第一终端设备和所述第二终端设备接收的。
第四方面,本申请实施例还提供了一种终端设备,包括:一个或多个处理器;一个或多个存储器;所述一个或多个存储器,用于存储一个或多个计算机程序以及数据信息;其中所述一个或多个计算机程序包括指令;当所述指令被所述一个或多个处理器执行时,使得所述终端设备执行如上述第二方面中任一项可能的设计中所述的方法。
第五方面,本申请实施例还提供了一种服务端设备,包括:一个或多个处理器;一个或多个存储器;所述一个或多个存储器,用于存储一个或多个计算机程序以及数据信息;其中所述一个或多个计算机程序包括指令;当所述指令被所述一个或多个处理器执行时,使得所述服务端设备执行如上述第三方面中任一项可能的设计中所述的方法。
第六方面,本申请实施例还提供了一种多设备的语音控制系统,包括至少两个如上述第四方面所述的终端设备、和如上述第五方面的服务端设备。
第七方面,本申请实施例提供了一种计算机可读存储介质,计算机可读介质存储有计算机程序(也可以称为代码,或指令)当其在计算机上运行时,使得计算机执行上述第二方面或第三方面中任一种可能的设计中的方法。
第八方面,本申请实施例提供了一种计算机程序产品,计算机程序产品包括:计算机程序(也可以称为代码,或指令),当计算机程序被运行时,使得计算机执行上述第二方面或第三方面中任一种可能的设计中的方法。
第九方面,本申请实施例还提供一种终端设备上的图形用户界面,该终端设备具有显示屏、一个或多个存储器、以及一个或多个处理器,所述一个或多个处理器用于执行存储在所述一个或多个存储器中的一个或多个计算机程序,所述图形用户界面包括所述终端设备执行本申请实施例第二方面任一可能的设计时显示的图形用户界面。
上述第二方面至第九方面中任一方面的有益效果请具体参阅上述第一方面中各种可能的设计的有益效果,在此不再赘述。
附图说明
图1a为本申请实施例提供的多设备管理的应用场景示意图;
图1b为本申请实施例提供的对应图1a示出的应用场景的流程示意图;
图2为本申请实施例提供的一种可能的终端设备的硬件结构示意图;
图3为本申请实施例提供的一种终端设备的软件结构框图;
图4为本申请实施例提供的一种多设备的语音控制方法的应用场景图之一;
图5为本申请实施例提供的一种多设备的语音控制方法的应用场景图之二;
图6为本申请实施例提供的一种多设备的语音控制方法的交互流程示意图之一;
图7为本申请实施例提供的一种多设备的语音控制方法的流程示意图;
图8a为本申请实施例提供的一种多设备的语音控制方法的交互流程示意图之二;
图8b为本申请实施例提供的一种多设备的语音控制方法的交互流程示意图之二;
图9为本申请实施例提供的一种多设备的语音控制方法的交互流程示意图之三。
具体实施方式
随着社会的快速发展,终端设备的形态越来越多,例如手机、平板电脑、电视等;并且,终端设备越来越普及。终端设备不但具有通信功能、还具有强大的处理能力、存储能力、照相功能等。终端设备通过操作系统执行相应的应用程序,用户可以使用终端设备打电话、发短消息、浏览网页、看视频等。并且,为了方便用户在不同的终端设备之间协同进行业务处理,目前存在多种方式可以实现对多设备的管理,例如多设备登录相同的账号,设备之间的投屏等。
结合背景技术中介绍的内容,相关技术中,在需要多个设备协同进行业务处理的场景中,通常需要用户手动操作。
示例性的,参阅图1a,为本申请实施例提供的多设备管理的应用场景示意图。在该场景中,假设用户在图1a中(a)示出的手机上,安装有某一个应用程序(application,APP),并且已经登录用户账号。如果需要在其他终端设备上的相同APP上,登录该相同的用户账号。目前,通常可以采用借助用户账号已经登录的设备,且通过扫一扫方式,来扫描未登录的设备上的二维码,以此来实现已登录设备对未登录设备的授权登录。如图1a所示,用户可以在(b)中的平板电脑上手动调整出用于进行请求授权登录的二维码;然后通过(a)示出的手机,采用扫一扫方式,来实现对未登录设备的授权登录。
基于图1a示出的场景,以下通过图1b示出的流程示意图介绍具体实现过程。
S101、未登录设备对应的后台服务器向APP开发平台请求APP的授权登录。
S102、APP开发平台返回二维码。
S103、未登录设备对应的后台服务器控制未登录设备显示二维码。
S104、用户通过已登录设备扫描二维码。可以理解,该流程操作为用户手动操作。
S105、用户通过已登录设备授权登录,并指示给APP开发平台。
S106、APP开发平台告知未登录设备对应的后台服务器已授权。
S107、未登录设备对应的后台服务器向APP开发平台请求用户账号数据。
S108、APP开发平台向未登录设备对应的后台服务器返回用户账号数据。
通过以上实现过程,可以得到该场景中需要用户手动操作设备,才能完成用户通过已登录设备实现对未登录设备的授权登录。
有鉴于此,本申请实施例提供一种多设备的语音控制系统及方法,可以通过语音来关联多个邻近设备,以及同时实现对多个邻近设备的协同管理,用以完成需要多个设备协同进行业务处理的业务任务。设计思想主要为多个邻近设备同时获取用户的语音指令,并将包含但不限于语音指令的语音请求信息发送给服务侧设备。服务侧设备可以根据接收到的来自多个设备的语音请求信息,关联上报相同语音指令的多个邻近设备,并为每个设备生成对应的控制指令。因此,通过本申请提供的系统或方法,具有操作简便、交互方式更方便等特点。其中,邻近设备表示多个可以满足同时接收到用户的同一语音指令的终端设备,例如,处于相同房间的手机和电视、处于同一桌面的电脑和手机等。
下面将结合附图,对本申请实施例进行详细描述。
可以理解的是,本申请实施例的终端设备可以是诸如智能家居设备(例如,智能电视,智慧屏,智能音箱等)、手机、平板电脑、可穿戴设备(例如,手表、头盔、耳机等)、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超 级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等具有语音指令输入能力的设备。可以理解的是,本申请实施例对终端设备的具体类型不作任何限制。
本申请实施例可以应用到的终端设备,示例性实施例包括但不限于搭载 或者其它操作系统的便携式终端设备。上述便携式终端设备也可以是其它便携式终端设备,诸如具有触敏表面(例如触控面板)的膝上型计算机(Laptop)等。
图2示出了一种可能的终端设备的硬件结构示意图。其中,所述终端设备200包括:射频(radio frequency,RF)电路210、电源220、处理器230、存储器240、输入单元250、显示单元260、音频电路270、通信接口280、以及无线保真(wireless-fidelity,Wi-Fi)模块290等部件。本领域技术人员可以理解,图2中示出的终端设备200的硬件结构并不构成对终端设备200的限定,本申请实施例提供的终端设备200可以包括比图示更多或更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图2中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
下面结合图2对所述终端设备200的各个构成部件进行具体的介绍:
所述RF电路210可用于通信或通话过程中,数据的接收和发送。特别地,所述RF电路210在接收到基站的下行数据后,发送给所述处理器230处理;另外,将待发送的上行数据发送给基站。通常,所述RF电路210包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(low noise amplifier,LNA)、双工器等。
此外,RF电路210还可以通过无线通信网络和其他设备进行通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(global system of mobile communication,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址(code division multiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA)、长期演进(long term evolution,LTE)、电子邮件、短消息服务(short messaging service,SMS)等。
Wi-Fi技术属于短距离无线传输技术,所述终端设备200通过Wi-Fi模块290可以连接访问接入点(access point,AP),从而实现数据网络的访问。所述Wi-Fi模块290可用于通信过程中,数据的接收和发送。
所述终端设备200可以通过所述通信接口280与其他设备实现物理连接。可选的,所述通信接口280与所述其他设备的通信接口通过电缆连接,实现所述终端设备200和其他设备之间的数据传输。
由于在本申请实施例中,所述终端设备200能够实现通信业务,与服务侧设备(例如可以包含但不限于:语音业务服务器、账号服务器等)实现交互,因此所述终端设备200需要具有数据传输功能,即所述终端设备200内部需要包含通信模块。虽然图2示出了所述RF电路210、所述Wi-Fi模块290、和所述通信接口280等通信模块,但是可以理解的是,所述终端设备200中存在上述部件中的至少一个或者其他用于实现通信的通信模块(如蓝牙模块),以进行数据传输。
例如,当所述终端设备200为手机时,所述终端设备200可以包含所述RF电路210, 还可以包含所述Wi-Fi模块290,或可以包含蓝牙模块(图2中未示出);当所述终端设备200为计算机时,所述终端设备200可以包含所述通信接口280,还可以包含所述Wi-Fi模块290,或可以包含蓝牙模块(图2中未示出);当所述终端设备200为平板电脑时,所述终端设备200可以包含所述Wi-Fi模块,或可以包含蓝牙模块(图2中未示出)。
所述存储器240可用于存储软件程序以及模块。所述处理器230通过运行存储在所述存储器240的软件程序以及模块,从而执行所述终端设备200的各种功能应用以及数据处理。可选的,所述存储器240可以主要包括存储程序区和存储数据区。其中,存储程序区可存储操作系统(主要包括内核层、系统层、应用程序框架层和应用程序层等各自对应的软件程序或模块)。
此外,所述存储器240可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述输入单元250可用于接收用户输入的数字或字符信息等多种不同类型的数据对象的编辑操作,以及产生与所述终端设备200的用户设置以及功能控制有关的键信号输入。可选的,输入单元250可包括触控面板251以及其他输入设备252。
其中,所述触控面板251,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在所述触控面板251上或在所述触控面板251附近的操作),并根据预先设定的程序驱动相应的连接装置。
可选的,所述其他输入设备252可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
所述显示单元260可用于显示由用户输入的信息或提供给用户的信息以及所述终端设备200的各种菜单。所述显示单元260即为所述终端设备200的显示系统,用于呈现界面,实现人机交互。所述显示单元260可以包括显示面板261。可选的,所述显示面板261可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置。本申请实施例中,在终端设备上可以不设置显示单元260,例如智能音箱设备无需设置显示屏;或者,在终端设备上设置显示单元260,并通过显示单元260显示终端设备200通过麦克风271接收到的语音指令所对应的显示内容,例如,若麦克风271接收到的语音指令为“打开并登录即时通信应用程序A”,则可在显示面板261上显示对应即时通信应用程序A的显示界面等。
所述处理器230是所述终端设备200的控制中心,利用各种接口和线路连接各个部件,通过运行或执行存储在所述存储器240内的软件程序和/或模块,以及调用存储在所述存储器240内的数据,执行所述终端设备200的各种功能和处理数据,从而实现基于所述终端设备200的多种业务。本申请实施例中,处理器230用来实现本申请实施例提供的方法,进而提供一种可以通过语音来同时控制多个邻近设备的技术方案,从而可以降低多设备进行业务处理场景下的操作繁琐度。
所述终端设备200还包括用于给各个部件供电的电源220(比如电池)。可选的,所述电源220可以通过电源管理系统与所述处理器230逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗等功能。
如图2所示,终端设备200还包括音频电路270、麦克风271和扬声器272,可提供用户与终端设备200之间的音频接口。音频电路270可用于将音频数据转换为扬声器272能够识别的信号,并将信号传输到扬声器272,由扬声器272转换为声音信号输出。麦克 风271用于收集外部的声音信号(如人说话的声音、或者其它声音等),并将收集的外部的声音信号转换为音频电路270能够识别的信号,发送给音频电路270。音频电路270还可用于将麦克风271发送的信号转换为音频数据,再将音频数据输出至RF电路210以发送给比如另一终端设备,或者将音频数据输出至存储器240以便后续进一步处理。本申请实施例中,麦克风271收集外部的声音信号的触发场景可以为用户通过点击终端设备200的显示界面上的语音输入控件(如智慧助手、语音助手等)触发的,也可以为用户通过预设唤醒词来唤醒的,本申请对此不进行限定。
尽管未示出,所述终端设备200还可以包括至少一种传感器、摄像头等,在此不再赘述。至少一种传感器可以包含但不限于压力传感器、气压传感器、加速度传感器、距离传感器、指纹传感器、触摸传感器、温度传感器等。
本申请实施例涉及的操作系统(operating system,OS),是运行在终端设备200上的最基本的系统软件。以手机为例,操作系统可以是鸿蒙系统(HarmonyOS)或安卓(android)系统或IOS系统。终端设备200的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以采用分层架构的操作系统为例,示例性说明终端设备200的软件结构。
图3为本申请实施例提供的一种终端设备的软件结构框图。如图3所示,终端设备的软件结构可以是分层架构,例如可以将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将操作系统分为五层,从上至下分别为应用程序层,应用程序框架层(framework,FWK),运行时和系统库,内核层,以及硬件层。
应用程序层可以包括一系列应用程序包。如图3所示,应用程序层可以包括相机、设置、皮肤模块、用户界面(user interface,UI)、三方应用程序等。其中,三方应用程序可以包括WLAN、音乐、通话、蓝牙、视频等。
在本申请一些实施例中,应用程序层可以用于实现编辑界面的呈现,上述编辑界面可以用于用户查看或进行操作等。例如,若手机包含显示面板261,用户可以在显示面板261显示的主界面上显示即时通信应用程序的相关界面等。
一种可能的实现方式中,应用程序可以使用java语言开发,通过调用应用程序框架层所提供的应用程序编程接口(application programming interface,API)来完成,开发者可以通过应用程序框架层来与操作系统的底层(例如硬件层、内核层等)进行交互,开发自己的应用程序。该应用程序框架层主要是操作系统的一系列的服务和管理系统。
应用程序框架层为应用程序层的应用程序提供应用编程接口和编程框架。应用程序框架层包括一些预定义函数。如图3所示,应用程序框架层可以包括快捷图标管理模块,窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
快捷图标管理模块用于对终端设备上显示的快捷图标进行管理,例如创建快捷图标、移除快捷图标、监控快捷图标是否满足显示条件等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供终端设备的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,终端设备振动,指示灯闪烁等。
在本申请一些实施例中,该应用程序框架层主要负责调用与硬件层之间通信的服务接口,以将用户进行操作的操作请求传递到硬件层,所述操作请求可以包含用户通过语音指令控制打开或登录某一APP的操作请求等。
运行时包括核心库和虚拟机。运行时负责操作系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是操作系统的核心库。应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(media libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
在一些实施例中,三维图形处理库可以用于绘制三维的运动轨迹图像,2D图形引擎可以用于绘制二维的运动轨迹图像。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
硬件层可以包括各类传感器,例如加速度传感器、陀螺仪传感器、触摸传感器等。
通常终端设备200可以同时运行多个应用程序。较为简单的,一个应用程序可以对应一个进程,较为复杂的,一个应用程序可以对应多个进程。每个进程具备一个进程号(进程ID)。
结合上述图2中对终端设备的硬件结构的介绍,以及图3中对终端设备的软件框架的介绍,下面结合多个实施例和附图,示例性说明终端设备执行本申请实施例中提出的一种多设备的语音控制方法的软件以及硬件的工作原理。
应理解,本申请实施例中“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示: 单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一(项)个”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a、b或c中的至少一项(个),可以表示:a,b,c,a和b,a和c,b和c,或a、b和c,其中a、b、c可以是单个,也可以是多个。
本申请实施例涉及的多个,是指大于或等于两个。
另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
此外,本申请实施例中,“终端设备”、“设备”等可以混用,即指可以用于实现本申请实施例的各种设备;本申请实施例中的“应用”和“应用程序”也可以混用,均指具有一定业务提供能力的程序或客户端等,也就是说应用和客户端也可混用,比如视频客户端、即时通信客户端也可以称之为视频应用或即时通信应用等。
应理解,终端设备的硬件结构可以如图2所示,软件架构可以如图3所示,其中,终端设备中的软件架构对应的软件程序和/或模块可以存储在存储器240中,处理器230可以运行存储器240中存储的软件程序和应用以执行本申请实施例提供的一种多设备的语音控制方法的流程。
为了便于理解本申请提供的一种多设备的语音控制方法,以下结合图4至图9中所示的内容,对采用本申请提供的方法的实现过程进行介绍。
本申请实施例适用于需要多设备协同进行业务处理的应用场景。首先通过以下多个示例对本申请实施例适用的应用场景进行说明。可以理解,本申请实施时并不限定于以下应用场景。
一种可能的应用场景中,如图1a所示,在需要借助用户账号已经登录的手机,来实现在平板电脑上同样登录该相同的用户账号的场景下,相关技术中通常需要用户借助手机且通过扫一扫方式,来实现在平板电脑上登录用户账号。本申请实施时,用户可以通过语音指令来同时控制手机和平板电脑,从而可以实现通过相同的语音指令来进行对手机和平板电脑不同的控制,进以实现在平板电脑上登录与手机上相同的用户账号。
参阅图4,为本申请实施例提供的一种多设备的语音控制方法的应用场景图。如图4中的(a)示出的手机上,已经登录有的微信账号(即手机为已登录设备);而在如图4中的(b)示出的平板电脑(也可简称“平板”)上,未登录微信账号(即平板为未登录设备)。在该场景下,手机和平板首先可以同时或几乎同时获取用户的语音指令,如图4中的“在平板上登录微信账号”。然后,手机和平板可以分别将包含该条语音指令的语音请求信息上传到第一服务端设备(例如语音业务服务器)进行处理;其中,语音请求信息还可以包括但不限于:时间戳信息、声纹信息、终端设备状态信息。
语音业务服务器接收到多个终端设备上传的语音请求信息之后,可以对各终端设备上传的语音请求信息进行分析,比如根据手机和平板上传的语音指令,确定手机和平板上传的语音请求信息是否相关。例如,若分析到手机和平板上传的语音指令相同、或者相似度大于指定阈值或者相对应时,则确定手机和平板上传的语音请求信息相关;此外,还可以结合时间戳信息和声纹信息等信息进行进一步的精确判断。
进一步的,语音业务服务器可以根据对手机和平板上传的语音指令的语义分析,进行 手机和平板的控制。一种可选的示例中,语音业务服务器可以分别生成对应手机的第一控制指令和对应平板的第二控制指令;其中,第一控制指令可以为指示手机向账号服务器发起授权请求,第二控制指令可以为指示平板向第二服务端设备(如账号服务器)发起登录请求。这样,账号服务器接收到平板发送的登录请求和手机发送的授权请求之后,可以根据手机的授权请求对平板的登录请求进行应答,以实现在平板上登录微信。另一种可选的示例中,语音业务服务器也可以生成对应手机的第一控制指令,此时第一控制指令可以包含平板的设备标识,且指示手机基于平板的设备标识向平板发送包含用户账号和密钥的授权信息,这样平板在接收到手机发送的授权信息之后,可以直接根据所述用户账号和密钥进行登录。
另一种可能的应用场景中,除需要借助账号服务器进行账号登录的场景之外,还可以实现屏幕投影等场景,例如,将手机的显示画面投影到电视上等。相关技术中实现该场景通常不仅需要在手机和电视连接在相同的局域网下,还需要用户在手机上通过手动操作投屏控件,来实现将手机的显示界面投屏到电视。本申请实施时,手机和电视无需连接在相同的局域网下或处于连接状态,用户可以通过语音指令来实现同时控制手机和电视,从而可以实现通过相同的语音指令来进行对手机和电视不同的控制,进以实现在电视上显示手机的显示界面。
参阅图5,为本申请实施例提供的一种多设备的语音控制方法的应用场景图。在该场景下,手机和电视首先可以同时获取用户的语音指令,如图5中的用户的语音指令为“把手机屏幕投到电视上”,如图5中的(a)示出的手机获取到用户的语音指令“把手机屏幕投到电视上”,及如图5中的(b)示出的电视也获取到用户的语音指令“把手机屏幕投到电视上”。然后,手机和电视可以分别将包含该条语音指令的语音请求信息上传到服务端设备(例如语音业务服务器)进行处理;其中,语音请求信息还可以包括但不限于:时间戳信息、声纹信息。
语音业务服务器接收到多个终端设备上传的语音请求信息之后,可以对各终端设备上传的语音请求信息进行分析,比如根据手机和电视上传的语音指令,确定手机和电视上传的语音请求信息是否相关。例如,若分析到手机和电视上传的语音指令相同、或者相似度大于指定阈值或者相对应时,则确定手机和电视上传的语音请求信息相关;此外,还可以结合时间戳信息和声纹信息等信息进行进一步的精确判断。
进一步的,语音业务服务器可以根据对手机和电视上传的语音指令的语义分析,进行对手机和电视的控制。一种可选的示例中,语音业务服务器可以生成对应手机的控制指令,例如向手机发送投屏指令;其中,投屏指令中可以包含但不限于:电视的设备标识(例如接入地址)。这样,手机接收到语音业务服务器发送的投屏指令之后,可以根据电视的接入地址连接到电视,无需手机和电视必须接入相同的局域网的条件,本申请实施时可以基于用户的语音指令实现对手机和电视相关的鉴权,从而可以进一步实现将手机的显示界面对应的内容投影到电视上进行显示。另一种可选的示例中,语音业务服务器也可以生成对应电视的控制指令,例如向电视发送接受投屏指令,其中,接受投屏指令可以包括但不限于:手机的设备标识,以及指示电视在与手机连接之后,指示手机将显示页面的数据内容发送给电视。
可以理解,本申请实施时,不限定语音业务服务器在确定两个终端设备的语音请求信 息相关之后,分别对两个终端设备的控制方式,实际实现时可根据对语音指令的语义分析结果,生成对应两个终端设备或者其中任一终端设备的控制指令,以实现用户语音指令所对应的用户意图。
此外,本申请实施时不限定服务端设备向第一终端设备发送的第一控制指令或第二终端设备发送的第二控制指令的次数和发送方式,控制指令可以为一次或多次,若需要多次控制指令才能实现用户语音指令所对应的用户意图,语音业务服务器可以发送多次控制指令。例如,语音业务服务器第一次发送控制指令时,可能存在终端设备无法正确接收到控制指令,则可以等待预设时间之后再次发送。又例如,用户意图需要语音业务服务器进行周期性控制,则语音业务服务器可以在每次周期时刻到达时,向终端设备发送控制指令。
基于结合图4和图5示出的内容对本申请实施例可能适用的应用场景进行说明,可以得到本申请实施例提供的方法的设计思想为,通过多个邻近设备可以接收相同(或几乎相同或相对应)的语音指令,并分别由每个邻近设备上传到服务端设备,进而可以由服务端设备针对不同的邻近设备生成不同的控制指令。以下对本申请提供的方法的交互过程进行具体说明。
参阅图6,为本申请实施例提供的一种多设备的语音控制方法的交互流程示意图。需要说明的是,该实施例中以第一终端设备和第二终端设备作为示例,本申请实施时,不限定终端设备的类型和数量;具体实现时,可以包含更多的终端设备参与到语音控制中,若包含更多的终端设备,各终端设备的交互流程可以参阅第一终端设备或者第二终端设备的实现过程。该交互流程包括:
步骤601a、第一终端设备接收第一语音指令的输入。
步骤601b、第二终端设备接收第二语音指令的输入。
其中,第一语音指令和第二语音指令可以基于用户的同一语音指令得到的,且分别由所述第一终端设备和所述第二终端设备接收,从而可以实现服务端设备识别到第一终端设备与第二终端设备分别上传的语音请求信息是相关的。结合图4示出的应用场景,第一语音指令和第二语音指令对应的用户语音指令可以为“在平板上登录微信账号”,第一终端设备可以为手机,第二终端设备可以为平板电脑。结合图5示出的应用场景,第一语音指令和第二语音指令对应的用户语音指令可以为“把手机屏幕投到电视上”,第一终端设备可以为手机,第二终端设备可以为电视。
示例性的,第一终端设备或第二终端设备在接收用户的语音指令的输入之前,可以已经通过唤醒词进行唤醒,或者通过终端设备上的指定控件或指定手势等进行唤醒,本申请实施例对终端设备的唤醒过程不作限定。
其中,本申请具体实现时,不限定步骤601a和步骤601b的执行顺序。可选的,步骤601a和步骤601b可以同时执行;例如,用户可以首先唤醒第一终端设备和第二终端设备之后,使得第一终端设备和第二终端设备此时可以同时接收用户语音指令的输入,也可以理解为第一语音指令和第二语音指令为用户的一条语音指令在两个不同终端设备的输入,即第一语音指令和第二语音指令来源于相同的用户语音指令。另一可选的,步骤601a可以先于步骤601b执行,或步骤601b先于步骤602a执行;需要说明的是,实施时还可以限定步骤601a和步骤601b的发生时间差小于指定时间阈值,且第一终端设备和第二终端设备分别接收到的第一语音指令相同(或几乎相同);例如,用户可以首先唤醒第一终端设备, 并输出内容为“把手机屏幕投到电视上”的第一语音指令,以使得第一终端设备接收到用户的语音指令,然后唤醒第二终端设备,并同样输出内容为“把手机屏幕投到电视上”的第二语音指令,以使得第二终端设备同样接收到用户的语音指令;这样,在第一终端设备和第二终端设备分别将包含语音指令的语音请求信息上传至第一服务端设备之后,第一服务端设备可以确定第一终端设备和第二终端之间具有接收相同的用户语音指令的关联关系。可以理解,步骤601a和步骤601b同时执行可以具有更准确的多设备的语音控制效果;若逐步执行,虽然来源于同一用户,但由于通过相同内容但不同时刻的用户语音指令来实现,此时差异较差大,第一服务端设备在判断相似度时可以设置较低的阈值。
步骤602a、第一终端设备上传第一语音请求信息到第一服务端设备,所述第一语音请求信息包含所述第一语音指令。
步骤602b、第二终端设备上传第二语音请求信息到第一服务端设备,所述第二语音请求信息包含所述第二语音指令。
本申请实施时,第一终端设备和第二终端设备还可以根据实际应用场景,在语音请求信息(第一语音请求信息或者第二语音请求信息)中携带其他信息。示例性的,所述语音请求信息还可以包括但不限于:时间戳信息、声纹信息、终端设备状态信息。其中,
1)、时间戳信息可以用于标识终端设备接收第一语音指令的时间,进而可以便于第一服务端设备结合所述时间戳信息,确定第一终端设备和第二终端设备是否在同一应用场景中接收到相同的语音指令。
2)、声纹信息可以用于标识用户身份信息,进而可以便于第一服务端设备结合所述声纹信息,确定第一终端设备和第二终端设备接收到的语音指令是否来自于同一人。
3)、终端设备状态信息可以但不限用于标识在终端设备的账户登录状态,例如标识微信账号在第一终端设备上为已登录状态,以及该微信账号在第二终端设备上为未登录状态,进而可以便于第一服务端设备结合所述终端设备状态信息,确定各终端设备的角色(例如可以认为图4中示出的手机为“源设备”角色,平板为“目标设备”角色;又例如可以认为图5中示出的手机为“发起方”角色,电视为“接收方”角色等),然后可以根据各终端设备的角色对应生成不同的控制指令。
举例来说,参阅以下表1a和表1b,第一服务端设备(语音业务服务器)分别接收到的第一终端设备(手机)、第二终端设备(平板电脑)、电视的语音请求信息,其中表1a为语音业务服务器对来自手机和平板电脑的语音请求信息的对比,表1b为语音业务服务器对来自手机和电视的语音请求信息的对比,如下:
表1a
根据以上表1a示出的内容,语音业务服务器通过对来自手机和平板电脑的语音请求信息的信息对比,分别得到语音指令、时间戳信息和声纹信息相同、或者相似度均大于指定阈值,则可以确定手机和平板电脑的本次语音请求信息相关。其中,相似度越大,表示两条语音请求信息之间相关的概率越大;相同则表示两条语音请求信息之间相关。进一步的,语音业务服务器可以进一步对语音指令的语义进行分析,根据分析结果生成对应手机的第一控制指令和/或对应平板电脑的第二控制指令。
需要说明的是,判断语音指令、时间戳信息和声纹信息是否相同或者相似度是否大于指定阈值,可实施为分别判断第一语音指令和第二语音指令是否相同或相似度是否大于第一指定阈值、第一终端设备上传的时间戳信息和第二终端设备上传的时间戳信息是否相同或相似度是否大于第二指定阈值、第一终端设备上传的声纹信息和第二终端设备上传的声纹信息的相似度是否大于第三指定阈值。其中,判断语音指令的相似度例如可实施为判断语音指令对应音频的时域类参数、频域类参数等方式,距离越小则可表示第一终端设备和第二终端设备上传的语音指令相似度越大;判断时间戳信息的相似度例如可实施为判断时间差等方式,时间差越小则可表示第一终端设备和第二终端设备上传的语音指令相似度越大;判断声纹信息的相似度例如可实施为基于人工智能技术进行声纹特征提取之后,比较声纹特征的相似度等。
此外,若用户采用对第一终端设备和第二终端设备逐一进行语音指令的控制,来实现同一用户意图,此时用户的语音指令可以不完全相同,而是相对应。结合图5的示例,用户向手机发起的语音指令可以为“把手机屏幕投到电视上”,而向电视发起的语音指令可以为“在电视上显示手机屏幕”,两个语音指令虽不相同,相似度也不高,但对应相同的用户,因此手机接收的第一语音指令和电视接收的第二语音指令相对应。换言之,语音业务服务器根据手机发起的“把手机屏幕投到电视上”可以确定手机为“发起方”角色,电视为“接收方”角色以及意图为投屏,并且根据电视发起的“在电视上显示手机屏幕”也可以确定手机为“发起方”角色,电视为“接收方”角色以及意图为投屏,因此手机和平板的语音指令是相对应的。
表1b
根据以上表1b示出的内容,语音业务服务器通过对来自手机和平板电脑的语音请求信息的信息对比,在得到语音指令的相似度不大于指定阈值、或时间戳信息的相似度不大于指定阈值、或声纹信息的相似度不大于指定阈值,则可以确定手机和电视的本次语音请求信息不相关。
此外,第一服务端设备通常可以接收到多条语音请求信息,在判断哪几条语音请求信息属于同一应用场景下的实现过程中,第一服务端设备可以根据语音请求信息中包含的信息逐一判断。例如,第一服务端设备可以首先基于终端设备之间的时间戳信息的相似度, 判断是否相关,若不大于指定阈值,则可以继续判断声纹信息等其他信息,否则可以确定不相关;以及,若最后确定终端设备之间的语音指令的相似度不大于指定阈值,则可以确定终端设备上传的语音请求信息相关。需要说明的是,本申请实施时,不限定第一服务端设备进行判断的先后顺序。这样,通过语音请求信息中的一些信息的筛选或排除,可以提高判断终端设备上传的语音请求信息是否相关的处理效率。
步骤603、第一服务端设备根据所述第一语音指令和所述第二语音指令,若确定所述第一语音请求信息和所述第二语音请求信息之间相关,执行以下步骤604a和604b中的至少一种。
示例性的,第一服务端设备接收到各终端设备的语音指令之后,一方面进行如表1a和表1b所示的相似度对比,用于判断相关的语音请求信息,也即处于同一应用场景下多个终端设备接收相同的语音指令;另一方面可以进行语义分析、槽位等处理,可实施为首先将各终端设备的语音指令识别为文字内容,然后基于得到的文字内容进行用户的意图理解或槽位解析等,用以确定各终端设备在该应用场景下的角色和意图。例如,第一服务端设备一方面可以基于来自手机的第一语音请求信息和来自平板的第二语音请求信息,可以确定手机和平板处于同一应用场景下多个终端设备接收相同的语音指令;基于此,第一服务端设备可以继续根据如“在平板上登录微信账号”的语音指令的语义分析,可以确定在该应用场景下手机作为“源设备”角色、平板作为“目标设备”角色以及意图为“登录微信账号”。其中,此时第一服务端设备例如可以为语音业务服务器,主要用于对各终端设备上传的语音请求信息进行处理。
进一步的,第一服务端设备可以根据确定的各终端设备在相关时的角色和意图,对应分别生成对第一终端设备的第一控制指令和/或对应第二终端设备的第二控制指令。
步骤604a、第一服务端设备生成并向所述第一终端设备发送所述第一控制指令,所述第一控制指令用于执行所述第一语音指令的相关操作;所述第一语音指令的相关操作用于与所述第二终端设备进行业务的协同处理。示例性的,第一语音指令的相关操作可以为第一服务端设备对第一语音指令的语音分析结果确定的。例如,在需要登录微信账号的应用场景下,第一服务端设备还可以从各终端设备的语音请求信息中,还可获取到各终端设备上的终端设备状态信息(即微信账号的登录状态),如图4中示出的手机上登录有微信账号,平板上未登录有微信账号,则第一语音指令的相关操作可以为生成指示手机向账号服务器发起授权请求的第一控制指令和指示平板向账号服务器发起登录请求的第二控制指令。
步骤604b、第一服务端设备生成并向所述第二终端设备发送所述第二控制指令,所述第二控制指令用于执行所述第二语音指令的相关操作;所述第二语音指令的相关操作用于与所述第一终端设备进行业务的协同处理。同理,第二语音指令的相关操作可以为第一服务端设备对第二语音指令的语音分析结果确定的。又例如,在需要进行手机投屏到电视的应用场景下,第一服务端设备可以根据语音指令的语义分析结果,向手机发送第一控制指令;其中,第一控制指令中可以包含平板的设备标识,比如接入地址(例如,接入地址可以为MAC地址或者IP地址等)。此时,第二语音指令的相关操作可以为指示手机根据平板的接入地址接入平板。
这样,通过第一服务端设备基于各终端设备的语音请求信息,可以由第一服务端设备控制各终端设备协同实现业务处理,从而可以降低用户的操作繁琐度。
可以理解,根据步骤603可以确定需要执行步骤604a、或执行步骤604b、或者同时执行步骤604a和步骤604b。
此外,在所述第一终端设备或所述第二终端设备上登录指定平台的应用场景下,可选的,若第一服务端设备指示第一终端设备和第二终端设备需要与第二服务端设备进行交互,则还可以生成唯一标识所述第一终端设备和所述第二终端设备的第一标识码(例如语音指纹),例如,可以通过唯一标识符(universally unique identifier,UUID)算法来实现。参阅图7,为本申请实施例提供的一种多设备的语音控制方法的流程示意图。
步骤701a、第一服务端设备获取第一终端设备的第一语音指令。
步骤701b、第一服务端设备获取第二终端设备的第二语音指令。
其中,步骤701a可以是基于以上步骤602a中接收到的第一语音请求信息之后得到的,步骤701b可以是基于以上步骤602b中接收到的第二语音请求信息之后得到的。
步骤702、第一服务端设备判断第一终端设备和第二终端设备是否相关。可实施为,通过判断第一终端设备的第一语音指令和第二终端设备的第一语音指令属于相同的语音输入(如语音指令、时间戳信息和声纹信息等的相似度分别大于指定阈值),可以确定相关,则继续执行步骤703。
步骤703、第一服务端设备生成第一标识码,所述第一标识码用于标识所述第一终端设备和所述第二终端设备相关。
本申请实施时,第一服务端设备可以通过第一控制指令和第二控制指令携带所述第一标识码。以及,第一终端设备和第二终端设备分别与另一服务器交互时,可以携带所述第一标识码。
其中,第一终端设备和第二终端设备分别与另一服务器交互,可实施为第一终端设备根据所述第一控制指令,向第二服务端设备(例如账号服务器等)发送第一请求指令,其中,所述第一请求指令用于请求登录指定平台(例如可以为授权请求);第二终端设备根据所述第二控制指令,向第二服务端设备发送第二请求指令,其中,所述第二请求指令用于请求授权对指定平台的登录(例如可以为登录请求)。例如,如图4所示的场景,手机可以根据第一控制指令,向账号服务器发送授权请求时携带所述第一标识码;平板可以根据第二控制指令,向账号服务器发送登录请求时同样携带所述第一标识码。
这样,账号服务器可以基于接收到的手机和平板的第一标识码,执行以下处理中的至少一种:向所述第一终端设备发送第一应答指令,向所述第二终端设备发送第二应答指令;其中,所述第二应答指令用于指示所述第一终端设备授权所述第二终端设备登录所述指定平台;例如,向平板发送第二应答指令,以确定手机的授权请求用于响应平板的登录请求。
另一可选的实施例中,在所述第一终端设备或所述第二终端设备上登录指定平台的应用场景下,第一服务端设备还可通过指示第一终端设备将指定平台的用户账号信息发送给第二终端设备,无需第二服务端设备的交互。可实施为,所述第一服务端设备在步骤603之后执行的处理为向第一终端设备发送第一控制指令。可选的,第一服务端设备可以在所述第一控制指令中携带第二终端设备的设备标识,以第一终端设备可以基于第二终端设备 的设备标识将指定平台的用户账号信息发送给所述第二终端设备,其中,设备标识可以为第二终端设备的接入地址(例如MAC地址或者IP地址等)。
在将所述第一终端设备接入所述第二终端设备或将所述第二终端设备接入所述第一终端设备的应用场景下,第一服务端设备在分别接收到手机和电视的语音请求信息之后,由于确定该应用场景的意图为将手机的显示画面投屏到电视上,则服务端设备可以仅向手机发送第一控制指令,而无需向电视发送第二控制指令;其中,该应用场景下向手机发送的第一控制指令可以为指示电视的接入地址。
为便于理解本申请实施例提供的方法,以下分别结合图4和图5示出的应用场景对本申请实施例提供的方法进行详细介绍。
参阅图8a,为本申请实施例提供的一种多设备的语音控制方法的另一交互流程示意图。如图4示出的应用场景,第一终端设备(手机)、第二终端设备(平板)和服务端设备(语音业务服务器和账号服务器)之间的交互流程,包括:
步骤801a、用户通过语音输入第一语音指令给手机。其中,第一语音指令可以为如图8a中示出的“在平板上登录微信账号”。
步骤801b、用户通过语音输入第二语音指令给平板。
步骤802a、手机向语音业务服务器发送第一语音请求信息。其中,第一语音请求信息可以包含但不限于:所述第一语音指令、时间戳信息、声纹信息和终端设备状态信息(如手机上微信账号的登录状态,可以理解如果需要登录其他APP账号,则可以为手机上其他APP账号的登录状态)。
步骤802b、平板向语音业务服务器发送第二语音请求信息。其中,第二语音请求信息可以包含但不限于:所述第二语音指令、时间戳信息、声纹信息和终端设备状态信息(如平板上微信账号的登录状态)。
示例性的,语音业务服务器在接收到多个终端设备发送的多个语音请求信息(包含所述第一语音请求信息和所述第二语音请求信息)之后,可以根据各语音请求信息确定相关的终端设备,也即可以确定手机和平板此次语音请求信息相关。然后,针对相关的每组终端设备,语音业务服务器可以生成对应各终端设备的控制指令,例如对应手机的第一控制指令和/或对应手机的第二控制指令。此外,针对相关的每组终端设备,语音业务服务器还可以生成唯一标识处于该应用场景下的多个终端设备。
步骤803a、语音业务服务器向手机发送第一控制指令。如图4示出的应用场景,基于手机上的微信账号处于已登录状态,第一控制指令可以用于指示手机向(第三方)账号服务器发送授权请求(第一请求指令),也即指示手机通知管理微信账号相关数据的账号服务器可以授权平板的登录。
步骤803b、语音业务服务器向平板发送第二控制指令。如图4示出的应用场景,基于平板上的微信账号处于未登录状态,第二控制指令可以用于指示平板向(第三方)账号服务器发送登录请求(第二请求指令)。
步骤804a、手机根据所述第一控制指令,向(第三方)账号服务器发送授权请求。
步骤804b、平板根据所述第二控制指令,向(第三方)账号服务器发送登录请求。
步骤805、(第三方)账号服务器根据所述授权请求和所述登录请求,授权平板登录(第 二应答指令)。其中,手机发送授权请求时可以携带语音业务服务器指示的第一标识码,平板发送登录请求时也可以携带语音业务服务器指示的第一标识码,且两者的第一标识码相同,则账号服务器可以根据第一标识码实现对手机本次授权请求和平板本次登录请求之间的匹配。
参阅图8b,为本申请实施例提供的一种多设备的语音控制方法的另一交互流程示意图。仍如图4示出的应用场景,第一终端设备(手机)、第二终端设备(平板)和服务端设备(语音业务服务器和账号服务器)之间的交互流程,其中,步骤801a至802b与图8a中示出的相同,在此不再赘述,不同的交互流程至少包括:
步骤803、语音业务服务器向手机发送第一控制指令,所述第一控制指令包含平板标识(第二终端设备的设备标识)。
步骤804、手机根据平板标识向平板发送用户账号信息。这样,平板可以根据手机发送的用户账号信息进行账号登录。示例性的,若手机根据所述平板标识确定与平板具有连接状态,则手机可以直接根据对应的连接通道发送所述用户账号信息;其中,所述连接状态可以通过但不限于以下方式中的一种来实现:蓝牙连接、Wi-Fi直连。另一示例性的,若手机与平板不具有连接状态,则手机可以根据所述平板标识指示的平板接入地址,向所述平板发送所述用户账号信息。
在以上实现过程中,本申请实施时可以基于用户的语音指令实现对多设备需要协同进行业务处理场景的识别,和对该场景包含的多个终端设备的关联,以及生成对各终端设备相对应的控制指令,从而可以实现基于用户语音的便携操作。相比于相关技术中,需要用户借助第一终端设备,并通过扫一扫方式,来实现对第二终端设备的授权登录的方式,可以降低用户的操作繁琐度。
参阅图9,为本申请实施例提供的一种多设备的语音控制方法的又一交互流程示意图。如图5示出的应用场景,第一终端设备(手机)、第二终端设备(电视)和服务端设备(语音业务服务器)之间的交互流程,包括:
步骤901a、用户通过语音输入第一语音指令给手机。
步骤901b、用户通过语音输入第二语音指令给电视。其中,第一语音指令和第二语音指令可以为如图9中示出的“把屏幕投到电视上”。
步骤902a、手机向语音业务服务器发送第一语音请求信息。其中,第一语音请求信息可以包含但不限于:所述第一语音指令、时间戳信息、声纹信息。可以理解,在该场景下无需终端设备状态信息,具体实现时语音请求信息可以根据具体场景进行设置。
步骤902b、电视向语音业务服务器发送第二语音请求信息。其中,第二语音请求信息可以包含但不限于:所述第二语音指令、时间戳信息、声纹信息。
示例性的,语音业务服务器根据手机本次语音请求信息和电视本次语音请求信息,可以确定手机和电视本次语音请求信息相关。因此,语音业务服务器可以通过对第一语音请求信息和第二语音请求信息的分析,可以生成向手机指示电视的接入地址的第一控制指令。
步骤903、语音业务服务器向手机发送第一控制指令。其中,第一控制指令可以包含但不限于:电视的设备标识(如接入地址)。
步骤904、手机根据所述第一控制指令,接入电视。
通过本申请实施例提供的方法,可以基于用户语音实现投屏场景下的便携操作。相比于相关技术中,需要第一终端设备和第二终端设备连接于相同的局域网,并且用户手动在第一终端设备上进行投屏的手动操作,可以降低用户的操作繁琐度;以及,本申请实施时,无需要求第一终端设备和第二终端设备必须连接于相同的局域网内,第一终端设备和第二终端设备可以处于连接状态或未处于连接状态,第一服务端设备可以基于第一终端设备上传的第一语音指令和第二终端设备上传的第二语音指令,进行第一终端设备和第二终端设备是否相关的鉴权,从而可以实现多设备的业务协同处理。
基于以上实施例,本申请还提供一种终端设备,所述终端设备包括多个功能模块;所述多个功能模块相互作用,实现本申请实施例所描述的各方法中第一终端设备或第二终端设备所执行的功能。如执行图6所示实施例中第一终端设备执行的步骤601a,或执行图6所示实施例中第二终端设备执行的步骤601b。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
基于以上实施例,本申请还提供一种终端设备,该终端设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储计算机程序指令,所述终端设备运行时,所述至少一个处理器执行本申请实施例所描述的各方法中终端设备所执行的功能。如执行图6所示实施例中第一终端设备执行的步骤601a,或执行图6所示实施例中第二终端设备执行的步骤601b。
基于以上实施例,本申请还提供一种服务端设备,所述服务端设备包括多个功能模块;所述多个功能模块相互作用,实现本申请实施例所描述的各方法中第一服务端设备或第二服务端设备所执行的功能。如执行图6所示实施例中第一服务端设备执行的步骤602a至步骤604b。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
基于以上实施例,本申请还提供一种服务端设备,该服务端设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储计算机程序指令,所述服务端设备运行时,所述至少一个处理器执行本申请实施例所描述的各方法中服务端设备所执行的功能。如执行图6所示实施例中第一服务端设备执行的步骤602a至步骤604b。
基于以上实施例,本申请还提供一种多设备的语音控制系统,该系统包括至少两个终端设备、和服务端设备;其中,所述至少两个终端设备进行业务的协同处理。例如,所述至少两个终端设备可以为上述实施例中的第一终端设备和第二终端设备,服务端设备可以为上述实施例中的第一服务端设备和第二服务端设备。
基于以上实施例,本申请还提供一种计算机程序产品,计算机程序产品包括:计算机程序(也可以称为代码,或指令),当计算机程序被运行时,使得计算机执行本申请实施例所描述的各方法。
基于以上实施例,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被计算机执行时,使得所述计算机执行本申请实施例所描述的各方法。
基于以上实施例,本申请还提供了一种芯片,所述芯片用于读取存储器中存储的计算机程序,实现本申请实施例所描述的各方法。
基于以上实施例,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持计 算机装置实现本申请实施例所描述的各方法。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存该计算机装置必要的程序和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (28)

  1. 一种多设备的语音控制系统,其特征在于,包括:
    第一终端设备接收并响应于用户的第一语音指令,向第一服务端设备上传第一语音请求信息,所述第一语音请求信息包含所述第一语音指令;以及,
    第二终端设备接收并响应于用户的第二语音指令,向所述第一服务端设备上传第二语音请求信息,所述第二语音请求信息包含所述第二语音指令;
    所述第一服务端设备根据所述第一语音指令和所述第二语音指令,若确定所述第一语音请求信息和所述第二语音请求信息之间相关,执行以下处理中的至少一种:
    生成并向所述第一终端设备发送第一控制指令,所述第一控制指令用于执行所述第一语音指令的相关操作;所述第一语音指令的相关操作用于与所述第二终端设备进行业务的协同处理;
    生成并向所述第二终端设备发送第二控制指令,所述第二控制指令用于执行所述第二语音指令的相关操作;所述第二语音指令的相关操作用于与所述第一终端设备进行业务的协同处理。
  2. 根据权利要求1所述的系统,其特征在于,所述第一服务端设备确定所述第一语音请求信息和所述第二语音请求信息之间相关,包括以下方式中的至少一种:
    确定所述第一语音指令和所述第二语音指令相同;
    确定所述第一语音指令和所述第二语音指令之间的相似度大于第一指定阈值;
    确定所述第一语音指令和所述第二语音指令相对应。
  3. 根据权利要求1或2所述的系统,其特征在于,所述第一语音请求信息和所述第二语音请求信息分别还包含以下信息中的至少一种:时间戳信息、声纹信息、终端设备状态信息。
  4. 根据权利要求3所述的系统,其特征在于,所述第一服务端设备确定所述第一语音请求信息和所述第二语音请求信息之间相关,还包括以下方式中的一种或多种:
    确定所述第一终端设备上传的时间戳信息和所述第二终端设备上传的时间戳信息相同、或相似度大于第二指定阈值;
    确定所述第一终端设备上传的声纹信息和所述第二终端设备上传的声纹信息相同、或相似度大于第三指定阈值。
  5. 根据权利要求3所述的系统,其特征在于,所述第一服务端设备根据所述第一语音指令和所述第二语音指令,执行以下处理中的至少一种;包括:
    所述第一服务端设备对所述第一语音指令和所述第二语音指令进行语义分析;以及,所述第一服务端设备根据所述第一终端设备对应的终端设备状态信息和所述第二终端设备对应的终端设备状态信息,确定所述第一终端设备和所述第二终端设备的状态;
    所述第一服务端设备根据所述语义分析的结果、所述第一终端设备和所述第二终端设备的状态,执行所述以下处理中的至少一种。
  6. 根据权利要求1至5中任一所述的系统,其特征在于,若所述第一服务端设备执行的处理为所述生成并向所述第一终端设备发送第一控制指令,则所述第一控制指令中包含所述第二终端设备的设备标识,所述第二终端设备的设备标识用于所述第一终端设备根据所述第二终端设备的设备标识,执行所述第一语音指令的相关操作;或者,
    若所述第二服务端设备执行的处理为所述生成并向所述第二终端设备发送第二控制指令,则所述第二控制指令中包含所述第一终端设备的设备标识,所述第一终端设备的设备标识用于所述第二终端设备根据所述第一终端设备的设备标识,执行所述第二语音指令的相关操作。
  7. 根据权利要求1至6中任一所述的系统,其特征在于,还包括:
    所述第一服务端设备生成第一标识码,所述第一标识码用于标识所述第一终端设备和所述第二终端设备相关。
  8. 根据权利要求7所述的系统,其特征在于,所述系统还包括第二服务端设备,其中:
    所述第一终端设备根据所述第一控制指令,向所述第二服务端设备发送第一请求指令,所述第一控制指令和所述第一请求指令携带所述第一标识码;以及,
    所述第二终端设备根据所述第二控制指令,向所述第二服务端设备发送第二请求指令,所述第二控制指令和所述第二请求指令携带所述第一标识码;
    所述第二服务端设备根据所述第一标识码,执行以下处理中的至少一种:向所述第一终端设备发送第一应答指令,向所述第二终端设备发送第二应答指令。
  9. 根据权利要求8所述的系统,其特征在于,所述第一请求指令用于请求登录指定平台,所述第二请求指令用于请求授权对指定平台的登录;
    所述第二终端设备执行的处理为所述向所述第二终端设备发送第二应答指令,所述第二应答指令用于指示所述第一终端设备授权所述第二终端设备登录所述指定平台。
  10. 根据权利要求1至9中任一所述的系统,其特征在于,所述第一语音指令和所述第二语音指令用于指示以下场景中的任一种:在所述第一终端设备或所述第二终端设备上登录指定平台、将所述第一终端设备接入所述第二终端设备或将所述第二终端设备接入所述第一终端设备。
  11. 根据权利要求1至10中任一所述的系统,其特征在于,所述第一语音指令和所述第二语音指令为基于用户的同一语音指令,且分别由所述第一终端设备和所述第二终端设备接收的。
  12. 一种多设备的语音控制方法,其特征在于,包括:
    第一终端设备接收用户的第一语音指令;
    所述第一终端设备响应于所述第一语音指令,向第一服务端设备上传第一语音请求信息,所述第一语音请求信息包含所述第一语音指令;
    所述第一终端设备接收所述第一服务端设备发送的第一控制指令,所述第一控制指令用于执行所述第一语音指令的相关操作;所述第一语音指令的相关操作用于与所述第二终端设备进行业务的协同处理;
    其中,所述第一控制指令为所述第一服务端设备在确定所述第一语音请求信息与第二终端设备上传的第二语音请求信息相关时生成的。
  13. 根据权利要求12所述的方法,其特征在于,所述第一语音请求信息和所述第二语音请求信息分别还包含以下信息中的至少一种:时间戳信息、声纹信息、终端设备状态信息。
  14. 根据权利要求12或13所述的方法,其特征在于,所述第一控制指令中包含所述第二终端设备的设备标识,所述第二终端设备的设备标识用于所述第一终端设备根据所述第二终端设备的设备标识,执行所述第一语音指令的相关操作。
  15. 根据权利要求12至14中任一所述的方法,其特征在于,所述第一控制指令中包含第一标识码;所述第一标识码用于标识所述第一终端设备和所述第二终端设备相关。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    所述第一终端设备根据所述第一控制指令,向第二服务端设备发送第一请求指令,所述第一控制指令和所述第一请求指令携带所述第一标识码,以使所述第二服务端设备根据所述第一标识码,执行以下处理中的至少一种:向所述第一终端设备发送第一应答指令,向所述第二终端设备发送第二应答指令。
  17. 根据权利要求16所述的方法,其特征在于,所述第一请求指令用于请求登录指定平台;或者,所述第一请求指令用于请求授权对指定平台的登录;所述方法还包括:
    若所述第一请求指令用于请求授权对指定平台的登录,所述第一终端设备接收第一应答指令,所述第一应答指令用于指示所述第二终端设备授权所述第一终端设备登录所述指定平台。
  18. 根据权利要求12至17中任一所述的方法,其特征在于,所述第一语音指令和所述第二语音请求信息中包含的第二语音指令用于指示以下场景中的任一种:在所述第一终端设备或所述第二终端设备上登录指定平台、将所述第一终端设备接入所述第二终端设备或将所述第二终端设备接入所述第一终端设备。
  19. 根据权利要求12至17中任一所述的方法,其特征在于,所述第一语音指令和所述第二语音请求信息中包含的第二语音指令为基于用户的同一语音指令,且分别由所述第一终端设备和所述第二终端设备接收的。
  20. 一种多设备的语音控制方法,其特征在于,包括:
    第一服务端设备接收第一终端设备上传的第一语音请求信息,所述第一语音请求信息包含所述第一语音指令;以及,
    所述第一服务端设备接收第二终端设备上传的第二语音请求信息,所述第二语音请求信息包含所述第二语音指令;
    所述第一服务端设备根据所述第一语音指令和所述第二语音指令,若确定所述第一语音请求信息和所述第二语音请求信息之间相关,执行以下处理中的至少一种:
    生成并向所述第一终端设备发送第一控制指令,所述第一控制指令用于执行所述第一语音指令的相关操作;所述第一语音指令的相关操作用于与所述第二终端设备进行业务的协同处理;
    生成并向所述第二终端设备发送第二控制指令,所述第二控制指令用于执行所述第二语音指令的相关操作;所述第二语音指令的相关操作用于与所述第一终端设备进行业务的协同处理。
  21. 根据权利要求20所述的方法,其特征在于,所述第一服务端设备确定所述第一语音请求信息和所述第二语音请求信息之间相关,包括以下方式中的至少一种:
    确定所述第一语音指令和所述第二语音指令相同;
    确定所述第一语音指令和所述第二语音指令之间的相似度大于第一指定阈值;
    确定所述第一语音指令和所述第二语音指令相对应。
  22. 根据权利要求20或21所述的方法,其特征在于,所述第一语音请求信息和所述第二语音请求信息分别还包含以下信息中的至少一种:时间戳信息、声纹信息、终端设备状态信息。
  23. 根据权利要求22所述的方法,其特征在于,所述第一服务端设备确定所述第一语音请求信息和所述第二语音请求信息之间相关,还包括以下方式中的一种或多种:
    确定所述第一终端设备上传的时间戳信息和所述第二终端设备上传的时间戳信息相同、或相似度大于第二指定阈值;
    确定所述第一终端设备上传的声纹信息和所述第二终端设备上传的声纹信息相同、或相似度大于第三指定阈值。
  24. 根据权利要求22所述的方法,其特征在于,所述第一服务端设备根据所述第一语音指令和所述第二语音指令,执行以下处理中的至少一种;包括:
    所述第一服务端设备对所述第一语音指令和所述第二语音指令进行语义分析;以及,所述第一服务端设备根据所述第一终端设备对应的终端设备状态信息和所述第二终端设备对应的终端设备状态信息,确定所述第一终端设备和所述第二终端设备的状态;
    所述第一服务端设备根据所述语义分析的结果、所述第一终端设备和所述第二终端设备的状态,执行所述以下处理中的至少一种。
  25. 一种终端设备,其特征在于,包括:一个或多个处理器;一个或多个存储器;
    所述一个或多个存储器,用于存储一个或多个计算机程序以及数据信息;其中所述一个或多个计算机程序包括指令;
    当所述指令被所述一个或多个处理器执行时,使得所述终端设备执行如权利要求12至19中任一项所述的方法。
  26. 一种服务端设备,其特征在于,包括一个或多个处理器;一个或多个存储器;
    所述一个或多个存储器,用于存储一个或多个计算机程序以及数据信息;其中所述一个或多个计算机程序包括指令;
    当所述指令被所述一个或多个处理器执行时,使得所述服务端设备执行如权利要求20至24中任一项所述的方法。
  27. 一种多设备的语音控制系统,其特征在于,包括至少两个如权利要求25所述的终端设备、和如权利要求26所述的服务端设备。
  28. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求12至24中任意一项所述的方法。
PCT/CN2023/080568 2022-03-18 2023-03-09 一种多设备的语音控制系统及方法 WO2023174155A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210272315.6A CN116805488A (zh) 2022-03-18 2022-03-18 一种多设备的语音控制系统及方法
CN202210272315.6 2022-03-18

Publications (1)

Publication Number Publication Date
WO2023174155A1 true WO2023174155A1 (zh) 2023-09-21

Family

ID=88022215

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/080568 WO2023174155A1 (zh) 2022-03-18 2023-03-09 一种多设备的语音控制系统及方法

Country Status (2)

Country Link
CN (1) CN116805488A (zh)
WO (1) WO2023174155A1 (zh)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190156824A1 (en) * 2017-11-17 2019-05-23 Canon Kabushiki Kaisha Voice control system, control method, and non-transitory computer-readable storage medium storing program
US10366692B1 (en) * 2017-05-15 2019-07-30 Amazon Technologies, Inc. Accessory for a voice-controlled device
CN110322878A (zh) * 2019-07-01 2019-10-11 华为技术有限公司 一种语音控制方法、电子设备及系统
US20190378516A1 (en) * 2018-06-06 2019-12-12 International Business Machines Corporation Operating a voice response system in a multiuser environment
CN110968362A (zh) * 2019-11-18 2020-04-07 北京小米移动软件有限公司 应用运行方法、装置及存储介质
US20200160856A1 (en) * 2018-11-15 2020-05-21 International Business Machines Corporation Collaborative artificial intelligence (ai) voice response system control
CN111341310A (zh) * 2020-02-19 2020-06-26 北京声智科技有限公司 基于智能音箱控制手机的系统、方法、装置和存储介质
CN112102826A (zh) * 2020-08-31 2020-12-18 南京创维信息技术研究院有限公司 一种控制语音设备多端唤醒的系统和方法
CN113127609A (zh) * 2019-12-31 2021-07-16 华为技术有限公司 语音控制方法、装置、服务器、终端设备及存储介质
CN113450792A (zh) * 2021-06-22 2021-09-28 海信视像科技股份有限公司 终端设备的语音控制方法、终端设备及服务器

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10366692B1 (en) * 2017-05-15 2019-07-30 Amazon Technologies, Inc. Accessory for a voice-controlled device
US20190156824A1 (en) * 2017-11-17 2019-05-23 Canon Kabushiki Kaisha Voice control system, control method, and non-transitory computer-readable storage medium storing program
US20190378516A1 (en) * 2018-06-06 2019-12-12 International Business Machines Corporation Operating a voice response system in a multiuser environment
US20200160856A1 (en) * 2018-11-15 2020-05-21 International Business Machines Corporation Collaborative artificial intelligence (ai) voice response system control
CN110322878A (zh) * 2019-07-01 2019-10-11 华为技术有限公司 一种语音控制方法、电子设备及系统
CN110968362A (zh) * 2019-11-18 2020-04-07 北京小米移动软件有限公司 应用运行方法、装置及存储介质
CN113127609A (zh) * 2019-12-31 2021-07-16 华为技术有限公司 语音控制方法、装置、服务器、终端设备及存储介质
CN111341310A (zh) * 2020-02-19 2020-06-26 北京声智科技有限公司 基于智能音箱控制手机的系统、方法、装置和存储介质
CN112102826A (zh) * 2020-08-31 2020-12-18 南京创维信息技术研究院有限公司 一种控制语音设备多端唤醒的系统和方法
CN113450792A (zh) * 2021-06-22 2021-09-28 海信视像科技股份有限公司 终端设备的语音控制方法、终端设备及服务器

Also Published As

Publication number Publication date
CN116805488A (zh) 2023-09-26

Similar Documents

Publication Publication Date Title
WO2021018008A1 (zh) 一种投屏方法与电子设备
US20230032922A1 (en) Assigning participants to rooms within a virtual conferencing system
US20220321617A1 (en) Automatically navigating between rooms within a virtual conferencing system
KR102350329B1 (ko) 전화 통화 동안의 실시간 공유 기법
US20240064037A1 (en) Providing a room preview within a virtual conferencing system
US11082397B2 (en) Management system and method for remote controller of electronic device
EP4315838A1 (en) Presenting participant conversations within virtual conferencing system
KR101962774B1 (ko) 애플리케이션과 연관된 신규 메시지를 처리하기 위한 방법 및 장치
CN112055072A (zh) 云端音频输入方法、装置、云系统、电子设备与存储介质
CN113268212A (zh) 投屏方法、装置、存储介质及电子设备
CN108604251A (zh) 一种多媒体文件的分享方法及终端设备
US20230259250A1 (en) Control method and apparatus, and electronic device
CN110945467B (zh) 一种免打扰方法和终端
CN111656347A (zh) 一种项目的显示方法及终端
US20230362782A1 (en) Data Sharing Method, Electronic Device, and System
WO2023174155A1 (zh) 一种多设备的语音控制系统及方法
US20240073051A1 (en) Coordinating side conversations within virtual conferencing system
EP3538981B1 (en) Layered content selection
KR20130100448A (ko) 이동 단말기 및 그 제어방법
JP2018503149A (ja) 情報入力方法、装置、プログラム及び記録媒体
WO2023273936A1 (zh) 一种壁纸设置方法、装置、存储介质及电子设备
US20120214551A1 (en) Apparatus and method for managing call notes in a wireless device
CN111142648B (zh) 一种数据处理方法和智能终端
CN114020379B (zh) 一种终端设备、信息反馈方法和存储介质
KR20120070311A (ko) 범용 편집 기능을 지원하는 이동 단말기 및 그 제어 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23769661

Country of ref document: EP

Kind code of ref document: A1