WO2022161077A1 - Procédé de commande vocale et dispositif électronique - Google Patents

Procédé de commande vocale et dispositif électronique Download PDF

Info

Publication number
WO2022161077A1
WO2022161077A1 PCT/CN2021/142083 CN2021142083W WO2022161077A1 WO 2022161077 A1 WO2022161077 A1 WO 2022161077A1 CN 2021142083 W CN2021142083 W CN 2021142083W WO 2022161077 A1 WO2022161077 A1 WO 2022161077A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
voice
recording data
recording
audio
Prior art date
Application number
PCT/CN2021/142083
Other languages
English (en)
Chinese (zh)
Inventor
王晓博
许嘉璐
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022161077A1 publication Critical patent/WO2022161077A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • the present application relates to computer technology, and in particular, to a voice control method and electronic device.
  • voice assistant As a new type of terminal application (application, APP) based on voice semantic algorithm, voice assistant provides service functions such as interactive dialogue, information query, and device control by receiving and recognizing voice signals sent by users. With the continuous development of deep learning theory and the maturity of intelligent voice hardware, voice assistant applications have become an essential software function for terminal devices such as smartphones, tablet computers, smart TVs, and smart speakers.
  • the user's living room has three devices: a speaker, a TV, and a mobile phone. All three devices have a voice assistant application installed, and the wake-up words are all "small E and small E".
  • the voice assistant application of the speaker, TV and mobile phone selects one of the three devices as the answering device by detecting the audio energy information of the wake-up word. Since the speaker is closest to the user, the three devices negotiate and select the speaker as the answering device based on the audio energy information of the wake-up word.
  • the speaker wakes up its own voice assistant application, and other devices do not respond to the wake word, that is, do not wake up their respective voice assistant applications. In this way, after the user continues to speak the voice signal, only the speaker will recognize and respond to the user's voice signal. For example, after the user speaks the voice signal "play song 112222", the speaker recognizes and responds to the voice signal. For example, the speaker responds by outputting the voice signal "Song 112222 will be played for you”.
  • the answering device recognizes and responds to the user's voice signal.
  • this processing method will have the problem of misrecognition by the answering device, that is, there is an answering device.
  • the problem that the voice signal input by the user after the wake-up word cannot be accurately recognized.
  • the present application provides a voice control method and electronic device, so as to solve the problem of misrecognition of voice control in a multi-device scenario and improve the accuracy of voice control.
  • an embodiment of the present application provides a voice control method, which can be applied to a voice control system, and the voice control system can at least include a first electronic device and a second electronic device with a voice control function.
  • the control method may include: the first electronic device and the second electronic device respectively receive a first voice command input by a user, and the first electronic device responds to the first voice command.
  • the second electronic device records and saves the recording data, and the recording is used to record the second voice command input by the user.
  • the second electronic device sends the audio recording data of the second electronic device to the first electronic device.
  • the first electronic device responds to the second voice instruction according to the recorded data of the first electronic device and/or the recorded data of the second electronic device.
  • the recording data of the first electronic device includes recording data of the second voice instruction input by the user recorded by the first electronic device.
  • the recording of the second electronic device may start before the first electronic device responds to the first voice instruction, decoupling the selection process of the answering device from the recording process of the electronic device, regardless of whether the first electronic device is determined as the Both the answering device and the second electronic device can record and save the second voice command input by the user.
  • the recording data of the second electronic device is sent to the first electronic device, and the The first electronic device responds to the second voice command.
  • the first electronic device acts as an answering device to answer the first voice command
  • the first electronic device and the second electronic device both record the second voice command and save the recorded data
  • the second electronic device sends its own recorded data
  • the first electronic device responds to the second voice instruction according to the recorded data of the first electronic device and/or the recorded data of the second electronic device.
  • the voice commands input by the user are recorded by the non-responding device
  • the answering device performs SE, ASR and other processing based on the recorded data of the answering device and/or the recorded data of the non-response device, effectively eliminating the need for the equipment in the process of selecting the answering device.
  • the communication delay between different devices can be solved, so as to solve the frame loss problem of voice control caused by delay in multi-device scenarios.
  • the answering device responds to the second voice command through the recording data collected by multiple devices collaboratively, which can solve the problem of the influence of the audio quality of the voice command picked up by the electronic device on the accuracy of ASR recognition, and improve the accuracy of voice control.
  • the method may further include: the first electronic device invokes a voice pickup instruction to the second electronic device, where the voice pickup instruction is used by the second electronic device to return the recording data of the second electronic device.
  • the recording by the second electronic device may include: recording by the second electronic device when or after the second electronic device receives the first voice instruction input by the user.
  • the second electronic device when the second electronic device receives the first voice command input by the user or after the second electronic device records the recording, that is, the second electronic device starts recording before determining the answering device, the second electronic device can record to the user The second voice command entered.
  • This can effectively eliminate the communication delay between devices in the process of selecting the answering device, thereby solving the problem of frame loss in voice control caused by delay in multi-device scenarios.
  • the method may further include: when or after the first electronic device receives the first voice instruction input by the user, recording the first electronic device, and the recording is used to record the second voice instruction input by the user.
  • the first voice command is used to wake up the voice control function of the first electronic device and/or the second electronic device.
  • the first voice instruction here may be the voice instruction of step 401 in the following embodiment shown in FIG. 3 .
  • the method may further include: the first electronic device and the second electronic device determine that the first electronic device is the answering device of the voice control system according to the audio quality information of the first voice command received by the first electronic device respectively.
  • the method may further include: during the recording process of the first electronic device and the second electronic device. , the first electronic device does not detect the second voice instruction input by the user within the preset time period, the first electronic device deletes the saved recording data and continues to record.
  • the first electronic device invokes a multi-round dialogue pause command to the second electronic device, where the multi-round dialogue pause command is used to instruct the multi-round dialogue pause to temporarily stop.
  • the second electronic device deletes the saved recording data and continues recording.
  • the first voice instruction here may be the voice instruction before step 701 in the embodiment shown in FIG. 6 below.
  • the second voice instruction here may be the voice instruction of step 703 in the following embodiment shown in FIG. 6 .
  • the method may further include: the first electronic device receiving audio quality information of the recording data of the second electronic device sent by the second electronic device.
  • This implementation can speed up the decision of the optimal radio equipment, thereby improving the response speed of the voice control.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device.
  • the audio quality information of the data and the audio quality information of the recording data of the second electronic device are used to determine the optimal audio pickup device from the voice control system.
  • the optimal radio device is the first electronic device
  • the first electronic device responds to the second voice command according to the recording data of the first electronic device, or according to the recording data of the first electronic device and the recording data of the second electronic device.
  • the first electronic device responds to the second voice command according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device.
  • the audio quality information is used to indicate the audio quality of the recording data.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device.
  • the audio content information of the data and/or the audio content information of the recording data of the second electronic device is to respond to the second voice instruction.
  • the audio content information is used to represent the audio content of the recording data.
  • the second voice command is responded according to the audio content information of the recording data of the first electronic device.
  • the audio content information of the recording data of the first electronic device is less than the audio content information of the recording data of the second electronic device
  • the second voice command is responded according to the audio content information of the recording data of the second electronic device.
  • the first electronic device can compare the audio content information of the audio recording data of the first electronic device to the audio content information. Splicing with the audio content information of the recording data of the second electronic device, and responding to the second voice command according to the spliced audio content information.
  • an embodiment of the present application provides a voice control method, which can be applied to a first electronic device of a voice control system, the voice control system can also include at least a second electronic device, and the voice control method can include: An electronic device receives the first voice command input by the user, and the first electronic device responds to the first voice command.
  • the first electronic device receives the recording data of the second electronic device sent by the second electronic device, and the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice instruction input by the user.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, and the recorded data of the first electronic device includes the first electronic device recording the second voice command input by the user. recording data.
  • the method may further include: the first electronic device invokes a voice pickup instruction to the second electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
  • the method may further include: when or after the first electronic device receives the first voice instruction input by the user, recording the first electronic device for recording the second voice instruction input by the user.
  • the first voice command is used to wake up the voice control function of the first electronic device and/or the second electronic device.
  • the method may further include: the first electronic device according to the audio quality information of the first voice command received by the first electronic device and the audio quality information of the first voice command received by the second electronic device, It is determined that the first electronic device is an answering device of the voice control system.
  • the method may further include: during the recording process of the first electronic device, the first electronic device If the second voice command input by the user is not detected within the preset time period, the first electronic device deletes the saved recording data and continues to record; the first electronic device invokes multiple rounds of dialogue pause instructions to the second electronic device, The dialogue pause instruction is used to instruct multiple rounds of dialogue to temporarily stop; the second electronic device deletes the saved recording data and continues recording.
  • the method may further include: the first electronic device receiving audio quality information of the recording data of the second electronic device sent by the second electronic device.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device.
  • the audio quality information of the data and the audio quality information of the recording data of the second electronic device are used to determine the optimal audio pickup device from the voice control system.
  • the optimal audio pickup device is the first electronic device
  • the first electronic device responds to the second voice command according to the recording data of the first electronic device.
  • the optimal radio device is the second electronic device
  • the first electronic device responds to the second voice command according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device.
  • the audio quality information is used to indicate the audio quality of the recording data.
  • the first electronic device responds to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, which may include: the first electronic device responds to the second voice command according to the recorded data of the first electronic device.
  • the audio content information of the data and/or the audio content information of the recording data of the second electronic device is to respond to the second voice instruction.
  • the audio content information is used to represent the audio content of the recording data.
  • an embodiment of the present application provides a voice control method.
  • the voice control method can be applied to a second electronic device of a voice control system.
  • the voice control system can also include at least a first electronic device.
  • the voice control method can include :
  • the second electronic device records and saves the recording data, and the recording is used to record the second voice command input by the user.
  • the second electronic device sends the recording data of the second electronic device to the first electronic device, the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice command input by the user, and the recording data is used by the first electronic device After answering the first voice instruction, answer the second voice instruction.
  • the method may further include: the second electronic device receives a voice pickup instruction called by the first electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
  • the recording by the second electronic device may include: recording by the second electronic device when or after the second electronic device receives the first voice instruction input by the user.
  • the method may further include: the second electronic device according to the audio quality information of the first voice command received by the second electronic device and the audio quality information of the first voice command received by the first electronic device, It is determined that the first electronic device is an answering device of the voice control system.
  • the method may further include: during the recording process of the second electronic device, the second electronic device receives the second electronic device to invoke multiple rounds of dialogue pause commands, The multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop; the second electronic device deletes the saved recording data and continues recording.
  • the method may further include: the second electronic device sends audio quality information of the recording data of the second electronic device to the first electronic device.
  • an embodiment of the present application provides a voice control device, the device has the function of implementing the second aspect or any possible design of the second aspect.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions, for example, a transceiver unit or module, and a processing unit or module.
  • an embodiment of the present application provides a voice control device, the device has a function of implementing the third aspect or any possible design of the third aspect.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions, for example, a transceiver unit or module, and a processing unit or module.
  • an embodiment of the present application provides an electronic device, which may include: one or more processors; one or more memories; wherein the one or more memories are used to store one or more programs ; the one or more processors are configured to run the one or more programs to implement the method according to the second aspect or any possible design of the second aspect.
  • an embodiment of the present application provides an electronic device, which may include: one or more processors; one or more memories; wherein the one or more memories are used to store one or more programs ; the one or more processors are configured to run the one or more programs to implement the method according to the third aspect or any possible design of the third aspect.
  • an embodiment of the present application provides a computer-readable storage medium, which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the second aspect or any of the second aspect.
  • a computer-readable storage medium which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the second aspect or any of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the third aspect or any of the third aspect.
  • a computer-readable storage medium which is characterized in that it includes a computer program, and when the computer program is executed on a computer, causes the computer to execute the third aspect or any of the third aspect.
  • an embodiment of the present application provides a chip, which is characterized in that it includes a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, to A method as described in the second aspect or any possible design of the second aspect is performed.
  • an embodiment of the present application provides a chip, characterized in that it includes a processor and a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, to perform the method described in the third aspect or any possible design of the third aspect.
  • embodiments of the present application provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the method described in the second aspect or any possible design of the second aspect.
  • the embodiments of the present application provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the method described in the third aspect or any possible design of the third aspect.
  • an embodiment of the present application provides a voice control system, where the voice control system includes at least a first electronic device and a second electronic device having a voice control function.
  • the first electronic device is adapted to perform the method as described in the second aspect or any possible design of the second aspect.
  • the second electronic device is configured to perform the method as described in the third aspect or any possible design of the third aspect.
  • the voice control method and electronic device of the embodiments of the present application solve the problem of frame loss in voice control in the multi-device scenario by directly recording multiple devices without performing cross-device communication, and improve voice control 's accuracy. After that, responding to the voice command input by the user through the recorded data of the multi-device collaborative sound collection can effectively solve the problem that the audio quality of the voice command picked up by the electronic device affects the accuracy of ASR recognition, and improve the accuracy of voice control.
  • FIG. 1 provides a schematic diagram of a voice control system according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a voice control method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a scenario of multi-device voice control provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another multi-device voice control scenario provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another voice control method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another multi-device voice control scenario provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a voice control device according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a voice control apparatus provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • At least one (item) refers to one or more, and "a plurality” refers to two or more.
  • “And/or” is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B exist , where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, c can be single or multiple.
  • Voice assistant An application program built on artificial intelligence, with the help of speech semantic recognition algorithm, through instant question-and-answer voice interaction with users, it helps users to complete information query, device control, text input and other operations.
  • Voice assistants usually use staged cascade processing, followed by voice wake-up, voice front-end processing, automatic speech recognition (ASR), natural language understanding (NLU), dialogue management (dialog management, DM), Basic workflows such as natural language generation (NLG) and text-to-speech (TTS) provide service functions.
  • ASR automatic speech recognition
  • NLU natural language understanding
  • DM dialogue management
  • Basic workflows such as natural language generation (NLG) and text-to-speech (TTS) provide service functions.
  • the voice front-end processing may include but is not limited to voice enhancement (speech enhancement, SE).
  • ASR can take the speech signal processed by SE noise reduction as input, and output the textual description result of the user's speech signal.
  • ASR is the basis for voice assistant applications to accurately complete subsequent recognition processing tasks.
  • the audio quality of the user's voice signal input to the ASR directly determines the accuracy of the ASR recognition result.
  • the voice control method of the embodiment of the present application can ensure the accuracy and reliability of the user voice signal input to the ASR, thereby improving the accuracy of the ASR recognition result, and then accurately completing the subsequent recognition processing task.
  • Voice wake-up The electronic device receives and detects a specific user voice signal (ie wake-up word) when the screen is locked or the voice assistant is dormant, activates or starts the voice assistant, and makes the voice assistant enter the state of waiting for voice signal input.
  • a specific user voice signal ie wake-up word
  • AEC Acoustic echo cancellation
  • the answering device In the multi-device voice control process, multiple electronic devices select an answering device through mutual communication and negotiation, and the answering device identifies and responds to the user's voice signal.
  • audio quality due to the diversity and complexity of usage scenarios, user voice commands picked up and processed by electronic devices are inevitably disturbed by various external and internal noises.
  • the interference of noise will affect the audio quality of the user's voice command picked up by the electronic device.
  • the external noise can be noises such as air conditioner fans and unrelated human voices around the device
  • the internal noise can be the audio/video played by the electronic device itself.
  • the distance and orientation between the electronic device and the user, as well as the posture of the electronic device and the performance of the microphone module, etc., will also affect the audio quality of the user's voice commands picked up by the electronic device.
  • the audio quality of the user's voice command picked up by the electronic device is poor, misrecognition will occur.
  • the communication delay caused by the cross-device communication between the multiple electronic devices and the delay caused by the selection of the answering device will cause the frame loss problem, which will lead to misidentification. .
  • the above delay will cause the user to say the voice signal "play song 112222", but the answering device only recognizes the voice signal "2222", that is, the voice signal "play song 11” is not received and recognized, which makes the answering device unable to Accurately recognize and respond to user voice commands.
  • the voice control method of the embodiment of the present application can improve the audio quality and/or reduce the time delay, so as to solve the problem of misrecognition of voice commands in the process of multi-device voice control.
  • the delay caused by the realization of multi-device wake-up and data transmission through communication is eliminated, thereby eliminating the impact of the delay on the accuracy of ASR recognition, and solving the problem of multiple devices.
  • the frame loss problem of voice control in the scene improves the accuracy of voice control.
  • the optimal radio equipment Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user.
  • the influence of audio quality of voice commands picked up by electronic devices on the accuracy of ASR recognition can be solved, and the accuracy of voice control can be improved.
  • the voice control method in the embodiment of the present application can be applied to a multi-device scenario.
  • the multi-device scenario may include a scenario where a user uses multiple electronic devices concurrently, or a scenario where user voice interaction occurs within the effective working range of multiple electronic devices.
  • each of the plurality of electronic devices has a voice control function.
  • This voice control function may be provided by a voice assistant.
  • the method of this embodiment can ensure the accuracy and reliability of the voice command input to the ASR, thereby improving the accuracy of the ASR recognition result, and further improving the accuracy of the ASR recognition result. Complete the subsequent recognition and processing tasks, and complete the response to the voice command. It makes the electronic device more intelligent, and realizes the efficient and accurate interaction between the electronic device and the user. At the same time, the user experience is improved.
  • the voice command in the embodiment of the present application refers to the command input by the user to the electronic device in the form of sound.
  • the voice command is used to enable the electronic device to provide the user with service functions such as interactive dialogue, information query, and device control.
  • the voice instruction may be a piece of voice signal input by the user through the microphone of the electronic device.
  • a voice assistant may be installed in the electronic device to enable the electronic device to implement a voice control function.
  • Voice assistants are generally dormant. The user can voice wake up the voice assistant before using the voice control function of the electronic device.
  • the voice signal to wake up the voice assistant may be called a wake-up word (or wake-up voice).
  • the wake word may be pre-registered in the electronic device.
  • the wake-up word may be "small E, small E".
  • the wake-up word may also be any other word or statement, which can be flexibly set according to requirements, and the embodiments of the present application will not illustrate them one by one.
  • the above-mentioned voice assistant may be an embedded application in the electronic device (ie, a system application of the electronic device), or may be a downloadable application.
  • Embedded applications are applications provided as part of the implementation of an electronic device such as a cell phone.
  • a downloadable application is an application that can provide its own internet protocol multimedia subsystem (IMS) connection.
  • the downloadable application may be pre-installed in the electronic device, or may be a third-party application downloaded by the user and installed in the electronic device.
  • IMS internet protocol multimedia subsystem
  • FIG. 1 is a schematic diagram of a voice control system according to an embodiment of the present application.
  • the voice control system may include multiple electronic devices, and the multiple electronic devices meet one or more of the following conditions: connected to the same wireless access point (such as a WiFi access point), or logged into the same account, or Set by the user in the same group, or the user's voice interaction occurs within the effective working range of the plurality of electronic devices.
  • the same wireless access point such as a WiFi access point
  • Set by the user in the same group or the user's voice interaction occurs within the effective working range of the plurality of electronic devices.
  • the voice control system may include three electronic devices, for example, a first electronic device 201 , a second electronic device 202 and a third electronic device 203 .
  • the first electronic device 201 , the second electronic device 202 and the third electronic device 203 all have a voice control function, for example, a voice assistant is installed.
  • the first electronic device 201 , the second electronic device 202 , and the third electronic device 203 can wake up the voice assistant with the same wake-up word, for example, "small E and small E".
  • the electronic devices described in the embodiments of the present application may be mobile phones, tablet computers, desktops, laptops, handheld computers, Laptops, desktops, ultra-mobile personal computers (UMPCs), netbooks, and cellular phones, personal digital assistants (PDAs), augmented reality (AR) ⁇ virtual reality , VR) devices, media players, TVs, smart speakers, smart watches, smart headphones and other devices.
  • PDAs personal digital assistants
  • AR augmented reality
  • VR virtual reality
  • the specific form of the electronic device is not particularly limited in the embodiments of the present application.
  • the first electronic device 201 , the second electronic device 202 and the third electronic device 203 can be the same type of electronic device, such as the first electronic device 201 , the second electronic device 202 and the third electronic device
  • the devices 203 are all mobile phones.
  • the first electronic device 201 , the second electronic device 202 and the third electronic device 203 can be different types of electronic devices, for example, the first electronic device 201 is a mobile phone, and the second electronic device 202 is a smart speaker , the third electronic device 203 is a television (as shown in FIG. 1 ).
  • the first electronic device 201, the second electronic device 202 and the third electronic device 203 directly start recording without cross-device communication, so as to solve the frame loss problem of voice control in a multi-device scenario, Improve the accuracy of voice control.
  • the first electronic device 201 , the second electronic device 202 and the third electronic device 203 can record each other without being called by other devices (eg, a central device), thus realizing a decentralized recording method.
  • This decentralized recording method does not need to perform the process of selecting a device as the calling device, which can effectively eliminate the delay caused by communication between devices and improve the accuracy of subsequent voice control.
  • one or more electronic devices are selected as the optimal sound-receiving device. Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user.
  • the embodiment of the present application can solve the problem of the influence of the audio quality of the voice command picked up by the electronic device on the accuracy of the ASR recognition by means of multi-device cooperative audio collection.
  • the voice control system may also include a server 204 .
  • the server 204 can provide intelligent voice services.
  • FIG. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Speaker 170A, Receiver 170B, Microphone 170C, Headphone Interface 170D, Sensor Module 180, Key 190, Motor 191, Indicator 192, Camera 193, Display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and an environmental sensor Light sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in this embodiment does not constitute a specific limitation on the electronic device.
  • the electronic device may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor ( image signal processor, ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and/or neural-network processing unit (NPU), etc. . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • a controller can be the nerve center and command center of an electronic device.
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuitsound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver (universal asynchronous receiver) /transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and/or Universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous receiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal serial bus
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device. While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in an electronic device can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G etc. applied on the electronic device.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the wireless communication module 160 can provide applications on electronic devices including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellite systems ( global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite systems
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify the signal, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the wireless communication module 160 may interact with other electronic devices, for example, after detecting a voice signal matching the wake-up word, send energy information of the detected voice signal to other electronic devices.
  • the electronic device in this embodiment of the present application may communicate with other electronic devices through the mobile communication module 150 and/or the wireless communication module 160 .
  • the first electronic device 201 sends a voice pickup instruction and the like to the second electronic device 202 through the communication module 150 and/or the wireless communication module 160 .
  • the antenna 1 of the electronic device is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi-zenith) satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • global positioning system global positioning system, GPS
  • global navigation satellite system global navigation satellite system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system, BDS
  • quasi-zenith satellite system quasi-zenith satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device realizes the display function through the GPU, the display screen 194, and the application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed
  • quantum dot light-emitting diode quantum dot light emitting diodes, QLED
  • the electronic device may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the electronic device can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device selects the frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, etc.
  • Video codecs are used to compress or decompress digital video.
  • An electronic device may support one or more video codecs.
  • the electronic device can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of electronic devices can be realized, such as image recognition, face recognition, speech recognition, text understanding, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be received by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound through the human mouth close to the microphone 170C, and input the sound signal into the microphone 170C.
  • the electronic device may be provided with at least one microphone 170C.
  • the electronic device may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals.
  • the electronic device may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the electronic device in this embodiment of the present application may receive a voice instruction input by the user through the microphone 170C.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 180A may be provided on the display screen 194 .
  • the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device determines the intensity of the pressure based on the change in capacitance. When a touch operation acts on the display screen 194, the electronic device detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device can also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
  • the gyro sensor 180B can be used to determine the motion attitude of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (ie, the x, y, and z axes) may be determined by the gyro sensor 180B.
  • the gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the electronic device, calculates the distance to be compensated by the lens module according to the angle, and allows the lens to counteract the shaking of the electronic device through reverse motion to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device calculates the altitude from the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device can use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the electronic device can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • Distance sensor 180F for measuring distance.
  • Electronic devices can measure distances by infrared or laser. In some embodiments, when shooting a scene, the electronic device can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • Electronic devices emit infrared light outward through light-emitting diodes.
  • Electronic devices use photodiodes to detect reflected infrared light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object in the vicinity of the electronic device. When insufficient reflected light is detected, the electronic device can determine that there is no object in the vicinity of the electronic device.
  • the electronic device can use the proximity light sensor 180G to detect that the user holds the electronic device close to the ear to talk, so as to automatically turn off the screen to save power.
  • Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the electronic device can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints. Electronic devices can use the collected fingerprint characteristics to unlock fingerprints, access application locks, take photos with fingerprints, and answer incoming calls with fingerprints.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device utilizes the temperature detected by the temperature sensor 180J to implement a temperature handling strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device may reduce the performance of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection.
  • the electronic device when the temperature is lower than another threshold, the electronic device heats the battery 142 to avoid abnormal shutdown of the electronic device caused by the low temperature.
  • the electronic device boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device, which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
  • the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device may receive key input and generate key signal input related to user settings and function control of the electronic device.
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device.
  • the electronic device can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device interacts with the network through the SIM card to realize functions such as call and data communication.
  • the electronic device employs an eSIM, ie: an embedded SIM card.
  • the eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.
  • recording is directly started without cross-device communication between multiple devices, so as to solve the problem of frame loss of voice control in the multi-device scenario, and improve the accuracy of voice control.
  • one or more electronic devices are selected from the multiple electronic devices as the optimal audio pickup device. Based on the recording data of the optimal radio equipment, it responds to the voice command input by the user. Through the selection of the optimal radio equipment, choose to satisfy the clearest pickup (closest to the user), the least noise interference (farthest away from the noise source), or the best SE processing effect (the best microphone noise reduction performance or support AEC) At least one of the electronic devices is used as a voice pickup entrance for the voice assistant to call, which can effectively solve the problem of the influence of the audio quality of the voice commands picked up by the electronic device on the ASR recognition accuracy.
  • the device information may include, but is not limited to, static attribute information or dynamic attribute information of the electronic device.
  • the static attribute information may include, but is not limited to, device model, system version, microphone capability information, and the like.
  • the dynamic attribute information may include, but is not limited to, power information of the electronic device, headphone status information, microphone status information, speaker status information, audio quality information of the recording data, and the like.
  • the speaker status information may be used to indicate whether the speaker of the electronic device is occupied.
  • the audio quality information is used to indicate whether the audio quality of the recorded data is good or bad.
  • the specific form of the audio quality information may include one or more items such as sound intensity information, noise sound intensity information, and signal-to-noise ratio information.
  • FIG. 3 is a schematic flowchart of a voice control method according to an embodiment of the present application. This embodiment is illustrated by taking the three electronic devices shown in FIG. 1 , a speaker 201 , a TV 202 and a mobile phone 203 as examples. As shown in FIG. 3, the method of this embodiment may include:
  • Step 401 the speaker 201 , the television 202 and the mobile phone 203 respectively receive the first voice instruction input by the user.
  • the first voice instruction is used to wake up the voice assistant of the electronic device.
  • the first voice instruction may be the above-mentioned wake-up word "small E small E".
  • the first voice command is used to wake up the respective voice assistants of the speaker 201 , the television 202 and the mobile phone 203 .
  • the electronic device can monitor whether the user has a voice signal input in real time through the microphone.
  • the electronic device can monitor whether the user has a voice signal input in real time through the microphone.
  • a user wants to use the voice control function of the electronic device, he or she can make a sound within the sound pickup range of the electronic device, so as to input the emitted sound into the microphone.
  • the electronic device can monitor the corresponding voice signal, such as the first voice command, through the microphone.
  • the user when the user wants to use the voice control function, he can say the wake-up word "small E, small E". If the sounding position of the user is located within the respective pickup ranges of the speaker 201, the TV 202 and the mobile phone 203, and no other software or hardware is using the microphone to collect the voice signal, the speaker 201, the TV 202 and the mobile phone 203 can pass their respective voice signals. The microphone detects the first voice instruction corresponding to the wake-up word "small E small E”.
  • Step 402 in response to the first voice command, the speaker 201 , the TV 202 and the mobile phone 203 wake up their respective voice assistants and start recording.
  • the electronic device When the electronic device detects the first voice command, in response to the first voice command, the electronic device wakes up the voice assistant.
  • the first voice command can be checked, that is, it is determined whether the received first voice command is a wake-up word registered in the electronic device. If the verification is passed, it indicates that the received first voice command is a wake-up word, which wakes up the voice assistant. If the verification fails, it indicates that the received first voice command is not a wake-up word, and the electronic device may not wake up the voice assistant at this time, that is, keep the voice assistant in a dormant state.
  • the speaker 201, the TV 202 and the mobile phone 203 when the speaker 201, the TV 202 and the mobile phone 203 detect the first voice command respectively, the speaker 201, the TV 202 and the mobile phone 203 wake up their respective voice assistants and start recording. After the speaker 201, the TV 202 and the mobile phone 203 start recording respectively, they can detect whether the user inputs other voice commands through their respective microphones, and when detecting other voice commands input by the user, generate recording data and save them in their own devices.
  • the television set 202 and the mobile phone 203 start recording, they respectively receive the second voice instruction input by the user. For example, take the second voice command spoken by the user as "play song 112222" as an example.
  • the speaker 201, the TV 202 and the mobile phone 203 respectively record the second voice command to generate their own recording data, and the content of the recording data is "play song 112222".
  • the recording data may be recorded every 0.5s to generate the recording data.
  • 0.5 may also be other numerical values, for example, 0.6, 1, etc., which are not described one by one in the embodiments of the present application.
  • the electronic device may further determine audio quality information corresponding to the recorded data according to the recorded data. In other words, the electronic device also evaluates the quality of its own recording data.
  • the audio quality information may include one or more items of sound intensity information, noise sound intensity information, and signal-to-noise ratio information.
  • the speaker 201 , the TV 202 and the mobile phone 203 can respectively perform quality evaluation on the respective recording data, and determine the audio quality information corresponding to the respective recording data.
  • Step 403 the speaker 201 , the TV 202 and the mobile phone 203 respectively execute the selection of the answering device, determine the answering device, and the answering device plays the answering voice corresponding to the first voice command.
  • step 402 and step 403 is not limited by the size of the serial number, and other execution sequences may also be used. For example, an answering device selection is performed while recording is started.
  • the answering device in this embodiment is used to play the answering voice corresponding to the voice command input by the user.
  • the answering device plays the answer voice corresponding to the first voice command, that is, the wake-up answer voice, such as "I'm here". While other electronic devices that are not used as answering devices wake up the voice assistant, but do not play the answering voice corresponding to the voice command input by the user.
  • the electronic device may select an answering device based on the audio quality information corresponding to the first voice command to determine an answering device.
  • the electronic device can evaluate the quality of the received first voice command, determine the audio quality information corresponding to the first voice command received by itself, and broadcast the audio quality corresponding to the first voice command received by itself. information and its own device information.
  • the electronic device receives audio quality information and its own device information corresponding to the first voice instruction received by itself and broadcast by other electronic devices.
  • the electronic device selects one electronic device as the answering device according to the audio quality information and device information of all the electronic devices. For example, choose the electronic device with the best audio quality as the answering device.
  • the speaker 201 when the speaker 201 detects the first voice command, the speaker 201 can also evaluate the quality of the first voice command, determine the audio quality information corresponding to the first voice command received by the speaker 201, and broadcast the speaker. The audio quality information corresponding to the first voice command received by 201 and the device information of the speaker 201 . Similar processing method, when the TV set 202 detects the first voice command, the TV set 202 can also perform quality evaluation on the first voice command, determine the audio quality information corresponding to the first voice command received by the TV set 202, and broadcast it. The audio quality information corresponding to the first voice command received by the TV set 202 and the device information of the TV set 202 .
  • the mobile phone 203 can also evaluate the quality of the first voice command, determine the audio quality information corresponding to the first voice command received by the mobile phone 203, and broadcast the first voice received by the mobile phone 203.
  • the audio quality information corresponding to the instruction and the device information of the mobile phone 203 are specified.
  • the speaker 201 can receive the audio quality information and device information corresponding to the first voice command of the TV 202 and the mobile phone 203, and the speaker 201 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201, the TV 202 and the mobile phone 203 according to the Device information, select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device.
  • the TV 202 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201 and the mobile phone 203 , and the TV 202 can receive the audio quality information corresponding to the first voice command of the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality Information and device information, select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device.
  • the mobile phone 203 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201 and the TV 202, and the mobile phone 203 can receive the audio quality information and device information corresponding to the first voice command of the speaker 201, the TV 202 and the mobile phone 203 , select an electronic device from the speaker 201, the TV 202 and the mobile phone 203 as the answering device.
  • the speaker 201, the television 202, and the mobile phone 203 are all determined to be the answering device as an exemplary illustration.
  • the speaker 201 acts as an answering device and plays a wake-up answering voice, such as "I am here".
  • the TV set 202 and the mobile phone 203 do not play the wake-up response voice, but the voice assistants of the TV set 202 and the mobile phone 203 are in the wake-up state as described in step 402 above, and can record.
  • the answering device may also be selected in combination with other information, such as the priority of each electronic device.
  • the specific implementation manner of performing the selection of the answering device may also adopt other manners, and this embodiment of the present application does not limit the foregoing manner.
  • the answering device in the last use process of the user or the answering device set by the user may be used as the answering device in this embodiment.
  • Step 404 the speaker 201 calls the voice pickup instruction to the TV set 202 and the mobile phone 203 respectively, and the voice pickup instruction is used to instruct to return the recording data.
  • the speaker 201 starts to perform the distributed sound collection task.
  • the answering device can respectively call the pickup instruction to other non-answering devices, and the pickup instruction is used to instruct the non-answering device to return the recording data to the answering device.
  • the voice assistant of the speaker 201 can call the interface between the voice assistant of the television 202 and the voice assistant of the speaker 201 to transmit the voice pickup instruction to the television 202 .
  • the voice assistant of the speaker 201 can call the interface between the voice assistant of the mobile phone 203 and the voice assistant of the speaker 201 to transmit a voice pickup instruction to the speaker 201 .
  • the pickup instruction may carry the identification information of the answering device.
  • the identification information of the answering device may be a media access control (media access control, MAC) address of the answering device.
  • the voice pickup instruction may carry the identification information of the speaker 201 to instruct the television 202 to return the recording data to the speaker 201 .
  • Step 405 the television 202 and the mobile phone 203 respectively send the recording data to the speaker 201 .
  • the answering device receives recorded data sent by other non-answering devices. After other non-answering devices send their own recording data, they can continue recording and send new recording data to the answering device.
  • the television 202 sends the audio recording data of the television 202 to the speaker 201 .
  • the mobile phone 203 sends the recording data of the mobile phone 203 to the speaker 201 .
  • the recorded data may include the above-mentioned second voice instruction.
  • the content of the recording data is "play song 112222".
  • the speaker 201 performs quality evaluation on the received recording data of the TV set 202 , and determines the audio quality information corresponding to the recording data of the TV set 202 .
  • the speaker 201 performs quality evaluation on the received recording data of the mobile phone 203 , and determines the audio quality information corresponding to the recording data of the mobile phone 203 .
  • the speaker 201 may also receive audio quality information corresponding to the recording data of the television set 202 sent by the television set 202 .
  • the speaker 201 can also receive audio quality information corresponding to the recording data of the mobile phone 203 sent by the mobile phone 203 .
  • Step 406 the speaker 201 determines the optimal radio device in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information, and plays the response voice corresponding to the second voice command according to the recording data of the optimal radio device.
  • the answering device selects an optimal radio device from multiple electronic devices according to the audio quality information corresponding to the recording data of multiple electronic devices (including itself and other non-responding devices), and uses the recording data of the optimal radio device to perform SE. , ASR, etc., to correctly identify the voice command input by the user, and then accurately respond to the voice command input by the user.
  • the accurate response to the voice command input by the user includes playing the response voice corresponding to the voice command input by the user.
  • the accurate response to the voice command input by the user may further include triggering the answering device or other non-responding device to execute an event corresponding to the voice command. The event could be playing a song, playing a video, making a call, etc.
  • the speaker 201 may also send the recording data of the optimal radio device to the server 204 shown in FIG. 1 , and the server 204 uses the recording data of the optimal device to perform SE, ASR and other processing, so as to correctly recognize the voice command input by the user, and then make an accurate response to the voice command input by the user.
  • the speaker 201 in this embodiment determines the speaker 201 as the optimal sound-receiving device among the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information of the recording data of the speaker 201 , the TV 202 and the mobile phone 203 .
  • the speaker 201 can play the answering voice "Song 112222 will be played for you here".
  • the multimedia resource of the song 112222 can be provided by the server 204 or the mobile phone 203 .
  • the speaker 201 may also play the response voice corresponding to the second voice command according to its own recording data and the recording data of the optimal audio recording device.
  • the speaker 201 can splicing its own recording data and the recording data of the optimal audio-receiving device, and plays the response voice corresponding to the second voice command based on the spliced recording data.
  • steps 404 to 406 may also be performed again to process the new recording data in a similar manner, so as to correctly identify the new voice command input by the user, and then perform the processing on the new voice command input by the user. Voice commands for accurate responses.
  • the voice control method of the embodiment of the present application may further process the new recording data through the following steps.
  • Step 407 the speaker 201 sends a stop recording instruction to the TV 202 and the mobile phone 203 respectively.
  • the answering device sends a stop recording instruction to other non-answering devices, and the stop recording instruction is used to instruct to stop recording and discard the recording data.
  • step 408 the television 202 and the mobile phone 203 respectively stop recording, and discard the recording data.
  • Non-answering devices stop recording based on the stop recording command to reduce power consumption.
  • the speaker 201 sends a stop recording instruction to the TV 202 and the mobile phone 203 respectively.
  • the television set 202 and the mobile phone 203 respectively stop recording and discard the recording data.
  • the recorded data corresponding to the second voice instruction is discarded.
  • the speaker 201 receives a new voice command input by the user.
  • the speaker 201 records the third voice instruction to generate recording data, and the content of the recording data is "change a song”.
  • the speaker 201 uses the recorded data to perform processing such as SE, ASR, etc., so as to correctly recognize the voice command input by the user, and then accurately respond to the voice command input by the user.
  • the speaker 201 can play the response voice "OK, switch songs for you", and play the switched songs.
  • the answering device and the optimal radio device are both the speaker 201 as an example for illustration.
  • the answering device and the optimal radio device may be the same device or different devices.
  • the answering device is a speaker.
  • the optimal radio device is a television set 202, and the embodiments of the present application are not limited by the above examples.
  • the answering device and the optimal radio device are different devices, the answering device can call the recording data of the optimal radio device.
  • the answering device when the voice command received by the answering device is used to turn off the voice assistant, the answering device can stop calling the recording data of other non-answering devices, and then stop its own distributed voice recording task, and discard the recorded data.
  • the multiple electronic devices when multiple electronic devices respectively receive the first voice command input by the user, the multiple electronic devices wake up their respective voice assistants and start recording, and the first voice command is used to wake up the voice assistant of the electronic device. .
  • the answering device can determine the optimal radio device according to the recording data of each electronic device, and play the response voice corresponding to the second voice command according to the recording data of the optimal radio device.
  • this embodiment realizes a decentralized collaborative recording method by directly starting the recording after waking up from the electronic device, and no longer relying on the central device to call.
  • the recording Before the answering device is determined, the recording has been started, and the recording data is used for SE, ASR and other processing, which effectively eliminates the communication delay between devices, thereby solving the problem of frame loss in voice control caused by delay in multi-device scenarios.
  • the voice commands input by the user can be correctly recognized, and then the voice commands input by the user can be accurately responded to, and the accuracy of voice control can be improved.
  • the audio recording can be started in advance, and the electronic device can evaluate the quality of its own recording data, which can speed up the audio evaluation of the electronic device and shorten the time required for subsequent decision-making on the optimal radio device. , to speed up the processing flow of the voice control method and improve the response speed of the voice control.
  • FIG. 3 uses the wake-up word to wake up the voice assistant and start recording as an example for illustration.
  • the embodiment of the present application is not limited by this.
  • the embodiment of the present application may also not have the above wake-up process.
  • the method triggers the recording of the electronic device, and based on the multi-device collaborative radio, the accuracy of the voice control is improved.
  • the other manner may be that the electronic device detects a human voice, or the electronic device detects the voice of a specific user, etc., which are not described one by one in the embodiments of the present application.
  • the specific implementation of the voice control method without the above wake-up process triggering the recording of the electronic device is similar to the embodiment shown in FIG. 3 .
  • the answering device calls the voice pickup instruction, the non-responding device returns the recorded data, and the answering device returns the recording data.
  • the optimal radio device is determined, and the response voice corresponding to the second voice command is played according to the recorded data of the optimal radio device.
  • FIG. 6 is a schematic flowchart of another voice control method provided by an embodiment of the present application. This embodiment is illustrated by taking the three electronic devices shown in FIG. 1 , a speaker 201 , a television 202 and a mobile phone 203 , and the answering device being the speaker 201 as an example. This embodiment is not the first invocation after the electronic device wakes up, for example, the second invocation, the third invocation, and the fourth invocation of the multi-round dialogue of the voice assistant. As shown in FIG. 6 , the method of this embodiment may include:
  • Step 701 the speaker 201 respectively invokes a multi-round dialogue pause instruction to the TV set 202 and the mobile phone 203 , and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue pause instruction to temporarily stop.
  • the answering device does not detect a new voice command input by the user within a preset time period, that is, there is a time interval between voice commands input by the user.
  • the answering device detects this time interval and triggers multiple rounds of dialogue pause operations.
  • the answering device may respectively call other non-answering devices a multi-round dialogue pause instruction, where the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop.
  • the voice assistant of the speaker 201 may invoke the interface between the voice assistant of the television 202 and the voice assistant of the speaker 201 to transmit a multi-round dialogue temporary stop instruction to the television 202 .
  • the voice assistant of the speaker box 201 can call the interface between the voice assistant of the mobile phone 203 and the voice assistant of the speaker box 201 , so as to transmit to the speaker box 201 an instruction to temporarily stop multiple rounds of conversations.
  • the speaker 201 deletes the previously saved recording data and continues to keep the recording.
  • step 702 the television 202 and the mobile phone 203 respectively delete the recorded recording data and keep the recording respectively.
  • the television set 202 and the mobile phone 203 respectively delete the recording data before invoking the multi-round dialogue pause instruction, and continue to keep the recording.
  • Step 703 the speaker 201 , the television 202 and the mobile phone 203 respectively receive the fourth voice command input by the user, and record the fourth voice command respectively to generate respective recording data.
  • the speaker 201 , the TV 202 and the mobile phone 203 may further perform quality evaluation on the respective received recording data, and determine the audio quality information corresponding to the respective received recording data.
  • the fourth voice command spoken by the user may be “play movie 333333” as an example.
  • the speaker 201 , the TV 202 and the mobile phone 203 respectively record the fourth voice command to generate respective recording data, and the content of the recording data is "play movie 333333".
  • step 704 the speaker 201 calls the voice pickup instruction to the TV set 202 and the mobile phone 203 respectively, and the voice pickup instruction is used to instruct to return the recording data.
  • the answering device can respectively call the pickup instruction to other non-answering devices, and the pickup instruction is used to instruct the non-answering device to return the recording data to the answering device.
  • Step 705 the television 202 and the mobile phone 203 respectively send the recording data to the speaker 201 .
  • the television 202 sends the audio recording data of the television 202 to the speaker 201 .
  • the mobile phone 203 sends the recording data of the mobile phone 203 to the speaker 201 .
  • the content of the audio recording data is "play movie 333333".
  • Step 706 the speaker 201 determines the optimal radio device in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information, and responds to the fourth voice command according to the recording data of the optimal radio device.
  • the answering device selects an optimal radio device from multiple electronic devices according to the audio quality information corresponding to the recording data of multiple electronic devices (including itself and other non-responding devices), and uses the recording data of the optimal radio device to perform SE. , ASR, etc., to correctly identify the voice command input by the user, and then accurately respond to the voice command input by the user.
  • the accurate response to the voice command input by the user includes playing the response voice corresponding to the voice command input by the user.
  • the accurate response to the voice command input by the user may further include triggering the answering device or other non-responding device to execute an event corresponding to the voice command.
  • the event can be playing a song, playing a video, making a call, etc.
  • the speaker 201 of this embodiment determines that the optimal sound-receiving device is the speaker in the speaker 201 , the TV 202 and the mobile phone 203 according to the audio quality information of the recording data of the speaker 201 , the TV 202 and the mobile phone 203 .
  • the speaker 201 can play the response voice "The movie 333333 will be played on the TV", and the TV 202 starts to play the movie 333333.
  • the optimal radio equipment can change.
  • the fifth voice command spoken by the user may be “sound small” as an example.
  • the speaker 201 , the TV 202 and the mobile phone 203 respectively record the fifth voice command to generate their respective recording data, and the content of the recording data is the "sound point”.
  • the TV 202 is determined as the optimal sound-receiving device among the speakers 201 , the TV 202 and the mobile phone 203 .
  • the speaker 201 can respond to the fifth voice command based on the recording data of the TV set 202 .
  • different devices can be selected for sound recording according to the recording effect. For example, after the TV 202 starts to play a movie, strong self-noise (such as the sound produced during movie playback) occurs in the user's home, and the voice assistant of the speaker 201 will also be mixed into the statement played by the TV. If the sound of the speaker 201 is used The recorded data will cause ASR recognition errors.
  • the voice control method of this embodiment can improve the accuracy of ASR recognition by dynamically calling the TV to perform radio recording and complete echo cancellation, thereby accurately responding to the voice commands input by the user, and improving the accuracy of voice control. Rate.
  • the above-mentioned embodiments shown in FIG. 3 and FIG. 6 are illustrated by taking the answering device selecting the optimal radio device according to the audio quality information, and responding to the second voice command according to the recording data of the optimal radio device as an example.
  • the answering device directly responds to the second voice instruction according to the received recording data, or according to the received recording data and its own recording data.
  • the specific implementation manner of responding to the second voice command may be that the answering device splices the audio content information of the received recording data and the audio content information of its own recording data, based on The spliced audio content information responds to the second voice command.
  • the user speaks the voice signal "play song 112222”
  • the answering device only recognizes the voice signal "2222”
  • the audio content information of the recording data of the answering device is used to represent the voice signal "2222”
  • the answering device receives the recording of other devices
  • the audio content information of the data is used to represent the voice signal "play song 112”
  • the answering device can splicing the two to obtain the spliced audio content information
  • the spliced audio content information is used to represent the voice signal "play song 112222".
  • FIG. 8 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present application.
  • the apparatus can be applied to an electronic device of a voice control system (such as the above-mentioned first electronic device 201 ), and the voice control system can also include at least a second electronic device (such as the second electronic device 202 or the third electronic device 202 ).
  • device 203 the apparatus may include: a transceiver module 81 and a processing module 82 .
  • the transceiver module 81 may specifically be the mobile communication module 150 and/or the wireless communication module 160 in the embodiment shown in FIG. 2 .
  • the processing module 82 may be the processor 110 of the embodiment shown in FIG. 2 .
  • the transceiver module 81 is used for receiving the first voice command input by the user, and the processing module 82 is used for responding to the first voice command.
  • the transceiver module 81 is further configured to receive the recording data of the second electronic device sent by the second electronic device, where the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice instruction input by the user.
  • the processing module 82 is further configured to respond to the second voice command according to the recorded data of the first electronic device and/or the recorded data of the second electronic device, and the recorded data of the first electronic device includes the second voice input by the user recorded by the first electronic device The recorded data of the command.
  • the transceiver module 81 is further configured to call a voice pickup instruction to the second electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
  • the processing module 82 is further configured to record when or after the first electronic device receives the first voice instruction input by the user, and the recording is used to record the second voice instruction input by the user.
  • the first voice command is used to wake up a voice control function of the first electronic device and/or the second electronic device.
  • the processing module 82 is further configured to determine the first electronic device according to the audio quality information of the first voice command received by the first electronic device and the audio quality information of the first voice command received by the second electronic device Answering device for voice control system.
  • the processing module 82 is further configured to, after the first electronic device responds to the first voice command, before recording the second voice command input by the user, during the recording process of the first electronic device, within a preset time period If the second voice command input by the user is not detected, the saved recording data will be deleted, and the recording will continue.
  • the transceiver module 81 is further configured to call a multi-round dialogue pause instruction to the second electronic device, and the multi-round dialogue pause instruction is used to instruct the multi-round dialogue to temporarily stop.
  • the transceiver module 81 is further configured to receive audio quality information of the recording data of the second electronic device sent by the second electronic device.
  • the processing module 82 is configured to determine the optimal radio device from the voice control system according to the audio quality information of the audio recording data of the first electronic device and the audio quality information of the audio recording data of the second electronic device.
  • the optimal sound-receiving device is the first electronic device
  • the second voice command is responded to according to the recording data of the first electronic device.
  • the optimal sound-receiving device is the second electronic device
  • the second voice command is answered according to the recording data of the second electronic device, or according to the recording data of the second electronic device and the recording data of the first electronic device.
  • the audio quality information is used to indicate the audio quality of the recording data.
  • the processing module 82 is configured to respond to the second voice instruction according to the audio content information of the audio recording data of the first electronic device and/or the audio content information of the audio recording data of the second electronic device.
  • the audio content information is used to represent the audio content of the recording data.
  • the voice control apparatus in this embodiment of the present application can be used to execute the steps of the answering device (eg, speaker 201 ) in the above method embodiment, and its technical principle and technical effect can be found in the explanation of the above method embodiment, which will not be repeated here.
  • the answering device eg, speaker 201
  • FIG. 9 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present application.
  • the apparatus can be applied to an electronic device (such as a second electronic device 202 or a third electronic device 203 ) of a voice control system, and the voice control system can also include at least a first electronic device (such as a first electronic device) 201), the apparatus may include: a transceiver module 91 and a processing module 92.
  • the transceiver module 91 may specifically be the mobile communication module 150 and/or the wireless communication module 160 in the embodiment shown in FIG. 2 .
  • the processing module 92 may be the processor 110 of the embodiment shown in FIG. 2 .
  • the processing module 92 is used for recording and saving the recording data, and the recording is used for recording the second voice instruction input by the user.
  • the transceiver module 91 is used for sending the recording data of the second electronic device to the first electronic device, the recording data of the second electronic device includes the recording data of the second electronic device recording the second voice command input by the user, and the recording data is used for the first electronic device. After responding to the first voice command, the device responds to the second voice command.
  • the transceiver module 91 is further configured to receive a voice pickup instruction called by the first electronic device, and the voice pickup instruction is used for the second electronic device to return the recording data of the second electronic device.
  • the processing module 92 is configured to record when or after the second electronic device receives the first voice instruction input by the user.
  • the processing module 92 is further configured to determine the first electronic device according to the audio quality information of the first voice command received by the second electronic device and the audio quality information of the first voice command received by the first electronic device.
  • the device is the answering device of the voice control system.
  • the processing module 92 is further configured to, after the first electronic device responds to the first voice command, during the recording process of the second electronic device, receive through the transceiver module 91 to invoke multiple rounds of dialogue pause commands from the second electronic device, The multi-round dialogue pause command is used to instruct the multi-round dialogue to temporarily stop.
  • the processing module 92 is also used to delete the saved recording data and continue recording.
  • the transceiver module 91 is further configured to send the audio quality information of the recording data of the second electronic device to the first electronic device.
  • the voice control apparatus in this embodiment of the present application can be used to perform the steps of any non-response device (such as a TV 202 or a mobile phone 203 ) in the above method embodiments.
  • any non-response device such as a TV 202 or a mobile phone 203
  • the electronic device may include: a microphone 1001 , one or more processors 1002 ; one or more memories 1003 ; the above devices may be connected through one or more communication buses 1005 .
  • the above-mentioned memory 1003 stores one or more computer programs 1004, one or more processors 1002 are used to execute one or more computer programs 1004, and the one or more computer programs 1004 include instructions, and the above-mentioned instructions can be used to execute the above-mentioned Each step performed by any electronic device in the method embodiment.
  • the electronic device may be any of the above-mentioned electronic devices, for example, a smart phone, a smart watch, and the like.
  • the electronic device shown in FIG. 10 may also include other devices such as a display screen, which is not limited in this embodiment of the present application. When it includes other devices, it may specifically be the electronic device shown in FIG. 2 .
  • the electronic device in this embodiment of the present application can be used to execute the steps of the electronic device in any of the above method embodiments, and the technical principles and technical effects of the electronic device can be referred to the explanations of the above method embodiments, which will not be repeated here.
  • inventions of the embodiments of the present application further provide a computer storage medium, where the computer storage medium may include computer instructions, when the computer instructions are executed on the electronic device, the electronic device is made to perform the execution of the electronic device in the foregoing method embodiments. each step.
  • inventions of the embodiments of the present application further provide a computer program product, which, when the computer program product runs on a computer, enables the computer to perform each step performed by the electronic device in the foregoing method embodiments.
  • An embodiment of the present application further provides a voice control system
  • the voice control system may at least include: a first electronic device and a second electronic device, wherein the first electronic device may adopt the structure of the embodiment shown in FIG. 8 or FIG. 10 ,
  • the second electronic device may adopt the structure of the embodiment shown in FIG. 9 or FIG. 10 , and correspondingly, may implement the technical solutions of any of the above method embodiments, and the implementation principles and technical effects thereof are similar, and will not be repeated here.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the processor mentioned in the above embodiments may be an integrated circuit chip, which has signal processing capability.
  • each step of the above method embodiment may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the methods disclosed in the embodiments of the present application may be directly embodied as executed by a hardware encoding processor, or executed by a combination of hardware and software modules in the encoding processor.
  • the software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Selective Calling Equipment (AREA)
  • Telephone Function (AREA)

Abstract

Procédé de commande vocale et dispositif électronique. Le procédé de commande vocale est appliqué à un système de commande vocale, le système de commande vocale comprenant au moins un premier dispositif électronique et un second dispositif électronique, qui ont une fonction de commande vocale. Le procédé de commande vocale comprend les étapes suivantes : un premier dispositif électronique et un second dispositif électronique reçoivent respectivement une première instruction vocale entrée par un utilisateur, et le premier dispositif électronique répond à la première instruction vocale ; le second dispositif électronique effectue un enregistrement et stocke des données d'enregistrement, l'enregistrement étant utilisé pour enregistrer une seconde instruction vocale entrée par l'utilisateur ; le second dispositif électronique envoie les données d'enregistrement du second dispositif électronique au premier dispositif électronique ; le premier dispositif électronique répond à la seconde instruction vocale en fonction de données d'enregistrement du premier dispositif électronique et/ou des données d'enregistrement du second dispositif électronique, les données d'enregistrement du premier dispositif électronique comprenant des données d'enregistrement du moment où le premier dispositif électronique enregistre une seconde instruction vocale entrée par l'utilisateur. Au moyen du procédé, le problème de reconnaissance erronée d'une commande vocale dans un scénario multi-dispositif peut être résolu, ce qui permet d'améliorer la précision d'une commande vocale.
PCT/CN2021/142083 2021-01-29 2021-12-28 Procédé de commande vocale et dispositif électronique WO2022161077A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110130831.0 2021-01-29
CN202110130831.0A CN114822525A (zh) 2021-01-29 2021-01-29 语音控制方法和电子设备

Publications (1)

Publication Number Publication Date
WO2022161077A1 true WO2022161077A1 (fr) 2022-08-04

Family

ID=82526078

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142083 WO2022161077A1 (fr) 2021-01-29 2021-12-28 Procédé de commande vocale et dispositif électronique

Country Status (2)

Country Link
CN (1) CN114822525A (fr)
WO (1) WO2022161077A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117873418A (zh) * 2022-10-11 2024-04-12 华为技术有限公司 一种录制控制方法、电子设备及介质
CN116682465A (zh) * 2022-10-31 2023-09-01 荣耀终端有限公司 记录内容的方法和电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148615A1 (en) * 2014-11-26 2016-05-26 Samsung Electronics Co., Ltd. Method and electronic device for voice recognition
CN107622652A (zh) * 2016-07-15 2018-01-23 青岛海尔智能技术研发有限公司 家电系统的语音控制方法与家电控制系统
CN108228699A (zh) * 2016-12-22 2018-06-29 谷歌有限责任公司 协作性语音控制装置
US20180228006A1 (en) * 2017-02-07 2018-08-09 Lutron Electronics Co., Inc. Audio-Based Load Control System
CN111326151A (zh) * 2018-12-14 2020-06-23 上海诺基亚贝尔股份有限公司 用于语音交互的设备、方法及计算机可读介质
CN111369994A (zh) * 2020-03-16 2020-07-03 维沃移动通信有限公司 语音处理方法及电子设备
CN112002319A (zh) * 2020-08-05 2020-11-27 海尔优家智能科技(北京)有限公司 智能设备的语音识别方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148615A1 (en) * 2014-11-26 2016-05-26 Samsung Electronics Co., Ltd. Method and electronic device for voice recognition
CN107622652A (zh) * 2016-07-15 2018-01-23 青岛海尔智能技术研发有限公司 家电系统的语音控制方法与家电控制系统
CN108228699A (zh) * 2016-12-22 2018-06-29 谷歌有限责任公司 协作性语音控制装置
US20180228006A1 (en) * 2017-02-07 2018-08-09 Lutron Electronics Co., Inc. Audio-Based Load Control System
CN111326151A (zh) * 2018-12-14 2020-06-23 上海诺基亚贝尔股份有限公司 用于语音交互的设备、方法及计算机可读介质
CN111369994A (zh) * 2020-03-16 2020-07-03 维沃移动通信有限公司 语音处理方法及电子设备
CN112002319A (zh) * 2020-08-05 2020-11-27 海尔优家智能科技(北京)有限公司 智能设备的语音识别方法及装置

Also Published As

Publication number Publication date
CN114822525A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
WO2021000876A1 (fr) Procédé de commande vocale, équipement électronique et système
US11843716B2 (en) Translation method and electronic device
WO2021052282A1 (fr) Procédé de traitement de données, module bluetooth, dispositif électronique et support d'enregistrement lisible
JP2022541207A (ja) 音声起動方法及び電子デバイス
CN111369988A (zh) 一种语音唤醒方法及电子设备
EP3826280B1 (fr) Procédé de génération d'instruction de commande vocale et terminal
WO2020073288A1 (fr) Procédé de déclenchement de dispositif électronique permettant d'exécuter une fonction, et dispositif électronique associé
CN112119641B (zh) 通过转发模式连接的多tws耳机实现自动翻译的方法及装置
WO2021052139A1 (fr) Procédé d'entrée de geste et dispositif électronique
WO2021000817A1 (fr) Procédé et dispositif de traitement de son ambiant
WO2022161077A1 (fr) Procédé de commande vocale et dispositif électronique
CN115589051B (zh) 充电方法和终端设备
CN113728295A (zh) 控屏方法、装置、设备及存储介质
CN113921002A (zh) 一种设备控制方法及相关装置
WO2022022319A1 (fr) Procédé et système de traitement d'image, dispositif électronique et système de puce
WO2020051852A1 (fr) Procédé d'enregistrement et d'affichage d'informations dans un processus de communication, et terminaux
WO2020078267A1 (fr) Procédé et dispositif de traitement de données vocales dans un processus de traduction en ligne
US20240178771A1 (en) Method and apparatus for adjusting vibration waveform of linear motor
CN114120987B (zh) 一种语音唤醒方法、电子设备及芯片系统
WO2022007757A1 (fr) Procédé d'enregistrement d'empreinte vocale inter-appareils, dispositif électronique et support de stockage
CN113380240B (zh) 语音交互方法和电子设备
CN115731923A (zh) 命令词响应方法、控制设备及装置
WO2023216922A1 (fr) Procédé d'identification de sélection de dispositif cible, dispositif terminal, système et support de stockage
WO2022143048A1 (fr) Procédé et appareil de gestion de tâches de dialogue et dispositif électronique
WO2024055881A1 (fr) Procédé de synchronisation d'horloge, dispositif électronique, système, et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922666

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922666

Country of ref document: EP

Kind code of ref document: A1