WO2024103926A1 - 语音控制方法、装置、存储介质以及电子设备 - Google Patents

语音控制方法、装置、存储介质以及电子设备 Download PDF

Info

Publication number
WO2024103926A1
WO2024103926A1 PCT/CN2023/117319 CN2023117319W WO2024103926A1 WO 2024103926 A1 WO2024103926 A1 WO 2024103926A1 CN 2023117319 W CN2023117319 W CN 2023117319W WO 2024103926 A1 WO2024103926 A1 WO 2024103926A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
current
state
preset
interaction
Prior art date
Application number
PCT/CN2023/117319
Other languages
English (en)
French (fr)
Inventor
印亚兵
张昊旻
严锋贵
高烨
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2024103926A1 publication Critical patent/WO2024103926A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present application relates to the field of voice control technology, and in particular to a voice control method, device, storage medium and electronic device.
  • Screen capture is a basic function of many terminals.
  • the display content on the terminal screen can be captured and saved in the form of a picture to obtain a screenshot of the terminal interface.
  • the picture obtained through this screenshot function is a screenshot of the entire interface on the entire screen.
  • Embodiments of the present application provide a voice control method, device, storage medium, and electronic device.
  • an embodiment of the present application provides a voice control method, the method comprising:
  • the candidate device If the second device status information sent by the candidate device is received, it is determined whether the current device performs voice interaction according to the first preset device status and the second preset device status corresponding to the second device status information.
  • an embodiment of the present application provides a voice control method, the method comprising:
  • the current master device If the current master device is in the first preset device state and receives the second device state information sent by the slave device, determine the target interactive device for voice interaction from the current master device and the slave device according to the first device state information corresponding to the first preset device state and the second device state information;
  • the target interactive device is controlled to perform voice interaction based on the interactive instruction.
  • an embodiment of the present application provides a voice control method, the method comprising:
  • the current slave device is controlled to perform voice interaction, wherein the interaction instruction is generated by the master device according to the first device status information of the master device, the second device status information of the current slave device and the second device status information of other slave devices.
  • an embodiment of the present application provides a voice control device, the device comprising:
  • a voice wake-up module is used to determine whether the current device is in a first preset device state when detecting that the user's voice meets the voice wake-up condition;
  • a device state sending module configured to send first device state information corresponding to the first preset device state to a candidate device if the current device is in the first preset device state, the candidate device being in the same multi-device scenario as the current device;
  • the voice interaction determination module is used to determine whether the current device performs voice interaction according to the first preset device state and the second preset device state corresponding to the second device state information if the second device state information sent by the candidate device is received.
  • an embodiment of the present application provides a voice control device, the device comprising:
  • the main device voice wake-up module is used to determine whether the current main device is in the first preset device state when the user voice meets the voice wake-up condition;
  • a master device voice interaction determination module configured to determine a target interaction device for voice interaction from the current master device and the slave device according to the first device status information corresponding to the first preset device status and the second device status information if the current master device is in the first preset device status and receives the second device status information sent by the slave device;
  • the instruction control module is used to control the target interactive device to perform voice interaction based on the interactive instruction.
  • an embodiment of the present application provides a voice control device, the device comprising:
  • a slave device voice wake-up module used to determine whether the current slave device is in a second preset device state when detecting that the user voice meets the voice wake-up condition
  • a slave device state sending module configured to send the second device state information of the current device to the master device if the current slave device is in the second preset device state;
  • the slave device voice interaction module is used to control the current slave device to perform voice interaction if an interaction instruction sent by the master device is received, wherein the interaction instruction is generated by the master device according to the first device status information of the master device, the second device status information of the current slave device and the second device status information of other slave devices.
  • an embodiment of the present application provides a computer storage medium, wherein the computer storage medium stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor and executing the steps of the above-mentioned method.
  • an embodiment of the present application provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • FIG1 is a device interaction method in the related art provided by an embodiment of the present application.
  • FIG2 is an exemplary system architecture diagram of a voice control method provided in an embodiment of the present application.
  • FIG3 is a flow chart of a voice control method provided in an embodiment of the present application.
  • FIG4 is a device interaction method provided by an embodiment of the present application.
  • FIG5 is a flow chart of a voice control method provided in another embodiment of the present application.
  • FIG6 is a flow chart of a voice control method provided in another embodiment of the present application.
  • FIG7 is a structural block diagram of a voice control device provided by another embodiment of the present application.
  • FIG8 is a flow chart of a voice control method provided in another embodiment of the present application.
  • FIG9 is a structural block diagram of a voice control device provided by another embodiment of the present application.
  • FIG10 is a flow chart of a voice control method provided in another embodiment of the present application.
  • FIG11 is a structural block diagram of a voice control device provided by another embodiment of the present application.
  • FIG. 12 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • the information including but not limited to user device information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • signals involved in the embodiments of this application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with relevant laws, regulations and standards of relevant countries and regions.
  • the object features, interactive behavior features and user information involved in this application are all obtained with full authorization.
  • Voice assistant is an important application of artificial intelligence in electronic devices. Electronic devices can have intelligent conversations and intelligent interactions with users through voice assistants. It can also recognize the voice commands entered by the user and trigger the electronic device to automatically execute the event corresponding to the voice command. Usually, the voice assistant is in a dormant state, and the user can voice wake up the voice assistant before using the voice assistant. Only after the voice assistant is awakened can the voice command entered by the user be received and recognized. The voice data used for awakening can be called a wake-up word.
  • the voice assistant can be awakened based on the wake-up word "Xiaobu Xiaobu", and then the electronic device can recognize the voice command using the voice assistant, and trigger the electronic device to query the weather in place A, and broadcast the weather in place A to the user through voice or text.
  • the voice control function can be realized by installing a voice assistant in the home appliance.
  • the user's environment such as the user's home
  • the voice assistants of the devices with the same wake-up word will all be awakened, and will recognize and respond to the voice commands subsequently spoken by the user.
  • FIG. 1 is a device interaction method in the related technology provided in an embodiment of the present application.
  • the user's living room is used as a multi-device scene, where there are four devices in the user's living room: a speaker 101, a TV 102, a mobile phone 103, and a wearable watch 104. These four devices are all equipped with voice assistants, and the wake-up word is "Xiaobu Xiaobu”. Then, when the user speaks a voice control command containing the wake-up word "Xiaobu Xiaobu", the voice assistants of the speaker 101, the TV 102, the mobile phone 103, and the wearable watch 104 will all be awakened and recognize the voice command, and will recognize and respond to the voice command.
  • users may often only need one device to respond. For example, when a user is using a mobile phone and needs to interact with a voice assistant, since interacting with the mobile phone is more convenient, the user often hopes that the voice assistant in the mobile phone can be awakened and respond to the user's control commands for voice interaction. If multiple devices respond at the same time, the user experience is poor.
  • the current device when it is first detected that the user's voice meets the voice wake-up condition, it is determined whether the current device is in the first preset device state; then if the current device is in the first preset device state, the first device state information corresponding to the first preset device state is sent to the candidate device, and the candidate device and the current device are in the same multi-device scenario; finally, if the second device state information sent by the candidate device is received, it is determined whether the current device performs voice interaction according to the first preset device state and the second preset device state corresponding to the second device state information.
  • the first preset device state of the current device and the second preset device state of the candidate device can be obtained. Since the device state can represent the user's use of the device, it can be determined according to the device state of each device which device the user wants to use for voice interaction, which effectively improves the accuracy of voice control.
  • FIG. 2 is an exemplary system architecture diagram of a voice control method provided in an embodiment of the present application.
  • the system architecture may include an electronic device 201, a network 202, and a server 203.
  • the network 202 is used to provide a medium for a communication link between the electronic device 201 and the server 203.
  • the network 202 may include various types of wired Communication link or wireless communication link, for example: wired communication link includes optical fiber, twisted pair or coaxial cable, wireless communication link includes Bluetooth communication link, Wireless-Fidelity (Wi-Fi) communication link or microwave communication link, etc.
  • the electronic device 201 can interact with the server 203 through the network 202 to receive messages from the server 203 or send messages to the server 203, or the electronic device 201 can interact with the server 203 through the network 202 to receive messages or data sent by other users to the server 203.
  • the electronic device 201 can be hardware or software. When the electronic device 201 is hardware, it can be various electronic devices, including but not limited to smart watches, smart phones, tablet computers, smart TVs, laptop portable computers, and desktop computers. When the electronic device 201 is software, it can be installed in the electronic devices listed above, which can be implemented as multiple software or software modules (for example: used to provide distributed services), or it can be implemented as a single software or software module, which is not specifically limited here.
  • the server 203 may be a business server that provides various services. It should be noted that the server 203 may be hardware or software. When the server 203 is hardware, it may be implemented as a distributed server cluster consisting of multiple servers, or it may be implemented as a single server. When the server 203 is software, it may be implemented as multiple software or software modules (for example, for providing distributed services), or it may be implemented as a single software or software module, which is not specifically limited herein.
  • the number of electronic devices 201 may be multiple, and multiple electronic devices 201 may be in the same multi-device scenario, and multiple electronic devices 201 in the same multi-device scenario may also be directly connected through the network 202, that is, multiple electronic devices 201 may also directly transmit data based on the network 202. Therefore, the system architecture may not include the server 203.
  • the server 203 may be an optional device in the embodiment of the present specification, that is, the method provided in the embodiment of the present specification may be applied to a system structure including only the electronic device 201, and the embodiment of the present application does not limit this.
  • an electronic device 201 in the system architecture if used as the current device, if the current device detects that the user voice meets the voice wake-up condition, it determines whether the current device is in a first preset device state. If the current device is in the first preset device state, the first device state information corresponding to the first preset device state is sent to the candidate device, and the candidate device and the current device are in the same multi-device scenario; if the second device state information sent by the candidate device is received, it is determined whether the current device performs voice interaction based on the first preset device state and the second preset device state corresponding to the second device state information.
  • FIG. 2 is only illustrative, and any number of electronic devices, networks, and servers may be used according to implementation requirements.
  • FIG. 3 is a flow chart of a voice control method provided in an embodiment of the present application.
  • the execution subject of the embodiment of the present application can be an electronic device that executes voice control, or a processor in an electronic device that executes the voice control method, or a voice control service in an electronic device that executes the voice control method.
  • the specific execution process of the voice control method is introduced below by taking the execution subject being a processor in an electronic device as an example.
  • the voice control method may at least include:
  • the voice control method is mainly used in a multi-device scenario, where there are at least two electronic devices, and each electronic device in the same multi-device scenario belongs to the same device group, and the electronic devices in the same device group have the same device level (that is, the electronic devices in the same device group do not distinguish between subordinate relationships or primary and secondary relationships), and data can be directly transmitted between the electronic devices or data can be transmitted based on server data forwarding.
  • each electronic device can be connected to the same wireless access point (such as a WiFi access point), logged in to the same user account, etc., so that each electronic device is in the same multi-device scenario.
  • each electronic device in the same multi-device scenario is provided with a program similar to a voice assistant, which can monitor the user voices emitted by users around the electronic device in real time based on the voice data collected by the microphone, and determine whether the user needs to perform voice interaction.
  • the voice wake-up condition can be that the user's voice includes a preset wake-up word and/or the voiceprint corresponding to the user's voice is a preset voiceprint. Therefore, after the voice assistant collects the user's voice through the microphone, it can be woken up based on the user's voice.
  • Word detection and/or voiceprint detection when the user's voice includes a preset wake-up word and/or the voiceprint corresponding to the user's voice is a preset voiceprint, it can be considered that the user's voice is detected to meet the voice wake-up condition.
  • the voice wake-up condition can also be that the electronic device is in a preset state. For example, if the electronic device is a smart watch, in order to reduce power consumption, the smart watch is in the off state most of the time. If the smart watch is in the on state, it can be determined that the smart watch meets the voice wake-up condition.
  • the device state of the electronic device When a user uses an electronic device or performs related operations on the electronic device, the device state of the electronic device will change.
  • the device state may refer to static or dynamic states such as device placement state, screen lighting state, standby state, video playing state, etc. Therefore, the device state of the electronic device is associated with the user's operation. Therefore, when the user needs to perform voice interaction in a multi-device scenario, he often hopes that the electronic device he is operating will respond.
  • each electronic device in the same multi-device scenario detects that the user voice meets the voice wake-up condition, it can first determine whether its own device is in a preset device state.
  • the preset device states corresponding to different electronic devices can be set in advance according to the device type of each electronic device. Then, if an electronic device is in the preset device state, it means that the electronic device may be being operated or used by the user, and the possibility that the user needs to interact with the electronic device is greater.
  • the preset device state corresponding to the current device is determined as the first preset device state.
  • the current device if the current device detects that the user's voice meets the voice wake-up condition, it can determine whether the current device is in the first preset device state.
  • the method of determining whether the current device is in the first preset device state may not be limited. For example, data collected by a preset sensor in the current device may be obtained to determine whether the current device is in the first preset device state.
  • the current device may be a device that is being used or operated by the user. However, since there are multiple electronic devices in the multi-device scene, there may be other electronic devices in the multi-device scene that are also in the preset device state. In order to facilitate the determination of the electronic device with which the user wants to perform voice interaction from multiple electronic devices in the preset device state, the electronic devices in the same multi-device scene can synchronize the status information corresponding to their own preset device state to other electronic devices after determining that they are in the preset device state.
  • the current device can obtain the first device status information representing the first preset device status, and send the first device status to at least one candidate device in the same multi-device scene as the current device, wherein the candidate device can be all electronic devices in the same multi-device scene as the current device, or can be a user-specified electronic device in the same multi-device scene as the current device.
  • S306 If the second device status information sent by the candidate device is received, determine whether the current device performs voice interaction according to the first preset device status and the second preset device status corresponding to the second device status information.
  • the current device can wait for a first preset time. If the second device state information sent by at least one candidate device is received within the first preset time, it means that there may be other electronic devices in the multi-device scenario that are also in the preset device state.
  • the second preset device status corresponding to each candidate device can be determined according to the second device status information, and then the first preset device status is compared with each second preset device status to determine whether the current device performs voice interaction.
  • the method of comparing the first preset device status with each second preset device status is not limited, and the comparison can be based on the rules set by the user or the electronic device when it leaves the factory to determine the comparison result. If it is determined that the current device performs voice interaction, then To analyze the voice control command corresponding to the user's voice and respond to the voice control command.
  • an electronic device for performing voice interaction can be determined from multiple electronic devices in the multi-device scenario, thereby improving the user experience when performing voice interaction.
  • FIG. 4 is a device interaction method provided in an embodiment of the present application.
  • the user's living room is used as a multi-device scenario, in which the user's living room has four devices: a speaker 101, a television 102, a mobile phone 103, and a wearable watch 104. These four devices are all equipped with voice assistants, and the wake-up word is "Xiaobu Xiaobu". Then, when the user speaks a user voice containing the wake-up word "Xiaobu Xiaobu", the voice assistants of the speaker 101, television 102, mobile phone 103, and wearable watch 104 may all detect that the user voice meets the voice wake-up condition.
  • the speaker 101, television 102, mobile phone 103, and wearable watch 104 will respectively determine whether they are in a preset device state. If the mobile phone 103 determines that it is in a first preset device state, the first device state information corresponding to the first preset device state will be sent to the speaker 101, television 102, and wearable watch 10 4. If the mobile phone 103 receives the second device status information of the speaker 101, the television 102 and the wearable watch 104, the mobile phone 103 will compare the first preset device status of the mobile phone 103 with the second preset device status corresponding to the second device status information of other devices, and then determine whether the mobile phone 103 performs voice interaction. Then the speaker 101, the television 102 and the wearable watch 104 will also determine whether they perform voice interaction. Finally, one electronic device is determined to interact from the speaker 101, the television 102, the mobile phone 103 and the wearable watch 104.
  • the mobile phone 103 determines to perform voice interaction, so the mobile phone 103 can parse the voice control command corresponding to the user's voice and affect the voice control command.
  • the current device when it is first detected that the user's voice meets the voice wake-up condition, it is determined whether the current device is in the first preset device state; then if the current device is in the first preset device state, the first device state information corresponding to the first preset device state is sent to the candidate device, and the candidate device and the current device are in the same multi-device scenario; finally, if the second device state information sent by the candidate device is received, it is determined whether the current device performs voice interaction according to the first preset device state and the second preset device state corresponding to the second device state information.
  • the first preset device state of the current device and the second preset device state of the candidate device can be obtained. Since the device state can represent the user's use of the device, it can be determined according to the device state of each device which device the user wants to use for voice interaction, which effectively improves the accuracy of voice control.
  • FIG5 is a flow chart of a voice control method provided by another embodiment of the present application.
  • the voice control method may at least include:
  • a feasible implementation method is to first obtain the device type of the current device, which is used to distinguish the categories of different devices.
  • the device type can be divided into handheld devices, wearable devices, speaker devices, and television devices, etc.
  • the device type can be divided according to user needs, or it can be directly divided at the factory; then the preset device states corresponding to electronic devices of different device types are also different.
  • its corresponding preset device state can be a handheld state
  • its device type of the electronic device is a wearable device
  • its corresponding preset device state can be a limb raised state
  • the device type of the electronic device is a speaker device
  • its corresponding preset device state can be a music playing state
  • the device type of the electronic device is a television device
  • its corresponding preset device state can be a video playing state
  • the specified status parameters corresponding to the current device can be obtained according to the device type of the current device, wherein the specified status parameters can be obtained through specified sensors and other devices, and finally, it is determined whether the current device is in the first preset device state based on the specified status parameters.
  • the device type of the current device is a handheld device, for example, the current device is a smart phone
  • the handheld device is generally not blocked by objects such as pockets, the handheld device is not completely horizontal, and the handheld device is not very stable. Therefore, if the device type of the current device is a handheld device, at least one of the occlusion state parameters, placement angle state parameters, and jitter state parameters corresponding to the current device can be obtained, so as to determine whether the current device is in a handheld state according to at least one of the occlusion parameters, placement angle parameters, and jitter parameters.
  • the occlusion state parameters may include the illumination value collected by the illumination sensor and the proximity distance value collected by the proximity sensor.
  • the illumination value is less than the preset illumination value, and the proximity distance value is less than the preset proximity distance value, then it can be determined that the current device is blocked, that is, it is determined that the current device is not in a handheld state (the first preset device state); otherwise, it can be determined that the current device is not blocked, then it can be determined whether the current device is in a flat state based on the placement angle state parameters, wherein the placement angle parameters may include the geomagnetic value collected by the geomagnetic sensor and the acceleration value collected by the acceleration sensor. If the angle calculated based on the geomagnetic value and the acceleration value is less than the preset flat angle, then it can be determined.
  • the current device is in a flat state, that is, it is determined that the current device is not in a handheld state (a first preset device state); otherwise, it can be determined that the current device is not in a flat state, and whether the current device is in a shaking state can be determined based on a shaking state parameter, wherein the shaking state parameter can include an angular velocity value collected by an angular velocity sensor, and an average angular velocity value in a time sliding window can be calculated according to the angular velocity value.
  • the current real-time angular velocity value is greater than a preset maximum angular velocity value, or the current real-time angular velocity value is greater than a preset minimum angular velocity value and less than a preset maximum angular velocity value and the average angular velocity value is greater than a preset average angular velocity value, it can be determined that the current device is in a shaking state, and then it can be determined that the current device is in a handheld state (a first preset device state), otherwise, it can be determined that the current device is not in a shaking state, and then it can be determined that the current device is not in a handheld state (a first preset device state).
  • step S504 please refer to the record in the above step S304, which will not be repeated here.
  • S506 If the second device status information sent by the candidate device is received, compare the priority of the first preset device status with the priority of the second preset device status corresponding to the second device status information, and determine whether the current device performs voice interaction according to the priority comparison result.
  • the first preset device status and the second preset device status corresponding to the second device status information can be compared.
  • a feasible implementation method is to respectively determine the priority of the first preset device status and each second preset device status, and then determine whether the current device performs voice interaction based on the priority comparison result.
  • the priority order of preset device states corresponding to each electronic device in the same multi-device scenario can be determined according to the user's instructions or preset settings when the device leaves the factory, and then the preset device state priority order determines the first state priority corresponding to the first preset device state, and determines the second state priority corresponding to the second preset device state corresponding to the second device state information. If the first state priority is greater than the second state priority, it means that the current device has a higher priority voice interaction control right, and the current device is determined to perform voice interaction; if the first state priority is less than the second state priority, it means that other candidate devices have a higher priority voice interaction control right, and the current device is determined not to perform voice interaction.
  • S508 If the second device status information sent by the candidate device is not received, determine that the current device performs voice interaction.
  • the current device does not receive the second device status information sent by the candidate device within the first preset time, it means that there is no candidate device in the multi-device scenario that is also in the preset device status. At this time, it can be directly determined that the voice interaction priority of the current device is the highest, and the current device can be controlled to perform voice interaction.
  • the current device If the current device is not in the first preset device state, but receives the second device state information sent by the candidate device, it means that there is a candidate device in the multi-device scenario that is also in the preset device state, then the voice interaction of the candidate device takes precedence. At this time, the current device can be controlled not to perform voice interaction.
  • the current device is not in the first preset device state and has not received the second device state information sent by the candidate device, it means that the interaction priority of all electronic devices in the multi-device scenario is low.
  • each electronic device may not perform voice interaction and continue to listen to whether the user's voice meets the wake-up conditions.
  • other judgment conditions may be used to continue to screen electronic devices that perform voice interaction from all electronic devices in the multi-device scenario.
  • the accuracy of judging the voice interaction device can be improved.
  • FIG6 is a flow chart of a voice control method provided by another embodiment of the present application.
  • the voice control method may at least include:
  • S606 If the second device status information sent by the candidate device is received, determine whether the current device performs voice interaction according to the first preset device status and the second preset device status corresponding to the second device status information.
  • steps S602 to S606 please refer to the description in the above embodiment, which will not be repeated here.
  • the current device is not in the first preset device state and has not received the second device state information sent by the candidate device, it means that the interaction priority of all electronic devices in the multi-device scenario is low. If it is still necessary to select an electronic device for voice interaction at this time, the electronic device for voice interaction can be selected according to the common voice feature values corresponding to each electronic device.
  • the general voice feature value is used to represent the wake-up priority between the sound source and the device, and there are many factors that represent the wake-up priority between the sound source and the device, so the general voice feature parameter can be used to characterize the wake-up priority between the sound source and the device, that is, the general voice feature value can be represented by a variety of general voice feature parameters. It is easy to understand that when a user interacts with a device by voice, he often approaches the device and speaks to the device, so the general voice feature parameter may include but is not limited to: the distance parameter between the sound source and the device and the orientation parameter of the device relative to the sound source.
  • the distance parameter between the sound source and the device can be calculated by the audio energy of the wake-up word in the user's voice.
  • the audio energy of the wake-up word needs to reduce the impact of environmental noise as much as possible.
  • VAD Voice Activity Detection
  • the Voice Activity Detection (VAD) method can be used to separate the wake-up word and environmental noise based on the user's voice containing the wake-up word. Further, the energy and duration of the wake-up word and the energy and duration of the environmental noise can be obtained. Then the wake-up word energy without the impact of noise can be calculated as follows:
  • the energy and duration of the wake-up word are es and ts, respectively, and the energy and duration of the ambient noise are en and tn, respectively. It can be regarded as the power of the wake-up word. can be regarded as the power of environmental noise, then and The difference can be considered as the power of the wake-up word after removing the influence of noise, and then the power of the wake-up word after removing the influence of noise can be used to represent the energy of the wake-up word after removing the influence of noise.
  • the method for calculating the orientation parameters of the current device relative to the sound source can train a sound orientation decision model through pre-recorded audio data, and then input the user's voice into the decision model to obtain the sound orientation result, that is, when The azimuth parameters of the previous device relative to the sound source, where the pre-recorded audio data may include: 1. Spectral feature data. The reason for selecting spectral feature data is that as the azimuth parameters of the sound source relative to the sound source increase, the sound will be reflected more to reach the current device, so the high-frequency part of the user's voice received by the current device will be attenuated more than the low-frequency part. 2. Reverberation feature data.
  • the reason for selecting reverberation feature data is that if the azimuth parameters of the sound source relative to the sound source increase, the greater the reverberation energy, the current device can calculate the voice direct-mixed ratio and autocorrelation characteristics of the user's voice. If the reverberation is greater, the peak of the autocorrelation result will be more and greater. 3. Multi-microphone feature data. The reason for selecting multi-microphone feature data is that if the current device has multiple microphones participating in the control of voice, the sound direction characteristics of multiple microphones can also be calculated to assist in deciding the azimuth parameters of the current device relative to the sound source.
  • first general voice feature weight corresponding to each first general voice feature parameter is preset to be set, wherein the greater the effect of the first general voice feature parameter on the wake-up priority between the sound source and the device, the greater the corresponding first general voice feature weight.
  • the first general voice feature parameter includes: a distance parameter between the sound source and the current device and an orientation parameter of the current device relative to the sound source
  • the first general voice feature weight corresponding to the distance parameter between the sound source and the current device can be set to 0.6
  • the first voice feature weight corresponding to the orientation parameter of the current device relative to the sound source can be set to 0.4.
  • the first general voice feature parameters corresponding to the current device you can also obtain the first general voice feature weights corresponding to each first general voice feature parameter, and then calculate the first general voice feature value corresponding to the current device based on each first general voice feature parameter and each first general voice feature weight, that is, multiply each first general voice feature parameter with its corresponding first general voice feature weight, and add up the multiplication results as the first general voice feature value corresponding to the current device.
  • the first universal voice feature value needs to be synchronized to other electronic devices in the multi-device scenario.
  • other electronic devices in the multi-device scenario will also use the same method as the current device to obtain the second universal voice feature value corresponding to the user voice, and synchronize the second universal voice feature value to the current device.
  • S610 If a second general voice feature value sent by a candidate device is received, determine whether the current device performs voice interaction according to the first general voice feature value and the second general voice feature value.
  • the current device after the current device sends the first general voice feature value to all candidate devices in the multi-device field, it can wait for a second preset time. If at least one candidate device receives the second general voice feature value sent by the candidate device within the second preset time, it can be determined whether the current device performs voice interaction based on the first general voice feature value and the second general voice feature value, that is, comparing the first general voice feature value and the second general voice feature value.
  • the first general voice feature value and the second general voice feature value can be compared. If the first general voice feature value is greater than the second general voice feature value, it means that the current device has a higher interaction priority with the user, and the current device is determined to perform voice interaction; if the first general voice feature value is less than the second general voice feature value, it means that the current device does not have a higher interaction priority with the user, that is, other candidate devices have a higher interaction priority with the user, and the current device is determined not to perform voice interaction; if the first general voice feature value is equal to the second general voice feature value, it means that neither the current device nor other candidate devices have a higher interaction priority.
  • the current device is a pre-set priority interaction device, wherein the priority interaction device is a user-defined setting or a setting of the electronic device when it leaves the factory, so that when the general voice feature values of multiple devices are the same, one of the electronic devices is selected for interaction to avoid the situation where the user voice interaction fails. If the current device is determined to be a pre-set priority interaction device, the current device is determined to perform voice interaction.
  • the priority interaction device is a user-defined setting or a setting of the electronic device when it leaves the factory, so that when the general voice feature values of multiple devices are the same, one of the electronic devices is selected for interaction to avoid the situation where the user voice interaction fails.
  • the current device after the current device sends the first universal voice feature value to all candidate devices in the multi-device scene, it can wait for a second preset time. If the second universal voice feature value sent by the candidate device is not received within the second preset time, it means that no other candidate device in the multi-scenario device has obtained the universal voice feature value except the current device. At this time, it can be directly determined that the current device performs voice interaction.
  • each electronic device in the multi-device scenario when it is determined that each electronic device in the multi-device scenario is not in a preset state, The general voice features acquired by each electronic device for the user's voice are obtained, and then the electronic device for voice interaction is selected according to the general voice features, which effectively improves the accuracy of determining the device for voice interaction.
  • FIG. 7 is a structural block diagram of a voice control device provided by another embodiment of the present application.
  • the voice control device 700 includes:
  • the voice wake-up module 710 is used to determine whether the current device is in a first preset device state when detecting that the user voice meets the voice wake-up condition;
  • the device state sending module 720 is used to send the first device state information corresponding to the first preset device state to the candidate device if the current device is in the first preset device state, and the candidate device and the current device are in the same multi-device scenario;
  • the first voice interaction determination module 730 is used to determine whether the current device performs voice interaction based on the first preset device state and the second preset device state corresponding to the second device state information if the second device state information sent by the candidate device is received.
  • the first voice interaction determination module 730 is further used to compare the priority of the first preset device state with the priority of the second preset device state corresponding to the second device state information, and determine whether the current device performs voice interaction according to the priority comparison result.
  • the first voice interaction determination module 730 is also used to determine a first state priority corresponding to a first preset device state according to a preset device state priority order, and to determine a second state priority corresponding to a second preset device state corresponding to the second device state information; if the first state priority is greater than the second state priority, it is determined that the current device performs voice interaction; if the first state priority is less than the second state priority, it is determined that the current device does not perform voice interaction.
  • the voice wake-up module 710 is further used to obtain the device type of the current device, obtain the specified state parameters corresponding to the current device according to the device type; and determine whether the current device is in the first preset device state according to the specified state parameters.
  • the voice wake-up module 710 is also used to obtain the occlusion state parameters, placement angle state parameters and jitter state parameters corresponding to the current device if the device type is a handheld device; and determine whether the current device is in a handheld state based on the occlusion parameters, placement angle parameters and jitter parameters.
  • the voice control device 700 further includes: a second voice interaction determination module, configured to determine that the current device performs voice interaction if the second device status information sent by the candidate device is not received.
  • a second voice interaction determination module configured to determine that the current device performs voice interaction if the second device status information sent by the candidate device is not received.
  • the voice control apparatus 700 further includes: a third voice interaction determination module, configured to determine that the current device does not perform voice interaction if the current device is not in the first preset device state and receives second device state information sent by the candidate device.
  • a third voice interaction determination module configured to determine that the current device does not perform voice interaction if the current device is not in the first preset device state and receives second device state information sent by the candidate device.
  • the voice control device 700 also includes: a fourth voice interaction determination module, which is used to obtain a first general voice feature value corresponding to the current device according to the user voice if the current device is not in the first preset device state and has not received the second device state information sent by the candidate device, and send the first general voice feature value to the candidate device; if the second general voice feature value sent by the candidate device is received, determine whether the current device performs voice interaction based on the first general voice feature value and the second general voice feature value.
  • a fourth voice interaction determination module which is used to obtain a first general voice feature value corresponding to the current device according to the user voice if the current device is not in the first preset device state and has not received the second device state information sent by the candidate device, and send the first general voice feature value to the candidate device; if the second general voice feature value sent by the candidate device is received, determine whether the current device performs voice interaction based on the first general voice feature value and the second general voice feature value.
  • the fourth voice interaction determination module is also used to obtain the first general voice feature parameters corresponding to the current device and the first general voice feature weights corresponding to each first general voice feature parameter according to the user voice; based on each first general voice feature parameter and each first general voice feature weight, calculate the first general voice feature value corresponding to the current device.
  • the first general voice feature parameter includes, but is not limited to: a distance parameter between the sound source and the current device and a position parameter of the current device relative to the sound source.
  • the fourth voice interaction determination module is also used to determine that the current device performs voice interaction if the first general voice feature value is greater than the second general voice feature value; if the first general voice feature value is less than the second general voice feature value, determine that the current device does not perform voice interaction; if the first general voice feature value is equal to the second general voice feature value, and the current device is determined to be a pre-set priority interaction device, then determine that the current device performs voice interaction.
  • the voice control device 700 further includes: a fifth voice interaction determination module, configured to determine that the current device performs voice interaction if the second general voice feature value sent by the candidate device is not received.
  • a fifth voice interaction determination module configured to determine that the current device performs voice interaction if the second general voice feature value sent by the candidate device is not received.
  • the voice control device includes: a voice wake-up module, which is used to detect that the user's voice meets the voice When the wake-up condition is met, it is determined whether the current device is in the first preset device state; the device state sending module is used to send the first device state information corresponding to the first preset device state to the candidate device if the current device is in the first preset device state, and the candidate device and the current device are in the same multi-device scenario; the voice interaction determination module is used to determine whether the current device performs voice interaction according to the first preset device state and the second preset device state corresponding to the second device state information if the second device state information sent by the candidate device is received.
  • a voice wake-up module which is used to detect that the user's voice meets the voice When the wake-up condition is met, it is determined whether the current device is in the first preset device state
  • the device state sending module is used to send the first device state information corresponding to the first preset device state to the candidate device if the current device is in the first preset
  • the first preset device state of the current device and the second preset device state of the candidate device can be obtained. Since the device state can represent the user's use of the device, it can be determined according to the device state of each device which device the user wants to use for voice interaction, which effectively improves the accuracy of voice control.
  • FIG. 8 is a flow chart of a voice control method provided in another embodiment of the present application.
  • the voice control method includes:
  • the multi-device environment includes a speaker, a television, a mobile phone, and a wearable watch. Then, a mobile phone with better data processing performance can be used as a master device, and a speaker, a television, and a wearable watch with poor data processing performance can be used as a slave device.
  • the voice control method is first described by applying it to the master device.
  • the master device When the master device detects that the user voice meets the voice wake-up condition, it can first determine whether the current master device itself is in the first preset device state.
  • the current master device is in the first preset device state and receives the second device state information sent by the slave device, determine the target interactive device for voice interaction from the current master device and the slave device according to the first device state information and the second device state information corresponding to the first preset device state.
  • the current device is in the first preset device state, it can be determined that the current device may be a device that is being used or operated by the user. However, since there are multiple electronic devices in a multi-device scenario, there may be other electronic devices in the multi-device scenario that are also in the preset device state.
  • the slave devices in the same multi-device scenario can synchronize the status information corresponding to their own preset device state to the master device after determining that they are in the preset device state.
  • the target interactive device for voice interaction can be determined from the current master device and the slave device according to the first device status information and the second device status information corresponding to the first preset device status.
  • the specific method of determining the target interactive device according to the device status information can refer to the description in the above embodiment, which will not be repeated here.
  • the master device can generate an interaction instruction. If the target interaction device is the current master device, the current master device is directly controlled to perform voice interaction based on the interaction instruction. If the target interaction device is a slave device, the interaction instruction is sent to the target interaction device. The interaction instruction is used to instruct the target interaction device to perform voice interaction, that is, after the target interaction device receives the interaction instruction, it controls the target interaction device itself to perform voice interaction.
  • the current master device is controlled to perform voice interaction.
  • the current master device is not in the first preset device state, it means that the current master device does not have the voice interaction priority in terms of device state.
  • the target interaction device for voice interaction can be determined from the slave device according to the second device status information, and the target interaction device can be controlled to perform voice interaction based on the interaction instruction.
  • the current master device is not in the first preset device state and has not received the second device state information sent by the slave device, it means that the interaction priority of all electronic devices in the multi-device scene is low.
  • the electronic device for voice interaction can be selected according to the universal voice feature values corresponding to each electronic device.
  • the first universal voice feature value corresponding to the current master device can be obtained according to the user voice; if the second universal voice feature value sent by the slave device is received, the target interaction device for voice interaction is determined from the current master device and the slave device according to the first universal voice feature value and the second universal voice feature value, and the target interaction device is controlled to perform voice interaction based on the interaction instruction.
  • the target interactive device for voice interaction is determined from the slave device according to the second device status information, and the target interactive device is controlled to perform voice interaction based on the interaction instruction.
  • the target interactive device for voice interaction is determined from the slave device according to the second general voice feature value, and the target interactive device is controlled to perform voice interaction based on the interaction instruction.
  • the work of determining the target interactive device based on status information or determining the target interactive device based on voice feature values is set in the master device.
  • the speed of determining the target interactive device can be increased.
  • the performance of the slave device is poor, the data processing amount of the slave device can be reduced, and the power consumption of the slave device can be reduced.
  • FIG9 is a structural block diagram of a voice control device provided by another embodiment of the present application.
  • the voice control device 900 includes:
  • the master device voice wake-up module 910 is used to determine whether the current master device is in the first preset device state when detecting that the user voice meets the voice wake-up condition;
  • the master device voice interaction determination module 920 is used to determine a target interaction device for voice interaction from the current master device and the slave device according to the first device status information and the second device status information corresponding to the first preset device status if the current master device is in the first preset device status and receives the second device status information sent by the slave device;
  • the instruction control module 930 is used to control the target interactive device to perform voice interaction based on the interaction instruction.
  • the master device voice interaction determination module 920 is also used to control the current master device to perform voice interaction based on the interaction instruction if the target interaction device is the current master device; if the target interaction device is a slave device, the interaction instruction is sent to the target interaction device, and the interaction instruction is used to instruct the target interaction device to perform voice interaction.
  • the master device voice interaction determination module 920 is further used to control the current master device to perform voice interaction if the current master device is in a first preset device state and has not received the second device state information sent by the slave device.
  • the master device voice interaction determination module 920 is also used to determine the target interaction device for voice interaction from the slave device according to the second device status information if the current master device is not in the first preset device state and receives the second device status information sent by the slave device; and control the target interaction device to perform voice interaction based on the interaction instruction.
  • the master device voice interaction determination module 920 is also used to obtain a first general voice feature value corresponding to the current master device according to the user voice if the current master device is not in the first preset device state and has not received the second device state information sent by the slave device; if the second general voice feature value sent by the slave device is received, determine the target interaction device for voice interaction from the current master device and the slave device according to the first general voice feature value and the second general voice feature value; and control the target interaction device for voice interaction based on the interaction instruction.
  • the master device voice interaction determination module 920 is also used to determine the target interaction device for voice interaction from the slave device according to the second device status information if the user voice is not detected to meet the voice wake-up condition and the second device status information sent by the slave device is received; and control the target interaction device to perform voice interaction based on the interaction instruction.
  • the master device voice interaction determination module 920 is also used to determine the target interaction device for voice interaction from the slave device according to the second general voice feature value if the user voice is not monitored to meet the voice wake-up condition and the second general voice feature value sent by the slave device is received; and control the target interaction device to perform voice interaction based on the interaction instruction.
  • FIG. 10 is a flow chart of a voice control method provided in another embodiment of the present application.
  • the voice control method includes:
  • the multi-device environment includes a speaker, a television, a mobile phone, and a wearable watch. Then, a mobile phone with better data processing performance can be used as a master device, and a speaker, a television, and a wearable watch with poor data processing performance can be used as a slave device.
  • the voice control method is first described as being applied to a slave device.
  • the second general voice feature value corresponding to the current slave device is obtained according to the user voice, and the second general voice feature value is sent to the master device; if an interaction instruction sent by the master device is received, the current slave device is controlled to perform voice interaction, wherein the interaction instruction is generated by the master device according to the second voice feature value corresponding to the master device, the second voice feature value of the current slave device, and the second voice feature values of other slave devices.
  • the work of determining the target interactive device based on status information or determining the target interactive device based on voice feature values is set in the master device.
  • the speed of determining the target interactive device can be increased.
  • the performance of the slave device is poor, the data processing amount of the slave device can be reduced, and the power consumption of the slave device can be reduced.
  • FIG11 is a structural block diagram of a voice control device provided by another embodiment of the present application.
  • the voice control device 1100 includes:
  • the slave device voice wake-up module 1110 is used to determine whether the current slave device is in a second preset device state when detecting that the user voice meets the voice wake-up condition;
  • the slave device state sending module 1120 is used to send the second device state information of the current device to the master device if the current slave device is in the second preset device state;
  • the slave device voice interaction module 1130 is used to control the current slave device to perform voice interaction if an interaction instruction sent by the master device is received, wherein the interaction instruction is generated by the master device according to the first device status information of the master device, the second device status information of the current slave device and the second device status information of other slave devices.
  • the slave device voice interaction module 1130 is also used to obtain the second general voice feature value corresponding to the current slave device according to the user voice, and send the second general voice feature value to the master device if the current slave device is not in the second preset device state and has not received the interaction instruction sent by the master device.
  • the slave device voice interaction module 1130 is also used to control the current slave device to perform voice interaction if an interaction instruction sent by the master device is received, wherein the interaction instruction is generated by the master device according to the second voice feature value corresponding to the master device, the second voice feature value of the current slave device, and the second voice feature values of other slave devices.
  • An embodiment of the present application further provides a computer storage medium, which can store multiple instructions, and the instructions are suitable for being loaded by a processor and executing the steps of any method in the above embodiments.
  • the electronic device 1200 may include: at least one electronic device processor 1201, at least one network interface 1204, a user interface 1203, a memory 1205, and at least one communication bus 1202.
  • the communication bus 1202 is used to realize the connection and communication between these components.
  • the user interface 1203 may include a display screen (Display) and a camera (Camera), and the optional user interface 1203 may also include a standard wired interface and a wireless interface.
  • Display display screen
  • Camera Camera
  • the optional user interface 1203 may also include a standard wired interface and a wireless interface.
  • the network interface 1204 may optionally include a standard wired interface or a wireless interface (such as a WI-FI interface).
  • the electronic device processor 1201 may include one or more processing cores.
  • the electronic device processor 1201 uses various interfaces and lines to connect various parts of the entire electronic device 1200, and executes various functions and processes data of the electronic device 1200 by running or executing instructions, programs, code sets or instruction sets stored in the memory 1205, and calling data stored in the memory 1205.
  • the electronic device processor 1201 can be implemented in at least one hardware form of digital signal processing (DSP), field programmable gate array (FPGA), and programmable logic array (PLA).
  • DSP digital signal processing
  • FPGA field programmable gate array
  • PDA programmable logic array
  • the electronic device processor 1201 can integrate one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), and a modem.
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is responsible for rendering and drawing the content to be displayed on the display screen; and the modem is used to process wireless communication. It is understandable that the above-mentioned modem may not be integrated into the electronic device processor 1201, but may be implemented by a separate chip.
  • the memory 1205 may include a random access memory (RAM) or a read-only memory (ROM).
  • the memory 1205 includes a non-transitory computer-readable storage medium.
  • the memory 1205 can be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 1205 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), instructions for implementing the above-mentioned various method embodiments, etc.; the data storage area may store data involved in the above-mentioned various method embodiments, etc.
  • the memory 1205 may also be at least one storage device located away from the aforementioned electronic device processor 1201. As shown in FIG. 12, the memory 1205 as a computer storage medium may include an operating system, a network communication module, a user interface module and a voice control program.
  • the user interface 1203 is mainly used to provide an input interface for the user and obtain data input by the user; and the electronic device processor 1201 can be used to call the voice control program stored in the memory 1205 and specifically perform the following operations:
  • first device state information corresponding to the first preset device state is sent to a candidate device, and the candidate device and the current device are in the same multi-device scenario;
  • the candidate device If the second device status information sent by the candidate device is received, it is determined whether the current device performs voice interaction according to the first preset device status and the second preset device status corresponding to the second device status information.
  • determining whether the current device performs voice interaction is performed based on the first preset device state and the second preset device state corresponding to the second device state information, including: comparing the priority of the first preset device state with the priority of the second preset device state corresponding to the second device state information, and determining whether the current device performs voice interaction based on the priority comparison result.
  • the priority of the first preset device state is compared with the priority of the second preset device state corresponding to the second device state information, and whether the current device performs voice interaction is determined according to the priority comparison result, including: determining the first state priority corresponding to the first preset device state according to a preset device state priority order, and determining the second state priority corresponding to the second preset device state corresponding to the second device state information; if the first state priority is greater than the second state priority, determining that the current device performs voice interaction; if the first state priority is less than the second state priority, determining that the current device does not perform voice interaction.
  • determining whether the current device is in a first preset device state includes: obtaining a device type of the current device, obtaining a specified state parameter corresponding to the current device according to the device type; and determining whether the current device is in the first preset device state according to the specified state parameter.
  • obtaining the specified state parameters corresponding to the current device according to the device type includes: if the device type is a handheld device, obtaining the occlusion state parameters, placement angle state parameters, and jitter state parameters corresponding to the current device; Determining whether the current device is in a first preset device state according to the device state parameter includes: determining whether the current device is in a handheld state according to an occlusion parameter, a placement angle parameter, and a jitter parameter.
  • the method further includes: if the second device status information sent by the candidate device is not received, determining that the current device performs voice interaction.
  • the method further includes: if the current device is not in the first preset device state and receives the second device state information sent by the candidate device, determining that the current device does not perform voice interaction.
  • the method also includes: if the current device is not in a first preset device state and has not received second device state information sent by a candidate device, obtaining a first general voice feature value corresponding to the current device according to the user voice, and sending the first general voice feature value to the candidate device; if a second general voice feature value sent by the candidate device is received, determining whether the current device performs voice interaction based on the first general voice feature value and the second general voice feature value.
  • obtaining a first general voice feature value corresponding to a current device based on user voice includes: obtaining a first general voice feature parameter corresponding to the current device and a first general voice feature weight corresponding to each first general voice feature parameter based on the user voice; and calculating the first general voice feature value corresponding to the current device based on each first general voice feature parameter and each first general voice feature weight.
  • the first general voice feature parameter includes, but is not limited to: a distance parameter between the sound source and the current device and a position parameter of the current device relative to the sound source.
  • whether the current device performs voice interaction is determined based on a first general voice feature value and a second general voice feature value, including: if the first general voice feature value is greater than the second general voice feature value, determining that the current device performs voice interaction; if the first general voice feature value is less than the second general voice feature value, determining that the current device does not perform voice interaction; if the first general voice feature value is equal to the second general voice feature value, and the current device is determined to be a pre-set priority interaction device, then determining that the current device performs voice interaction.
  • the method further includes: if the second general voice feature value sent by the candidate device is not received, determining that the current device performs voice interaction.
  • the user interface 1203 is mainly used to provide an input interface for the user and obtain data input by the user; and the electronic device processor 1201 can be used to call the voice control program stored in the memory 1205 and specifically perform the following operations:
  • the current master device determines whether the current master device is in the first preset device state; if the current master device is in the first preset device state and receives the second device state information sent by the slave device, then determine the target interactive device for voice interaction from the current master device and the slave device according to the first device state information and the second device state information corresponding to the first preset device state; and control the target interactive device to perform voice interaction based on the interactive instruction.
  • the target interactive device is the current master device
  • the current master device is controlled to perform voice interaction based on the interaction instruction
  • the target interactive device is a slave device
  • the interaction instruction is sent to the target interactive device, and the interaction instruction is used to instruct the target interactive device to perform voice interaction.
  • the voice control method further includes: if the current master device is in a first preset device state and has not received the second device state information sent by the slave device, controlling the current master device to perform voice interaction.
  • the voice control method also includes: if the current master device is not in the first preset device state and receives the second device state information sent by the slave device, then determining the target interactive device for voice interaction from the slave device according to the second device state information; and controlling the target interactive device for voice interaction based on the interaction instruction.
  • the voice control method also includes: if the current master device is not in the first preset device state and has not received the second device state information sent by the slave device, then obtaining the first general voice feature value corresponding to the current master device according to the user voice; if the second general voice feature value sent by the slave device is received, then determining the target interactive device for voice interaction from the current master device and the slave device according to the first general voice feature value and the second general voice feature value; and controlling the target interactive device for voice interaction based on the interaction instruction.
  • the voice control method also includes: if the user voice is not detected to meet the voice wake-up condition, and the second device status information sent by the slave device is received, the target interactive device for voice interaction is determined from the slave device according to the second device status information; and the target interactive device is controlled to perform voice interaction based on the interaction instruction.
  • the voice control method also includes: if the user voice is not monitored to meet the voice wake-up condition and the second general voice feature value sent by the slave device is received, then the target interactive device for voice interaction is determined from the slave device according to the second general voice feature value; and the target interactive device is controlled to perform voice interaction based on the interaction instruction.
  • the user interface 1203 is mainly used to provide an input interface for the user and obtain data input by the user; and the electronic device processor 1201 can be used to call the voice control program stored in the memory 1205 and specifically perform the following operations:
  • the current slave device When it is detected that the user voice meets the voice wake-up condition, it is determined whether the current slave device is in the second preset device state; if the current slave device is in the second preset device state, the second device state information of the current device is sent to the master device; if an interaction instruction sent by the master device is received, the current slave device is controlled to perform voice interaction, wherein the interaction instruction is generated by the master device according to the first device state information of the master device, the second device state information of the current slave device, and the second device state information of other slave devices.
  • the voice control method also includes: if the current slave device is not in the second preset device state and has not received the interaction instruction sent by the master device, then obtaining the second general voice feature value corresponding to the current slave device according to the user voice, and sending the second general voice feature value to the master device; if the interaction instruction sent by the master device is received, then controlling the current slave device to perform voice interaction, wherein the interaction instruction is generated by the master device according to the second voice feature value corresponding to the master device, the second voice feature value of the current slave device, and the second voice feature values of other slave devices.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of modules is only a logical function division. There may be other division methods in actual implementation, such as multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or modules, which can be electrical, mechanical or other forms.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in one place or distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional module in each embodiment of the present application can be integrated into a processing module, or each module can exist physically separately, or two or more modules can be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Selective Calling Equipment (AREA)

Abstract

本申请实施例公开了一种语音控制方法、装置、存储介质以及电子设备。首先监测到用户语音满足语音唤醒条件时,若确定当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备;若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。

Description

语音控制方法、装置、存储介质以及电子设备
本申请要求了2022年11月17日提交的、申请号为2022114437865、发明名称为“语音控制方法、装置、存储介质以及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音控制技术领域,尤其涉及一种语音控制方法、装置、存储介质以及电子设备。
背景技术
随着科学技术的发展,终端也越来越多地出现在人们生活中,终端中的功能也越来越丰富。截屏功能是目前很多终端的一项基本功能,通过截屏功能可以截取终端屏幕上的显示内容,并将截取的显示内容以图片的形式保存下来,以得到终端界面的截屏,通常通过这种截屏功能获得的图片是整个屏幕上的全部界面的截屏。
发明内容
本申请实施例提供一种语音控制方法、装置、存储介质以及电子设备。
第一方面,本申请实施例提供一种语音控制方法,所述方法包括:
监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态;
若所述当前设备处于所述第一预设设备状态,则将所述第一预设设备状态对应的第一设备状态信息发送至候选设备,所述候选设备与所述当前设备处于同一多设备场景中;
若接收到所述候选设备发送的第二设备状态信息,则根据所述第一预设设备状态以及所述第二设备状态信息对应的第二预设设备状态,确定所述当前设备是否进行语音交互。
第二方面,本申请实施例提供一种语音控制方法,所述方法包括:
监测到用户语音满足语音唤醒条件时,判断当前主设备是否处于第一预设设备状态;
若所述当前主设备处于所述第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第一预设设备状态对应的第一设备状态信息以及所述第二设备状态信息,从所述当前主设备以及所述从属设备中确定进行语音交互的目标交互设备;
基于交互指令控制所述目标交互设备进行语音交互。
第三方面,本申请实施例提供一种语音控制方法,所述方法包括:
监测到用户语音满足语音唤醒条件时,判断当前从属设备是否处于第二预设设备状态;
若所述当前从属设备处于所述第二预设设备状态,则将所述当前设备的第二设备状态信息发送至主设备;
若接收到所述主设备发送的交互指令,则控制所述当前从属设备进行语音交互,其中,所述交互指令为所述主设备根据所述主设备的第一设备状态信息、所述当前从属设备的第二设备状态信息以及其他从属设备的第二设备状态信息生成。
第四方面,本申请实施例提供一种语音控制装置,所述装置包括:
语音唤醒模块,用于监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态;
设备状态发送模块,用于若所述当前设备处于所述第一预设设备状态,则将所述第一预设设备状态对应的第一设备状态信息发送至候选设备,所述候选设备与所述当前设备处于同一多设备场景中;
语音交互确定模块,用于若接收到所述候选设备发送的第二设备状态信息,则根据所述第一预设设备状态以及所述第二设备状态信息对应的第二预设设备状态,确定所述当前设备是否进行语音交互。
第五方面,本申请实施例提供一种语音控制装置,所述装置包括:
主设备语音唤醒模块,用于监测到用户语音满足语音唤醒条件时,判断当前主设备是否处于第一预设设备状态;
主设备语音交互确定模块,用于若所述当前主设备处于所述第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第一预设设备状态对应的第一设备状态信息以及所述第二设备状态信息,从所述当前主设备以及所述从属设备中确定进行语音交互的目标交互设备;
指令控制模块,用于基于交互指令控制所述目标交互设备进行语音交互。
第六方面,本申请实施例提供一种语音控制装置,所述装置包括:
从属设备语音唤醒模块,用于监测到用户语音满足语音唤醒条件时,判断当前从属设备是否处于第二预设设备状态;
从属设备状态发送模块,用于若所述当前从属设备处于所述第二预设设备状态,则将所述当前设备的第二设备状态信息发送至主设备;
从属设备语音交互模块,用于若接收到所述主设备发送的交互指令,则控制所述当前从属设备进行语音交互,其中,所述交互指令为所述主设备根据所述主设备的第一设备状态信息、所述当前从属设备的第二设备状态信息以及其他从属设备的第二设备状态信息生成。
第七方面,本申请实施例提供一种计算机存储介质,所述计算机存储介质存储有多条指令,所述指令适于由处理器加载并执行上述的方法的步骤。
第八方面,本申请实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请实施例的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的相关技术中的一种设备交互方法;
图2为本申请实施例提供的一种语音控制方法的示例性系统架构图;
图3为本申请实施例提供的一种语音控制方法的流程示意图;
图4为本申请实施例提供的一种设备交互方法;
图5为本申请另一实施例提供的一种语音控制方法的流程示意图;
图6为本申请另一实施例提供的一种语音控制方法的流程示意图;
图7为本申请另一实施例提供的一种语音控制装置的结构框图;
图8为本申请另一实施例提供的一种语音控制方法的流程示意图;
图9为本申请另一实施例提供的一种语音控制装置的结构框图;
图10为本申请另一实施例提供的一种语音控制方法的流程示意图;
图11为本申请另一实施例提供的一种语音控制装置的结构框图;
图12为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为使得本申请的特征和优点能够更加的明显和易懂,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而非全部实施例。基于本申请中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
另外需要说明的是,本申请实施例所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如,本申请中涉及的对象特征、交互行为特征以及用户信息等都是在充分授权的情况下获取的。
语音助手是人工智能在电子设备上的重要应用。电子设备通过语音助手可以与用户进行智能对话和即时问答的智能交互。还可以识别用户输入的语音命令,并触发电子设备自动执行该语音命令对应的事件。通常情况下,语音助手是处于休眠状态的,用户在使用语音助手前,可以对语音助手进行语音唤醒。只有在语音助手被唤醒后,才可以接收并识别用户输入的语音命令。用于唤醒的语音数据可以称为唤醒词,例如,以唤醒词为“小布小布”为例,如果用户想要使用语音助手查询A地的天气,则可说出“小布小布,A地的天气”的语音命令,在语音助手接收到该语义命令之后,可以基于唤醒词“小布小布”被唤醒,进而电子设备利用语音助手可以识别该语音命令,并触发电子设备查询A地的天气,并通过语音或者文字向用户播报A地的天气。
在相关技术中,随着技术的发展语音控制的应用越来越广泛。如,很多家居设备目前都支持语音控制功能。如可以通过在家居设备中安装语音助手来实现语音控制功能。这样,便会存在用户所处环境中(如用户家中)包括多个支持语音控制功能的设备的场景,即多设备场景。在该多设备场景下,如果这多个设备中存在唤醒词相同的设备,则在用户说出唤醒词后,具有相同唤醒词的设备的语音助手均会被唤醒,并都会对用户后续说出的语音命令进行识别并作出响应。
请参阅图1,图1为本申请实施例提供的相关技术中的一种设备交互方法。
如图1所示,将用户的客厅作为多设备场景,其中,用户家客厅有音箱101,电视机102、手机103以及穿戴手表104四个设备,这四个设备均安装有语音助手,且唤醒词均为“小布小布"。那么当用户说出包含唤醒词“小布小布”的语音控制命令之后,音箱101,电视机102、手机103以及穿戴手表104的语音助手均会被唤醒并识别该语音命令,并对语音命令进行识别并作出响应。
用户在多设备场景中,往往用户可能只需要某一个设备进行响应,例如,当用户正在使用手机时,此时若需要与语音助手进行语音交互时,由于与手机进行交互更加便捷,那么用户往往更加希望手机中的语音助手可以被唤醒并响应用户的控制命令进行语音交互,而如多个设备同时响应,给用户带来的体验较差。
针对上述技术问题,在本申请实施例中,首先监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态;然后若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中;最后若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。监测到用户语音满足语音唤醒条件时,可以获取当前设备的第一预设设备状态与候选设备的第二预设设备状态,由于设备状态可以代表用户对设备的使用情况,因此根据各设备的设备状态可以确定用户具体想要使用哪个设备进行语音交互,有效提升了语音控制的准确性。
请参阅图2,图2为本申请实施例提供的一种语音控制方法的示例性系统架构图。
如图2所示,系统架构可以包括电子设备201、网络202和服务器203。网络202用于在电子设备201和服务器203之间提供通信链路的介质。网络202可以包括各种类型的有线 通信链路或无线通信链路,例如:有线通信链路包括光纤、双绞线或同轴电缆的,无线通信链路包括蓝牙通信链路、无线保真(Wireless-Fidelity,Wi-Fi)通信链路或微波通信链路等。
电子设备201可以通过网络202与服务器203交互,以接收来自服务器203的消息或向服务器203发送消息,或者电子设备201可以通过网络202与服务器203交互,进而接收其他用户向服务器203发送的消息或者数据。电子设备201可以是硬件,也可以是软件。当电子设备201为硬件时,可以是各种电子设备,包括但不限于智能手表、智能手机、平板电脑、智能电视、膝上型便携式计算机和台式计算机等。当电子设备201为软件时,可以是安装在上述所列举的电子设备中,其可以实现呈多个软件或软件模块(例如:用来提供分布式服务),也可以实现成单个软件或软件模块,在此不作具体限定。
服务器203可以是提供各种服务的业务服务器。需要说明的是,服务器203可以是硬件,也可以是软件。当服务器203为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器203为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块,在此不做具体限定。
在本申请实施例中,电子设备201的数量可以是多个,多个电子设备201可以处于同一多设备场景,且处于同一多设备场景的多个电子设备201也可以直接通过网络202进行连接,也即多个电子设备201也可以直接基于网络202进行数据传输。因此该系统架构还可以不包括服务器203,换言之,服务器203可以为本说明书实施例中可选的设备,即本说明书实施例提供的方法可以应用于仅包括电子设备201的系统结构中,本申请实施例对此不做限定。
在本申请实施例中,如果将系统架构中的某一电子设备201作为当前设备时,若当前设备监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态,若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中;若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。
应理解,图2中的电子设备、网络以及服务器的数目仅是示意性的,根据实现需要,可以是任意数量的电子设备、网络以及服务器。
请参阅图3,图3为本申请实施例提供的一种语音控制方法的流程示意图。本申请实施例的执行主体可以是执行语音控制的电子设备,也可以是执行语音控制方法的电子设备中的处理器,还可以是执行语音控制方法的电子设备中的语音控制服务。为方便描述,下面以执行主体是电子设备中的处理器为例,介绍语音控制方法的具体执行过程。
如图3所示,语音控制方法至少可以包括:
S302、监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态。
可以理解的,在本申请实施例中,语音控制方法主要应用于多设备场景中,多设备场景中存在至少两个电子设备,处于同一多设备场景中的各电子设备属于同一设备组中,同一设备组中的电子设备具有相同设备等级(也即同一设备组中的电子设备不区分从属关系或者主次关系)且各电子设备之间可以直接进行数据传输或者基于服务器的数据转发进行数据传输。进一步地,各电子设备可以通过连接同一个无线接入点(如WiFi接入点)、登录了同一个用户账号等方式,以使得各电子设备处于同一多设备场景中。
进一步地,处于同一多设备场景中的各电子设备中均设置有类似语音助手的程序,该程序可以基于麦克风采集的语音数据实时监听电子设备周围的用户发出的用户语音,判断用户是否需要进行语音交互。
一种判断用户是否需要进行语音交互的方式是,可以提前在各电子设备中设置语音唤醒条件,若监测到用户语音满足语音唤醒条件时,就可以确认用户需要进行语音交互。在本申请实施例中,语音唤醒条件可以是用户语音中包括预设的唤醒词和/或用户语音对应的声纹为预设声纹,因此当语音助手通过麦克风采集到用户语音之后,可以基于用户语音进行唤醒 词检测和/或声纹检测,当用户语音中包括预设的唤醒词和/或用户语音对应的声纹为预设声纹时,就可以认为检测到用户语音满足语音唤醒条件。语音唤醒条件还可以是电子设备处于预设状态,例如,若电子设备为智能手表,为了减少功耗,大部分时间智能手表都是处于熄屏状态,那么如果智能手表处于亮屏状态,则可以确定智能手表满足语音唤醒条件。
由于用户在使用电子设备或者对电子设备进行相关操作时,电子设备的设备状态会发生变化,例如,设备状态可以是指设备放置状态、屏幕点亮状态、待机状态、播放视频状态等静态或者动态的状态,因此电子设备的设备状态与用户的操作是关联的,因此当用户在多设备场景中需要进行语音交互时,往往更加希望自己正在操作的电子设备进行响应。
基于上述思路,在本申请实施例中,可以在处于同一多设备场景中的各电子设备在检测到用户语音满足语音唤醒条件时,都可以首先判断自身设备是否处于预设设备状态中。
具体地,处于同一多设备场景中的各电子设备,若电子设备的类型不同,则其对应的预设状态也不同,因此可以提前根据每个电子设备的设备类型分别设置不同电子设备对应的预设设备状态,那么若电子设备处于预设设备状态也就代表电子设备可能正在被用户操作或者使用,进而用户需要与该电子设备进行交互的可能性也就越大,为了便于将不同设备的预设设备状态进行区分,在本申请实施例中,将当前设备对应的预设设备状态确定为第一预设设备状态。
对于当前设备来说,若当前设备监测到用户语音满足语音唤醒条件时,可以判断当前设备是否处于第一预设设备状态,其中,判断前设备是否处于第一预设设备状态的方式可以不做限定,例如,可以通过获取当前设备中预设传感器采集的数据进而判断当前设备是否处于第一预设设备状态。
S304、若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中。
若判断当前设备处于第一预设设备状态,则可以确定当前设备可能是为用户正在使用或者操作的设备,但是由于多设备场景中存在多个电子设备,因此多设备场景中还可能存在其他也处于预设设备状态的电子设备,那么为了便于从多个处于预设设备状态的电子设备中确定出用户想要进行语音交互的电子设备,处于同一多设备场景中的电子设备在确定自身处于预设设备状态之后,都可以将自身处于的预设设备状态对应的状态信息同步至其他电子设备。
进一步地,由于预设设备状态并不是实体数据,不能直接进行传输,因此当前设备可以获取代表第一预设设备状态的第一设备状态信息,并将第一设备状态发送至与当前设备处于同一多设备场景的至少一个候选设备,其中,候选设备可以是与当前设备处于同一多设备场景的所有电子设备,也可以是与当前设备处于同一多设备场景中用户指定的电子设备。
S306、若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。
由于多设备场景中存在多个电子设备,因此多设备场景中还可能存在其他也处于预设设备状态的电子设备,也即当前设备将第一设备状态对应的第一设备状态信息发送至候选设备时,有可能候选设备也会发送其对应的预设设备状态的设备状态信息,这里为了与当前设备处于的第一预设设备状态进行区分,在本申请实施例中,将候选设备处于的预设设备状态记为第二预设设备状态,那么在当前设备第一预设设备状态对应的第一设备状态信息发送至候选设备之后,当前设备可以等待第一预设时间,若在第一预设时间内接收到至少一个候选设备发送的第二设备状态信息,则代表多设备场景中还可能存在其他也处于预设设备状态的电子设备。
进一步地,在接收到至少一个候选设备发送的第二设备状态信息之后,可以根据第二设备状态信息分别确定各候选设备对应的第二预设设备状态,然后根据将第一预设设备状态与各第二预设设备状态进行比较,然后确定当前设备是否进行语音交互。其中,将第一预设设备状态与各第二预设设备状态进行比较的方式可以不做限定,可以是根据用户或者电子设备出厂时设置的规则进行比较,以确定比较结果。其中,若确定当前设备进行语音交互,则可 以解析用户语音对应的语音控制命令,并响应该语音控制命令。
由于处于多设备场景中的每个电子设备都会执行上述语音控制方法,因此可以从处于多设备场景中的多个电子设备中确定一个进行语音交互的电子设备,提什么了用户进行语音交互时的体验。
请参阅图4,图4为本申请实施例提供的一种设备交互方法。
如图4所示,将用户的客厅作为多设备场景,其中,用户家客厅有音箱101,电视机102、手机103以及穿戴手表104四个设备,这四个设备均安装有语音助手,且唤醒词均为“小布小布"。那么当用户说出包含唤醒词“小布小布”的用户语音之后,音箱101,电视机102、手机103以及穿戴手表104的语音助手均可能都会监测到用户语音满足语音唤醒条件,那么音箱101,电视机102、手机103以及穿戴手表104会分别判断各自是否处于预设设备状态,如果手机103均判断自身处于第一预设设备状态,则会将第一预设设备状态对应的第一设备状态信息发送至音箱101,电视机102以及穿戴手表104,如果手机103接收到音箱101,电视机102以及穿戴手表104的第二设备状态信息,那么手机103会比较手机103的第一预设设备状态以及其他设备的第二设备状态信息对应的第二预设设备状态,进而确定手机103是否进行语音交互,那么音箱101,电视机102以及穿戴手表104也会确定自身是否进行语音交互,最终从音箱101,电视机102、手机103以及穿戴手表104确定出一个电子设备进行交互。
在图4中,手机103确定进行语音交互,那么手机103可以解析用户语音对应的语音控制命令,并影响该语音控制命令。
由于与手机进行交互更加便捷,那么用户往往更加希望手机中的语音助手可以被唤醒并响应用户的控制命令进行语音交互,而如多个设备同时响应,给用户带来的体验较差。
在本申请实施例中,首先监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态;然后若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中;最后若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。监测到用户语音满足语音唤醒条件时,可以获取当前设备的第一预设设备状态与候选设备的第二预设设备状态,由于设备状态可以代表用户对设备的使用情况,因此根据各设备的设备状态可以确定用户具体想要使用哪个设备进行语音交互,有效提升了语音控制的准确性。
请参阅图5,图5为本申请另一实施例提供的一种语音控制方法的流程示意图。如图5所示,语音控制方法至少可以包括:
S502、监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态。
由于用户在使用电子设备或者对电子设备进行相关操作时,对不同类型的电子设备进行使用或者操作的方式也不同,进而导致不同类型的电子设备所处的预设设备状态不同,因此在判断当前设备是否处于第一预设设备状态的过程中,一种可行的实施方式是,可以先获取当前设备的设备类型,设备类型用于区分不同设备的类别,例如,设备类型可以分为手持设备、穿戴设备、音箱设备以及电视设备等,设备类型可以根据用户需要进行划分,也可以直接在出厂时进行划分;那么不同设备类型的电子设备对应的预设设备状态也是不同的,例如,当电子设备的设备类型为手持设备时,其对应的预设设备状态可以是手持状态;当电子设备的设备类型为穿戴设备时,其对应的预设设备状态可以是肢体抬起状态;当电子设备的设备类型为音箱设备时,其对应的预设设备状态可以是播放音乐状态;当电子设备的设备类型为电视设备时,其对应的预设设备状态可以是播放视频状态等。
进一步地,不同设备状态下设备中某些状态参数是不同的,那么可以根据当前设备的设备类型获取当前设备对应的指定状态参数,其中,指定状态参数可以通过指定的传感器等器件获取,最后根据指定状态参数判断当前设备是否处于第一预设设备状态。
例如,若当前设备的设备类型为手持设备,例如,当前设备为智能手机,那么对于手持设备来说如果用户正在使用或者操作手持设备,手持设备一般不会被口袋等物体遮挡,手持设备也并不会完全水平,并且手持设备不会非常平稳,因此若当前设备的设备类型为手持设备,则可以获取当前设备对应的遮挡状态参数、放置角度状态参数以及抖动状态参数中的至少一种,以便于根据遮挡参数、放置角度参数以及抖动参数中的至少一种判断当前设备是否处于手持状态。
具体地,如果根据遮挡参数、放置角度参数以及抖动参数判断当前设备是否处于手持状态时,那么首先可以基于遮挡状态参数判断当前设备是否被遮挡,其中,遮挡状态参数可以包括光照传感器采集的光照数值以及接近传感器采集的接近距离数值,如果光照数值小于预设光照数值,且接近距离数值小于预设接近距离数值,则可以确定当前设备被遮挡,也即确定当前设备不处于手持状态(第一预设设备状态);否则可以确定当前设备没有被遮挡,那么可以基于放置角度状态参数判断当前设备是否处于平放状态,其中,放置角度参数可以包括地磁传感器采集的地磁数值以及加速度传感器采集的加速度数值,如果基于地磁数值以及加速度数值计算出的角度小于预设平放角度,则可以确定当前设备处于平放状态,也即确定当前设备不处于手持状态(第一预设设备状态);否则可以确定当前设备不处于平放状态,可以基于抖动状态参数判断当前设备是否处于抖动状态,其中,抖动状态参数可以包括角速度传感器采集的角速度数值,根据角速度数值可以计算出时间滑动窗口内平均角速度值,若当前实时的角速度数值大于预设最大角速度值,或者当前实时的角速度数值大于预设最小角速度值、小于预设最大角速度值且平均角速度值大于预设平均角速度数值,则可以确定当前设备处于抖动状态,那么也就可以确定当前设备处于手持状态(第一预设设备状态),否则可以确定当前设备不处于抖动状态,那么也就可以确定当前设备不处于手持状态(第一预设设备状态)。
S504、若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中。
关于步骤S504,可以参阅上述步骤S304中的记载,此处不在赘述。
S506、若接收到候选设备发送的第二设备状态信息,则比较第一预设设备状态的优先级与第二设备状态信息对应的第二预设设备状态的优先级,根据优先级比较结果确定当前设备是否进行语音交互。
在本申请实施例中,在接收到候选设备发送的第二设备状态信息之后,可以对第一预设设备状态与第二设备状态信息对应的第二预设设备状态的进行比较,在比较过程中,一种可行的实施方式是,可以分别确定第一预设设备状态与以及各第二预设设备状态优先级,然后根据优先级比较结果确定当前设备是否进行语音交互。
具体地,可以根据用户的指示或者在设备出厂时预设设置处于同一多设备场景中各电子设备对应的预设设备状态的优先级顺序,然后预先设置的设备状态优先级顺序确定第一预设设备状态对应的第一状态优先级,以及确定第二设备状态信息对应的第二预设设备状态对应的第二状态优先级,若第一状态优先级大于第二状态优先级,代表当前设备拥有更高优先级的语音交互控制权,则确定当前设备进行语音交互;若第一状态优先级小于第二状态优先级,代表其他候选设备拥有更高优先级的语音交互控制权,则确定当前设备不进行语音交互。
S508、若未接收到候选设备发送的第二设备状态信息,则确定当前设备进行语音交互。
若当前设备在第一预设时间内未接收到候选设备发送的第二设备状态信息,代表多设备场景中不存在也处于预设设备状态的候选设备,此时可以直接确定当前设备的语音交互优先级最高,以及控制当前设备进行语音交互。
S510、若当前设备未处于第一预设设备状态且接收到候选设备发送的第二设备状态信息,则确定当前设备不进行语音交互。
如果当前设备未处于第一预设设备状态,但是却接收到候选设备发送的第二设备状态信息,代表多设备场景中存在也处于预设设备状态的候选设备,那么候选设备的语音交互优先 级高,此时可以控制当前设备不进行语音交互。
若当前设备未处于第一预设设备状态且未接收到候选设备发送的第二设备状态信息,代表多设备场景中的所有电子设备的交互优先级都较低,则各电子设备可以不进行语音交互,继续监听用户的语音是否满足唤醒条件,也可以通过其他判定条件从多设备场景中的所有电子设备继续筛选进行语音交互的电子设备。
在本申请实施例中,通过比较多设备场景中各电子设备的设备类型,确定各电子设备是否满足预设设备状态,进而确定多设备场景中各电子设备的预设设备状态的优先级,以确定当前设备是否进行语音交互,可以提高判断语音交互设备的准确性。
请参阅图6,图6为本申请另一实施例提供的一种语音控制方法的流程示意图。如图6所示,语音控制方法至少可以包括:
S602、监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态。
S604、若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中。
S606、若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。
关于步骤S602至S606,可以参阅上述实施例中的描述,此处不在赘述。
S608、若当前设备未处于第一预设设备状态且未接收到候选设备发送的第二设备状态信息,则根据用户语音获取当前设备对应的第一通用语音特征值,以及将第一通用语音特征值发送至候选设备。
如果当前设备未处于第一预设设备状态且未接收到候选设备发送的第二设备状态信息,代表多设备场景中的所有电子设备的交互优先级都较低,如果此时仍然需要选出一个电子设备进行语音交互,可以通过各电子设备对应的通用语音特征值选择出进行语音交互的电子设备。
其中,通用语音特征值用于表示发声源与设备之间的唤醒优先级,而表示发声源与设备之间的唤醒优先级的因素很多,那么可以使用通用语音特征参数来表征发声源与设备之间的唤醒优先级,也即通用语音特征值可以使用多种通用语音特征参数进行表示。容易理解地,由于用户与设备进行语音交互时,往往会靠近该设备并面向该设备发出用户语音,因此通用语音特征参数可以包括但不限于:发声源与设备之间的距离参数以及设备相对于发声源的方位参数。
具体地,发声源与设备之间的距离参数可以通过用户语音中唤醒词音频能量来计算,能量越大表示距离越近,发声源与设备之间的距离参数也就越小,唤醒优先级也越高。具体地,唤醒词音频能量需要尽可能降低环境噪声的影响,可以使用语音活动检测(Voice Activity Detection,VAD)的方法在包含有唤醒词的用户语音基础上切分出唤醒词和环境噪声,进一步地可以得到唤醒词的能量和时长以及环境噪声的能量和时长,那么去除噪声影响的唤醒词能量可以如下计算:
其中,唤醒词的能量和时长分别计为es、ts,以及环境噪声的能量和时长分别计为en、tn,那么可以看做是唤醒词的功率,可以看做是环境噪声的功率,那么之差可以认为是去除噪声影响的唤醒词的功率,进而可以通过去除噪声影响的唤醒词的功率去表示去除噪声影响的唤醒词能量。
进一步地,当前设备相对于发声源的方位参数的计算方法可以通过预先录制的音频数据训练出声音朝向的决策模型,然后将用户语音输入至该决策模型以得到声音朝向结果也即当 前设备相对于发声源的方位参数,其中预先录制的音频数据可以包括:1、频谱特征数据,选择频谱特征数据的原因是发声源对于发声源的方位参数增大,声音会更多的经过反射到达当前设备,那么当前设备接收到的用户语音中高频部分相较于低频部分会衰减的更多。2、混响特征数据,选择混响特征数据的原因是如果发声源对于发声源的方位参数增大,混响能量越大,那么当前设备可以计算用户语音的语音直混比以及自相关特征,如果混响越大,自相关结果的峰值也会越多越大。3、多麦特征数据,选择多麦特征数据的原因是如果当前设备有多个麦克风参与语音的控制,还可以计算多个麦克风的声音方向特征,辅助决策当前设备相对于发声源的方位参数。
进一步地,由于第一通用语音特征参数的数量可以是多个,那么不同的第一通用语音特征参数对发声源与设备之间的唤醒优先级的影响是不同的,因此预设对各第一通用语音特征参数对应的第一通用语音特征权值进行设置,其中,第一通用语音特征参数对发声源与设备之间的唤醒优先级的影响越大,则其对应的第一通用语音特征权值也就越大。例如,如果第一通用语音特征参数包括:发声源与当前设备之间的距离参数以及当前设备相对于发声源的方位参数,那么可以设置发声源与当前设备之间的距离参数对应的第一通用语音特征权值为0.6,以及设置当前设备相对于发声源的方位参数对应的第一语音特征权值为0.4。
那么在获取到当前设备对应第一通用语音特征参数之后,还可以获取各第一通用语音特征参数对应的第一通用语音特征权值,进而基于各第一通用语音特征参数以及各第一通用语音特征权值,计算当前设备对应的第一通用语音特征值,也即各第一通用语音特征参数与其对应的第一通用语音特征权值相乘,并将各乘积结果相加作为当前设备对应的第一通用语音特征值。
进一步地,在根据用户语音获取当前设备对应的第一通用语音特征值之后,需要将第一通用语音特征值同步给多设备场景中的其他电子设备,同样的,多设备场景中的其他电子设备也会使用与当前设备相同的方法获取用户语音对应的第二通用语音特征值,并将第二通用语音特征值同步给当前设备。
S610、若接收到候选设备发送的第二通用语音特征值,则根据第一通用语音特征值以及第二通用语音特征值,确定当前设备是否进行语音交互。
在本申请实施例中,当前设备将第一通用语音特征值发送至多设备场中的所有候选设备之后,可以等待第二预设时间,若在第二预设时间内至少一个候选设备接收到候选设备发送的第二通用语音特征值,则可以根据第一通用语音特征值以及第二通用语音特征值,也即比较第一通用语音特征值以及第二通用语音特征值,根据比较结果确定当前设备是否进行语音交互。
具体地,可以比较第一通用语音特征值以及第二通用语音特征值,若第一通用语音特征值大于第二通用语音特征值,代表当前设备具有与用户更高的交互优先权,则确定当前设备进行语音交互;若第一通用语音特征值小于第二通用语音特征值,代表当前设备不具有与用户更高的交互优先权,也即其他候选设备具有与用户更高的交互优先权,则确定当前设备不进行语音交互;若第一通用语音特征值等于第二通用语音特征值,代表当前设备和其他候选设备都不具有更高的交互优先权,此时可以判断当前设备是否为预先设置的优先交互设备,其中,优先交互设备为用户自定义设置或者电子设备出厂时设置的,以便于在多个设备的通用语音特征值相同时,选择其中一个电子设备进行交互,避免出现用户语音交互失败的情况,那么如果确定当前设备为预先设置的优先交互设备,则确定当前设备进行语音交互。
S612、若未接收到候选设备发送的第二通用语音特征值,则确定当前设备进行语音交互。
在本申请实施例中,当前设备将第一通用语音特征值发送至多设备场中的所有候选设备之后,可以等待第二预设时间,若在第二预设时间内未接收到候选设备发送的第二通用语音特征值,代表多场景设备中除了当前设备没有其他候选设备获取到通用语音特征值,此时可以直接确定当前设备进行语音交互。
在本申请实施例中,在确定多设备场景中的各电子设备均不处于预设状态时,可以分别 获取各电子设备针对用户语音获取的通用语音特征,进而根据通用语音特征选择进行语音交互的电子设备,有效提升了确定进行语音交互的设备的准确性。
请参阅图7,图7为本申请另一实施例提供的一种语音控制装置的结构框图。如图7所示,语音控制装置700包括:
语音唤醒模块710,用于监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态;
设备状态发送模块720,用于若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中;
第一语音交互确定模块730,用于若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。
可选地,第一语音交互确定模块730,还用于比较第一预设设备状态的优先级与第二设备状态信息对应的第二预设设备状态的优先级,根据优先级比较结果确定当前设备是否进行语音交互。
可选地,第一语音交互确定模块730,还用于根据预先设置的设备状态优先级顺序确定第一预设设备状态对应的第一状态优先级,以及确定第二设备状态信息对应的第二预设设备状态对应的第二状态优先级;若第一状态优先级大于第二状态优先级,则确定当前设备进行语音交互;若第一状态优先级小于第二状态优先级,则确定当前设备不进行语音交互。
可选地,语音唤醒模块710,还用于获取当前设备的设备类型,根据设备类型获取当前设备对应的指定状态参数;根据指定状态参数判断当前设备是否处于第一预设设备状态。
可选地,语音唤醒模块710,还用于若设备类型为手持设备,则获取当前设备对应的遮挡状态参数、放置角度状态参数以及抖动状态参数;根据遮挡参数、放置角度参数以及抖动参数判断当前设备是否处于手持状态。
可选地,语音控制装置700还包括:第二语音交互确定模块,用于若未接收到候选设备发送的第二设备状态信息,则确定当前设备进行语音交互。
可选地,语音控制装置700还包括:第三语音交互确定模块,用于若当前设备未处于第一预设设备状态且接收到候选设备发送的第二设备状态信息,则确定当前设备不进行语音交互。
可选地,语音控制装置700还包括:第四语音交互确定模块,用于若当前设备未处于第一预设设备状态且未接收到候选设备发送的第二设备状态信息,则根据用户语音获取当前设备对应的第一通用语音特征值,以及将第一通用语音特征值发送至候选设备;若接收到候选设备发送的第二通用语音特征值,则根据第一通用语音特征值以及第二通用语音特征值,确定当前设备是否进行语音交互。
可选地,第四语音交互确定模块,还用于根据用户语音获取当前设备对应第一通用语音特征参数以及各第一通用语音特征参数对应的第一通用语音特征权值;基于各第一通用语音特征参数以及各第一通用语音特征权值,计算当前设备对应的第一通用语音特征值。
可选地,第一通用语音特征参数包括但不限于:发声源与当前设备之间的距离参数以及当前设备相对于发声源的方位参数。
可选地,第四语音交互确定模块,还用于若第一通用语音特征值大于第二通用语音特征值,则确定当前设备进行语音交互;若第一通用语音特征值小于第二通用语音特征值,则确定当前设备不进行语音交互;若第一通用语音特征值等于第二通用语音特征值,且确定当前设备为预先设置的优先交互设备,则确定当前设备进行语音交互。
可选地,语音控制装置700还包括:第五语音交互确定模块,用于若未接收到候选设备发送的第二通用语音特征值,则确定当前设备进行语音交互。
在本申请实施例中,语音控制装置包括:语音唤醒模块,用于监测到用户语音满足语音 唤醒条件时,判断当前设备是否处于第一预设设备状态;设备状态发送模块,用于若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中;语音交互确定模块,用于若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。监测到用户语音满足语音唤醒条件时,可以获取当前设备的第一预设设备状态与候选设备的第二预设设备状态,由于设备状态可以代表用户对设备的使用情况,因此根据各设备的设备状态可以确定用户具体想要使用哪个设备进行语音交互,有效提升了语音控制的准确性。
请参阅图8,图8为本申请另一实施例提供的一种语音控制方法的流程示意图。
如图8所示,语音控制方法包括:
S802、监测到用户语音满足语音唤醒条件时,判断当前主设备是否处于第一预设设备状态。
在本申请实施例中,多设备场景中存在至少两个电子设备,处于同一多设备场景中的各电子设备属于同一设备组中,同一设备组中的电子设备具有从属关系或者主次关系,也即多设备场景中至少存在一个主设备以及至少一个从属设备,例如,多设备环境中包括音箱,电视机、手机以及穿戴手表,那么可以将数据处理性能较好的手机作为主设备,而将数据处理性能较差的音箱,电视机以及穿戴手表作为从属设备。为了方便描述,先以语音控制方法应用于主设备进行描述。
当主设备监测到用户语音满足语音唤醒条件时,可以先判断当前主设备自身是否处于第一预设设备状态。
S804、若当前主设备处于第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第一预设设备状态对应的第一设备状态信息以及第二设备状态信息,从当前主设备以及从属设备中确定进行语音交互的目标交互设备。
若当前设备处于第一预设设备状态,则可以确定当前设备可能是为用户正在使用或者操作的设备,但是由于多设备场景中存在多个电子设备,因此多设备场景中还可能存在其他也处于预设设备状态的电子设备,那么为了便于从多个处于预设设备状态的电子设备中确定出用户想要进行语音交互的电子设备,处于同一多设备场景中的从属设备在确定自身处于预设设备状态之后,都可以将自身处于的预设设备状态对应的状态信息同步至主设备。
因此在主设备接收到从属设备发送的第二设备状态信息之后,可以根据第一预设设备状态对应的第一设备状态信息以及第二设备状态信息,从当前主设备以及从属设备中确定进行语音交互的目标交互设备。具体根据设备状态信息确定目标交互设备的方法可以参阅上述实施例中的描述,此处不在赘述。
S806、基于交互指令控制目标交互设备进行语音交互。
在确定目标交互设备之后,主设备可以生成交互指令,若目标交互设备为当前主设备,则直接基于交互指令控制当前主设备进行语音交互;若目标交互设备为从属设备,则将交互指令发送至目标交互设备,交互指令用于指示目标交互设备进行语音交互,也即目标交互设备接收到交互指令之后控制目标交互设备本身进行语音交互。
进一步地,若当前主设备处于第一预设设备状态,且未接收到从属设备发送的第二设备状态信息,则控制当前主设备进行语音交互。
可选地,若当前主设备未处于第一预设设备状态,那么代表当前主设备不具备设备状态方面的语音交互优先级,此时若接收到从属设备发送的第二设备状态信息,则可以根据第二设备状态信息从从属设备中确定进行语音交互的目标交互设备,并基于交互指令控制目标交互设备进行语音交互。
可选地,若当前主设备未处于第一预设设备状态且未接收到从属设备发送的第二设备状态信息,代表多设备场景中的所有电子设备的交互优先级都较低,如果此时仍然需要选出一 个电子设备进行语音交互,可以通过各电子设备对应的通用语音特征值选择出进行语音交互的电子设备,此时可以根据用户语音获取当前主设备对应的第一通用语音特征值;若接收到从属设备发送的第二通用语音特征值,则根据第一通用语音特征值以及第二通用语音特征值,从当前主设备以及从属设备中确定进行语音交互的目标交互设备,以及基于交互指令控制目标交互设备进行语音交互。
可选地,若当前主设备未监测到用户语音满足语音唤醒条件,且接收到从属设备发送的第二设备状态信息时,则根据第二设备状态信息从从属设备中确定进行语音交互的目标交互设备,以及基于交互指令控制目标交互设备进行语音交互。
可选地,若当前主设备未监测到用户语音满足语音唤醒条件且接收到从属设备发送的第二通用语音特征值,则根据第二通用语音特征值从从属设备中确定进行语音交互的目标交互设备,以及基于交互指令控制目标交互设备进行语音交互。
在本申请实施例中,将根据状态信息确定目标交互设备或者根据语音特征值确定目标交互设备的工作均设置在主设备中,一方面,由于主设备的性能较好,因此可以提高确定目标交互设备的速度,另一方面,由于从属设备的性能较差,那么可以减少从属设备的数据处理量,降低从属设备的功耗。
请参阅图9,图9为本申请另一实施例提供的一种语音控制装置的结构框图。如图9所示,语音控制装置900包括:
主设备语音唤醒模块910,用于监测到用户语音满足语音唤醒条件时,判断当前主设备是否处于第一预设设备状态;
主设备语音交互确定模块920,用于若当前主设备处于第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第一预设设备状态对应的第一设备状态信息以及第二设备状态信息,从当前主设备以及从属设备中确定进行语音交互的目标交互设备;
指令控制模块930,用于基于交互指令控制目标交互设备进行语音交互。
可选地,主设备语音交互确定模块920,还用于若目标交互设备为当前主设备,则基于交互指令控制当前主设备进行语音交互;若目标交互设备为从属设备,则将交互指令发送至目标交互设备,交互指令用于指示目标交互设备进行语音交互。
可选地,主设备语音交互确定模块920,还用于若当前主设备处于第一预设设备状态,且未接收到从属设备发送的第二设备状态信息,则控制当前主设备进行语音交互。
可选地,主设备语音交互确定模块920,还用于若当前主设备未处于第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第二设备状态信息从从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
可选地,主设备语音交互确定模块920,还用于若当前主设备未处于第一预设设备状态且未接收到从属设备发送的第二设备状态信息,则根据用户语音获取当前主设备对应的第一通用语音特征值;若接收到从属设备发送的第二通用语音特征值,则根据第一通用语音特征值以及第二通用语音特征值,从当前主设备以及从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
可选地,主设备语音交互确定模块920,还用于若未监测到用户语音满足语音唤醒条件,且接收到从属设备发送的第二设备状态信息时,则根据第二设备状态信息从从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
可选地,主设备语音交互确定模块920,还用于若未监测到用户语音满足语音唤醒条件且接收到从属设备发送的第二通用语音特征值,则根据第二通用语音特征值从从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
请参阅图10,图10为本申请另一实施例提供的一种语音控制方法的流程示意图。
如图10所示,语音控制方法包括:
S1002、监测到用户语音满足语音唤醒条件时,判断当前从属设备是否处于第二预设设备状态。
在本申请实施例中,多设备场景中存在至少两个电子设备,处于同一多设备场景中的各电子设备属于同一设备组中,同一设备组中的电子设备具有从属关系或者主次关系,也即多设备场景中至少存在一个主设备以及至少一个从属设备,例如,多设备环境中包括音箱,电视机、手机以及穿戴手表,那么可以将数据处理性能较好的手机作为主设备,而将数据处理性能较差的音箱,电视机以及穿戴手表作为从属设备。为了方便描述,先以语音控制方法应用于从属设备进行描述。
S1004、若当前从属设备处于第二预设设备状态,则将当前设备的第二设备状态信息发送至主设备。
S1006、若接收到主设备发送的交互指令,则控制当前从属设备进行语音交互,其中,交互指令为主设备根据主设备的第一设备状态信息、当前从属设备的第二设备状态信息以及其他从属设备的第二设备状态信息生成。
可选地,若当前从属设备不处于第二预设设备状态,且未接收到主设备发送的交互指令,则根据用户语音获取当前从属设备对应的第二通用语音特征值,以及将第二通用语音特征值发送至主设备;若接收到主设备发送的交互指令,则控制当前从属设备进行语音交互,其中交互指令为主设备根据主设备对应的第二语音特征值、当前从属设备的第二语音特征值以及其他从属设备的第二语音特征值生成。
在本申请实施例中,将根据状态信息确定目标交互设备或者根据语音特征值确定目标交互设备的工作均设置在主设备中,一方面,由于主设备的性能较好,因此可以提高确定目标交互设备的速度,另一方面,由于从属设备的性能较差,那么可以减少从属设备的数据处理量,降低从属设备的功耗。
请参阅图11,图11为本申请另一实施例提供的一种语音控制装置的结构框图。如图11所示,语音控制装置1100包括:
从属设备语音唤醒模块1110,用于监测到用户语音满足语音唤醒条件时,判断当前从属设备是否处于第二预设设备状态;
从属设备状态发送模块1120,用于若当前从属设备处于第二预设设备状态,则将当前设备的第二设备状态信息发送至主设备;
从属设备语音交互模块1130,用于若接收到主设备发送的交互指令,则控制当前从属设备进行语音交互,其中,交互指令为主设备根据主设备的第一设备状态信息、当前从属设备的第二设备状态信息以及其他从属设备的第二设备状态信息生成。
可选地,从属设备语音交互模块1130,还用于若当前从属设备不处于第二预设设备状态,且未接收到主设备发送的交互指令,则根据用户语音获取当前从属设备对应的第二通用语音特征值,以及将第二通用语音特征值发送至主设备。
可选地,从属设备语音交互模块1130,还用于若接收到主设备发送的交互指令,则控制当前从属设备进行语音交互,其中交互指令为主设备根据主设备对应的第二语音特征值、当前从属设备的第二语音特征值以及其他从属设备的第二语音特征值生成。
本申请实施例还提供了一种计算机存储介质,计算机存储介质可以存储有多条指令,指令适于由处理器加载并执行如上述实施例中的任一项的方法的步骤。
请参见图12,图12为本申请实施例提供的一种电子设备的结构示意图。如图12所示,电子设备1200可以包括:至少一个电子设备处理器1201,至少一个网络接口1204,用户接口1203,存储器1205,至少一个通信总线1202。
其中,通信总线1202用于实现这些组件之间的连接通信。
其中,用户接口1203可以包括显示屏(Display)、摄像头(Camera),可选用户接口1203还可以包括标准的有线接口、无线接口。
其中,网络接口1204可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。
其中,电子设备处理器1201可以包括一个或者多个处理核心。电子设备处理器1201利用各种接口和线路连接整个电子设备1200内的各个部分,通过运行或执行存储在存储器1205内的指令、程序、代码集或指令集,以及调用存储在存储器1205内的数据,执行电子设备1200的各种功能和处理数据。可选的,电子设备处理器1201可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。电子设备处理器1201可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到电子设备处理器1201中,单独通过一块芯片进行实现。
其中,存储器1205可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory,ROM)。可选的,该存储器1205包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器1205可用于存储指令、程序、代码、代码集或指令集。存储器1205可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等;存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器1205可选的还可以是至少一个位于远离前述电子设备处理器1201的存储装置。如图12所示,作为一种计算机存储介质的存储器1205中可以包括操作系统、网络通信模块、用户接口模块以及语音控制程序。
在图12所示的电子设备1200中,用户接口1203主要用于为用户提供输入的接口,获取用户输入的数据;而电子设备处理器1201可以用于调用存储器1205中存储的语音控制程序,并具体执行以下操作:
监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态;
若当前设备处于第一预设设备状态,则将第一预设设备状态对应的第一设备状态信息发送至候选设备,候选设备与当前设备处于同一多设备场景中;
若接收到候选设备发送的第二设备状态信息,则根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互。
在一个实施例中,根据第一预设设备状态以及第二设备状态信息对应的第二预设设备状态,确定当前设备是否进行语音交互,包括:比较第一预设设备状态的优先级与第二设备状态信息对应的第二预设设备状态的优先级,根据优先级比较结果确定当前设备是否进行语音交互。
在一个实施例中,比较第一预设设备状态的优先级与第二设备状态信息对应的第二预设设备状态的优先级,根据优先级比较结果确定当前设备是否进行语音交互,包括:根据预先设置的设备状态优先级顺序确定第一预设设备状态对应的第一状态优先级,以及确定第二设备状态信息对应的第二预设设备状态对应的第二状态优先级;若第一状态优先级大于第二状态优先级,则确定当前设备进行语音交互;若第一状态优先级小于第二状态优先级,则确定当前设备不进行语音交互。
在一个实施例中,判断当前设备是否处于第一预设设备状态,包括:获取当前设备的设备类型,根据设备类型获取当前设备对应的指定状态参数;根据指定状态参数判断当前设备是否处于第一预设设备状态。
在一个实施例中,根据设备类型获取当前设备对应的指定状态参数,包括:若设备类型为手持设备,则获取当前设备对应的遮挡状态参数、放置角度状态参数以及抖动状态参数; 根据设备状态参数判断当前设备是否处于第一预设设备状态,包括:根据遮挡参数、放置角度参数以及抖动参数判断当前设备是否处于手持状态。
在一个实施例中,方法还包括:若未接收到候选设备发送的第二设备状态信息,则确定当前设备进行语音交互。
在一个实施例中,方法还包括:若当前设备未处于第一预设设备状态且接收到候选设备发送的第二设备状态信息,则确定当前设备不进行语音交互。
在一个实施例中,方法还包括:若当前设备未处于第一预设设备状态且未接收到候选设备发送的第二设备状态信息,则根据用户语音获取当前设备对应的第一通用语音特征值,以及将第一通用语音特征值发送至候选设备;若接收到候选设备发送的第二通用语音特征值,则根据第一通用语音特征值以及第二通用语音特征值,确定当前设备是否进行语音交互。
在一个实施例中,根据用户语音获取当前设备对应的第一通用语音特征值,包括:根据用户语音获取当前设备对应第一通用语音特征参数以及各第一通用语音特征参数对应的第一通用语音特征权值;基于各第一通用语音特征参数以及各第一通用语音特征权值,计算当前设备对应的第一通用语音特征值。
在一个实施例中,第一通用语音特征参数包括但不限于:发声源与当前设备之间的距离参数以及当前设备相对于发声源的方位参数。
在一个实施例中,根据第一通用语音特征值以及第二通用语音特征值,确定当前设备是否进行语音交互,包括:若第一通用语音特征值大于第二通用语音特征值,则确定当前设备进行语音交互;若第一通用语音特征值小于第二通用语音特征值,则确定当前设备不进行语音交互;若第一通用语音特征值等于第二通用语音特征值,且确定当前设备为预先设置的优先交互设备,则确定当前设备进行语音交互。
在一个实施例中,方法还包括:若未接收到候选设备发送的第二通用语音特征值,则确定当前设备进行语音交互。
在图12所示的电子设备1200中,用户接口1203主要用于为用户提供输入的接口,获取用户输入的数据;而电子设备处理器1201可以用于调用存储器1205中存储的语音控制程序,并具体执行以下操作:
监测到用户语音满足语音唤醒条件时,判断当前主设备是否处于第一预设设备状态;若当前主设备处于第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第一预设设备状态对应的第一设备状态信息以及第二设备状态信息,从当前主设备以及从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
在一个实施例中,若目标交互设备为当前主设备,则基于交互指令控制当前主设备进行语音交互;若目标交互设备为从属设备,则将交互指令发送至目标交互设备,交互指令用于指示目标交互设备进行语音交互。
在一个实施例中,语音控制方法还包括:若当前主设备处于第一预设设备状态,且未接收到从属设备发送的第二设备状态信息,则控制当前主设备进行语音交互。
在一个实施例中,语音控制方法还包括:若当前主设备未处于第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第二设备状态信息从从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
在一个实施例中,语音控制方法还包括:若当前主设备未处于第一预设设备状态且未接收到从属设备发送的第二设备状态信息,则根据用户语音获取当前主设备对应的第一通用语音特征值;若接收到从属设备发送的第二通用语音特征值,则根据第一通用语音特征值以及第二通用语音特征值,从当前主设备以及从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
在一个实施例中,语音控制方法还包括:若未监测到用户语音满足语音唤醒条件,且接收到从属设备发送的第二设备状态信息时,则根据第二设备状态信息从从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
在一个实施例中,语音控制方法还包括:若未监测到用户语音满足语音唤醒条件且接收到从属设备发送的第二通用语音特征值,则根据第二通用语音特征值从从属设备中确定进行语音交互的目标交互设备;基于交互指令控制目标交互设备进行语音交互。
在图12所示的电子设备1200中,用户接口1203主要用于为用户提供输入的接口,获取用户输入的数据;而电子设备处理器1201可以用于调用存储器1205中存储的语音控制程序,并具体执行以下操作:
监测到用户语音满足语音唤醒条件时,判断当前从属设备是否处于第二预设设备状态;若当前从属设备处于第二预设设备状态,则将当前设备的第二设备状态信息发送至主设备;若接收到主设备发送的交互指令,则控制当前从属设备进行语音交互,其中,交互指令为主设备根据主设备的第一设备状态信息、当前从属设备的第二设备状态信息以及其他从属设备的第二设备状态信息生成。
在一个实施例中,语音控制方法还包括:若当前从属设备不处于第二预设设备状态,且未接收到主设备发送的交互指令,则根据用户语音获取当前从属设备对应的第二通用语音特征值,以及将第二通用语音特征值发送至主设备;若接收到主设备发送的交互指令,则控制当前从属设备进行语音交互,其中交互指令为主设备根据主设备对应的第二语音特征值、当前从属设备的第二语音特征值以及其他从属设备的第二语音特征值生成。
在本申请实施例所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请实施例各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请实施例各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本申请实施例所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。以上为对本申请实施例所提供的语音控制方法、装置、存储介质以及电子设备的描述,对于本领域的技术人员,依据本申请实施例实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申请实施例的限制。

Claims (26)

  1. 一种语音控制方法,所述方法包括:
    监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态;
    若所述当前设备处于所述第一预设设备状态,则将所述第一预设设备状态对应的第一设备状态信息发送至候选设备,所述候选设备与所述当前设备处于同一多设备场景中;
    若接收到所述候选设备发送的第二设备状态信息,则根据所述第一预设设备状态以及所述第二设备状态信息对应的第二预设设备状态,确定所述当前设备是否进行语音交互。
  2. 根据权利要求1所述的方法,所述根据所述第一预设设备状态以及所述第二设备状态信息对应的第二预设设备状态,确定所述当前设备是否进行语音交互,包括:
    比较所述第一预设设备状态的优先级与所述第二设备状态信息对应的第二预设设备状态的优先级,根据优先级比较结果确定所述当前设备是否进行语音交互。
  3. 根据权利要求2所述的方法,所述比较所述第一预设设备状态的优先级与所述第二设备状态信息对应的第二预设设备状态的优先级,根据优先级比较结果确定所述当前设备是否进行语音交互,包括:
    根据预先设置的设备状态优先级顺序确定所述第一预设设备状态对应的第一状态优先级,以及确定所述第二设备状态信息对应的第二预设设备状态对应的第二状态优先级;
    若所述第一状态优先级大于所述第二状态优先级,则确定所述当前设备进行语音交互;
    若所述第一状态优先级小于所述第二状态优先级,则确定所述当前设备不进行语音交互。
  4. 根据权利要求1所述的方法,所述判断所述当前设备是否处于第一预设设备状态,包括:
    获取所述当前设备的设备类型,根据所述设备类型获取所述当前设备对应的指定状态参数;
    根据所述指定状态参数判断所述当前设备是否处于第一预设设备状态。
  5. 根据权利要求4所述的方法,所述根据所述设备类型获取所述当前设备对应的指定状态参数,包括:
    若所述设备类型为手持设备,则获取所述当前设备对应的遮挡状态参数、放置角度状态参数以及抖动状态参数中的至少一种;
    根据所述设备状态参数判断所述当前设备是否处于第一预设设备状态,包括:
    根据所述遮挡参数、所述放置角度参数以及所述抖动参数中的至少一种判断所述当前设备是否处于手持状态。
  6. 根据权利要求1所述的方法,所述方法还包括:
    若未接收到所述候选设备发送的第二设备状态信息,则确定所述当前设备进行语音交互。
  7. 根据权利要求1所述的方法,所述方法还包括:
    若所述当前设备未处于所述第一预设设备状态且接收到所述候选设备发送的第二设备状态信息,则确定所述当前设备不进行语音交互。
  8. 根据权利要求1所述的方法,所述方法还包括:
    若所述当前设备未处于所述第一预设设备状态且未接收到所述候选设备发送的第二设备状态信息,则根据所述用户语音获取所述当前设备对应的第一通用语音特征值,以及将所述第一通用语音特征值发送至所述候选设备;
    若接收到所述候选设备发送的第二通用语音特征值,则根据所述第一通用语音特征值以及所述第二通用语音特征值,确定所述当前设备是否进行语音交互。
  9. 根据权利要求8所述的方法,所述根据所述用户语音获取所述当前设备对应的第一通用语音特征值,包括:
    根据所述用户语音获取所述当前设备对应第一通用语音特征参数以及各第一通用语音特征参数对应的第一通用语音特征权值;
    基于各第一通用语音特征参数以及各第一通用语音特征权值,计算所述当前设备对应的第一通用语音特征值。
  10. 根据权利要求9所述的方法,所述第一通用语音特征参数包括但不限于:发声源与所述当前设备之间的距离参数以及所述当前设备相对于所述发声源的方位参数。
  11. 根据权利要求8所述的方法,所述根据所述第一通用语音特征值以及所述第二通用语音特征值,确定所述当前设备是否进行语音交互,包括:
    若所述第一通用语音特征值大于所述第二通用语音特征值,则确定所述当前设备进行语音交互;
    若所述第一通用语音特征值小于所述第二通用语音特征值,则确定所述当前设备不进行语音交互;
    若所述第一通用语音特征值等于所述第二通用语音特征值,且确定所述当前设备为预先设置的优先交互设备,则确定所述当前设备进行语音交互。
  12. 根据权利要求8所述的方法,所述方法还包括:
    若未接收到所述候选设备发送的第二通用语音特征值,则确定所述当前设备进行语音交互。
  13. 一种语音控制方法,所述方法包括:
    监测到用户语音满足语音唤醒条件时,判断当前主设备是否处于第一预设设备状态;
    若所述当前主设备处于所述第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第一预设设备状态对应的第一设备状态信息以及所述第二设备状态信息,从所述当前主设备以及所述从属设备中确定进行语音交互的目标交互设备;
    基于交互指令控制所述目标交互设备进行语音交互。
  14. 根据权利要求13所述的方法,所述基于交互指令控制所述目标交互设备进行语音交互,包括:
    若所述目标交互设备为所述当前主设备,则基于交互指令控制所述当前主设备进行语音交互;
    若所述目标交互设备为所述从属设备,则将交互指令发送至所述目标交互设备,所述交互指令用于指示所述目标交互设备进行语音交互。
  15. 根据权利要求13所述的方法,所述方法还包括:
    若所述当前主设备处于所述第一预设设备状态,且未接收到从属设备发送的第二设备状态信息,则控制所述当前主设备进行语音交互。
  16. 根据权利要求13所述的方法,所述方法还包括:
    若所述当前主设备未处于所述第一预设设备状态,且接收到所述从属设备发送的第二设备状态信息,则根据所述第二设备状态信息从所述从属设备中确定进行语音交互的目标交互设备;
    基于交互指令控制所述目标交互设备进行语音交互。
  17. 根据权利要求13所述的方法,所述方法还包括:
    若所述当前主设备未处于所述第一预设设备状态且未接收到所述从属设备发送的第二设备状态信息,则根据所述用户语音获取所述当前主设备对应的第一通用语音特征值;
    若接收到所述从属设备发送的第二通用语音特征值,则根据所述第一通用语音特征值以及所述第二通用语音特征值,从所述当前主设备以及所述从属设备中确定进行语音交互的目标交互设备;
    基于交互指令控制所述目标交互设备进行语音交互。
  18. 根据权利要求13所述的方法,所述方法还包括:
    若未监测到用户语音满足语音唤醒条件,且接收到所述从属设备发送的第二设备状态信息时,则根据所述第二设备状态信息从所述从属设备中确定进行语音交互的目标交互设备;
    基于所述交互指令控制所述目标交互设备进行语音交互。
  19. 根据权利要求18所述的方法,所述方法还包括:
    若未监测到用户语音满足语音唤醒条件且接收到所述从属设备发送的第二通用语音特征值,则根据所述第二通用语音特征值从所述从属设备中确定进行语音交互的目标交互设备;
    基于交互指令控制所述目标交互设备进行语音交互。
  20. 一种语音控制方法,所述方法包括:
    监测到用户语音满足语音唤醒条件时,判断当前从属设备是否处于第二预设设备状态;
    若所述当前从属设备处于所述第二预设设备状态,则将所述当前设备的第二设备状态信息发送至主设备;
    若接收到所述主设备发送的交互指令,则控制所述当前从属设备进行语音交互,其中,所述交互指令为所述主设备根据所述主设备的第一设备状态信息、所述当前从属设备的第二设备状态信息以及其他从属设备的第二设备状态信息生成。
  21. 根据权利要求20所述的方法,所述方法还包括:
    若所述当前从属设备不处于所述第二预设设备状态,且未接收到所述主设备发送的交互指令,则根据所述用户语音获取所述当前从属设备对应的第二通用语音特征值,以及将所述第二通用语音特征值发送至所述主设备;
    若接收到所述主设备发送的交互指令,则控制所述当前从属设备进行语音交互,其中所述交互指令为所述主设备根据所述主设备对应的第二语音特征值、所述当前从属设备的第二语音特征值以及其他从属设备的第二语音特征值生成。
  22. 一种语音控制装置,所述装置包括:
    语音唤醒模块,用于监测到用户语音满足语音唤醒条件时,判断当前设备是否处于第一预设设备状态;
    设备状态发送模块,用于若所述当前设备处于所述第一预设设备状态,则将所述第一预设设备状态对应的第一设备状态信息发送至候选设备,所述候选设备与所述当前设备处于同一多设备场景中;
    语音交互确定模块,用于若接收到所述候选设备发送的第二设备状态信息,则根据所述第一预设设备状态以及所述第二设备状态信息对应的第二预设设备状态,确定所述当前设备是否进行语音交互。
  23. 一种语音控制装置,所述装置包括:
    主设备语音唤醒模块,用于监测到用户语音满足语音唤醒条件时,判断当前主设备是否处于第一预设设备状态;主设备语音交互确定模块,用于若所述当前主设备处于所述第一预设设备状态,且接收到从属设备发送的第二设备状态信息,则根据第一预设设备状态对应的第一设备状态信息以及所述第二设备状态信息,从所述当前主设备以及所述从属设备中确定进行语音交互的目标交互设备;指令控制模块,用于基于交互指令控制所述目标交互设备进行语音交互。
  24. 一种语音控制装置,所述装置包括:
    从属设备语音唤醒模块,用于监测到用户语音满足语音唤醒条件时,判断当前从属设备是否处于第二预设设备状态;从属设备状态发送模块,用于若所述当前从属设备处于所述第二预设设备状态,则将所述当前设备的第二设备状态信息发送至主设备;从属设备语音交互模块,用于若接收到所述主设备发送的交互指令,则控制所述当前从属设备进行语音交互,其中,所述交互指令为所述主设备根据所述主设备的第一设备状态信息、所述当前从属设备的第二设备状态信息以及其他从属设备的第二设备状态信息生成。
  25. 一种计算机存储介质,所述计算机存储介质存储有多条指令,所述指令适于由处理器加载并执行如权利要求1~12、13~19以及20~21任意一项的所述方法的步骤。
  26. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1~12、13~19以及20~21任一项所述方法的步骤。
PCT/CN2023/117319 2022-11-17 2023-09-06 语音控制方法、装置、存储介质以及电子设备 WO2024103926A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211443786.5A CN115810356A (zh) 2022-11-17 2022-11-17 语音控制方法、装置、存储介质以及电子设备
CN202211443786.5 2022-11-17

Publications (1)

Publication Number Publication Date
WO2024103926A1 true WO2024103926A1 (zh) 2024-05-23

Family

ID=85483428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/117319 WO2024103926A1 (zh) 2022-11-17 2023-09-06 语音控制方法、装置、存储介质以及电子设备

Country Status (2)

Country Link
CN (1) CN115810356A (zh)
WO (1) WO2024103926A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115810356A (zh) * 2022-11-17 2023-03-17 Oppo广东移动通信有限公司 语音控制方法、装置、存储介质以及电子设备
CN117133282A (zh) * 2023-03-27 2023-11-28 荣耀终端有限公司 一种语音交互方法及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109391528A (zh) * 2018-08-31 2019-02-26 百度在线网络技术(北京)有限公司 语音智能设备的唤醒方法、装置、设备及存储介质
US10643609B1 (en) * 2017-03-29 2020-05-05 Amazon Technologies, Inc. Selecting speech inputs
CN111276139A (zh) * 2020-01-07 2020-06-12 百度在线网络技术(北京)有限公司 语音唤醒方法及装置
CN113241068A (zh) * 2021-03-26 2021-08-10 青岛海尔科技有限公司 语音信号的响应方法和装置、存储介质及电子装置
CN114627871A (zh) * 2022-03-22 2022-06-14 北京小米移动软件有限公司 唤醒设备的方法、装置、设备及存储介质
CN115810356A (zh) * 2022-11-17 2023-03-17 Oppo广东移动通信有限公司 语音控制方法、装置、存储介质以及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10643609B1 (en) * 2017-03-29 2020-05-05 Amazon Technologies, Inc. Selecting speech inputs
CN109391528A (zh) * 2018-08-31 2019-02-26 百度在线网络技术(北京)有限公司 语音智能设备的唤醒方法、装置、设备及存储介质
CN111276139A (zh) * 2020-01-07 2020-06-12 百度在线网络技术(北京)有限公司 语音唤醒方法及装置
CN113241068A (zh) * 2021-03-26 2021-08-10 青岛海尔科技有限公司 语音信号的响应方法和装置、存储介质及电子装置
CN114627871A (zh) * 2022-03-22 2022-06-14 北京小米移动软件有限公司 唤醒设备的方法、装置、设备及存储介质
CN115810356A (zh) * 2022-11-17 2023-03-17 Oppo广东移动通信有限公司 语音控制方法、装置、存储介质以及电子设备

Also Published As

Publication number Publication date
CN115810356A (zh) 2023-03-17

Similar Documents

Publication Publication Date Title
US11443744B2 (en) Electronic device and voice recognition control method of electronic device
JP6811758B2 (ja) 音声対話方法、装置、デバイス及び記憶媒体
KR101726945B1 (ko) 수동 시작/종료 포인팅 및 트리거 구문들에 대한 필요성의 저감
WO2024103926A1 (zh) 语音控制方法、装置、存储介质以及电子设备
CN111192591B (zh) 智能设备的唤醒方法、装置、智能音箱及存储介质
JP7348288B2 (ja) 音声対話の方法、装置、及びシステム
US9953643B2 (en) Selective transmission of voice data
WO2020062670A1 (zh) 电器设备的控制方法、装置、电器设备和介质
CN108024128B (zh) 蓝牙音乐播放的控制方法、装置、终端设备及存储介质
EP3779968A1 (en) Audio processing
US11178280B2 (en) Input during conversational session
JP6619488B2 (ja) 人工知能機器における連続会話機能
CN106940997B (zh) 一种向语音识别系统发送语音信号的方法和装置
CN109032554B (zh) 一种音频处理方法和电子设备
JP2024507916A (ja) オーディオ信号の処理方法、装置、電子機器、及びコンピュータプログラム
CA3199744A1 (en) Methods and systems for automatic queuing in conference calls
CN111447223A (zh) 一种通话处理方法及电子设备
CN112394901A (zh) 音频输出模式调整方法、装置及电子设备
CN106126179B (zh) 一种信息处理方法和电子设备
CN107680592A (zh) 一种移动终端语音识别方法、及移动终端及存储介质
KR20230118164A (ko) 디바이스 또는 어시스턴트-특정 핫워드들의 단일 발언으로의결합
CN103677582A (zh) 一种控制电子设备的方法及一种电子设备
WO2016196570A1 (en) Systems and methods for dynamic operation of electronic devices based on detection of one or more events
WO2023155607A1 (zh) 终端设备和语音唤醒方法
CN111240634A (zh) 音箱工作模式调整方法和装置