WO2024149352A1 - 语音交互方法、装置及相关设备 - Google Patents
语音交互方法、装置及相关设备 Download PDFInfo
- Publication number
- WO2024149352A1 WO2024149352A1 PCT/CN2024/071940 CN2024071940W WO2024149352A1 WO 2024149352 A1 WO2024149352 A1 WO 2024149352A1 CN 2024071940 W CN2024071940 W CN 2024071940W WO 2024149352 A1 WO2024149352 A1 WO 2024149352A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- voice
- augmented reality
- instruction
- type
- Prior art date
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000003190 augmentative effect Effects 0.000 claims abstract description 205
- 230000004044 response Effects 0.000 claims abstract description 141
- 230000009467 reduction Effects 0.000 claims description 36
- 238000012545 processing Methods 0.000 claims description 18
- 238000001514 detection method Methods 0.000 description 25
- 230000006870 function Effects 0.000 description 18
- 238000005516 engineering process Methods 0.000 description 14
- 239000011521 glass Substances 0.000 description 13
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000020169 heat generation Effects 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Definitions
- the present application relates to the field of voice interaction, and in particular to a voice interaction method, apparatus and related equipment.
- augmented reality devices are used to display or collect sensor data, and in most cases they need to be connected to a terminal device compatible with the augmented reality device before they can be used.
- the present application provides a voice interaction method, apparatus and related equipment to at least solve the above technical problems existing in the prior art.
- a voice interaction method comprising:
- the target device When the target device is of the first type, determining that the target response mode to the voice instruction is a first response mode
- the target device When the target device is of the second type or the target device is not detected, determining that the target response mode to the voice instruction is the second response mode;
- the target response mode is used to respond to the voice instruction.
- the adopting the target response mode to respond to the voice command includes:
- the voice command is sent to the target device for response;
- the target response mode is the second response mode, based on the type of the voice instruction, determining One of the augmented reality device and the target device responds.
- determining, based on the type of the voice command, that one of the augmented reality device and the target device responds includes:
- the augmented reality device responds to the voice instruction
- the voice instruction is delivered to the target device for response.
- the voice instruction when the voice instruction is a second type instruction, the voice instruction is sent to the target device for response, including:
- the voice instruction is converted into a target type instruction, and the target type instruction is delivered to the target device for response.
- the audio acquisition unit is used to collect target voice data, and the target voice data includes voice instructions;
- the method further comprises:
- obtaining the voice instruction by determining whether there is a voice instruction in the target voice data; and/or delivering the target voice data to the target device includes:
- the method of obtaining the voice instruction by determining whether there is a voice instruction in the target voice data after noise reduction; and/or delivering the target voice data after noise reduction to the target device includes:
- a voice interaction device comprising:
- An acquisition unit used to acquire a user's voice command based on an audio acquisition unit of an augmented reality device
- a first determining unit configured to determine a type of a target device connected to the augmented reality device
- a second determining unit configured to determine, when the target device is of the first type, that the target response mode to the voice instruction is a first response mode
- a third determining unit configured to determine, when the target device is of the second type or the target device is not detected, that the target response mode to the voice instruction is a second response mode
- the response unit is used to respond to the voice instruction using the target response mode.
- an augmented reality device wherein the augmented reality device at least includes the voice interaction device described in the present application.
- an electronic device including:
- the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method described in the present application.
- the audio acquisition unit of the augmented reality device obtains the user's voice command, determines the type of the target device connected to the augmented reality device, and when the target device is of the first type, determines the target response mode to the voice command to be the first response mode; when the target device is of the second type or the target device is not detected, determines the target response mode to the voice command to be the second response mode, and responds to the voice command using the target response mode.
- FIG1 shows a schematic diagram of the implementation flow of the voice interaction method according to an embodiment of the present application.
- FIG. 2 shows a schematic diagram 1 of the implementation process of different target response modes in an embodiment of the present application.
- FIG3 shows a second schematic diagram of the implementation process of different target response modes in an embodiment of the present application.
- FIG4 shows a schematic diagram of data flow on an augmented reality device according to an embodiment of the present application.
- FIG5 shows a schematic diagram of the composition structure of the voice interaction device in an embodiment of the present application.
- FIG6 shows a schematic diagram of the structure of an electronic device according to an embodiment of the present application.
- the augmented reality device can only communicate with the terminal device adapted thereto, which fails to reflect the flexibility and versatility of the augmented reality device.
- AR devices are now the mainstream wearable devices.
- Intelligent voice interaction as the mainstream interaction method of augmented reality devices, can free your hands and easily and quickly complete the input or control of augmented reality devices.
- augmented reality devices are usually not used as complex computing units, but only used for display (projection) and sensor data acquisition functions (such as images, audio, inertial measurement units, etc.). Based on this, augmented reality devices can only be connected to compatible terminal devices through wired or wireless means, and the collected sensor data will be handed over to the terminal device for algorithm calculation.
- augmented reality devices If it is possible to achieve normal voice interaction of augmented reality devices when they are not connected to terminal devices or connected to other terminal devices, it is bound to expand the functionality of augmented reality devices. In this way, it can lay the foundation for the widespread application of augmented reality devices in daily life.
- the technical solution of the embodiment of the present application involves a voice interaction solution.
- the augmented reality device can be connected to different types of target devices for voice interaction, and can also be connected to the target device for voice interaction, which reflects the versatility and flexibility of the augmented reality device. Based on the acquired voice command and the type of target device connected to the augmented reality device, different target response modes can be used to respond to the voice command. It provides technical support for the augmented reality device to be able to perform voice interaction normally in scenarios where it is not connected to the target device or is connected to different target devices, thereby expanding the use scenarios of the augmented reality device.
- the present application provides a voice interaction method, as shown in FIG1 , the method comprising:
- S101 Acquire a user's voice command based on an audio acquisition unit of an augmented reality device.
- the augmented reality device is an electronic device that can perform AR interaction.
- the augmented reality device can be a smart wearable device, including but not limited to smart glasses and smart watches.
- the augmented reality device is a split AR glasses as an example for explanation.
- the augmented reality device includes an audio collection unit, such as a microphone.
- the user's voice command is acquired by collecting the voice command sent by the user to the augmented reality device through the microphone.
- the microphone includes a microphone array sensor (MicArray) for collecting voice commands issued by the user to the augmented reality device.
- MicArray microphone array sensor
- the user can issue a voice command to the augmented reality device when the augmented reality device is not in use.
- a voice command such as "please turn on the screen” or "please turn off the device” can be issued to it.
- the user can also issue voice commands to the augmented reality device when the augmented reality device is used, such as when the augmented reality device is used to project movies or play audio such as songs. That is, when the augmented reality device is used to output multimedia data, the user's voice commands are obtained based on the audio acquisition unit of the augmented reality device.
- augmented reality devices usually output some multimedia data, such as images, audio, etc., during AR interaction.
- the video in the target device can be projected to the augmented reality device for output, such as projecting a movie in the target device to the augmented reality device for output.
- the multimedia data may refer to the video in the target device that can be projected by the AR device.
- the audio in the target device through the augmented reality device, such as answering a call or voice call through the augmented reality device.
- the multimedia data may refer to the audio in the target device that can be output by the AR device.
- the augmented reality device in this application is used as a device to replace the target device for audio and video output.
- This alternative solution mainly takes into account that in some application scenarios, using the target device as an audio and video output device is far less convenient and has a better output effect than using the augmented reality device as an audio and video output device.
- the projectable video in the target device is projected by the augmented reality device, so that the wearer of the augmented reality device can feel the immersive effect.
- the call can be answered by wearing split AR glasses.
- Split AR glasses can be built into ordinary glasses worn by the wearer. Answering calls through split AR glasses can avoid the situation where you cannot take out your mobile phone from your pocket to answer calls in a crowded environment.
- the user can issue voice commands to the augmented reality device when necessary.
- the augmented reality device specifically, a microphone, collects voice commands.
- the multimedia data output by the split AR glasses is the audio and video information played by the user.
- the microphone can collect the voice command issued by the user to obtain the voice command for audio and video playback, so as to increase the volume of the current audio and video playback by responding to the voice command.
- S102 Determine the type of the target device connected to the augmented reality device.
- the target device can be any device that performs voice interaction with the augmented reality device.
- the target device connected to the augmented reality device in this application can be different types of terminals.
- the target device is a self-developed terminal, that is, a terminal adapted to the augmented reality device. It can be understood that after the augmented reality device is produced by the manufacturer, there will usually be a self-developed terminal produced by the manufacturer that is adapted to the augmented reality device.
- the self-developed terminal can be understood as a normal-sized terminal with no display screen or a small display screen but computing power.
- the intelligent voice interaction function of the augmented reality device is realized by connecting the augmented reality device to the adapted self-developed terminal.
- the target device may also be other types of terminals, such as a third-party mobile phone or a third-party computer.
- the intelligent voice interaction function of the augmented reality device is realized by connecting the augmented reality device with other types of terminals.
- the augmented reality device is a split-type AR glasses
- the split-type AR glasses can be connected to a terminal adapted thereto, and can also be connected to a third-party terminal.
- the split-type AR glasses can perform intelligent voice interaction with the device connected to the split-type AR glasses.
- voice interaction as the mainstream interaction method of augmented reality devices, is usually limited to self-developed terminals. That is, if users want to use augmented reality devices, they have to purchase self-developed terminals that are compatible with them in order to perform voice interaction normally. In this case, on the one hand, if users already have mobile phones, they will not purchase additional compatible terminals for cost and portability reasons. On the other hand, when the augmented reality device is connected to a personal computer (PC), the PC, as a computing unit, will no longer be connected to the terminal. In the above two scenarios, the voice interaction function of the augmented reality device will become unavailable, greatly limiting the use scenarios of voice interaction under AR glasses.
- PC personal computer
- the augmented reality device can be connected to different types of terminals, and the voice interaction function of the augmented reality device can be realized by determining the type of target device connected to the augmented reality device.
- the augmented reality device can also be connected to no type of terminal, and realize some basic voice interaction functions such as adjusting volume and brightness through its own voice command control application.
- augmented reality devices can be connected to different types of terminals, or not connected to any terminal, so that users can choose to purchase only augmented reality devices without having to purchase self-developed terminals that are compatible with them. They can also realize the voice interaction function of augmented reality devices by directly connecting to their mobile phones or computers.
- the augmented reality device can access or connect to different types of terminals.
- the response mode to voice commands will be different depending on the type of target device connected to the augmented reality device.
- the types of target devices in this application include the first type and the second type.
- the first type is a terminal that contains voice keyword detection technology services, such as the aforementioned self-developed terminal.
- the second type is a terminal that does not contain voice keyword detection technology services, such as the aforementioned third-party mobile phone terminal, computer terminal, etc.
- the corresponding target response mode to the voice command can be mode A (first response mode).
- the corresponding target response mode to the voice command can be mode B (second response mode).
- the augmented reality device identifies whether it is connected to a target device. If no device is connected, the target response mode for the voice command is determined to be the second response mode. If a device is connected, the identifier of the connected device is obtained, and based on the identifier of the connected device, it is determined whether the connected device is of the first type or the second type. If the identifier of the connected device is identifier A, and identifier A is an identifier representing a device of the first type, the connected device is determined to be of the first type. If the identifier of the connected device is identifier B, and identifier B is an identifier representing a device of the second type, the connected device is determined to be of the second type.
- two response modes are pre-set based on whether the terminal connected to the augmented reality device is a terminal adapted to the augmented reality device or a third-party terminal not adapted to the augmented reality device.
- One of them is a mode used when the terminal connected to the augmented reality device is a terminal adapted to the augmented reality device.
- the other is a mode used when the terminal connected to the augmented reality device is a third-party terminal.
- Two different types of terminals and the modes to be used under each type of terminal are pre-set as a corresponding relationship.
- the mode corresponding to the type of terminal is searched in the corresponding relationship as the target response mode for responding to the voice command.
- the augmented reality device of this application needs to use voice keyword detection technology services to achieve normal voice interaction of the augmented reality device when connected to the second type of target device.
- the augmented reality device of this application includes voice keyword detection technology services, when the augmented reality device is not connected to the target device, that is, when the target device is not detected, the augmented reality device can also use the corresponding target response mode to perform basic voice interaction, such as adjusting brightness, adjusting volume, etc.
- a low-power voice keyword detection technology service is deployed in an augmented reality device. Without significantly increasing the power consumption or heat generation of the augmented reality device, the low-power voice keyword detection technology service is used to complete the recognition of voice commands.
- the augmented reality device is able to maintain voice interaction when connected to different types of terminals.
- Different target response modes are used to respond to voice commands based on the different types of target devices connected, that is, different target response modes are used to analyze what kind of command the voice command is and process the command.
- the voice command is analyzed to be a voice command of "increase the volume", and the output volume of the augmented reality device is increased.
- the voice command is analyzed to be a command of "reduce screen brightness”, and the screen brightness of the augmented reality device is reduced. This is to achieve the operation corresponding to the voice command.
- the augmented reality device can not only communicate with the adapted terminal and the third-party terminal, but also perform voice interaction without connecting to any terminal, which reflects the flexibility and versatility of the augmented reality device.
- the target response mode can be determined based on the type of target device connected to the augmented reality device, and the target response mode can be used to respond to voice commands. Based on the type of target device connected to the augmented reality device, different target response modes can be used to respond to voice commands.
- the augmented reality device maintains voice interaction capabilities when connected to different types of terminals. Technical support is provided for enabling the augmented reality device to perform normal voice interaction in scenarios where different target devices are connected.
- the adopting the target response mode to respond to the voice instruction includes:
- the voice command is sent to the target device for response;
- the target response mode is the second response mode, it is determined that one of the augmented reality device and the target device responds based on the type of the voice instruction.
- the type of the voice command indicates whether the voice command is a type of command to be responded to by the augmented reality device or a type of command to be responded to by the target device.
- target devices there are mainly two types of response entities involved: target devices and augmented reality devices.
- Augmented reality devices mainly include voice keyword detection technical services and voice command control applications.
- the target response mode is the first response mode, that is, based on the fact that the target device type connected to the augmented reality device is of the first type, such as a self-developed terminal, and the target response mode is determined to be the first response mode, the voice command is handed over to the operating system of the target device for full voice control.
- the voice command is converted into an Event Id to notify the voice command control application in the augmented reality device, and the voice command control application determines the instruction. Whether the command is a control command of the augmented reality device itself. If the command is a control command of the augmented reality device itself, the voice command control application responds to the voice command, such as volume adjustment, brightness adjustment, etc.
- the Event Id is converted into a universal keyboard (USB KeyBoard, UniversalSerialBus KeyBoard) protocol Id, and the USB KeyBoard Id is handed over to the target device, which responds to the voice command.
- USB KeyBoard UniversalSerialBus KeyBoard
- the voice keyword detection technology service of the augmented reality device is in operation, and the voice keyword detection technology service determines whether the target device connected to the augmented reality device is of the first type or the second type. By judging the type of the target device connected, different target response modes are adopted.
- the target response mode is the first response mode, that is, when the target response mode is determined to be the first response mode based on the target device type connected to the augmented reality device being of the first type such as a self-developed terminal
- the voice keyword detection technology service in the augmented reality device enters a dormant state, and the voice command is handed over to the operating system of the target device for full voice control.
- the target device connected to the augmented reality device is a self-developed terminal
- the self-developed terminal contains a voice keyword detection technology service
- a predefined voice command set on the self-developed terminal such as a volume increase command to adjust the sound of multimedia data, a brightness increase command to adjust the display brightness of the screen, and a mode switching command to adjust the display mode of multimedia data, such as adjusting from normal mode to 3D mode, or from 3D mode to normal mode.
- the augmented reality device determines that the connected target device is a self-developed terminal, and can hand over the voice commands collected by the augmented reality device to the operating system of the target device for full voice control.
- the voice keyword detection technology service running on the self-developed terminal hands over the voice command to the self-developed terminal.
- the self-developed terminal specifically the application processing unit, responds to the voice command and executes the control actions corresponding to the language command, such as sound adjustment, brightness adjustment, mode switching, etc.
- the voice command obtained by the augmented reality device is a (sound adjustment) command for adjusting the sound of the multimedia data output by the augmented reality device. If the target device connected to the augmented reality device is a self-developed terminal compatible with the augmented reality device, the sound adjustment command will be processed by the self-developed terminal to achieve the adjustment of the sound of the multimedia data output by the augmented reality device through the self-developed terminal.
- the voice command obtained by the augmented reality device is an instruction for adjusting the display brightness of the multimedia data output by the augmented reality device. If the target device connected to the augmented reality device is a self-developed terminal adapted to the augmented reality device, the display brightness adjustment instruction will be handed over to the self-developed terminal for processing, so as to realize the adjustment of the display brightness of the augmented reality device screen through the self-developed terminal.
- the voice command obtained by the augmented reality device is an instruction for adjusting the display mode of the multimedia data output by the augmented reality device. If the target device connected to the augmented reality device is a self-developed terminal adapted to the augmented reality device, the display mode adjustment instruction is handed over to the self-developed terminal for processing, so as to The terminal adjusts the display mode of the multimedia data output by the augmented reality device.
- the target response mode is the second response mode, that is, when the target response mode is determined to be the second response mode based on the fact that the target device type connected to the augmented reality device is the second type such as a third-party mobile phone or computer, or the target device is not detected, the voice keyword detection technical service of the augmented reality device remains running.
- determining that one of the augmented reality device and the target device responds based on the type of the voice instruction includes:
- the augmented reality device responds to the voice instruction
- the voice instruction is delivered to the target device for response.
- the voice commands when the target response mode is the second response mode, also include two types.
- the first type of command is a command that the augmented reality device can respond to, such as the aforementioned sound adjustment command, display brightness adjustment command, and display mode adjustment command.
- the second type of command is a command that the augmented reality device cannot respond to but the target device can respond to, such as the "return to the previous step" command, the "confirm” command, and the "main menu” command.
- the target response mode is the second response mode
- the voice command is a command that the augmented reality device can respond to, such as a sound adjustment command, a display brightness adjustment command, and a display mode adjustment command
- the augmented reality device responds to the voice command. If the voice command is a "return to the previous step" command, an "OK” command, and a "main menu” command, the voice command is handed over to the target device for response.
- delivering the voice instruction to the target device for response includes:
- the voice instruction is converted into a target type instruction, and the target type instruction is delivered to the target device for response.
- the voice command is a command that the augmented reality device cannot respond to but the target device can respond to, it is necessary to convert the voice command into a type that the target device can recognize or respond to, thereby achieving a response to the voice command.
- the target device connected to the augmented reality device is a third-party mobile phone or computer
- the voice keyword detection technology service in the augmented reality device recognizes the voice command
- it converts the voice command into an Event Id and notifies the voice command control application in the augmented reality device.
- the voice command control application matches the command Event Id with the number corresponding to the predefined command function, and makes corresponding feedback according to the predefined command function represented by the corresponding number.
- the predefined command function is the command function represented by different numbers in the predefined command set, such as number 1 represents the volume increase function, number 2 represents the brightness increase function, number 3 represents the mode switching function, etc.
- the instruction corresponding to the instruction Event Id is a first type instruction, that is, when the instruction Event Id corresponds to the number corresponding to the instruction function predefined for the augmented reality device, that is, the instruction corresponding to the instruction Event Id is a self-control-related instruction that the augmented reality device can respond to, such as sound adjustment, brightness adjustment, display switching, mode switching and other instructions related to the hardware of the augmented reality device
- the voice command control application directly completes the corresponding control operation through the system application programming interface (API, Application Programming Interface).
- the instruction Event Id is converted into a USB KeyBoard protocol Id and sent to the connected target device.
- the target device responds to the USB KeyBoard protocol Id and performs operations on the multimedia data output by the augmented reality device.
- the command "return to the previous step” is defined as the "F1" key on the keyboard of the target device.
- the multimedia data output by the augmented reality device will return to the previous step.
- the "main menu” is defined as the "F2" key on the keyboard of the target device.
- the augmented reality device will pop up the main menu interface.
- confirmation can also be defined as the "enter” key on the keyboard of the target device.
- the aforementioned execution process of the augmented reality device in the present application can be realized by a high-performance dedicated chip set in the augmented reality device.
- the setting of the high-performance dedicated chip can improve the computing efficiency and realize the rapid response to the voice command.
- the low-power voice keyword detection technology service is adopted, which has the significant advantages of low required computing power and low resource consumption. Without significantly increasing the power consumption, the augmented reality device is endowed with AR capabilities, which greatly improves the control ability of the augmented reality device itself.
- the voice command can be customized and expanded, which makes up for the shortcomings of key control.
- the augmented reality device converts the recognized voice command into a target type that can be recognized or responded to by the target device, and regards the response of the target device to the voice command as a response to the execution of the voice command by the user operation (such as the aforementioned target device keyboard "F1" key, the target device keyboard “F2” key and the target device keyboard “enter” key, etc.).
- This scheme is equivalent to using the augmented reality device as a control peripheral of the target device, and giving the target device voice interaction capabilities under the condition of low-cost access. In practical applications, the product competitiveness of the augmented reality device is increased.
- the audio acquisition unit is used to acquire target voice data, and the target voice data includes voice instructions;
- the method further comprises:
- the augmented reality device also includes a keyword detection unit and a control application unit.
- the keyword detection unit includes a voice keyword detection technical service, which is used to determine the type of target device connected and recognize voice commands.
- the control application unit includes a voice command control application, which is used to determine the command type and send different types of commands to different response entities for response.
- the target voice data includes voice commands and/or voice data generated when the augmented reality device outputs multimedia data.
- the voice command in the present application refers to the command input into the augmented reality device in the form of voice, which is a kind of command data.
- the target voice data collected by the microphone array sensor may be the command data.
- the voice data collected by the microphone array sensor may be the voice data generated when multimedia data is output, such as when answering a call through an augmented reality device, the voice data collected by the microphone array sensor may be the content of the call. Or when projecting a movie through an augmented reality device, the voice data collected by the microphone array sensor may be the audio content of the movie.
- the augmented reality device collects target voice data through a microphone array sensor, and identifies whether the target voice data contains voice commands and/or voice data generated when the augmented reality device outputs multimedia data. If it is found through identification that the target voice data contains both voice commands and voice data generated when the augmented reality device outputs multimedia data, the two types of voice data can be separated to perform different processing on the two types of voice data.
- the augmented reality device determines different target device types through the keyword detection unit, thereby adopting different response modes. Specifically, when the target device is of the first type, the target device responds to the voice command. When the target device is of the second type or the target device is not detected, the keyword detection unit recognizes the voice command, and the control application unit determines different command types, thereby selecting different response subjects to respond to the voice command.
- the voice data can be delivered to the target device, which will record, perform semantic recognition and other processing on the voice data.
- the method further comprises determining whether there is a voice indicator in the target voice data. command to obtain the voice command; and/or, delivering the target voice data to the target device, including:
- the augmented reality device also includes a noise reduction unit for performing noise reduction processing on the target voice data to obtain the target voice data after noise reduction.
- the augmented reality device collects the target voice data through the microphone array sensor, and performs noise reduction processing on the target voice data through the noise reduction unit to obtain the target voice data after noise reduction.
- the target voice data after noise reduction is transmitted to the augmented reality device end and/or the target device end.
- the augmented reality device end determines different target device types through the keyword detection unit, thereby adopting different response modes.
- the target device type is the first type
- the target device responds to the voice instruction.
- the target device type is the second type or the target device is not detected
- the voice instruction is identified by the keyword detection unit, and the control application unit determines different instruction types, thereby selecting different response subjects to respond to the voice instruction.
- the noise-reduced voice data can be delivered to the target device, which will record, perform semantic recognition and other processing on the noise-reduced voice data.
- the quality of the voice data transmitted to the augmented reality device and/or the target device can be improved, thereby achieving accurate response to the target voice data.
- obtaining the voice instruction by determining whether there is a voice instruction in the target voice data after noise reduction; and/or delivering the target voice data after noise reduction to the target device includes:
- the augmented reality device also includes a copying and shunting unit, which is used to copy the noise-reduced target voice data to obtain two copies of the noise-reduced target voice data, and shun the two copies of the noise-reduced target voice data to obtain the first target voice data and the second target voice data.
- the augmented reality device sends the microphone audio data Audio In read to a specific noise reduction unit for directional noise reduction to obtain enhanced audio data Clean Audio that is free of noise or contains less noise.
- the audio data that is free of noise or contains less noise is obtained by the copying and shunting unit to obtain the first target voice data and the second target voice data. Voice data.
- the first target voice data and the second target voice data are both the same voice data as the Clean Audio that is noise-free or less noise-containing after noise reduction.
- the second target voice data is directly transmitted to the target device Audio Out through hardware wired or wireless means, and the first target voice data is processed by the augmented reality device.
- the augmented reality device obtains the voice instruction, and judges different target device types through the keyword detection unit, thereby adopting different response modes. Specifically, when the target device is of the first type, the target device responds to the voice instruction.
- the target device is of the second type or the target device is not detected, the voice instruction is identified by the keyword detection unit, and the control application unit judges different instruction types, thereby selecting different response subjects to respond to the voice instruction.
- the first target voice data and the second target voice data are obtained.
- the first target voice data is processed by the augmented reality device end for voice command recognition and responses by different response entities.
- the second target voice data is transmitted to the target device end so that the collected target voice data can be used by other applications of the target device. For example, when the user makes a call, the collected target voice data is used as the call content by the call application of the target device. Copying and diverting the target voice data after noise reduction can ensure that the applications on the augmented reality device end and the target device end can obtain the audio data they need without interfering with each other.
- the present application embodiment provides a voice interaction device, as shown in FIG5 , the device comprising:
- An acquisition unit 501 is used to acquire a user's voice command based on an audio acquisition unit of an augmented reality device;
- a first determining unit 502 configured to determine a type of a target device connected to the augmented reality device
- a second determining unit 503 is used to determine that the target response mode to the voice instruction is a first response mode when the target device is of the first type;
- a third determining unit 504 is configured to determine that the target response mode to the voice command is a second response mode when the target device is of the second type or the target device is not detected;
- the response unit 505 is used to respond to the voice instruction using the target response mode.
- the response unit 505 is used to direct the voice instruction to be responded to by the target device when the target response mode is the first response mode; and to determine, based on the type of the voice instruction, whether one of the augmented reality device and the target device will respond when the target response mode is the second response mode.
- the response unit 505 is used to cause the augmented reality device to respond to the voice instruction when the voice instruction is a first type of instruction; and to hand over the voice instruction to the target device for response when the voice instruction is a second type of instruction.
- the response unit 505 is used to convert the voice instruction into a target type instruction when the voice instruction is a second type instruction, and to hand over the target type instruction to the The target device responds.
- the audio acquisition unit is used to acquire target voice data, and the target voice data includes voice instructions;
- the device also includes:
- the voice data unit is used to obtain the voice instruction by determining whether there is a voice instruction in the target voice data; and/or, to deliver the target voice data to the target device.
- the device also includes: a noise reduction unit; the noise reduction unit is used to perform noise reduction processing on the target voice data to obtain the target voice data after noise reduction; accordingly, the voice data unit is also used to obtain the voice instruction by determining whether there is a voice instruction in the target voice data after noise reduction; and/or, delivering the target voice data after noise reduction to the target device.
- a noise reduction unit used to perform noise reduction processing on the target voice data to obtain the target voice data after noise reduction
- the voice data unit is also used to obtain the voice instruction by determining whether there is a voice instruction in the target voice data after noise reduction; and/or, delivering the target voice data after noise reduction to the target device.
- the device also includes: a copy and diversion unit; the copy and diversion unit is used to obtain first target voice data and second target voice data based on the target voice data after noise reduction; accordingly, the voice data unit is also used to obtain the voice instruction by determining whether there is a voice instruction in the first target voice data; and/or, handing over the second target voice data to the target device.
- a copy and diversion unit is used to obtain first target voice data and second target voice data based on the target voice data after noise reduction; accordingly, the voice data unit is also used to obtain the voice instruction by determining whether there is a voice instruction in the first target voice data; and/or, handing over the second target voice data to the target device.
- the voice interaction device of the embodiment of the present application solves the problem in a similar principle to the aforementioned voice interaction method. Therefore, the implementation process, implementation principle and beneficial effects of the device can all be referred to the description of the implementation process, implementation principle and beneficial effects of the aforementioned method, and the repeated parts will not be repeated.
- An embodiment of the present application provides an augmented reality device, which at least includes the voice interaction device described in the present application.
- the present application also provides an electronic device.
- Fig. 6 shows a schematic block diagram of an example electronic device 600 that can be used to implement an embodiment of the present application.
- the electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
- the electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.
- the components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the present application described herein and/or required.
- the electronic device 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 602 or a computer program loaded from a storage unit 608 into a random access memory (RAM) 603.
- ROM read-only memory
- RAM random access memory
- Various programs and data required for the operation of the electronic device 600 may also be stored.
- the computing unit 601, the ROM 602, and the RAM 603 are connected to one another via a bus 604.
- An input/output (I/O) interface 605 is also connected to the bus 604.
- the I/O interface 605 includes: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a disk, an optical disk, etc.; and a communication unit 609, such as a network card, a modem, a wireless communication transceiver, etc.
- the communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
- the computing unit 601 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processors (DSPs), and any appropriate processors, controllers, microcontrollers, etc.
- the computing unit 601 performs the various methods and processes described above, such as the voice interaction method.
- the voice interaction method may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 608.
- part or all of the computer program may be loaded and/or installed on the electronic device 600 via ROM 602 and/or communication unit 609.
- the computer program When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the voice interaction method described above may be performed.
- the computing unit 601 may be configured to perform the voice interaction method in any other appropriate manner (e.g., by means of firmware).
- Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof.
- FPGAs field programmable gate arrays
- ASICs application specific integrated circuits
- ASSPs application specific standard products
- SOCs systems on chips
- CPLDs complex programmable logic devices
- Various implementations can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
- a programmable processor which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
- the program code for implementing the method of the present application can be written in any combination of one or more programming languages. These program codes can be provided to a processor or controller of a general-purpose computer, a special-purpose computer or other programmable data processing device, so that when the program code is executed by the processor or controller, the functions/operations specified in the flow chart and/or block diagram are implemented.
- the program code can be executed entirely on the machine, Partially on the machine, partly on the machine as a stand-alone software package and partly on a remote machine, or entirely on a remote machine or server.
- first and second are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, a feature defined as “first” or “second” may explicitly or implicitly include at least one of the features. In the description of this application, the meaning of “plurality” is two or more, unless otherwise clearly and specifically defined.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
本申请提供了一种语音交互方法、装置及相关设备,其中所述方法包括:基于增强现实设备的音频采集单元获取用户的语音指令;确定与所述增强现实设备连接的目标设备的类型;在所述目标设备为第一类型时,确定对所述语音指令的目标响应模式为第一响应模式;在所述目标设备为第二类型或未检测到目标设备时,确定对所述语音指令的目标响应模式为第二响应模式;采用所述目标响应模式,对所述语音指令进行响应。为实现增强现实设备在不连接目标设备,或连接不同目标设备的场景下都能正常进行语音交互提供了技术支持。
Description
相关申请的交叉引用
本申请要求于2023年1月12日提交中国专利局,申请号为202310096785.6的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及语音交互领域,尤其涉及一种语音交互方法、装置及相关设备。
通常,增强现实设备用做显示或采集传感器数据,多数情况下需能连接到与增强现实设备适配的终端设备才能使用。
发明内容
本申请提供了一种语音交互方法、装置及相关设备,以至少解决现有技术中存在的以上技术问题。
根据本申请的第一方面,提供了一种语音交互方法,所述方法包括:
基于增强现实设备的音频采集单元获取用户的语音指令;
确定与所述增强现实设备连接的目标设备的类型;
在所述目标设备为第一类型时,确定对所述语音指令的目标响应模式为第一响应模式;
在所述目标设备为第二类型或未检测到目标设备时,确定对所述语音指令的目标响应模式为第二响应模式;
采用所述目标响应模式,对所述语音指令进行响应。
上述方案中,所述采用所述目标响应模式,对所述语音指令进行响应,包括:
在所述目标响应模式为第一响应模式时,将所述语音指令交由所述目标设备响应;
在所述目标响应模式为第二响应模式时,基于所述语音指令的类型,确定
由所述增强现实设备和所述目标设备中的其中之一进行响应。
上述方案中,所述基于所述语音指令的类型,确定由所述增强现实设备和所述目标设备中的其中之一进行响应,包括:
在所述语音指令为第一类型指令时,由所述增强现实设备对所述语音指令进行响应;
在所述语音指令为第二类型指令时,将所述语音指令交由所述目标设备响应。
上述方案中,所述在所述语音指令为第二类型指令时,将所述语音指令交由所述目标设备响应,包括:
在所述语音指令为第二类型指令时,将所述语音指令转换成目标类型指令,将所述目标类型指令交由所述目标设备响应。
上述方案中,所述音频采集单元用于对目标语音数据进行采集,所述目标语音数据包括语音指令;
所述方法还包括:
通过确定所述目标语音数据中是否存在语音指令而获取所述语音指令;
和/或,将所述目标语音数据交由所述目标设备。
上述方案中,所述通过确定所述目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述目标语音数据交由所述目标设备,包括:
将所述目标语音数据进行降噪处理,得到降噪后的目标语音数据;
通过确定所述降噪后的目标语音数据中是否存在语音指令而获取所述语音指令;
和/或,将所述降噪后的目标语音数据交由所述目标设备。
上述方案中,所述通过确定所述降噪后的目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述降噪后的目标语音数据交由所述目标设备,包括:
基于降噪后的目标语音数据,得到第一目标语音数据和第二目标语音数据;
通过确定所述第一目标语音数据中是否存在语音指令而获取所述语音指令;
和/或,将所述第二目标语音数据交由所述目标设备。
根据本申请的第二方面,提供了一种语音交互装置,所述装置包括:
获取单元,用于基于增强现实设备的音频采集单元获取用户的语音指令;
第一确定单元,用于确定与所述增强现实设备连接的目标设备的类型;
第二确定单元,用于在所述目标设备为第一类型时,确定对所述语音指令的目标响应模式为第一响应模式;
第三确定单元,用于在所述目标设备为第二类型或未检测到目标设备时,确定对所述语音指令的目标响应模式为第二响应模式;
响应单元,用于采用所述目标响应模式,对所述语音指令进行响应。
根据本申请的第三方面,提供了一种增强现实设备,所述增强现实设备至少包括本申请所述的语音交互装置。
根据本申请的第四方面,提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行本申请所述的方法。
本申请中,基于增强现实设备的音频采集单元获取用户的语音指令,确定与增强现实设备连接的目标设备的类型,在目标设备为第一类型时,确定对语音指令的目标响应模式为第一响应模式;在目标设备为第二类型或未检测到目标设备时,确定对语音指令的目标响应模式为第二响应模式,采用目标响应模式,对语音指令进行响应。为实现增强现实设备在不连接目标设备,或连接不同目标设备的场景下都能正常进行语音交互提供了技术支持。
应当理解,本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征,也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。
通过参考附图阅读下文的详细描述,本申请示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本申请的若干实施方式,其中:
在附图中,相同或对应的标号表示相同或对应的部分。
图1示出了本申请实施例语音交互方法的实现流程示意图。
图2示出了本申请实施例不同目标响应模式的实现流程示意图一。
图3示出了本申请实施例不同目标响应模式的实现流程示意图二。
图4示出了本申请实施例增强现实设备端数据流示意图。
图5示出了本申请实施例中语音交互装置的组成结构示意图。
图6示出了本申请实施例一种电子设备的组成结构示意图。
为使本申请的目的、特征、优点能够更加的明显和易懂,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而非全部实施例。基于本申请中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
相关技术中,增强现实设备仅能和与其适配的终端设备进行通信,无法体现增强现实设备的灵活性和多功能性。
可以理解,增强现实(AR,Augmented Reality)设备是现在主流的可穿戴式设备,智能语音交互作为增强现实设备的主流交互方式,可以做到解放双手,轻松快速的完成对增强现实设备的输入或操控。考虑到外观、佩戴、功耗、发热等因素影响,增强现实设备通常不作为复杂计算单元,仅用作显示(投影)及传感器数据采集功能(如图像、音频、惯性测量单元等)。基于此,增强现实设备只能通过有线或无线的方式连接到与之适配的终端设备,将采集到的传感器数据交给终端设备做算法计算。如果能够实现增强现实设备在不连接终端设备或连接其他终端设备时也可以正常进行语音交互,势必会扩大增强现实设备的功能性。如此,可以为增强现实设备在日常生活中的广泛应用奠定基础。
本申请实施例的技术方案涉及到语音交互的方案,增强现实设备可与不同类型的目标设备连接进行语音交互,也可以在不连接目标设备的情况下进行语音交互,体现了增强现实设备的多功能性和灵活性。基于获取到的语音指令和与增强现实设备连接的目标设备类型,即可采用不同的目标响应模式对语音指令进行响应。为实现增强现实设备在不连接目标设备,或连接不同目标设备的场景下都能正常进行语音交互提供了技术支持,扩大了增强现实设备的使用场景。
下面对本申请实施例的语音交互方法做详细说明。
本申请实施例提供一种语音交互方法,如图1所示,所述方法包括:
S101:基于增强现实设备的音频采集单元获取用户的语音指令。
本步骤中,增强现实设备为可以进行AR交互的电子设备。增强现实设备可以为智能穿戴设备,包括但不限定于智能眼镜、智能手表。本申请中,以增强现实设备为分体式AR眼镜为例进行说明。
增强现实设备包括音频采集单元,如麦克风。本步骤中,通过麦克风对用户向增强现实设备发出的语音指令进行采集的方式,而获取用户的语音指令。
可以理解,麦克风包含有麦克风阵列传感器(MicArray),用于采集用户向增强现实设备发出的语音指令。
在实际应用中,用户可以在增强现实设备不被使用时,向增强现实设备发出语音指令。如,在增强现实设备处于黑屏时,向其发出诸如“请亮屏”的语音指令或“请关机”的语音指令。
用户还可以在增强现实设备被使用时,如利用增强现实设备投放电影或播放歌曲等音频时,向增强现实设备发出语音指令。即,在利用增强现实设备输出多媒体数据的情况下,基于增强现实设备的音频采集单元获取用户的语音指令。
可以理解,增强现实设备通常会在进行AR交互的过程中输出一些多媒体数据,如图像、音频等。增强现实设备在与目标设备连接的情况下,可以将目标设备中的视频投影至增强现实设备进行输出,如将目标设备中的电影投影至增强现实设备进行输出。在这种情况下,多媒体数据可指目标设备中可被AR设备进行投影的视频。还可以,将目标设备中的音频通过增强现实设备进行输出,如通过增强现实设备接听通话语音或语音电话。这种情况下,多媒体数据可指目标设备中可被AR设备进行输出的音频。
即,在与目标设备进行连接的情况下,本申请中的增强现实设备作为代替目标设备进行音视频输出的设备。这种代替方案主要考虑到,在某些应用场景中,采用目标设备作为音视频输出设备远没有采用增强现实设备作为音视频输出设备方便、输出效果好。如,在投影的应用场景中,通过增强现实设备对目标设备中的可投影视频进行投影,可令增强现实设备的佩戴者感受到身临其境的效果。还如,在一些接听环境拥挤的场景中,如地铁的早晚高峰情形中,由于地铁中的拥挤不方便接听目标设备的来电,可通过佩戴的分体式AR眼镜进行来电的接听。分体式AR眼镜可内置在佩戴者佩戴的普通眼镜上,通过分体式AR眼镜进行来电接听,可避免在拥挤环境中无法从口袋中掏出手机接听来电情况的发生。
用户可在需要的情形下,向增强现实设备发出语音指令。增强现实设备,具体是麦克风采集语音指令。
示例性地,在增强现实设备为分体式AR眼镜的情况下,假定当前场景为用户使用分体式AR眼镜进行音视频播放,此时分体式AR眼镜输出的多媒体数据为用户播放的音视频信息,当用户发出语音指令“调高音量”,通过麦克风采集用户发出的这一语音指令即可获取到针对音视频播放的语音指令,以通过对语音指令的响应,提高当前音视频播放的音量。
S102:确定与所述增强现实设备连接的目标设备的类型。
本步骤中,目标设备可以是任何与增强现实设备进行语音交互的设备。如智能手机、电脑、个人数字助理等设备。本申请中的增强现实设备连接的目标设备可以是不同类型的终端。如目标设备是自研终端,即,与增强现实设备相适配的终端。可以理解,增强现实设备在被厂商生产出来之后,通常都会有厂家生产的与该增强现实设备相适配的自研终端。该自研终端可以理解为一个没有显示屏幕或显示屏幕较小,但具有计算能力的正常大小的终端。通过将增强现实设备与适配的自研终端连接来实现增强现实设备的智能语音交互功能。
目标设备也可以是其他类型终端,如第三方手机、或者第三方电脑。本申请中,通过将增强现实设备与其他类型终端连接来实现增强现实设备的智能语音交互功能。示例性地,当增强现实设备为分体式AR眼镜时,分体式AR眼镜可以和与其适配的终端进行连接,还可以和第三方终端进行连接。本申请中,无论是和与其适配的终端进行连接,还是和第三方终端进行连接,分体式AR眼镜均可和与分体式AR眼镜连接的设备进行智能语音交互。
在实际应用中,语音交互作为增强现实设备主流的交互方式,通常受限在自研终端上。即,用户要想使用增强现实设备,就得购买与之适配的自研终端才可以正常进行语音交互。在这种情况下,一方面,用户在已具备移动手机情况下,出于成本和便携性考虑,不会再额外购买适配的终端。另一方面,增强现实设备在接入个人电脑(PC,Personal Computer)场景下,PC作为计算单元,也不会再接入终端。在前述两个场景下增强现实设备的语音交互功能就会变的不可用,大大限制了语音交互在AR眼镜下的使用场景。
本申请中,增强现实设备可接入不同类型的终端,并通过确定与增强现实设备连接的目标设备的类型,实现增强现实设备的语音交互功能。增强现实设备也可以不接入任何类型的终端,通过自身语音指令控制应用实现一些基础的语音交互功能,如调节音量、调节亮度等。
在实际应用中,增强现实设备可接入不同类型的终端,也可以不接入任何终端,使得用户可以选择只购买增强现实设备,不用再去购买与之适配的自研终端,直接与自己的手机或者电脑进行连接也可以实现增强现实设备的语音交互功能。
S103:在所述目标设备为第一类型时,确定对所述语音指令的目标响应模式为第一响应模式。
本申请中,增强现实设备可接入或连接不同类型的终端,增强现实设备连接的目标设备类型不同,对语音指令的响应模式也会有所不同。
本申请中目标设备的类型包括第一类型和第二类型,第一类型为内部包含语音关键词检测技术服务的终端,如前述的自研终端。第二类型为内部不包含语音关键词检测技术服务的终端,如前述的第三方手机端、电脑端等终端。
示例性地,假定增强现实设备连接的目标设备为前述的自研终端,那与之对应的对语音指令的目标响应模式可以为模式A(第一响应模式)。假定增强现实设备连接的目标设备为前述的其它类型终端(第三方手机端、电脑端等),那与之对应的对语音指令的目标响应模式可以为模式B(第二响应模式)。
在实施时,增强现实设备对自身是否连接有目标设备进行识别。如果没有连接设备,则确定对所述语音指令的目标响应模式为第二响应模式。如果连接有设备,获得所连接的设备的标识,基于所连接设备的标识,确定其所连接的设备为第一类型还是为第二类型。如果所连接的设备的标识为标识A,且标识A是表征第一类型的设备的标识,则确定其所连接的设备为第一类型。如果所连接的设备的标识为标识B,且标识B是表征第二类型的设备的标识时,确定其所连接的设备为第二类型。
本申请中,基于与增强现实设备连接的终端是与增强现实设备适配的终端,还是不与增强现实设备适配的第三方终端,预先设置有两种响应模式。其中一种是在与增强现实设备连接的终端是与增强现实设备适配的终端时使用的模式。另一种是在与增强现实设备连接的终端是第三方终端时使用的模式。预先将两种不同类型终端和各类型终端下需要使用的模式设置为一种对应关系。在实施时,基于与增强现实设备连接的终端的类型,在对应关系中查找该类型终端对应的模式,作为对语音指令进行响应的目标响应模式。
S104:在所述目标设备为第二类型或未检测到目标设备时,确定对所述语音指令的目标响应模式为第二响应模式。
由于第二类型的目标设备是不包含语音关键词检测技术服务的终端,因此为保证增强现实设备在接入第二类型的目标设备时仍能进行语音交互,本申请增强现实设备中需运用语音关键词检测技术服务,来实现增强现实设备在连接第二类型目标设备时的正常语音交互。同时,由于本申请增强现实设备中包括语音关键词检测技术服务,因此当增强现实设备没有与目标设备进行连接,即未检测到目标设备时,增强现实设备也可以采用对应的目标响应模式进行基础的语音交互,如调节亮度、调节音量等。
本申请中,在增强现实设备中部署一种低功耗的语音关键词检测技术服务,在不明显增加增强现实设备的功耗或发热的情况下,借助低功耗语音关键词检测技术服务,完成对语音指令的识别。
通过对不同类型的终端实施不同的目标响应模式,满足了增强现实设备在接入不同类型终端下均能保持语音交互的能力。
S105:采用所述目标响应模式,对所述语音指令进行响应。
通过连接的目标设备类型的不同,采用不同的目标响应模式对语音指令进行响应,即采用不同的目标响应模式,分析语音指令是何种指令并对该种指令进行处理。示例性地,分析语音指令是“提高音量”的语音指令,调高增强现实设备的输出音量。分析语音指令是“降低屏幕亮度”的指令,将增强现实设备的屏幕亮度降低。以实现语音指令对应的操作。
在S101~S105所示的方案中,增强现实设备不仅可与适配的终端以及第三方终端进行通信,还可以在不接入任何终端的情况下进行语音交互。体现了增强现实设备的灵活性和多功能性。
且本申请中,可基于与增强现实设备连接的目标设备的类型,确定目标响应模式,并利用目标响应模式,实现对语音指令的响应。基于与增强现实设备连接的目标设备类型,即可采用不同的目标响应模式对语音指令进行响应。实现了增强现实设备在接入不同类型终端的情况下保持语音交互能力。为实现增强现实设备在连接不同目标设备的场景下都能正常进行语音交互提供了技术支持。
在一个可选的方案中,所述采用所述目标响应模式,对所述语音指令进行响应,包括:
在所述目标响应模式为第一响应模式时,将所述语音指令交由所述目标设备响应;
在所述目标响应模式为第二响应模式时,基于所述语音指令的类型,确定由所述增强现实设备和所述目标设备中的其中之一进行响应。
其中,语音指令的类型表征该语音指令是由增强现实设备进行响应的指令类型、或由目标设备进行响应的指令类型。
如图2所示,本申请中,涉及到的响应主体主要有两类:目标设备和增强现实设备。增强现实设备主要包含语音关键词检测技术服务和语音指令控制应用。当目标响应模式为第一响应模式,即基于增强现实设备连接的目标设备类型为第一类型如自研终端而确定目标响应模式为第一响应模式时,语音指令交由目标设备的操作系统进行全语音控制。当目标响应模式为第二响应模式,即基于增强现实设备连接的目标设备类型为第二类型如第三方手机或电脑、或者未检测到目标设备而确定目标响应模式为第二响应模式时,将语音指令转化为Event Id通知给增强现实设备中的语音指令控制应用,语音指令控制应用判断指
令是否为增强现实设备自身控制指令。如果指令为增强现实设备自身控制指令,则由语音指令控制应用对语音指令进行响应,如音量调节、亮度调节等。如果语音指令为目标设备控制的指令,则将Event Id转化为通用键盘(USB KeyBoard,UniversalSerialBus KeyBoard)协议Id,将USB KeyBoard Id交由目标设备,由目标设备对语音指令进行响应。
具体的,如图3所示,增强现实设备的语音关键词检测技术服务处于运行状态,由语音关键词检测技术服务判断与增强现实设备连接的目标设备为第一类型还是为第二类型。通过判断接入的目标设备类型,采取不同的目标响应模式。当目标响应模式为第一响应模式,即基于增强现实设备连接的目标设备类型为第一类型如自研终端而确定目标响应模式为第一响应模式时,增强现实设备中的语音关键词检测技术服务进入休眠状态,语音指令交由目标设备的操作系统进行全语音控制。示例性地,假设增强现实设备连接的目标设备为自研终端,由于自研终端里面包含有语音关键词检测技术服务,通常在自研终端上有预定义好的语音指令集,如调高音量指令是为了对多媒体数据的声音进行调节、增加亮度指令是为了对屏幕的显示亮度进行调节、模式切换指令是为了对多媒体数据的显示模式进行调节,如从普通模式调节到3D模式,或从3D模式调节到普通模式等。
在用户发出对应的语音指令后,增强现实设备判断接入的目标设备为自研终端,可将增强现实设备采集的语音指令交由目标设备的操作系统进行全语音控制。自研终端上运行的语音关键词检测技术服务将语音指令交给自研终端。自研终端、具体是应用处理单元响应语音指令,执行对应于语言指令的操控动作,如声音调节、亮度调节、模式切换等。
在增强现实设备获取到的语音指令为对增强现实设备输出的多媒体数据的声音进行调节的(声音调节)指令,如果与增强现实设备连接的目标设备是与增强现实设备适配的自研终端,则将声音调节指令交由自研终端处理,以通过自研终端实现对增强现实设备输出的多媒体数据的声音的调节。
在增强现实设备获取到的语音指令为对增强现实设备输出的多媒体数据的显示亮度进行调节的指令,如果与增强现实设备连接的目标设备是与增强现实设备适配的自研终端,则将显示亮度调节指令交由自研终端处理,以通过自研终端实现对增强现实设备屏幕的显示亮度的调节。
在增强现实设备获取到的语音指令为对增强现实设备输出的多媒体数据的显示模式进行调节的指令,如果与增强现实设备连接的目标设备是与增强现实设备适配的自研终端,则将显示模式调节指令交由自研终端处理,以通过自研
终端实现对增强现实设备输出的多媒体数据的显示模式的调节。
当目标响应模式为第二响应模式,即基于增强现实设备连接的目标设备类型为第二类型如第三方手机或电脑、或未检测到目标设备而确定目标响应模式为第二响应模式时,增强现实设备的语音关键词检测技术服务保持运行。
在一个可选的方案中,所述在所述目标响应模式为第二响应模式时,基于所述语音指令的类型,确定由所述增强现实设备和所述目标设备中的其中之一进行响应,包括:
在所述语音指令为第一类型指令时,由所述增强现实设备对所述语音指令进行响应;
在所述语音指令为第二类型指令时,将所述语音指令交由所述目标设备响应。
本申请中,在目标响应模式为第二响应模式的情况下,语音指令也包括两种类型,第一类型指令为增强现实设备可以响应的指令,如前述的声音调节指令、显示亮度调节指令及显示模式调节指令等。第二类型指令为增强现实设备无法响应、但目标设备可以响应的指令,如“返回上一步”指令、“确定”指令及“主菜单”指令等等。
本申请中,在目标响应模式为第二响应模式的情况下,如果语音指令是诸如声音调节指令、显示亮度调节指令及显示模式调节指令等增强现实设备可以响应的指令,则由增强现实设备对语音指令进行响应。如果语音指令是诸如“返回上一步”指令、“确定”指令及“主菜单”指令等,则将语音指令交由目标设备进行响应。
基于此,本申请中,在第二响应模式下,根据语音指令类型的不同,由不同的响应主体对语音指令进行响应的方案,可避免语音指令全部为响应主体中的其中之一进行响应而导致的负担过重的问题。本申请中,不同类型的语音指令由不同的响应主体进行响应,在工程上易于实施,具有很高的可行性。
在一个可选的方案中,所述在所述语音指令为第二类型指令时,将所述语音指令交由所述目标设备响应,包括:
在所述语音指令为第二类型指令时,将所述语音指令转换成目标类型指令,将所述目标类型指令交由所述目标设备响应。
本申请中,如果语音指令为增强现实设备无法响应、但目标设备可以响应的指令,则需要将语音指令转化为目标设备可以识别或者响应的类型,进而实现对语音指令的响应。
示例性地,假设增强现实设备连接的目标设备为第三方手机或电脑端,当
增强现实设备中的语音关键词检测技术服务识别到语音指令后,将语音指令转化为Event Id通知给增强现实设备中的语音指令控制应用,语音指令控制应用在收到指令Event Id后,将指令Event Id与预先定义的指令功能对应的编号进行对应,根据对应的编号代表的预定义的指令功能做出对应的反馈。其中,预定义的指令功能为预先定义的指令集中不同编号代表的指令功能,如编号1代表声音增大功能,编号2代表亮度增大功能,编号3代表模式切换功能等。
当指令Event Id对应的指令为第一类型指令,即当指令Event Id与为增强现实设备预定义的指令功能对应的编号存在对应关系,也就是说指令Event Id对应的指令为增强现实设备可以响应的自身控制相关指令,如声音调节、亮度调节、显示切换、模式切换等与增强现实设备硬件相关的指令时,语音指令控制应用直接通过系统应用程序编程接口(API,Application Programming Interface)完成对应的控制操作。当指令Event Id对应的指令为第二类型指令,即当指令Event Id与为目标设备预定义的指令功能对应的编号存在对应关系,也就是说当指令Event Id对应的指令为需要目标设备进行响应的指令时,通过将指令Event Id转化为USB KeyBoard协议Id发送给接入的目标设备。目标设备对USB KeyBoard协议Id进行响应,执行对增强现实设备输出的多媒体数据的操作。示例性地,将指令“返回上一步”定义为目标设备的键盘“F1”键,当用户点击“F1”键时,增强现实设备输出的多媒体数据就会返回上一步操作。或者将“主菜单”定义为目标设备的键盘“F2”键,当用户点击“F2”键时,增强现实设备就会弹出主菜单界面。或者还可以将“确认”定义为目标设备的键盘“enter”键,当用户点击“enter”键时就可以进行确认操作。
本申请中的增强现实设备的前述执行过程可由设置在增强现实设备中的高性能专用芯片来实现,高性能专用芯片的设置可提升计算效率,实现对语音指令的快速响应。此外,本申请中,采用低功耗语音关键词检测技术服务,具有所需算力低、资源消耗少等显著优势。在不明显增加功耗的情况下,赋予了增强现实设备AR能力,极大程度提升了增强现实设备自身控制能力。再有,语音指令可定制、可扩展,弥补了通过按键控制的不足之处。借助通用的USB KeyBoard协议,增强现实设备将识别到的语音指令转化为目标设备可识别或响应的目标类型,将目标设备对语音指令的响应看做是用户操作(如前述的目标设备的键盘“F1”键、目标设备的键盘“F2”键以及目标设备的键盘“enter”键等)按键而对语音指令执行的响应。这种方案相当于将增强现实设备作为目标设备的控制外设,在低成本接入的情况下赋予目标设备语音交互能力。在实际应用中增加了增强现实设备的产品竞争力。
在一个可选的方案中,所述音频采集单元用于对目标语音数据进行采集,所述目标语音数据包括语音指令;
所述方法还包括:
通过确定所述目标语音数据中是否存在语音指令而获取所述语音指令;
和/或,将所述目标语音数据交由所述目标设备。
本方案中,增强现实设备还包括关键词检测单元及控制应用单元。关键词检测单元包含有语音关键词检测技术服务,用于判断接入的目标设备类型,以及识别语音指令。控制应用单元包含有语音指令控制应用,用于判断指令类型,将不同类型的指令交由不同的响应主体进行响应。目标语音数据包括语音指令和/或在增强现实设备进行多媒体数据输出时产生的语音数据。
可以理解,本申请中的语音指令指的是通过语音形式输入至增强现实设备的指令,是一种指令数据。麦克风阵列传感器采集到的目标语音数据可能是指令数据。除此之外,麦克风阵列传感器采集到的语音数据可能是进行多媒体数据输出时产生的语音数据,如在通过增强现实设备接听来电时,麦克风阵列传感器采集到的语音数据可能是通话内容。或者在通过增强现实设备进行电影的投影时,麦克风阵列传感器采集到的语音数据可能是电影中的音频内容。
增强现实设备通过麦克风阵列传感器采集到目标语音数据,识别目标语音数据是否包含语音指令、和/或是否包含增强现实设备进行多媒体数据输出时产生的语音数据。如果经识别发现目标语音数据既包含语音指令,又包含增强现实设备进行多媒体数据输出时产生的语音数据,则可将这两种语音数据进行分离,以对两种语音数据执行不同的处理。
当目标语音数据中包含语音指令时,增强现实设备通过关键词检测单元判断不同的目标设备类型,从而采用不同的响应模式。具体的,当目标设备为第一类型时,由目标设备对语音指令进行响应。当目标设备为第二类型或未检测到目标设备时,通过关键词检测单元识别语音指令,控制应用单元判断不同的指令类型,从而选用不同的响应主体对语音指令进行响应。
当目标语音数据中包含增强现实设备进行多媒体数据输出时产生的语音数据,可将该语音数据交由目标设备,由目标设备对该语音数据进行录制、语义识别等处理。
前述方案中,通过将采集到的目标语音数据传送到增强现实设备端和/或目标设备端,在工程上易于实施,保证了增强现实设备端和/或目标设备对目标语音数据的正常使用。
在一个可选的方案中,所述通过确定所述目标语音数据中是否存在语音指
令而获取所述语音指令;和/或,将所述目标语音数据交由所述目标设备,包括:
将所述目标语音数据进行降噪处理,得到降噪后的目标语音数据;
通过确定所述降噪后的目标语音数据中是否存在语音指令而获取所述语音指令;
和/或,将所述降噪后的目标语音数据交由所述目标设备。
本方案中,增强现实设备还包括降噪单元,用于对目标语音数据进行降噪处理,得到降噪后的目标语音数据。增强现实设备通过麦克风阵列传感器采集到目标语音数据,将采集到的目标语音数据通过降噪单元对目标语音数据进行降噪处理,得到降噪后的目标语音数据。将降噪后的目标语音数据传送给增强现实设备端和/或目标设备端。当目标语音数据中包含语音指令时,增强现实设备端通过关键词检测单元判断不同的目标设备类型,从而采用不同的响应模式。当目标设备类型为第一类型时,由目标设备对语音指令进行响应。当目标设备类型为第二类型或未检测到目标设备时,通过关键词检测单元识别语音指令,控制应用单元判断不同的指令类型,从而选用不同的响应主体对语音指令进行响应。
当目标语音数据中包含增强现实设备进行多媒体数据输出时产生的语音数据,可将降噪后的语音数据交由目标设备,由目标设备对该降噪后的语音数据进行录制、语义识别等处理。
通过将采集到的目标语音数据进行降噪,使得经过降噪后的目标语音数据是不包含噪声或包含噪声较少的语音数据,可以提高传送给增强现实设备端和/或目标设备端的语音数据的质量,从而实现对目标语音数据的准确响应。
在一个可选的方案中,所述通过确定所述降噪后的目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述降噪后的目标语音数据交由所述目标设备,包括:
基于降噪后的目标语音数据,得到第一目标语音数据和第二目标语音数据;
通过确定所述第一目标语音数据中是否存在语音指令而获取所述语音指令;
和/或,将所述第二目标语音数据交由所述目标设备。
本方案中,增强现实设备还包括复制分流单元,用于将降噪后的目标语音数据进行复制,得到两份降噪后的目标语音数据,将两份降噪后的目标语音数据进行分流,得到第一目标语音数据和第二目标语音数据。如图4所示,增强现实设备将读取到的麦克风音频数据Audio In送给特定的降噪单元进行定向降噪,得到增强后的不含噪声或含噪声较少的音频数据Clean Audio,不含噪声或含噪声较少的音频数据通过复制分流单元得到第一目标语音数据和第二目标语
音数据。可以理解,第一目标语音数据和第二目标语音数据均为与经过降噪后的不含噪声或含噪声较少的Clean Audio相同的语音数据。第二目标语音数据直接通过硬件有线或无线方式传输给目标设备Audio Out,第一目标语音数据由增强现实设备端处理。进一步的,当目标语音数据中存在语音指令时,增强现实设备端获取语音指令,通过关键词检测单元判断不同的目标设备类型,从而采用不同的响应模式。具体的,当目标设备为第一类型时,由目标设备对语音指令进行响应。当目标设备为第二类型或未检测到目标设备时,通过关键词检测单元识别语音指令,控制应用单元判断不同的指令类型,从而选用不同的响应主体对语音指令进行响应。
通过将降噪后的目标语音数据进行复制并分流,得到第一目标语音数据和第二目标语音数据,第一目标语音数据由增强现实设备端进行处理,是为了进行语音指令识别以及由不同响应主体进行响应的方案。第二目标语音数据传送到目标设备端是为了让采集到的目标语音数据被目标设备的其他应用所用。示例性地,当用户进行通话时,采集到的目标语音数据作为通话内容被目标设备的通话应用所用。对降噪后的目标语音数据进行复制并分流,能够保证增强现实设备端和目标设备端的应用都能互不干扰的获取到各自需要的音频数据。
本申请实施例提供一种语音交互装置,如图5所示,所述装置包括:
获取单元501,用于基于增强现实设备的音频采集单元获取用户的语音指令;
第一确定单元502,用于确定与所述增强现实设备连接的目标设备的类型;
第二确定单元503,用于在所述目标设备为第一类型时,确定对所述语音指令的目标响应模式为第一响应模式;
第三确定单元504,用于在所述目标设备为第二类型或未检测到目标设备时,确定对所述语音指令的目标响应模式为第二响应模式;
响应单元505,用于采用所述目标响应模式,对所述语音指令进行响应。
在一个可选的方案中,所述响应单元505,用于在所述目标响应模式为第一响应模式时,将所述语音指令交由所述目标设备响应;在所述目标响应模式为第二响应模式时,基于所述语音指令的类型,确定由所述增强现实设备和所述目标设备中的其中之一进行响应。
在一个可选的方案中,所述响应单元505,用于在所述语音指令为第一类型指令时,由所述增强现实设备对所述语音指令进行响应;在所述语音指令为第二类型指令时,将所述语音指令交由所述目标设备响应。
在一个可选的方案中,所述响应单元505,用于在所述语音指令为第二类型指令时,将所述语音指令转换成目标类型指令,将所述目标类型指令交由所述
目标设备响应。
在一个可选的方案中,所述音频采集单元用于对目标语音数据进行采集,所述目标语音数据包括语音指令;
所述装置还包括:
语音数据单元,用于通过确定所述目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述目标语音数据交由所述目标设备。
在一个可选的方案中,所述装置还包括:降噪单元;所述降噪单元用于将所述目标语音数据进行降噪处理,得到降噪后的目标语音数据;相应地,所述语音数据单元,还用于通过确定所述降噪后的目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述降噪后的目标语音数据交由所述目标设备。
在一个可选的方案中,所述装置还包括:复制分流单元;所述复制分流单元用于基于降噪后的目标语音数据,得到第一目标语音数据和第二目标语音数据;相应地,所述语音数据单元,还用于通过确定所述第一目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述第二目标语音数据交由所述目标设备。
需要说明的是,本申请实施例的语音交互装置,由于该装置解决问题的原理与前述的语音交互方法相似,因此,该装置的实施过程、实施原理及有益效果均可以参见前述方法的实施过程、实施原理及有益效果的描述,重复之处不再赘述。
本申请实施例提供一种增强现实设备,所述增强现实设备至少包括本申请所述的语音交互装置。
根据本申请的实施例,本申请还提供了一种电子设备。
图6示出了可以用来实施本申请的实施例的示例电子设备600的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。
如图6所示,电子设备600包括计算单元601,其可以根据存储在只读存储器(ROM)602中的计算机程序或者从存储单元608加载到随机访问存储器(RAM)603中的计算机程序,来执行各种适当的动作和处理。在RAM 603中,
还可存储电子设备600操作所需的各种程序和数据。计算单元601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
电子设备600中的多个部件连接至I/O接口605,包括:输入单元606,例如键盘、鼠标等;输出单元607,例如各种类型的显示器、扬声器等;存储单元608,例如磁盘、光盘等;以及通信单元609,例如网卡、调制解调器、无线通信收发机等。通信单元609允许电子设备600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元601可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元601的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元601执行上文所描述的各个方法和处理,例如语音交互方法。例如,在一些实施例中,语音交互方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元608。在一些实施例中,计算机程序的部分或者全部可以经由ROM 602和/或通信单元609而被载入和/或安装到电子设备600上。当计算机程序加载到RAM 603并由计算单元601执行时,可以执行上文描述的语音交互方法的一个或多个步骤。备选地,在其他实施例中,计算单元601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行语音交互方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本申请的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、
部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
Claims (10)
- 一种语音交互方法,其中,所述方法包括:基于增强现实设备的音频采集单元获取用户的语音指令;确定与所述增强现实设备连接的目标设备的类型;在所述目标设备为第一类型时,确定对所述语音指令的目标响应模式为第一响应模式;在所述目标设备为第二类型或未检测到目标设备时,确定对所述语音指令的目标响应模式为第二响应模式;采用所述目标响应模式,对所述语音指令进行响应。
- 根据权利要求1所述的方法,其中,所述采用所述目标响应模式,对所述语音指令进行响应,包括:在所述目标响应模式为第一响应模式时,将所述语音指令交由所述目标设备响应;在所述目标响应模式为第二响应模式时,基于所述语音指令的类型,确定由所述增强现实设备和所述目标设备中的其中之一进行响应。
- 根据权利要求2所述的方法,其中,所述基于所述语音指令的类型,确定由所述增强现实设备和所述目标设备中的其中之一进行响应,包括:在所述语音指令为第一类型指令时,由所述增强现实设备对所述语音指令进行响应;在所述语音指令为第二类型指令时,将所述语音指令交由所述目标设备响应。
- 根据权利要求3所述的方法,其中,所述在所述语音指令为第二类型指令时,将所述语音指令交由所述目标设备响应,包括:在所述语音指令为第二类型指令时,将所述语音指令转换成目标类型指令,将所述目标类型指令交由所述目标设备响应。
- 根据权利要求1所述的方法,其中,所述音频采集单元用于对目标语音数据进行采集,所述目标语音数据包括语音指令;所述方法还包括:通过确定所述目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述目标语音数据交由所述目标设备。
- 根据权利要求5所述的方法,其中,所述通过确定所述目标语音数据中 是否存在语音指令而获取所述语音指令;和/或,将所述目标语音数据交由所述目标设备,包括:将所述目标语音数据进行降噪处理,得到降噪后的目标语音数据;通过确定所述降噪后的目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述降噪后的目标语音数据交由所述目标设备。
- 根据权利要求6所述的方法,其中,所述通过确定所述降噪后的目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述降噪后的目标语音数据交由所述目标设备,包括:基于降噪后的目标语音数据,得到第一目标语音数据和第二目标语音数据;通过确定所述第一目标语音数据中是否存在语音指令而获取所述语音指令;和/或,将所述第二目标语音数据交由所述目标设备。
- 一种语音交互装置,其中,所述装置包括:获取单元,用于基于增强现实设备的音频采集单元获取用户的语音指令;第一确定单元,用于确定与所述增强现实设备连接的目标设备的类型;第二确定单元,用于在所述目标设备为第一类型时,确定对所述语音指令的目标响应模式为第一响应模式;第三确定单元,用于在所述目标设备为第二类型或未检测到目标设备时,确定对所述语音指令的目标响应模式为第二响应模式;响应单元,用于采用所述目标响应模式,对所述语音指令进行响应。
- 一种增强现实设备,其中,至少包括权利要求8所述的语音交互装置。
- 一种电子设备,其中,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310096785.6 | 2023-01-12 | ||
CN202310096785.6A CN116030810A (zh) | 2023-01-12 | 2023-01-12 | 语音交互方法、装置及相关设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024149352A1 true WO2024149352A1 (zh) | 2024-07-18 |
Family
ID=86075860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/071940 WO2024149352A1 (zh) | 2023-01-12 | 2024-01-12 | 语音交互方法、装置及相关设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116030810A (zh) |
WO (1) | WO2024149352A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116030810A (zh) * | 2023-01-12 | 2023-04-28 | 杭州灵伴科技有限公司 | 语音交互方法、装置及相关设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150293738A1 (en) * | 2014-04-15 | 2015-10-15 | Samsung Display Co., Ltd. | Wearable device |
US20180261224A1 (en) * | 2017-03-08 | 2018-09-13 | Jetvox Acoustic Corp. | Wireless voice-controlled system and wearable voice transmitting-receiving device thereof |
CN109658932A (zh) * | 2018-12-24 | 2019-04-19 | 深圳创维-Rgb电子有限公司 | 一种设备控制方法、装置、设备及介质 |
CN111966321A (zh) * | 2020-08-24 | 2020-11-20 | Oppo广东移动通信有限公司 | 音量调节方法、ar设备及存储介质 |
US20210366472A1 (en) * | 2019-04-17 | 2021-11-25 | Lg Electronics Inc. | Artificial intelligence apparatus for speech interaction and method for the same |
CN116030810A (zh) * | 2023-01-12 | 2023-04-28 | 杭州灵伴科技有限公司 | 语音交互方法、装置及相关设备 |
-
2023
- 2023-01-12 CN CN202310096785.6A patent/CN116030810A/zh active Pending
-
2024
- 2024-01-12 WO PCT/CN2024/071940 patent/WO2024149352A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150293738A1 (en) * | 2014-04-15 | 2015-10-15 | Samsung Display Co., Ltd. | Wearable device |
US20180261224A1 (en) * | 2017-03-08 | 2018-09-13 | Jetvox Acoustic Corp. | Wireless voice-controlled system and wearable voice transmitting-receiving device thereof |
CN109658932A (zh) * | 2018-12-24 | 2019-04-19 | 深圳创维-Rgb电子有限公司 | 一种设备控制方法、装置、设备及介质 |
US20210366472A1 (en) * | 2019-04-17 | 2021-11-25 | Lg Electronics Inc. | Artificial intelligence apparatus for speech interaction and method for the same |
CN111966321A (zh) * | 2020-08-24 | 2020-11-20 | Oppo广东移动通信有限公司 | 音量调节方法、ar设备及存储介质 |
CN116030810A (zh) * | 2023-01-12 | 2023-04-28 | 杭州灵伴科技有限公司 | 语音交互方法、装置及相关设备 |
Also Published As
Publication number | Publication date |
---|---|
CN116030810A (zh) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11985464B2 (en) | Wireless audio output devices | |
US20230251822A1 (en) | Changing companion communication device behavior based on status of wearable device | |
US10621980B2 (en) | Execution of voice commands in a multi-device system | |
WO2024149352A1 (zh) | 语音交互方法、装置及相关设备 | |
JP6742465B2 (ja) | ブルートゥーススピーカーにおける連続ウェイクアップ遅延低減の方法、装置及びブルートゥーススピーカー | |
CN112394895B (zh) | 画面跨设备显示方法与装置、电子设备 | |
CN114443256A (zh) | 资源调度方法及电子设备 | |
CN109274405B (zh) | 数据传输方法、装置、电子设备及计算机可读介质 | |
CN109101517B (zh) | 信息处理方法、信息处理设备以及介质 | |
KR102669342B1 (ko) | 디바이스 점유 방법 및 전자 디바이스 | |
WO2022100309A1 (zh) | 桌面元数据的显示方法、访问方法及相关装置 | |
WO2024113716A1 (zh) | 一种支持iops突发的方法、装置、电子设备及介质 | |
CN116028205B (zh) | 资源调度方法和电子设备 | |
WO2018157499A1 (zh) | 一种语音输入的方法和相关设备 | |
WO2024103926A1 (zh) | 语音控制方法、装置、存储介质以及电子设备 | |
WO2020135131A1 (zh) | 网络热点的切换方法、智能终端及计算机可读存储介质 | |
CN112995402A (zh) | 控制方法及装置、计算机可读介质和电子设备 | |
CN114745451A (zh) | 数据传输方法及装置、电子设备和计算机可读介质 | |
WO2020019844A1 (zh) | 语音数据处理方法及相关产品 | |
CN113778255A (zh) | 触摸识别方法和装置 | |
CN115113751A (zh) | 调整触摸手势的识别参数的数值范围的方法和装置 | |
CN109918015B (zh) | 终端响应方法、终端及计算机可读存储介质 | |
WO2018201993A1 (zh) | 图像绘制方法、终端及存储介质 | |
WO2019061287A1 (zh) | 一种电子设备和降低功耗的方法及装置 | |
CN109658930B (zh) | 语音信号处理方法、电子装置及计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24741348 Country of ref document: EP Kind code of ref document: A1 |