WO2018121747A1 - 语音控制方法和装置 - Google Patents

语音控制方法和装置 Download PDF

Info

Publication number
WO2018121747A1
WO2018121747A1 PCT/CN2017/119923 CN2017119923W WO2018121747A1 WO 2018121747 A1 WO2018121747 A1 WO 2018121747A1 CN 2017119923 W CN2017119923 W CN 2017119923W WO 2018121747 A1 WO2018121747 A1 WO 2018121747A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
audio unit
unit
audio
information input
Prior art date
Application number
PCT/CN2017/119923
Other languages
English (en)
French (fr)
Inventor
王嘉晋
熊友军
Original Assignee
深圳市优必选科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技有限公司 filed Critical 深圳市优必选科技有限公司
Publication of WO2018121747A1 publication Critical patent/WO2018121747A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/10Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones

Definitions

  • the present invention relates to the field of speech recognition, and in particular to a speech control method and apparatus.
  • electronic devices with voice control functions generally have only one microphone or pickup as an audio input unit on the hardware.
  • the microphone When a voice call or a voice is recorded, the microphone is occupied, and the voice recognition engine program cannot use this.
  • a microphone is used to identify the voice command.
  • the prior art usually writes a voice engine and a video call or a voice call in an application, so that the voice is first recognized by the voice engine, and after the recognition is not an instruction, the voice is transparently transmitted to the video call or the voice input logic, but this has Two disadvantages:
  • the present invention aims to provide a voice control method and apparatus, which can solve the problem that the prior art usually writes a voice engine and a video call or voice input in an application, and all ordinary voices pass through the voice. Recognition processing, and then input, the voice has a large delay, it is easy to audio and video are not synchronized, and the problem of custom video call or voice entry program is required.
  • a voice control method is applied to a system provided with a first audio unit and a second audio unit, the voice control method comprising the following steps:
  • the method before the acquiring the first voice information input by the first audio unit, the method further includes the following steps:
  • the first audio unit is woken up if the first audio unit is allowed to wake up.
  • stopping acquiring the second voice information input by the second audio unit specifically: if it is necessary to stop acquiring the second voice information input by the second audio unit, hang up the audio call or video call.
  • the method before the acquiring the first voice information input by the first audio unit, the method further includes the following steps:
  • the first audio unit is assigned as an input source of a speech recognition engine.
  • the present invention also discloses a voice control apparatus, including:
  • a first acquiring unit configured to acquire first voice information input by the first audio unit
  • a second acquiring unit configured to acquire second voice information input by the second audio unit
  • An identification unit configured to identify a voice instruction in the first voice information
  • a first determining unit configured to determine, according to the voice instruction, whether to stop acquiring the second voice information input by the second audio unit
  • the voice control device further includes:
  • a receiving unit configured to receive a wake-up instruction for waking up the first audio unit
  • the second determining unit is configured to determine whether to allow the first audio unit to wake up, and if the first audio unit is allowed to wake up, wake up the first audio unit.
  • the stopping unit comprises:
  • the hangup unit is configured to hang up the audio call or the video call if it is required to stop acquiring the second voice information input by the second audio unit.
  • the voice control device further includes:
  • an allocating unit configured to allocate the first audio unit as an input source of a voice recognition engine.
  • the first audio unit and the second audio unit each comprise a microphone, a microphone matrix, a microphone interface, a microphone matrix interface or a wireless audio input device.
  • Voice control device including:
  • processors and memory for storing instructions executable by the processor
  • the processor is configured to:
  • the present invention has an advantageous effect: by using the first audio unit as the audio input source of the speech recognition engine and the second audio unit as the call in the system provided with the first audio unit and the second audio unit
  • the input source of other applications such as recording enables parallel recognition of voice commands during a call or recording process. It solves the problem that the industry cannot cope with voice commands (including hanging audio calls) in parallel in voice and video calls. This method eliminates the need to customize audio and video calls or recording programs, and avoids the problem of recording delays, resulting in audio and video out of sync.
  • Embodiment 1 is a schematic flow chart of a voice control method according to Embodiment 1 of the present invention.
  • FIG. 2 is a schematic flowchart diagram of a voice control method according to Embodiment 2 of the present invention.
  • FIG. 3 is a schematic structural diagram of a voice control apparatus according to Embodiment 3 of the present invention.
  • FIG. 4 is a schematic structural diagram of a voice control apparatus according to Embodiment 4 of the present invention.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • the voice control method as shown in FIG. 1 is applied to a system in which a first audio unit and a second audio unit are provided.
  • the audio input unit is occupied, which causes the voice recognition engine to use the audio input unit such as a microphone to perform voice command recognition.
  • an audio input unit is introduced, and the sound source of the voice recognition engine is designated as multiple. This audio input unit can recognize voice commands in parallel during a call or recording process.
  • a microphone source is introduced in the hardware, which can be accessed through an I2S (Inter-IC Sound) bus, which is dedicated to data transmission between audio devices and is widely used in various multimedia systems. It uses a design that transmits clock and data signals along separate wires. By separating the data from the clock signal, it avoids distortion caused by time difference, saving users the cost of purchasing professional equipment that resists audio jitter.
  • I2S Inter-IC Sound
  • the audio input unit may include a microphone, a microphone matrix, a microphone interface, a microphone matrix interface, or a wireless audio input device.
  • the voice control method includes the following steps:
  • the first audio unit has been set in advance as an audio input source of the speech recognition engine, and the first speech information is used as a speech recognition engine for speech recognition.
  • the speech recognition engine stores in advance a reaction mechanism such as a voice command and an application corresponding to the voice command, processing data, and making an action.
  • the processor, the voice recognition engine in the controller or the independent voice recognition chip processes the first voice information, and identifies whether the first voice information has information corresponding to the voice command stored in advance, if yes, proceed to step S130; If not, proceed to acquire the first voice information input by the first audio unit.
  • the voice command pre-stored in the voice recognition engine has a higher priority, or the second voice unit inputs the second voice message to interfere with the corresponding reaction mechanism of the voice command, and needs to stop acquiring the second voice input by the second audio unit. information.
  • the corresponding reaction mechanism of the voice instruction in the first voice information is to stop acquiring the second voice information input by the second audio unit.
  • the input of the second audio unit is stopped by transmitting a close or suspend command to an application that is using the second audio unit, such as an audio or video call, a recording, or the like.
  • the audio call can also be a recording process
  • the video call can also be a video recording process
  • the voice control method provided in this embodiment, by using the first audio unit as the audio input source of the voice recognition engine and the second audio unit as the call recording and other applications in the system provided with the first audio unit and the second audio unit
  • the input source enables speech commands to be recognized in parallel during a call or recording process. It solves the problem that the industry cannot cope with voice commands (including hanging audio calls) in parallel in voice and video calls. This method eliminates the need to customize audio and video calls or recording programs, and avoids the problem of recording delays, resulting in audio and video out of sync.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • the voice control method shown in FIG. 2 is applied to a system provided with a first audio unit and a second audio unit, the voice control method comprising the following steps:
  • the first audio unit is allocated as an input source of a voice recognition engine.
  • the "first" and “second” referred to in the present invention are only used to distinguish different components, and do not have a discrimination order.
  • the first audio unit can be assigned as an input source of the speech recognition engine, although other audio units can be assigned, for example, the second audio unit is an input source of the speech recognition engine.
  • the allocation may be implemented by means of an application programming interface (API) or the like.
  • API application programming interface
  • the position of the first audio unit and the second audio unit can be conveniently arranged or adjusted by an input source to which the speech recognition engine can be assigned.
  • the voice control method further includes the following steps:
  • a dedicated instruction can be set for starting the speech recognition engine. Before the speech recognition engine is started, even if a voice instruction pre-stored by the speech recognition engine is recognized, an event corresponding to the voice instruction is not executed.
  • S203 Determine whether the first audio unit is allowed to wake up. If the device is in an emergency talk state or has an instruction with a higher priority than the wake-up action, the first audio unit does not allow audio information to be output to the voice recognition engine even if a wake-up command to wake up the first audio unit is received.
  • step S204 If the first audio unit is allowed to wake up, the first audio unit is woken up. The activation of the first audio unit is effective, and the first voice information input by the first audio unit is allowed to be obtained, that is, step S210 is performed.
  • the speech recognition engine By enabling the speech recognition engine to be turned off and on, efficient use of device computing resources is achieved, and the effect to be achieved by the present invention is also ensured: by the first audio in a system in which the first audio unit and the second audio unit are provided
  • the unit serves as an audio input source for the speech recognition engine
  • the second audio unit serves as an input source for other applications such as call recording, so that voice commands can be recognized in parallel during a call or recording process.
  • Steps S210, S220, S230, and S240 respectively correspond to S110, S120, S130, and S140 in the first embodiment, and are not described again.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • the voice control device shown in FIG. 3 includes:
  • the first acquiring unit is configured to acquire first voice information input by the first audio unit.
  • the second acquiring unit is configured to acquire second voice information input by the second audio unit.
  • the first audio unit and the second audio unit each comprise a microphone, a microphone matrix, a microphone interface, a microphone matrix interface, or a wireless audio input device.
  • An allocating unit configured to allocate the first audio unit as an input source of a voice recognition engine.
  • the receiving unit is configured to receive a wake-up instruction for waking up the first audio unit.
  • the second determining unit is configured to determine whether to allow the first audio unit to wake up, and if the first audio unit is allowed to wake up, wake up the first audio unit.
  • An identifying unit configured to identify a voice instruction in the first voice information
  • the first determining unit is configured to determine, according to the voice instruction, whether to stop acquiring the second voice information input by the second audio unit;
  • the stopping unit includes a hanging unit (not shown) for hanging up an audio call or a video call if it is required to stop acquiring the second voice information input by the second audio unit.
  • Audio calls also include recording and other processes.
  • the present invention can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention or portions of the embodiments.
  • a computer device which may be a personal computer, server, or network device, etc.
  • modules or units described as separate components may or may not be physically separate, and the components illustrated as modules or units may or may not be physical modules, both It can be located in one place or it can be distributed to multiple network modules. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.
  • the invention is applicable to a wide variety of general purpose or special purpose computing system environments or configurations.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the apparatus shown in FIG. 4 includes: a processor 200 and a memory 300 for storing instructions executable by the processor 200;
  • the processor 200 is configured to:
  • the device provided by the embodiment of the present invention in the system provided with the first audio unit and the second audio unit, uses the first audio unit as an audio input source of the voice recognition engine, and the second audio unit serves as a call recording and other applications.
  • the input source enables voice commands to be recognized in parallel during a call or recording process. It solves the problem that the industry cannot cope with voice commands (including hanging audio calls) in parallel in voice and video calls. This method eliminates the need to customize audio and video calls or recording programs, and avoids the problem of recording delays, resulting in audio and video out of sync.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Business, Economics & Management (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明公开了语音控制方法,应用于设有第一音频单元和第二音频单元的系统,语音控制方法包括以下步骤:获取第一音频单元输入的第一语音信息;识别第一语音信息中的语音指令;根据语音指令判断是否需要停止获取第二音频单元输入的第二语音信息;若需要,则停止获取第二音频单元输入的第二语音信息。通过在设有第一音频单元和第二音频单元的系统中,将第一音频单元作为语音识别引擎的音频输入源,第二音频单元作为通话录音等其他应用的输入源,实现在通话或者录音过程中可以并行识别语音指令。解决了业界普遍存在的在音视频通话中无法同时用语音并行处理语音指令的问题。

Description

语音控制方法和装置 技术领域
本发明涉及语音识别领域,特别涉及语音控制方法和装置。
背景技术
现阶段,带有语音控制功能的电子装置一般来说硬件上只有一路麦克风或拾音器等作为音频输入单元,语音通话或者录入声音的时候,这一路麦克风会被占用,语音识别引擎程序就无法使用这一路麦克风进行语音指令的识别。现有技术通常是将语音引擎和视频通话或者语音录入写在一个应用里,这样语音先经过语音引擎识别,经过识别不是指令,则把语音透传给视频通话或者语音录入逻辑,但这样做有两个缺点:
1.所有普通语音都要经过语音识别处理,再进行录入,语音有较大延时,很容易音视频不同步。
2.需要定制视频通话或者语音录入程序,因为需要使用语音引擎提供的API来导入声音,机器人上无法使用普通的调用Android标准AudioRecord的第三方视频通话或者语音录入程序。
发明内容
为了克服现有技术的不足,本发明的目的在于提供语音控制方法和装置,其能解决现有技术通常是将语音引擎和视频通话或者语音录入写在一个应用里,所有普通语音都要经过语音识别处理,再进行录入,语音有较大延时,很容易音视频不同步,且需要定制视频通话或者语音录入程序的问题。
本发明的目的采用以下技术方案实现:
语音控制方法,应用于设有第一音频单元和第二音频单元的系统,所述语音控制方法包括以下步骤:
获取所述第一音频单元输入的第一语音信息;
识别所述第一语音信息中的语音指令;
根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息;
若需要,则停止获取所述第二音频单元输入的第二语音信息。
优选的,所述获取所述第一音频单元输入的第一语音信息之前,还包括以下步骤:
接收唤醒所述第一音频单元的唤醒指令;
判断是否允许唤醒所述第一音频单元;
若允许唤醒所述第一音频单元,则唤醒所述第一音频单元。
优选的,所述若需要,则停止获取所述第二音频单元输入的第二语音信息,具体为:若需要停止获取所述第二音频单元输入的第二语音信息,则挂断音频通话或视频通话。
优选的,所述获取所述第一音频单元输入的第一语音信息之前,还包括以下步骤:
分配所述第一音频单元为语音识别引擎的输入源。
另一方面,本发明还公开了语音控制装置,包括:
第一获取单元,用于获取所述第一音频单元输入的第一语音信息;
第二获取单元,用于获取所述第二音频单元输入的第二语音信息;
识别单元,用于识别所述第一语音信息中的语音指令;
第一判断单元,用于根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息;
停止单元,用于若需要,则停止获取所述第二音频单元输入的第二语音信息。
优选的,所述语音控制装置还包括:
接收单元,用于接收唤醒所述第一音频单元的唤醒指令;
第二判断单元,用于判断是否允许唤醒所述第一音频单元,若允许唤醒所述第一音频单元,则唤醒所述第一音频单元。
优选的,所述停止单元包括:
挂断单元,用于若需要停止获取所述第二音频单元输入的第二语音信息,则挂断音频通话或视频通话。
优选的,所述语音控制装置还包括:
分配单元,用于分配所述第一音频单元为语音识别引擎的输入源。
优选的,所述第一音频单元和第二音频单元均包括麦克风、麦克风矩阵、麦克风接口、麦克风矩阵接口或无线音频输入装置。
语音控制装置,包括:
处理器以及用于存储处理器可执行的指令的存储器;
所述处理器被配置为:
获取所述第一音频单元输入的第一语音信息;
识别所述第一语音信息中的语音指令;
根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息;
若需要,则停止获取所述第二音频单元输入的第二语音信息。
相比现有技术,本发明的有益效果在于:通过在设有第一音频单元和第二音频单元的系统中,将第一音频单元作为语音识别引擎的音频输入源,第二音频单元作为通话录音等其他应用的输入源,实现在通话或者录音过程中可以并行识别语音指令。解决了业界普遍存在的在音视频通话中无法同时用语音并行处理语音指令(包括挂断音频通话)的问题。该方法无需定制音视频通话或录音程序,且避免了录音延迟,导致音视频不同步的问题。
附图说明
图1是本发明实施例一提供的语音控制方法的流程示意图。
图2是本发明实施例二提供的语音控制方法的流程示意图。
图3是本发明实施例三提供的语音控制装置的结构示意图。
图4是本发明实施例四提供的语音控制装置的结构示意图。
具体实施方式
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其他目的、特征和优点能够更明显易懂,以下特举较佳实施 例,并配合附图,详细说明如下。
实施例一:
如图1所示的语音控制方法,应用于设有第一音频单元和第二音频单元的系统。针对语音通话和录音功能占有音频输入单元,导致语音识别引擎无法使用麦克风等音频输入单元进行语音指令识别的缺点,在硬件上多引入了一路音频输入单元,将语音识别引擎的声源指定为多加的这一路音频输入单元,在通话或者录音过程中可以并行识别语音指令。
具体的,在硬件上多引入一路麦克风源,可以通过I2S(Inter—IC Sound)总线接入,该总线专责于音频设备之间的数据传输,广泛应用于各种多媒体系统。它采用了沿独立的导线传输时钟与数据信号的设计,通过将数据和时钟信号分离,避免了因时差诱发的失真,为用户节省了购买抵抗音频抖动的专业设备的费用。
音频输入单元可以包括麦克风、麦克风矩阵、麦克风接口、麦克风矩阵接口或无线音频输入装置。
所述语音控制方法包括以下步骤:
S110,获取所述第一音频单元输入的第一语音信息。
预先已经设定第一音频单元为语音识别引擎的音频输入源,第一语音信息作为语音识别引擎进行语音识别的对象。
S120,识别所述第一语音信息中的语音指令。
语音识别引擎预先存储有语音指令和与语音指令相对应的应用、处理数据、做出动作等反应机制。处理器、控制器中的语音识别引擎 或者独立的语音识别芯片对第一语音信息做处理,识别第一语音信息中是否有与预先存储的语音指令对应的信息,若有,则进行步骤S130;若没有,则继续获取所述第一音频单元输入的第一语音信息。
S130,根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息。
语音识别引擎中预先存储的语音指令,有些优先级较高,或者第二音频单元输入第二语音信息会干扰语音指令相应的反应机制,就需要停止获取所述第二音频单元输入的第二语音信息。当然也包括第一语音信息中的语音指令相应的反应机制就是停止获取所述第二音频单元输入的第二语音信息这种情况。
S140,若需要,则停止获取所述第二音频单元输入的第二语音信息。
具体的,是通过向正在使用第二音频单元的应用如音视频通话、录音等发送关闭或中止命令,停止第二音频单元的输入。
所述若需要,则停止获取所述第二音频单元输入的第二语音信息,具体为:若需要停止获取所述第二音频单元输入的第二语音信息,则挂断音频通话或视频通话,音频通话也可是录音过程,视频通话也可以是录像过程。
本实施例提供的语音控制方法,通过在设有第一音频单元和第二音频单元的系统中,将第一音频单元作为语音识别引擎的音频输入源,第二音频单元作为通话录音等其他应用的输入源,实现在通话或者录音过程中可以并行识别语音指令。解决了业界普遍存在的在音视 频通话中无法同时用语音并行处理语音指令(包括挂断音频通话)的问题。该方法无需定制音视频通话或录音程序,且避免了录音延迟,导致音视频不同步的问题。
实施例二:
如图2所示的语音控制方法,应用于设有第一音频单元和第二音频单元的系统,所述语音控制方法包括以下步骤:
S201,分配所述第一音频单元为语音识别引擎的输入源。本发明涉及的“第一”和“第二”仅用于区别不同部件,不具备区分顺序作用。可以分配所述第一音频单元为语音识别引擎的输入源,当然也可以分配其他音频单元,如第二音频单元为语音识别引擎的输入源。
具体的,所述分配可以通过应用程序编程接口(Application Programming Interface,API)等手段来实现。
通过可以分配语音识别引擎的输入源,可以方便布置或调整第一音频单元和第二音频单元的位置。
作为本发明的进一步改进,所述语音控制方法还包括以下步骤:
S202,接收唤醒所述第一音频单元的唤醒指令。
具体的,可以为启动语音识别引擎设置一条专用指令。在语音识别引擎未启动之前,即使识别到语音识别引擎预先存储的语音指令,也不会执行与所述语音指令相对应的事件。
S203,判断是否允许唤醒所述第一音频单元。如果设备处于紧急通话状态或有比所述唤醒动作优先级高的指令,即使接收到唤醒所述第一音频单元的唤醒指令,第一音频单元也不允许向语音识别引擎输 出音频信息。
S204,若允许唤醒所述第一音频单元,则唤醒所述第一音频单元。第一音频单元激活生效,允许获取所述第一音频单元输入的第一语音信息,即执行步骤S210。
通过使语音识别引擎可关闭和开启,实现设备计算资源的高效利用,而同样可以保证本发明所要实现的效果:通过在设有第一音频单元和第二音频单元的系统中,将第一音频单元作为语音识别引擎的音频输入源,第二音频单元作为通话录音等其他应用的输入源,实现在通话或者录音过程中可以并行识别语音指令。
S210,获取所述第一音频单元输入的第一语音信息。
S220,识别所述第一语音信息中的语音指令。
S230,根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息。
S240,若需要,则停止获取所述第二音频单元输入的第二语音信息。
步骤S210、S220、S230和S240,分别对应实施例一中的S110、S120、S130和S140,不再赘述。
实施例三:
如图3所示的语音控制装置,包括:
111,第一获取单元,用于获取所述第一音频单元输入的第一语音信息。
112,第二获取单元,用于获取所述第二音频单元输入的第二语 音信息。
典型的,所述第一音频单元和第二音频单元均包括麦克风、麦克风矩阵、麦克风接口、麦克风矩阵接口或无线音频输入装置。
101,分配单元,用于分配所述第一音频单元为语音识别引擎的输入源。
102,接收单元,用于接收唤醒所述第一音频单元的唤醒指令;
103第二判断单元,用于判断是否允许唤醒所述第一音频单元,若允许唤醒所述第一音频单元,则唤醒所述第一音频单元。
120,识别单元,用于识别所述第一语音信息中的语音指令;
130,第一判断单元,用于根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息;
140,停止单元,用于若需要,则停止获取所述第二音频单元输入的第二语音信息。
具体的,所述停止单元包括挂断单元(图未示),用于若需要停止获取所述第二音频单元输入的第二语音信息,则挂断音频通话或视频通话。音频通话也包括录音等过程。
本实施例中的装置与前述实施例中的方法是基于同一发明构思下的两个方面,在前面已经对方法实施过程作了详细的描述,所以本领域技术人员可根据前述描述清楚地了解本实施中的系统的结构及实施过程,为了说明书的简洁,在此就不再赘述。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本发明时可以把各模块的功能在同一个或多个软件 和/或硬件中实现。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。
描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块或单元可以是或者也可以不是物理上分开的,作为模块或单元示意的部件可以是或者也可以不是物理模块,既可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本发明可用于众多通用或专用的计算系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、机顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等,如实施例四。
实施例四:
如图4所示的装置,包括:处理器200以及用于存储处理器200可执行的指令的存储器300;
所述处理器200被配置为:
获取所述第一音频单元输入的第一语音信息;
识别所述第一语音信息中的语音指令;
根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息;
若需要,则停止获取所述第二音频单元输入的第二语音信息。
本实施例中的装置与前述实施例中的方法是基于同一发明构思下的两个方面,在前面已经对方法实施过程作了详细的描述,所以本领域技术人员可根据前述描述清楚地了解本实施中的系统的结构及实施过程,为了说明书的简洁,在此就不再赘述。
本发明实施例提供的装置,通过在设有第一音频单元和第二音频单元的系统中,将第一音频单元作为语音识别引擎的音频输入源,第二音频单元作为通话录音等其他应用的输入源,实现在通话或者录音过程中可以并行识别语音指令。解决了业界普遍存在的在音视频通话中无法同时用语音并行处理语音指令(包括挂断音频通话)的问题。该方法无需定制音视频通话或录音程序,且避免了录音延迟,导致音视频不同步的问题。
对于本领域的技术人员来说,可根据以上描述的技术方案以及构思,做出其它各种相应的改变以及变形,而所有的这些改变以及变形都应该属于本发明权利要求的保护范围之内。

Claims (10)

  1. 语音控制方法,其特征在于,应用于设有第一音频单元和第二音频单元的系统,所述语音控制方法包括以下步骤:
    获取所述第一音频单元输入的第一语音信息;
    识别所述第一语音信息中的语音指令;
    根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息;
    若需要,则停止获取所述第二音频单元输入的第二语音信息。
  2. 如权利要求1所述的语音控制方法,其特征在于:所述获取所述第一音频单元输入的第一语音信息之前,还包括以下步骤:
    接收唤醒所述第一音频单元的唤醒指令;
    判断是否允许唤醒所述第一音频单元;
    若允许唤醒所述第一音频单元,则唤醒所述第一音频单元。
  3. 如权利要求1所述的语音控制方法,其特征在于:所述若需要,则停止获取所述第二音频单元输入的第二语音信息,具体为:若需要停止获取所述第二音频单元输入的第二语音信息,则挂断音频通话或视频通话。
  4. 如权利要求1-3中任一项所述的语音控制方法,其特征在于:所述获取所述第一音频单元输入的第一语音信息之前,还包括以下步骤:
    分配所述第一音频单元为语音识别引擎的输入源。
  5. 语音控制装置,其特征在于,包括:
    第一获取单元,用于获取所述第一音频单元输入的第一语音信 息;
    第二获取单元,用于获取所述第二音频单元输入的第二语音信息;
    识别单元,用于识别所述第一语音信息中的语音指令;
    第一判断单元,用于根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息;
    停止单元,用于若需要,则停止获取所述第二音频单元输入的第二语音信息。
  6. 如权利要求5所述的语音控制装置,其特征在于,还包括:
    接收单元,用于接收唤醒所述第一音频单元的唤醒指令;
    第二判断单元,用于判断是否允许唤醒所述第一音频单元,若允许唤醒所述第一音频单元,则唤醒所述第一音频单元。
  7. 如权利要求5所述的语音控制装置,其特征在于,所述停止单元包括:
    挂断单元,用于若需要停止获取所述第二音频单元输入的第二语音信息,则挂断音频通话或视频通话。
  8. 如权利要求5-7中任一项所述的语音控制装置,其特征在于,还包括:
    分配单元,用于分配所述第一音频单元为语音识别引擎的输入源。
  9. 如权利要求5-7中任一项所述的语音控制装置,其特征在于,所述第一音频单元和第二音频单元均包括麦克风、麦克风矩阵、麦克 风接口、麦克风矩阵接口或无线音频输入装置。
  10. 语音控制装置,其特征在于,包括:
    处理器以及用于存储处理器可执行的指令的存储器;
    所述处理器被配置为:
    获取所述第一音频单元输入的第一语音信息;
    识别所述第一语音信息中的语音指令;
    根据所述语音指令判断是否需要停止获取所述第二音频单元输入的第二语音信息;
    若需要,则停止获取所述第二音频单元输入的第二语音信息。
PCT/CN2017/119923 2016-12-31 2017-12-29 语音控制方法和装置 WO2018121747A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611264344.9 2016-12-31
CN201611264344.9A CN106686243A (zh) 2016-12-31 2016-12-31 语音控制方法和装置

Publications (1)

Publication Number Publication Date
WO2018121747A1 true WO2018121747A1 (zh) 2018-07-05

Family

ID=58850476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/119923 WO2018121747A1 (zh) 2016-12-31 2017-12-29 语音控制方法和装置

Country Status (2)

Country Link
CN (1) CN106686243A (zh)
WO (1) WO2018121747A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111263100A (zh) * 2020-01-19 2020-06-09 中移(杭州)信息技术有限公司 视频通话方法、装置、设备及存储介质
CN112837689A (zh) * 2019-11-25 2021-05-25 阿里巴巴集团控股有限公司 会议系统、数据通信系统及语音信息处理方法
CN114071318A (zh) * 2021-11-12 2022-02-18 阿波罗智联(北京)科技有限公司 语音处理方法、终端设备及车辆

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106686243A (zh) * 2016-12-31 2017-05-17 深圳市优必选科技有限公司 语音控制方法和装置
CN108520744B (zh) * 2018-03-15 2020-11-10 斑马网络技术有限公司 语音控制方法与装置,以及电子设备与存储介质
CN113473199B (zh) * 2018-09-03 2023-06-09 海信视像科技股份有限公司 基于麦克风的设备控制方法及装置
CN109243452A (zh) * 2018-10-26 2019-01-18 北京雷石天地电子技术有限公司 一种用于声音控制的方法及系统
CN111385911B (zh) * 2018-12-27 2022-06-28 深圳市优必选科技有限公司 一种巡检机器人及其语音通话方法
CN109995945A (zh) * 2019-03-29 2019-07-09 联想(北京)有限公司 处理方法和电子设备
CN112533081B (zh) * 2019-09-19 2023-04-18 成都鼎桥通信技术有限公司 录音处理方法、设备及存储介质
CN113053411B (zh) * 2020-03-30 2024-01-16 深圳市优克联新技术有限公司 语音数据处理设备、方法、系统及存储介质
CN112565659B (zh) * 2020-12-07 2023-08-18 康佳集团股份有限公司 一种在音视频应用工作时执行语音指令的方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007104343A (ja) * 2005-10-04 2007-04-19 Kenwood Corp ハンズフリー装置、制御方法及びプログラム
CN202121655U (zh) * 2011-04-29 2012-01-18 武汉光动能科技有限公司 语音控制的车载多媒体影音装置
CN104572009A (zh) * 2015-01-28 2015-04-29 合肥联宝信息技术有限公司 一种自适应外界环境的音频控制方法及装置
CN105976815A (zh) * 2016-04-22 2016-09-28 乐视控股(北京)有限公司 车载语音识别方法及装置
CN106686243A (zh) * 2016-12-31 2017-05-17 深圳市优必选科技有限公司 语音控制方法和装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105913844A (zh) * 2016-04-22 2016-08-31 乐视控股(北京)有限公司 车载语音获取方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007104343A (ja) * 2005-10-04 2007-04-19 Kenwood Corp ハンズフリー装置、制御方法及びプログラム
CN202121655U (zh) * 2011-04-29 2012-01-18 武汉光动能科技有限公司 语音控制的车载多媒体影音装置
CN104572009A (zh) * 2015-01-28 2015-04-29 合肥联宝信息技术有限公司 一种自适应外界环境的音频控制方法及装置
CN105976815A (zh) * 2016-04-22 2016-09-28 乐视控股(北京)有限公司 车载语音识别方法及装置
CN106686243A (zh) * 2016-12-31 2017-05-17 深圳市优必选科技有限公司 语音控制方法和装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837689A (zh) * 2019-11-25 2021-05-25 阿里巴巴集团控股有限公司 会议系统、数据通信系统及语音信息处理方法
CN111263100A (zh) * 2020-01-19 2020-06-09 中移(杭州)信息技术有限公司 视频通话方法、装置、设备及存储介质
CN114071318A (zh) * 2021-11-12 2022-02-18 阿波罗智联(北京)科技有限公司 语音处理方法、终端设备及车辆
CN114071318B (zh) * 2021-11-12 2023-11-14 阿波罗智联(北京)科技有限公司 语音处理方法、终端设备及车辆

Also Published As

Publication number Publication date
CN106686243A (zh) 2017-05-17

Similar Documents

Publication Publication Date Title
WO2018121747A1 (zh) 语音控制方法和装置
US10204624B1 (en) False positive wake word
JP2019015952A (ja) ウェイクアップ方法、デバイス及びシステム、クラウドサーバーと可読媒体
JP2019128939A (ja) ジェスチャーによる音声ウェイクアップ方法、装置、設備及びコンピュータ可読媒体
JP2019128938A (ja) 読話による音声ウェイクアップ方法、装置、設備及びコンピュータ可読媒体
CN106250093A (zh) 先前捕捉的音频的检索机制
WO2020244257A1 (zh) 语音唤醒方法、系统、电子设备及计算机可读存储介质
WO2018049933A1 (zh) 数据迁移方法及相关产品
CN110362288B (zh) 一种同屏控制方法、装置、设备及存储介质
GB2565420A (en) Interactive sessions
US11074912B2 (en) Identifying a valid wake input
US20190050195A1 (en) Output provision based on gaze detection
US11948565B2 (en) Combining device or assistant-specific hotwords in a single utterance
CN109697987A (zh) 一种外接式的远场语音交互装置及实现方法
WO2024103926A1 (zh) 语音控制方法、装置、存储介质以及电子设备
CN111063356A (zh) 电子设备响应方法及系统、音箱和计算机可读存储介质
US5483618A (en) Method and system for distinguishing between plural audio responses in a multimedia multitasking environment
US11423893B2 (en) Response to secondary inputs at a digital personal assistant
US20180350360A1 (en) Provide non-obtrusive output
US20180061361A1 (en) Managing display setting based on motion sensor activity for universal platform applications
CN111263100A (zh) 视频通话方法、装置、设备及存储介质
CN110120963B (zh) 一种数据处理方法、装置、设备和机器可读介质
CN115269048A (zh) 应用程序的并发控制方法、电子设备和可读存储介质
US20190019505A1 (en) Sustaining conversational session
US20190065608A1 (en) Query input received at more than one device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17886711

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17886711

Country of ref document: EP

Kind code of ref document: A1