WO2020228332A1 - 语音助手系统的控制方法、控制装置及蓝牙耳机 - Google Patents

语音助手系统的控制方法、控制装置及蓝牙耳机 Download PDF

Info

Publication number
WO2020228332A1
WO2020228332A1 PCT/CN2019/127460 CN2019127460W WO2020228332A1 WO 2020228332 A1 WO2020228332 A1 WO 2020228332A1 CN 2019127460 W CN2019127460 W CN 2019127460W WO 2020228332 A1 WO2020228332 A1 WO 2020228332A1
Authority
WO
WIPO (PCT)
Prior art keywords
hot word
signal
voice
assistant system
voice assistant
Prior art date
Application number
PCT/CN2019/127460
Other languages
English (en)
French (fr)
Inventor
牛裔
曾森
宋婷
吴玉锦
Original Assignee
出门问问信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 出门问问信息科技有限公司 filed Critical 出门问问信息科技有限公司
Publication of WO2020228332A1 publication Critical patent/WO2020228332A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1041Mechanical or electronic switches, or control elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present disclosure relates to intelligent voice technology, in particular to a control method and a control device of a voice assistant system, which are used to control the startup of the voice assistant system based on the hot word library in the earphone using hot words; and particularly to the voice assistant system Control device's Bluetooth headset.
  • voice recognition technology has also entered a stage of rapid development, and its applications have become more mature and extensive.
  • voice assistants products based on intelligent voice recognition technology, include smart phones, tablet computers and other types of products. It has been widely used on smart terminals and provides a very good user experience.
  • the user wears a Bluetooth headset device that includes a microphone capable of receiving audio.
  • a voice command containing a hot word the hot word is picked up by the microphone.
  • the Bluetooth system of the Bluetooth headset device is transmitted to the smart phone, and the corresponding function of the smart terminal is awakened through the voice assistant on the smart phone.
  • the voice containing hot words appearing near the user will be picked up by the Bluetooth headset device worn by the user and transmitted to the smart phone, and a certain function will be awakened by the voice assistant accordingly.
  • voice containing hot words appearing near the user 1.
  • the voice that is not made by the user but by others (it can be called “malicious voice input”); 2.
  • the voice is made by the user himself Emitted, however, is, for example, played through a playback device, in other words, is not a voice uttered by the user in real time (it can be called "voice cracking").
  • voice cracking Neither other people's voice nor the user's non-real-time voice are driven by the user's real intentions, and will cause false wakeups, which greatly compromises the original excellent user experience of the voice assistant.
  • a control method of a voice assistant system which is used to control the startup of the voice assistant system based on voice instructions and a hot word library in a headset.
  • the control method may include: picking up voice through air vibration Instructions; pick up the voice instruction by the human body vibration of the issuer of the voice instruction; convert the voice instruction picked up by air vibration into a first signal, and convert the voice instruction picked up by the human body vibration of the voice instruction into a second signal; The first signal is compared with the hot word database to obtain a first comparison result, and the second signal is compared with the hot word database to obtain a second comparison result; and based on the first comparison result and/or the first comparison result Second, compare the results to determine whether to activate the voice assistant system.
  • a control method of a voice assistant system wherein the first signal and the second signal are both digital signals obtained through analog-to-digital conversion, and the hot word database contains at least one hot word code.
  • a method for controlling a voice assistant system wherein if the first signal is inconsistent with all the hot word codes of the hot word database, it is determined not to start the voice assistant system; or if the first signal It is consistent with a hot word code of the hot word database, and the second signal is inconsistent with the one hot word code, then it is determined not to start the voice assistant system; or if the first signal is consistent with a hot word code of the hot word database, and the first signal If the second signal is inconsistent with all the hot word codes of the hot word database, it is determined not to start the voice assistant system; or if the first signal is consistent with a hot word code of the hot word database, and the second signal is consistent with the one hot word code , Then confirm to start the voice assistant system.
  • a method for controlling a voice assistant system wherein if the second signal is inconsistent with all the hot word codes of the hot word database, it is determined not to start the voice assistant system; or if the second signal If it is consistent with a hot word code in the hot word database, it is determined to start the voice assistant system.
  • a control method of a voice assistant system which further includes: picking up environmental noise; based on the environmental noise, noise-reducing voice commands picked up by air vibration; and noise-reducing voice The instruction is converted into the first signal.
  • a control method of a voice assistant system which further includes: detecting whether the earphone is worn by the sender of the voice command; and if the earphone is not worn by the sender of the voice command, determining not to start Voice assistant system.
  • a control device for a voice assistant system which is used to control the startup of the voice assistant system based on voice instructions and a hot word library in a headset, which includes: a first picking unit that vibrates through air Pick up the voice instruction; the second pickup unit picks up the voice instruction through the human body vibration of the issuer of the voice instruction; the conversion unit converts the voice instruction picked up through the air vibration into the first signal, and the human body vibration of the issuer through the voice instruction The picked up voice command is converted into a second signal; the comparison unit compares the first signal with the hot word database to obtain the first comparison result, and compares the second signal with the hot word database to obtain the second comparison And the calculation unit, based on the first comparison result and/or the second comparison result, to determine whether to activate the voice assistant system.
  • a control device for a voice assistant system wherein the first signal and the second signal are both digital signals obtained through analog-to-digital conversion, and the hot word database contains at least one hot word code, if If the first signal in the comparison unit is inconsistent with all the hot word codes of the hot word database, the computing unit determines not to start the voice assistant system; or if the first signal in the comparison unit is consistent with a hot word code in the hot word database , And the second signal is inconsistent with the one hot word code, the calculation unit determines not to start the voice assistant system; or if the first signal in the comparison unit is consistent with a hot word code in the hot word database, and the second signal is consistent with If all the hot word codes of the hot word database are inconsistent, the calculation unit determines not to start the voice assistant system; or if the first signal in the comparison unit is consistent with a hot word code of the hot word database, and the second signal is consistent with the one If the hot word codes are consistent, the computing unit determines to start the
  • a control device for a voice assistant system which further includes: a third pickup unit that picks up environmental noise; and a processing unit that reduces voice commands picked up by air vibration based on the environmental noise.
  • Noise wherein the conversion unit converts the noise-reduced voice command into the first signal; and/or the wearing detection unit detects whether the earphone is worn by the sender of the voice command; wherein, if the earphone is not worn by the sender of the voice command, Then the computing unit determines not to start the voice assistant system.
  • a Bluetooth headset which includes the control device of the above-mentioned voice assistant system.
  • the voice assistant system control method, control device and Bluetooth headset including the control device provided by the present disclosure can effectively prevent the voice assistant system from being triggered by mistake.
  • the control method, control device, and Bluetooth headset of the voice assistant system provided by the present disclosure can not only prevent the voice assistant from being triggered by mistake, but also reduce the environmental noise in the voice command.
  • Fig. 1 is a flowchart of a control method of a voice assistant system according to an embodiment of the present disclosure
  • Fig. 2 is a flowchart of a control method of a voice assistant system according to an embodiment of the present disclosure
  • Fig. 3 is a block diagram of a control device of a voice assistant system according to an embodiment of the present disclosure
  • Fig. 4 is a block diagram of a control device of a voice assistant system according to an embodiment of the present disclosure
  • Fig. 5 is a schematic diagram of a Bluetooth headset including a control device of a voice assistant system according to an embodiment of the present disclosure.
  • a control method of a voice assistant system is provided, and the control method can be used to control the start of the voice assistant system based on voice instructions and a hot word database in a headset.
  • a voice assistant system based on artificial intelligence technology is installed in a smart phone, and a headset (including a Bluetooth headset) worn by the user is installed with a hot word library containing at least one specific hot word,
  • the voice command is picked up by the microphone of the earphone worn by the user as a pickup device, and then through the control method, based on the picked up voice command and the hot word database, it can be realized Only when the voice command is issued by the user in real time, can the voice assistant system be activated or awakened, and interact with the voice assistant to meet the user's needs; otherwise, the voice assistant system will not be activated, which effectively prevents false wakeups happened.
  • the method will be described in conjunction with
  • the control method of the voice assistant system may include: picking up the voice instruction through air vibration; picking up the voice instruction through the human body vibration of the sender of the voice instruction; converting the voice instruction picked up through the air vibration into the first Signal, and convert the voice command picked up by the human body vibration of the voice command into a second signal; compare the first signal with the hot word database to obtain the first comparison result, and compare the second signal with the hot word The library is compared to obtain a second comparison result; and based on the first comparison result and/or the second comparison result, it is determined whether to start the voice assistant system.
  • Fig. 1 shows a flowchart of a control method 100 of a voice assistant system according to this embodiment, as shown in Fig.
  • step S110 the voice command is picked up by air vibration; in S120, the human body of the sender of the voice command Vibration picks up the voice command; in S130, the voice command picked up by air vibration is converted into a first signal, and the voice command picked up by the human body vibration of the sender of the voice command is converted into a second signal; in S140, the first signal A signal is compared with a hot word database to obtain a first comparison result, and the second signal is compared with a hot word database to obtain a second comparison result; in S150, based on the first comparison result and/or The second comparison result determines whether to start the voice assistant system. As shown in FIG. 1, in the control method 100, step S110 and step S120 are performed simultaneously, so that the same voice command can be picked up through different propagation paths.
  • the voice commands issued by the user can be propagated in the air as a medium, specifically in the form of waves through the vibration of the air.
  • a microphone also called a "microphone” can be used to pick up voice commands through air vibrations and convert the sound signals into analog electrical signals.
  • the microphone for picking up voice commands transmitted through air vibration can be set at a position where the headset is closer to the user’s mouth, and the microphone is directed toward the user’s mouth to increase the strength of the picked up voice and reduce the transmission path. May interfere.
  • the microphones used here can include electromotive, aluminum ribbon, capacitive, piezoelectric, electromagnetic, carbon particle, semiconductor microphones, and even MEMS microphone arrays.
  • the voice transmission medium used to pick up voice commands through air vibration is air
  • the voice commands picked up in this way may be issued by other people or media other than the user and transmitted using air as a medium and be mistaken Pick up, this is the cause of false wakeup.
  • the same voice command is also picked up by the body vibration of the speaker.
  • the voice commands issued by the user are transmitted through a solid medium such as the skeletal bones of the issuer. Therefore, it can also be picked up by a sound pickup device such as a bone conduction microphone. The same voice command transmitted, the pickup device converts the picked-up voice command into a corresponding analog electrical signal.
  • the voice instruction is picked up by the human body vibration of the voice instruction issuer.
  • the voice transmission medium used in this way is the human body vibration of the issuer. It can be determined that the voice instruction picked up in this manner is issued by the voice issuer in real time.
  • the two electrical signals are converted separately to further obtain the signals required for subsequent operations, so that the signals picked up by air vibration
  • the voice instruction is converted into a first signal
  • the voice instruction picked up by the human body vibration of the issuer of the voice instruction is converted into a second signal.
  • an analog-to-digital converter can be used for sampling at a sampling frequency of 22.05kHz, and quantization with a 16-bit word length quantized digits, thereby Convert two analog electrical signals into encoded digital signals respectively.
  • ADC analog-to-digital converter
  • a hot word database containing at least one designated hot word is constructed in advance, and the picked up voice instructions can be compared with the hot words in the hot word bank one by one to obtain the corresponding comparison result .
  • the comparison results include the following two types: (1) the voice command is consistent with a hot word in the hot word database; (2) the voice command is inconsistent with all the hot words in the hot word database.
  • the first signal corresponding to the voice instruction propagated through air vibration is compared with the hot word database to obtain the first comparison result, and the same voice instruction picked up by the human body vibration of the voice instruction is corresponding to The second signal is compared with the hot word database to obtain a second comparison result.
  • the process of comparing the first signal with the hot word database and comparing the second signal with the hot word database is essentially to compare the digital code as the first signal and the second signal with the hot word database in the hot word database.
  • the process of comparing word and digital codes For example, for the first signal, when 16-bit quantized digits are used, the digital code as the first signal is 1001110010111011.
  • the comparison result is the first A signal is consistent with a hot word in the hot word database; if the hot word code 1001110010111011 does not exist in the hot word database, the comparison result is that the first signal is inconsistent with all the hot words in the hot word database.
  • the voice assistant system after comparing the first signal and the second signal with the hot word database to obtain the first comparison result and the second comparison result, it can be based on The combination of the first comparison result and the second comparison result, or based only on the second comparison result, or only based on the first comparison result, determines whether to start the voice assistant system, that is, determine whether to start the voice assistant system, or determine Do not start the voice assistant system.
  • whether to start the voice assistant system can be determined based on the combination of the first comparison result and the second comparison result. Specifically, if the first signal and the heat If all the hot word codes of the thesaurus are inconsistent, it is determined not to start the voice assistant system; or if the first signal is consistent with a hot word code of the hot word database, but the second signal is inconsistent with a hot word code, then it is determined not to start the voice Assistant system; or if the first signal is consistent with a hot word code of the hot word database, but the second signal is inconsistent with all the hot word codes of the hot word database, then it is determined not to start the voice assistant system; or if the first signal is consistent with the hot word code If a hot word code in the thesaurus is consistent, and the second signal is consistent with a hot word code, it is determined to start the voice assistant system, which can effectively avoid false wake-ups. In terms of avoiding false wakeups, this is an optimal implementation.
  • the control method of the voice assistant system it is also possible to determine whether to start the voice assistant system based only on the second comparison result. Specifically, if the second signal and the hot word database are all hot If the word codes are inconsistent, it is determined not to start the voice assistant system; or if the second signal is consistent with a hot word code in the hot word database, it is determined to start the voice assistant system. In this way, false wake-ups can be avoided to a certain extent.
  • the mode selection can be used to stop picking up voice instructions through air vibration and subsequent conversion and comparison operations, and only pick up voice instructions through human vibration and perform Corresponding conversion and comparison operations can effectively reduce energy consumption, and can be used as a sub-optimal implementation to achieve a balance between preventing false wakeup and reducing energy consumption.
  • the control method of the voice assistant system it is also possible to determine whether to start the voice assistant system based only on the first comparison result. Specifically, if the first signal and the hot word database are all hot If the word codes are inconsistent, it is determined not to start the voice assistant system; or if the first signal is consistent with a hot word code in the hot word database, it is determined to start the voice assistant system.
  • control method of the voice assistant system may further include: picking up environmental noise; based on the environmental noise, denoising the voice command picked up by air vibration; and converting the noise-reduced voice command into a first signal.
  • Fig. 2 shows a flowchart of a control method 200 of a voice assistant system according to this embodiment, as shown in Fig.
  • the voice command is picked up by air vibration; in S220, the human body of the sender of the voice command Vibration picks up the voice command; in S230, pick up the environmental noise; in S240, based on the environmental noise, noise reduction is performed on the voice command picked up by air vibration; in S250, the noise-reduced voice command is converted into the first signal, And convert the voice command picked up by the human body vibration of the voice command into a second signal; in S260, compare the first signal with the hot vocabulary to obtain the first comparison result, and compare the second signal with The hot vocabulary is compared to obtain the second comparison result; in S270, it is determined whether to start the voice assistant system based on the first comparison result and/or the second comparison result.
  • the pickup of environmental noise in step S230 is performed simultaneously with the pickup of voice instructions by air vibration in step S210 and the pickup of voice instructions by human body vibration of the sender of the voice instruction in step S220.
  • step S230 the environmental noise in the external environment is picked up, and then in step S240, based on the environmental noise, the voice command picked up by air vibration is denoised. Specifically, after the environmental noise is analyzed and features are extracted, Based on this, noise with a phase difference of 180 degrees from the environmental noise is generated, and superimposed on the voice signal picked up by air vibration, and the environmental noise in the voice signal is cancelled by the cancellation principle.
  • any existing active noise reduction technology can be used to eliminate the environmental noise in the voice picked up by air vibration.
  • the control method of the voice assistant system may further include: detecting whether the earphone is worn by the sender of the voice command; and if the earphone is not worn by the sender of the voice command, determining not to start the voice assistant system.
  • the above-mentioned operations of the control method can be performed before all other operations, which can prevent the occurrence of false wakeup when the user is not wearing a headset.
  • any method in the prior art can be used to detect whether the earphone is worn by the person issuing the voice command.
  • a photoelectric sensor can be installed on the head of the earphone to detect whether the earphone is inserted into the ear canal of the user to determine whether the earphone is worn. ;
  • the motion sensor set in the earphone can be used to detect whether the action of wearing the earphone occurs to determine whether the earphone is worn.
  • a control device for a voice assistant system is also provided, and the control device is used for controlling the start of the voice assistant system in the earphone based on voice instructions and a hot word database.
  • the control device of this voice assistant system can be used in hardware products including Bluetooth headsets to implement the control method of the voice assistant system in the previous content of the present disclosure.
  • the control device of the voice assistant system may include: a first pickup unit that picks up a voice instruction through air vibration; a second pickup unit that picks up the voice instruction through human body vibration of the issuer of the voice instruction; conversion A unit that converts the voice command picked up by air vibration into a first signal, and converts the voice command picked up by the human body vibration of the sender of the voice command into a second signal; the comparison unit compares the first signal with the hot word The database is compared to obtain a first comparison result, and the second signal is compared with the hot word database to obtain a second comparison result; and a calculation unit, based on the first comparison result and/or the second comparison result The result is determined whether to start the voice assistant system.
  • the control device 500 includes: a first picking unit 510 that picks up voice instructions through air vibration; a second picking unit 520 that The human body vibration of the issuer of the voice instruction picks up the voice instruction; the conversion unit 530 converts the voice instruction picked up by air vibration into a first signal, and converts the voice instruction picked up by the human body vibration of the voice instruction into a second signal.
  • the comparison unit 540 which compares the first signal with the hot word database to obtain a first comparison result, and compares the second signal with the hot word database to obtain a second comparison result; and a calculation unit 550, Determine whether to activate the voice assistant system based on the first comparison result and/or the second comparison result.
  • the first pickup unit may be any suitable microphone including an electric, aluminum ribbon, capacitive, piezoelectric, electromagnetic, carbon particle, semiconductor microphone, or even a MEMS microphone.
  • Microphone array the second pickup unit can be a bone conduction microphone.
  • the conversion unit, the comparison unit, and the calculation unit may be separate digital processors, or they may be integrated in the same digital processing at the same time.
  • the first signal and the second signal are both digital signals obtained through analog-to-digital conversion
  • the hot word database contains at least one hot word code
  • the comparison unit in the comparison unit, the first comparison result and the second comparison result are obtained by comparing the first signal and the second signal with the hot word database.
  • the calculation unit can determine whether to start the voice assistant system based on the combination of the first comparison result and the second comparison result, or only the second comparison result, or only the first comparison result, that is, Confirm to start the voice assistant system, or confirm not to start the voice assistant system.
  • the calculation unit determines not to start the voice assistant system; or If the first signal in the comparison unit is consistent with a hot word code in the hot word database, and the second signal is inconsistent with the one hot word code, the computing unit determines not to start the voice assistant system; or if in the comparison unit The first signal is consistent with a hot word code of the hot word database, but the second signal is inconsistent with all the hot word codes of the hot word database, the computing unit determines not to start the voice assistant system; or if the first signal is in the comparison unit If it is consistent with a hot word code of the hot word database, and the second signal is consistent with the one hot word code, the computing unit determines to start the voice assistant system.
  • the calculation unit determines not to start the voice assistant system; Or if the second signal in the comparison unit is consistent with a hot word code in the hot word database, the calculation unit determines to start the voice assistant system.
  • the calculation unit determines not to start the voice assistant system; Or if the first signal in the comparison unit is consistent with a hot word code in the hot word database, the calculation unit determines to start the voice assistant system.
  • control device of the voice assistant system may further include: a third pickup unit that picks up environmental noise; and a processing unit that, based on the environmental noise, performs noise reduction on the voice command picked up by air vibration, wherein the conversion The unit converts the noise-reduced voice command into the first signal.
  • Fig. 4 shows a control device 600 of the voice assistant system according to this embodiment. As shown in Fig.
  • the control device 600 includes: a first picking unit 610, which picks up voice instructions through air vibration; The human body vibration of the sender of the voice instruction picks up the voice instruction; the third pickup unit 630 picks up environmental noise; the processing unit 640 reduces the noise of the voice instruction picked up by air vibration based on the environmental noise; the conversion unit 650 vibrates through the air
  • the picked-up voice command is converted into a first signal, and the voice command picked up by the human body vibration of the sender of the voice command is converted into a second signal;
  • the comparison unit 660 compares the first signal with the hot vocabulary to obtain the first signal A comparison result, and the second signal is compared with the hot word database to obtain a second comparison result; and the calculation unit 670 determines whether to activate the voice assistant based on the first comparison result and/or the second comparison result system.
  • the first pickup unit may be any suitable microphone including electric, aluminum ribbon, capacitive, piezoelectric, electromagnetic, carbon particle, semiconductor microphones, and even It can be a MEMS microphone array; the second pickup unit can be a bone conduction microphone; the third pickup unit can also be an electric, aluminum ribbon, capacitive, piezoelectric, electromagnetic, carbon particle, and semiconductor microphone. Any suitable microphone, even a MEMS microphone array.
  • the processing unit, the conversion unit, the comparison unit, and the calculation unit may be separate digital processors, or they may be integrated in the same digital processing at the same time.
  • control device of the voice assistant system may further include: a wearing detection unit that detects whether the earphone is worn by the sender of the voice instruction; wherein, if the earphone is not worn by the sender of the voice instruction, the computing unit determines Do not start the voice assistant system.
  • the wearing detection unit may be any device in the prior art for detecting whether the earphone is worn by the person issuing the voice command, for example, it may be a photoelectric sensor arranged on the head of the earphone, which detects whether the earphone is It is inserted into the ear canal of the user to determine whether the earphone is worn; or it may be a motion sensor provided in the earphone, which detects whether the earphone is worn to determine whether the earphone is worn.
  • a Bluetooth headset which includes the aforementioned voice assistant control device.
  • FIG. 5 shows a Bluetooth headset 900 according to an embodiment of the present disclosure.
  • the Bluetooth headset 900 contains the control device of the above-mentioned voice assistant system, which can effectively prevent false wake-up.
  • the environmental microphone 930 may also be provided with a photoelectric sensor or a motion sensor (not shown) on the head of the Bluetooth headset 900 to detect whether the headset is in the ear-in state.
  • the description with reference to the terms “one embodiment/mode”, “some embodiments/modes”, “examples”, “specific examples”, or “some examples”, etc. means to combine the embodiments/modes
  • the specific features, structures, materials or characteristics described by the examples are included in at least one embodiment/mode or example of the present disclosure.
  • the schematic representations of the aforementioned terms do not necessarily refer to the same embodiment/mode or example.
  • the described specific features, structures, materials, or characteristics may be combined in any one or more embodiments or examples in an appropriate manner.
  • those skilled in the art can combine and combine the different embodiments/modes or examples and the features of the different embodiments/modes or examples described in this specification without contradicting each other.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include at least one of the features. In the description of the present disclosure, "a plurality of” means at least two, such as two, three, etc., unless otherwise specifically defined.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Telephone Function (AREA)
  • Machine Translation (AREA)

Abstract

提供了语音助手系统的控制方法(100,200),其用于在耳机中基于语音指令和热词库来控制语音助手系统的启动,控制方法(100,200)可包括:通过空气振动拾取语音指令(S110,S210);通过语音指令的发出者的人体振动拾取语音指令(S120,S220);将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号(S130,S250);将第一信号与热词库进行比对以获得第一比对结果,以及将第二信号与热词库进行比对以获得第二比对结果(S140,S260);以及基于第一比对结果和/或第二比对结果来确定是否启动语音助手系统(S150,S270)。通过提供的语音助手系统的控制方法(100,200)、控制装置(500,600)及包含控制装置(500,600)的蓝牙耳机(900),能够有效地防止误触发语音助手系统。

Description

语音助手系统的控制方法、控制装置及蓝牙耳机
本申请要求了2019年5月11日提交的、申请号为201910391232.7、发明名称为“语音助手系统的控制方法、控制装置及蓝牙耳机”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及智能语音技术,特别是涉及语音助手系统的控制方法及控制装置,其用于使用热词在耳机中基于热词库来控制语音助手系统的启动;还特别涉及包含该语音助手系统的控制装置的蓝牙耳机。
背景技术
随着人工智能技术的发展,使得语音识别技术也进入了快速发展阶段,应用也愈发成熟和广泛,其中,语音助手这种基于智能语音识别技术的产品在包括智能手机、平板电脑等各类智能终端上得到了广泛应用,提供了非常优良的用户体验。
例如,在一种智能手机的应用场景中,使用者佩戴有蓝牙耳机装置,该蓝牙耳机装置中包含能够收音的麦克风,当使用者发出包含热词的语音指令时,该热词被麦克风拾取,通过蓝牙耳机装置的蓝牙系统传输至智能手机,通过智能手机上的语音助手来唤醒智能终端的相应功能。
在实际应用中,经常出现所谓“误唤醒”情况:使用者附近出现的包含热词的语音会被使用者佩戴的蓝牙耳机装置拾取并传输至智能手机,相应地通过语音助手唤醒某种功能。具体而言,使用者附近出现的包含热词的语音包含两种:1.不是由使用者本人而是由他人发出的语音(可以称之为“恶意语音输入”);2.由使用者本人发出,然而是例如通过回放设备播放的,换句话说,不是由使用者实时发出的语音(可以称之为“语音破解”)。无论是他人发出的语音,还是使用者本人的非实时语音,都不是由使用者的真实意图驱动的,因而都会造成误唤醒,使得语音助手原本优良的用户体验大打折扣。
发明内容
根据本公开的一个方面,提供了一种语音助手系统的控制方法,其用于在耳机中 基于语音指令和热词库来控制语音助手系统的启动,该控制方法可包括:通过空气振动拾取语音指令;通过语音指令的发出者的人体振动拾取语音指令;将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号;将第一信号与热词库进行比对以获得第一比对结果,以及将第二信号与热词库进行比对以获得第二比对结果;以及基于第一比对结果和/或第二比对结果来确定是否启动语音助手系统。
根据本公开的一些实施方式,提供了一种语音助手系统的控制方法,其中,第一信号和第二信号均为通过模数转换获得的数字信号,热词库包含至少一个热词编码。
根据本公开的一些实施方式,提供了一种语音助手系统的控制方法,其中,若第一信号与热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若第一信号与热词库的一个热词编码一致,而第二信号与所述一个热词编码不一致,则确定不启动语音助手系统;或者若第一信号与热词库的一个热词编码一致,而第二信号与热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若第一信号与热词库的一个热词编码一致,且第二信号与所述一个热词编码一致,则确定启动语音助手系统。
根据本公开的一些实施方式,提供了一种语音助手系统的控制方法,其中,若第二信号与热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若第二信号与热词库的一个热词编码一致,则确定启动语音助手系统。
根据本公开的一些实施方式,提供了一种语音助手系统的控制方法,其还包括:拾取环境噪声;基于环境噪声,对通过空气振动拾取的语音指令进行降噪;以及将经过降噪的语音指令转换为第一信号。
根据本公开的一些实施方式,提供了一种语音助手系统的控制方法,其还包括:检测耳机是否被语音指令的发出者佩戴;以及若耳机未被语音指令的发出者佩戴,则确定不启动语音助手系统。
根据本公开的另一个方面,提供了一种语音助手系统的控制装置,用于在耳机中基于语音指令和热词库来控制语音助手系统的启动,其包括:第一拾取单元,通过空气振动拾取语音指令;第二拾取单元,通过语音指令的发出者的人体振动拾取语音指令;转换单元,将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号;比对单元,将第一信号与热词库进行比对以获得第一比对结果,以及将第二信号与热词库进行比对以获得第二比 对结果;以及计算单元,基于第一比对结果和/或第二比对结果以确定是否启动语音助手系统。
根据本公开的一些实施方式,提供了一种语音助手系统的控制装置,其中,第一信号和第二信号均为通过模数转换获得的数字信号,热词库包含至少一个热词编码,若在比对单元中第一信号与热词库的全部热词编码均不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第一信号与热词库的一个热词编码一致,而第二信号与所述一个热词编码不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第一信号与热词库的一个热词编码一致,而第二信号与热词库的全部热词编码均不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第一信号与热词库的一个热词编码一致,且第二信号与所述一个热词编码一致,则计算单元确定启动语音助手系统;或者若在比对单元中第二信号与热词库的全部热词编码均不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第二信号与热词库的一个热词编码一致,则计算单元确定启动语音助手系统。
根据本公开的一些实施方式,提供了一种语音助手系统的控制装置,其还包括:第三拾取单元,拾取环境噪声;以及处理单元,基于环境噪声,对通过空气振动拾取的语音指令进行降噪,其中,转换单元将经过降噪的语音指令转换为第一信号;和/或佩戴检测单元,检测耳机是否被语音指令的发出者佩戴;其中,若耳机未被语音指令的发出者佩戴,则计算单元确定不启动语音助手系统。
根据本公开的再一个方面,提供了一种蓝牙耳机,其包含上述语音助手系统的控制装置。
通过本公开提供的语音助手系统的控制方法、控制装置及包含该控制装置的蓝牙耳机,能够有效地防止误触发语音助手系统。此外,通过本公开提供的语音助手系统的控制方法、控制装置及蓝牙耳机,不仅可以防止误触发语音助手,还能够降低语音指令中的环境噪声。
附图说明
附图示出了本公开的示例性实施方式,并与其说明一起用于解释本公开的原理,其中包括了这些附图以提供对本公开的进一步理解,并且附图包括在本说明书中并构成本说明书的一部分。
图1是根据本公开的实施方式的语音助手系统的控制方法的流程图;
图2是根据本公开的实施方式的语音助手系统的控制方法的流程图;
图3是根据本公开实施方式的语音助手系统的控制装置的框图;
图4是根据本公开实施方式的语音助手系统的控制装置的框图;
图5是根据本公开实施方式的包含语音助手系统的控制装置的蓝牙耳机的示意图。
具体实施方式
下面结合附图和实施方式对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施方式仅用于解释相关内容,而非对本公开的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本公开相关的部分。
需要说明的是,在不冲突的情况下,本公开中的实施方式及实施方式中的特征可以相互组合。下面将参考附图并结合实施方式来详细说明本公开。
根据本公开的一个方面,提供了语音助手系统的控制方法,该控制方法可以用于在耳机中基于语音指令和热词库来控制语音助手系统的启动。在该控制方法的一个应用场景中,在智能手机中安装有基于人工智能技术的语音助手系统,该使用者佩戴的耳机(包括蓝牙耳机)中安装有包含至少一个特定热词的热词库,在使用者说出包含特定热词的语音指令,该语音指令被该使用者佩戴的耳机中作为拾音装置的麦克风拾取,然后通过该控制方法,基于拾取的语音指令与热词库,能够实现只有该语音指令是由使用者实时发出的情况下,才启动或者唤醒语音助手系统,开始与语音助手的交互,满足使用者的使用需求,否则不启动语音助手系统,这样有效地防止误唤醒情况的发生。下面,结合本公开的实施方式对该方法进行描述。
根据本公开的一个实施方式,语音助手系统的控制方法可以包括:通过空气振动拾取语音指令;通过语音指令的发出者的人体振动拾取该语音指令;将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号;将第一信号与热词库进行比对以获得第一比对结果,以及将第二信号与热词库进行比对以获得第二比对结果;以及基于第一比对结果和/或第二比对结果来确定是否启动语音助手系统。图1示出了根据该实施方式的语音助手系统的控制方法100的流程图,如图1所示:在S110中,通过空气振动拾取语音指令;在S120中,通过语音指令的发出者的人体振动拾取语音指令;在S130中,将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号;在S140中,将第一信号与热词库进行比对以获得第 一比对结果,以及将第二信号与热词库进行比对以获得第二比对结果;在S150中,基于第一比对结果和/或第二比对结果来确定是否启动语音助手系统。如图1所示,在该控制方法100中,步骤S110与步骤S120同时进行,从而可以分别通过不同传播路径对同一语音指令进行拾取。
使用者发出的语音指令可以以空气为介质进行传播,具体地是通过空气的振动并以波的形式进行传播。可以使用传声器(也称为“麦克风”)来通过空气振动拾取语音指令,将声音信号转换为模拟电信号。可以将该用于拾取通过空气振动传播的语音指令的麦克风设置在耳机离使用者嘴部较近的位置,并使得麦克风朝向使用者嘴部,以提高拾取语音的强度并减小传播路径上的可能干扰。这里使用的麦克风可以包括电动式、铝带式、电容式、压电式、电磁式、碳粒式、半导体式麦克风,甚至还可以包括MEMS麦克风阵列。
由于通过空气振动拾取语音指令这种方式所借助的语音传播介质为空气,因此以此方式拾取的语音指令可能是由使用者之外的其他人或者媒介发出并以空气为介质进行传播而被错误拾取,这是造成误唤醒的原因。
此外,在该实施方式的控制方法中,除了通过空气振动来拾取语音指令的出者发出的语音以外,还通过该发出者的人体振动来拾取该同一语音指令。具体地,使用者发出的语音指令除了以空气为介质进行传播以外,还通过该发出者的头骨骨骼这种固体介质进行传播,因此还可以通过例如骨传导麦克风的拾音装置来拾取通过骨骼振动传播的同一语音指令,该拾音装置将拾取的语音指令转换为相应的模拟电信号。
通过语音指令发出者的人体振动来拾取语音指令这种方式所借助的语音传播介质为发出者的人体振动,可以确定以此方式拾取的语音指令为语音发出者实时发出。
在获得了与上述通过两种介质传播的语音指令分别对应的两路模拟电信号之后,对这两路电信号分别进行转换,以进一步获得后续操作所需的信号,从而将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号。具体地,根据本公开的一个实施方式,对于两路模拟电信号,例如可以通过模数转换器(ADC),以22.05kHz的采样频率进行采样,以16位字长的量化数位进行量化,从而将两路模拟电信号分别转化为经过编码的数字信号。当然也可以根据需求,采用更低的11.25kHz或者更高的44.1kHz为采样频率、8位或12位字长的量化数位。
为了实现通过热词来唤醒语音助手系统,事先构造包含至少一个指定热词的热词 库,可以将拾取的语音指令与热词库中的热词逐一进行比对,以获得相应的比对结果。比对结果包含以下两种:(1)语音指令与热词库中的一个热词一致;(2)语音指令与热词库中的全部热词都不一致。具体地,将通过空气振动传播的语音指令对应的第一信号与热词库进行比对以获得第一比对结果,以及将通过语音指令的发出者的人体振动来拾取的同一语音指令对应的第二信号与热词库进行比对以获得第二比对结果。
在根据本公开的另一实施方式中,除了第一信号和第二信号均为通过模数转换获得的数字信号以外,热词库中的指定热词以对应编码的方式存储在热词库中的,所采用的编码方式与语音指令转换为对应信号所采用的编码完全相同。如此,将第一信号与热词库进行比对以及将第二信号与热词库进行比对的过程,本质上就是将作为第一信号和第二信号的数字编码与热词库中的热词数字编码进行比对的过程。例如,对于第一信号而言,在采用16位量化数位的情况下,作为第一信号的数字编码为1001110010111011,当热词库中存在完全相同的热词编码1001110010111011时,则比对结果为第一信号与热词库中的一个热词一致;如果热词库中不存在该热词编码1001110010111011,则比对结果为第一信号与热词库中全部热词均不一致。
在根据该实施方式的语音助手系统的控制方法中,在经过将第一信号和第二信号分别与热词库进行比对而获得第一比对结果和第二比对结果之后,就可以基于第一比对结果与第二比对结果的结合、或者仅基于第二比对结果、或者仅基于第一比对结果,来确定是否启动语音助手系统,即,确定启动语音助手系统,或者确定不启动语音助手系统。
根据本公开的一种实施方式,在语音助手系统的控制方法中,可以基于第一比对结果与第二比对结果的结合来确定是否启动语音助手系统,具体地,若第一信号与热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若第一信号与热词库的一个热词编码一致,而第二信号与一个热词编码不一致,则确定不启动语音助手系统;或者若第一信号与热词库的一个热词编码一致,而第二信号与热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若第一信号与热词库的一个热词编码一致,且第二信号与一个热词编码一致,则确定启动语音助手系统,由此能够有效地避免误唤醒。就避免误唤醒而言,这是一种最优实施方式。
根据本公开的另一种实施方式,在语音助手系统的控制方法中,还可以仅基于第二比对结果来确定是否启动语音助手系统,具体地,若第二信号与热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若第二信号与热词库的一个热词编 码一致,则确定启动语音助手系统。通过这种方式,能够在一定程度上避免误唤醒,但是在具体实施场景中,可以通过模式选择停止通过空气振动拾取语音指令以及后续的转换及对比操作,而仅通过人体振动拾取语音指令并进行相应的转换及比对操作,能够有效地减小能量消耗,可以作为一种取得防止误唤醒与减小能量消耗之间平衡的次优实施方式。
根据本公开的又一种实施方式,在语音助手系统的控制方法中,还可以仅基于第一比对结果来确定是否启动语音助手系统,具体地,若第一信号与热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若第一信号与热词库的一个热词编码一致,则确定启动语音助手系统。
根据本公开的一些实施方式,语音助手系统的控制方法还可以包括:拾取环境噪声;基于环境噪声,对通过空气振动拾取的语音指令进行降噪;以及将经过降噪的语音指令转换为第一信号。图2示出了根据该实施方式的语音助手系统的控制方法200的流程图,如图2所示:在S210中,通过空气振动拾取语音指令;在S220中,通过语音指令的发出者的人体振动拾取语音指令;在S230中,拾取环境噪声;在S240中,基于环境噪声,对通过空气振动拾取的语音指令进行降噪;在S250中,将经过降噪的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号;在S260中,将第一信号与热词库进行比对以获得第一比对结果,以及将第二信号与热词库进行比对以获得第二比对结果;在S270中,基于第一比对结果和/或第二比对结果来确定是否启动语音助手系统。
在该方法中,步骤S230中的拾取环境噪声与步骤S210中的通过空气振动拾取语音指令及步骤S220中的通过语音指令的发出者的人体振动拾取语音指令同时进行。在步骤S230中,拾取外界环境中的环境噪声,然后在步骤S240中,基于环境噪声,对通过空气振动拾取的语音指令进行降噪,具体地,在对该环境噪声进行分析和提取特征后,以此为基础产生与该环境噪声相位差为180度的噪声,并叠加于通过空气振动拾取的语音信号之上,利用抵消原理将语音信号中的环境噪声抵消。这里可以采用任何现有的主动降噪技术来将通过空气振动拾取的语音中的环境噪声消除。
根据本公开的一些实施方式,语音助手系统的控制方法还可以包括:检测耳机是否被语音指令的发出者佩戴;以及若耳机未被语音指令的发出者佩戴,则确定不启动语音助手系统。该控制方法的上述操作可以在所有其他操作之前进行,这样能够防止在使用者未佩戴耳机的情况下防止误唤醒的发生。这里可以采用现有技术中任何方式 来检测耳机是否给语音指令发出者所佩戴,例如可以通过在耳机头部设置光电传感器来检测耳机是否被置入使用者的耳道,以确定耳机是否被佩戴;或者可以通过在耳机中设置的运动传感器来检测是否发生佩戴耳机的动作,以确定耳机是否被佩戴。
根据本公开的另一方面,还提供了语音助手系统的控制装置,该控制装置用于在耳机中基于语音指令和热词库来控制语音助手系统的启动。在本公开的框架下,这种语音助手系统的控制装置可以用于包括蓝牙耳机在内的硬件产品中实施本公开之前内容中的语音助手系统的控制方法。
根据本公开的一个实施方式,语音助手系统的控制装置可以包括:第一拾取单元,通过空气振动拾取语音指令;第二拾取单元,通过语音指令的发出者的人体振动拾取所述语音指令;转换单元,将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号;比对单元,将第一信号与所述热词库进行比对以获得第一比对结果,以及将第二信号与所述热词库进行比对以获得第二比对结果;以及计算单元,基于第一比对结果和/或第二比对结果以确定是否启动语音助手系统。图3示出了根据该实施方式的语音助手系统的控制装置500,如图3所示,该控制装置500包括:第一拾取单元510,通过空气振动拾取语音指令;第二拾取单元520,通过语音指令的发出者的人体振动拾取语音指令;转换单元530,将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号;比对单元540,将第一信号与热词库进行比对以获得第一比对结果,以及将第二信号与热词库进行比对以获得第二比对结果;以及计算单元550,基于第一比对结果和/或第二比对结果以确定是否启动语音助手系统。
在一个实施例中,第一拾取单元可以是包括电动式、铝带式、电容式、压电式、电磁式、碳粒式、半导体式麦克风在内的任何合适的麦克风,甚至还可以是MEMS麦克风阵列;第二拾取单元可以是骨传导麦克风。转换单元、对比单元以及计算单元可以是单独的数字处理器,也可以同时集成在同一数字处理中。
如上所述,第一信号和第二信号均为通过模数转换获得的数字信号,热词库包含至少一个热词编码。
在根据该实施方式的语音助手系统的控制装置中,在比对单元中,在经过将第一信号和第二信号分别与热词库进行比对而获得第一比对结果和第二比对结果之后,计算单元就可以基于第一比对结果与第二比对结果的结合、或者仅基于第二比对结果、 或者仅基于第一比对结果,来确定是否启动语音助手系统,即,确定启动语音助手系统,或者确定不启动语音助手系统。
根据本公开的一种实施方式,在语音助手系统的控制装置中,若在比对单元中第一信号与热词库的全部热词编码均不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第一信号与热词库的一个热词编码一致,而第二信号与所述一个热词编码不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第一信号与热词库的一个热词编码一致,而第二信号与热词库的全部热词编码均不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第一信号与热词库的一个热词编码一致,且第二信号与所述一个热词编码一致,则计算单元确定启动语音助手系统。
根据本公开的另一种实施方式,在语音助手系统的控制装置中,若在比对单元中第二信号与热词库的全部热词编码均不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第二信号与热词库的一个热词编码一致,则计算单元确定启动语音助手系统。
根据本公开的又一种实施方式,在语音助手系统的控制装置中,若在比对单元中第一信号与热词库的全部热词编码均不一致,则计算单元确定不启动语音助手系统;或者若在比对单元中第一信号与热词库的一个热词编码一致,则计算单元确定启动语音助手系统。
根据本公开的一些实施方式,语音助手系统的控制装置还可以包括:第三拾取单元,拾取环境噪声;以及处理单元,基于环境噪声,对通过空气振动拾取的语音指令进行降噪,其中,转换单元将经过降噪的语音指令转换为第一信号。图4示出了根据该实施方式的语音助手系统的控制装置600,如图4所示,该控制装置600包括:第一拾取单元610,通过空气振动拾取语音指令;第二拾取单元620,通过语音指令的发出者的人体振动拾取语音指令;第三拾取单元630,拾取环境噪声;处理单元640,基于环境噪声,对通过空气振动拾取的语音指令进行降噪;转换单元650,将通过空气振动拾取的语音指令转换为第一信号,以及将通过语音指令的发出者的人体振动拾取的语音指令转换为第二信号;比对单元660,将第一信号与热词库进行比对以获得第一比对结果,以及将第二信号与热词库进行比对以获得第二比对结果;以及计算单元670,基于第一比对结果和/或第二比对结果以确定是否启动语音助手系统。
如上所述,在实施例中,第一拾取单元可以是包括电动式、铝带式、电容式、压电式、电磁式、碳粒式、半导体式麦克风在内的任何合适的麦克风,甚至还可以是MEMS 麦克风阵列;第二拾取单元可以是骨传导麦克风;第三拾取单元也可以是包括电动式、铝带式、电容式、压电式、电磁式、碳粒式、半导体式麦克风在内的任何合适的麦克风,甚至还可以是MEMS麦克风阵列。处理单元、转换单元、对比单元以及计算单元可以是单独的数字处理器,也可以同时集成在同一数字处理中。
根据本公开的一些实施方式,语音助手系统的控制装置还可以包括:佩戴检测单元,检测耳机是否被语音指令的发出者佩戴;其中,若耳机未被语音指令的发出者佩戴,则计算单元确定不启动语音助手系统。如上所述,在实施例中,佩戴检测单元可以是现有技术中任何用于检测耳机是否给语音指令发出者所佩戴的装置,例如可以是设置在耳机头部的光电传感器,通过检测耳机是否被置入使用者的耳道,以确定耳机是否被佩戴;或者可以是设置在耳机中的运动传感器,通过检测是否发生佩戴耳机的动作,以确定耳机是否被佩戴。
根据本公开的再一方面,提供了一种蓝牙耳机,其包含上述的语音助手的控制装置。图5示出了根据本公开一个实施方式的蓝牙耳机900,如图5所示,蓝牙耳机900中包含上述语音助手系统的控制装置,能够有效地防止误唤醒,其中在蓝牙耳机900腿部远端设有用于通过空气振动拾取语音指令的主麦克风910,在腿部内侧近端设有用于通过头骨骨骼振动拾取语音指令的骨传导麦克风920,在腿部外侧近端设有用于拾取环境噪声的环境麦克风930,还可以在蓝牙耳机900头部设有光电传感器或者运动传感器(未示出),用以检测耳机是否处于入耳状态。
在本说明书的描述中,参考术语“一个实施例/方式”、“一些实施例/方式”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施方式/方式或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例/方式或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例/方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例方式或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例/方式或示例以及不同实施例/方式或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
本领域的技术人员应当理解,上述实施方式仅仅是为了清楚地说明本公开,而并非是对本公开的范围进行限定。对于所属领域的技术人员而言,在上述公开的基础上还可以做出其它变化或变型,并且这些变化或变型仍处于本公开的范围内。

Claims (10)

  1. 一种语音助手系统的控制方法,用于在耳机中基于语音指令和热词库来控制语音助手系统的启动,其特征在于,所述控制方法包括:
    通过空气振动拾取所述语音指令;
    通过所述语音指令的发出者的人体振动拾取所述语音指令;
    将通过空气振动拾取的所述语音指令转换为第一信号,以及将通过所述语音指令的发出者的人体振动拾取的所述语音指令转换为第二信号;
    将所述第一信号与所述热词库进行比对以获得第一比对结果,以及将所述第二信号与所述热词库进行比对以获得第二比对结果;以及
    基于所述第一比对结果和/或所述第二比对结果来确定是否启动语音助手系统。
  2. 如权利要求1所述的控制方法,其特征在于,所述第一信号和第二信号均为通过模数转换获得的数字信号,所述热词库包含至少一个热词编码。
  3. 如权利要求2所述的控制方法,其特征在于,
    若所述第一信号与所述热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者
    若所述第一信号与所述热词库的一个热词编码一致,而所述第二信号与所述一个热词编码不一致,则确定不启动语音助手系统;或者
    若所述第一信号与所述热词库的一个热词编码一致,而所述第二信号与所述热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者
    若所述第一信号与所述热词库的一个热词编码一致,且所述第二信号与所述一个热词编码一致,则确定启动语音助手系统。
  4. 如权利要求2所述的控制方法,其特征在于,
    若所述第二信号与所述热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者
    若所述第二信号与所述热词库的一个热词编码一致,则确定启动语音助手系统。
  5. 如权利要求1-4任一项所述的控制方法,其特征在于,所述控制方法还包括:
    拾取环境噪声;
    基于所述环境噪声,对通过空气振动拾取的所述语音指令进行降噪;以及
    将经过降噪的所述语音指令转换为所述第一信号。
  6. 如权利要求1-5任一项所述的控制方法,其特征在于,所述控制方法还包括:
    检测所述耳机是否被所述语音指令的发出者佩戴;以及
    若所述耳机未被所述语音指令的发出者佩戴,则确定不启动语音助手系统。
  7. 一种语音助手系统的控制装置,用于在耳机中基于语音指令和热词库来控制语音助手系统的启动,其特征在于,所述控制装置包括:
    第一拾取单元,通过空气振动拾取所述语音指令;
    第二拾取单元,通过所述语音指令的发出者的人体振动拾取所述语音指令;
    转换单元,将通过空气振动拾取的所述语音指令转换为第一信号,以及将通过所述语音指令的发出者的人体振动拾取的所述语音指令转换为第二信号;
    比对单元,将所述第一信号与所述热词库进行比对以获得第一比对结果,以及将所述第二信号与所述热词库进行比对以获得第二比对结果;以及
    计算单元,基于所述第一比对结果和/或所述第二比对结果以确定是否启动语音助手系统。
  8. 如权利要求7所述的控制装置,其特征在于,所述第一信号和第二信号均为通过模数转换获得的数字信号,所述热词库包含至少一个热词编码,
    若在所述比对单元中所述第一信号与所述热词库的全部热词编码均不一致,则所述计算单元确定不启动语音助手系统;或者
    若在所述比对单元中所述第一信号与所述热词库的一个热词编码一致,而所述第二信号与所述一个热词编码不一致,则所述计算单元确定不启动语音助手系统;或者
    若在所述比对单元中所述第一信号与所述热词库的一个热词编码一致,而所述第二信号与所述热词库的全部热词编码均不一致,则所述计算单元确定不启动语音助手系统;或者
    若在所述比对单元中所述第一信号与所述热词库的一个热词编码一致,且所述第二信号与所述一个热词编码一致,则所述计算单元确定启动语音助手系统;或者
    若在所述比对单元中所述第二信号与所述热词库的全部热词编码均不一致,则所述计算单元确定不启动语音助手系统;或者
    若在所述比对单元中所述第二信号与所述热词库的一个热词编码一致,则所述计算单元确定启动语音助手系统。
  9. 如权利要求7或8所述的控制装置,其特征在于,所述控制装置还包括:
    第三拾取单元,拾取环境噪声;以及
    处理单元,基于所述环境噪声,对通过空气振动拾取的语音指令进行降噪;
    其中,所述转换单元将经过降噪的语音指令转换为所述第一信号;和/或
    佩戴检测单元,检测所述耳机是否被所述语音指令的发出者佩戴;其中,若所述耳机未被所述语音指令的发出者佩戴,则所述计算单元确定不启动语音助手系统。
  10. 一种蓝牙耳机,其特征在于,所述蓝牙耳机包含如权利要求7-9任一项所述的控制装置。
PCT/CN2019/127460 2019-05-11 2019-12-23 语音助手系统的控制方法、控制装置及蓝牙耳机 WO2020228332A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910391232.7 2019-05-11
CN201910391232.7A CN110265007B (zh) 2019-05-11 2019-05-11 语音助手系统的控制方法、控制装置及蓝牙耳机

Publications (1)

Publication Number Publication Date
WO2020228332A1 true WO2020228332A1 (zh) 2020-11-19

Family

ID=67914603

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127460 WO2020228332A1 (zh) 2019-05-11 2019-12-23 语音助手系统的控制方法、控制装置及蓝牙耳机

Country Status (2)

Country Link
CN (1) CN110265007B (zh)
WO (1) WO2020228332A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265007B (zh) * 2019-05-11 2020-07-24 出门问问信息科技有限公司 语音助手系统的控制方法、控制装置及蓝牙耳机
CN111862975A (zh) * 2020-07-15 2020-10-30 百度在线网络技术(北京)有限公司 智能终端控制方法、装置、设备、存储介质和系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271190A1 (en) * 2008-04-25 2009-10-29 Nokia Corporation Method and Apparatus for Voice Activity Determination
CN103871419A (zh) * 2012-12-11 2014-06-18 联想(北京)有限公司 一种信息处理方法及电子设备
CN106686494A (zh) * 2016-12-27 2017-05-17 广东小天才科技有限公司 一种可穿戴设备的语音输入控制方法及可穿戴设备
CN106847275A (zh) * 2016-12-27 2017-06-13 广东小天才科技有限公司 一种用于控制穿戴设备的方法及穿戴设备
CN106992015A (zh) * 2015-12-22 2017-07-28 恩智浦有限公司 语音激活系统
CN108847221A (zh) * 2018-06-19 2018-11-20 Oppo广东移动通信有限公司 语音识别方法、装置、存储介质及电子设备
CN110265007A (zh) * 2019-05-11 2019-09-20 出门问问信息科技有限公司 语音助手系统的控制方法、控制装置及蓝牙耳机

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9978397B2 (en) * 2015-12-22 2018-05-22 Intel Corporation Wearer voice activity detection
CN106714023B (zh) * 2016-12-27 2019-03-15 广东小天才科技有限公司 一种基于骨传导耳机的语音唤醒方法、系统及骨传导耳机
CN107678793A (zh) * 2017-09-14 2018-02-09 珠海市魅族科技有限公司 语音助手启动方法及装置、终端及计算机可读存储介质
CN109729463A (zh) * 2017-10-27 2019-05-07 北京金锐德路科技有限公司 用于脖戴式语音交互耳机的声麦骨麦复合收音装置
CN109346075A (zh) * 2018-10-15 2019-02-15 华为技术有限公司 通过人体振动识别用户语音以控制电子设备的方法和系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271190A1 (en) * 2008-04-25 2009-10-29 Nokia Corporation Method and Apparatus for Voice Activity Determination
CN103871419A (zh) * 2012-12-11 2014-06-18 联想(北京)有限公司 一种信息处理方法及电子设备
CN106992015A (zh) * 2015-12-22 2017-07-28 恩智浦有限公司 语音激活系统
CN106686494A (zh) * 2016-12-27 2017-05-17 广东小天才科技有限公司 一种可穿戴设备的语音输入控制方法及可穿戴设备
CN106847275A (zh) * 2016-12-27 2017-06-13 广东小天才科技有限公司 一种用于控制穿戴设备的方法及穿戴设备
CN108847221A (zh) * 2018-06-19 2018-11-20 Oppo广东移动通信有限公司 语音识别方法、装置、存储介质及电子设备
CN110265007A (zh) * 2019-05-11 2019-09-20 出门问问信息科技有限公司 语音助手系统的控制方法、控制装置及蓝牙耳机

Also Published As

Publication number Publication date
CN110265007B (zh) 2020-07-24
CN110265007A (zh) 2019-09-20

Similar Documents

Publication Publication Date Title
US11748462B2 (en) Biometric authentication
US11693939B2 (en) Ear proximity detection
US9324322B1 (en) Automatic volume attenuation for speech enabled devices
US20200184057A1 (en) Headset for Acoustic Authentication of a User
WO2019233228A1 (zh) 电子设备及设备控制方法
US20190147890A1 (en) Audio peripheral device
GB2608710A (en) Speaker identification
WO2020228332A1 (zh) 语音助手系统的控制方法、控制装置及蓝牙耳机
US11900730B2 (en) Biometric identification
US11918345B2 (en) Cough detection
US11894000B2 (en) Authenticating received speech
JP2002358089A (ja) 音声処理装置及び音声処理方法
US10916248B2 (en) Wake-up word detection
US11290802B1 (en) Voice detection using hearable devices
US11488606B2 (en) Audio system with digital microphone
CN214226506U (zh) 声音处理电路、电声器件和声音处理系统
CN106653060B (zh) 吹气声识别系统及采用该系统的吹气识别方法
US20220366932A1 (en) Methods and apparatus for detecting singing
US11393449B1 (en) Methods and apparatus for obtaining biometric data
CN110166863B (zh) 一种入耳式语音装置
KR102562180B1 (ko) 웨어러블 음향 변환 장치
TWI697891B (zh) 入耳式語音裝置
JP2004317942A (ja) 音声処理装置、音声認識装置及び音声処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19928450

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19928450

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19928450

Country of ref document: EP

Kind code of ref document: A1