WO2020228332A1 - 语音助手系统的控制方法、控制装置及蓝牙耳机 - Google Patents
语音助手系统的控制方法、控制装置及蓝牙耳机 Download PDFInfo
- Publication number
- WO2020228332A1 WO2020228332A1 PCT/CN2019/127460 CN2019127460W WO2020228332A1 WO 2020228332 A1 WO2020228332 A1 WO 2020228332A1 CN 2019127460 W CN2019127460 W CN 2019127460W WO 2020228332 A1 WO2020228332 A1 WO 2020228332A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hot word
- signal
- voice
- assistant system
- voice assistant
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000007613 environmental effect Effects 0.000 claims description 26
- 238000004364 calculation method Methods 0.000 claims description 21
- 238000006243 chemical reaction Methods 0.000 claims description 18
- 238000001514 detection method Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 description 6
- 210000000988 bone and bone Anatomy 0.000 description 5
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 4
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 4
- 229910052782 aluminium Inorganic materials 0.000 description 4
- 229910052799 carbon Inorganic materials 0.000 description 4
- 239000002245 particle Substances 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 210000003128 head Anatomy 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 210000000613 ear canal Anatomy 0.000 description 2
- 238000005265 energy consumption Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000005336 cracking Methods 0.000 description 1
- 210000003625 skull Anatomy 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
- H04R1/1041—Mechanical or electronic switches, or control elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- the present disclosure relates to intelligent voice technology, in particular to a control method and a control device of a voice assistant system, which are used to control the startup of the voice assistant system based on the hot word library in the earphone using hot words; and particularly to the voice assistant system Control device's Bluetooth headset.
- voice recognition technology has also entered a stage of rapid development, and its applications have become more mature and extensive.
- voice assistants products based on intelligent voice recognition technology, include smart phones, tablet computers and other types of products. It has been widely used on smart terminals and provides a very good user experience.
- the user wears a Bluetooth headset device that includes a microphone capable of receiving audio.
- a voice command containing a hot word the hot word is picked up by the microphone.
- the Bluetooth system of the Bluetooth headset device is transmitted to the smart phone, and the corresponding function of the smart terminal is awakened through the voice assistant on the smart phone.
- the voice containing hot words appearing near the user will be picked up by the Bluetooth headset device worn by the user and transmitted to the smart phone, and a certain function will be awakened by the voice assistant accordingly.
- voice containing hot words appearing near the user 1.
- the voice that is not made by the user but by others (it can be called “malicious voice input”); 2.
- the voice is made by the user himself Emitted, however, is, for example, played through a playback device, in other words, is not a voice uttered by the user in real time (it can be called "voice cracking").
- voice cracking Neither other people's voice nor the user's non-real-time voice are driven by the user's real intentions, and will cause false wakeups, which greatly compromises the original excellent user experience of the voice assistant.
- a control method of a voice assistant system which is used to control the startup of the voice assistant system based on voice instructions and a hot word library in a headset.
- the control method may include: picking up voice through air vibration Instructions; pick up the voice instruction by the human body vibration of the issuer of the voice instruction; convert the voice instruction picked up by air vibration into a first signal, and convert the voice instruction picked up by the human body vibration of the voice instruction into a second signal; The first signal is compared with the hot word database to obtain a first comparison result, and the second signal is compared with the hot word database to obtain a second comparison result; and based on the first comparison result and/or the first comparison result Second, compare the results to determine whether to activate the voice assistant system.
- a control method of a voice assistant system wherein the first signal and the second signal are both digital signals obtained through analog-to-digital conversion, and the hot word database contains at least one hot word code.
- a method for controlling a voice assistant system wherein if the first signal is inconsistent with all the hot word codes of the hot word database, it is determined not to start the voice assistant system; or if the first signal It is consistent with a hot word code of the hot word database, and the second signal is inconsistent with the one hot word code, then it is determined not to start the voice assistant system; or if the first signal is consistent with a hot word code of the hot word database, and the first signal If the second signal is inconsistent with all the hot word codes of the hot word database, it is determined not to start the voice assistant system; or if the first signal is consistent with a hot word code of the hot word database, and the second signal is consistent with the one hot word code , Then confirm to start the voice assistant system.
- a method for controlling a voice assistant system wherein if the second signal is inconsistent with all the hot word codes of the hot word database, it is determined not to start the voice assistant system; or if the second signal If it is consistent with a hot word code in the hot word database, it is determined to start the voice assistant system.
- a control method of a voice assistant system which further includes: picking up environmental noise; based on the environmental noise, noise-reducing voice commands picked up by air vibration; and noise-reducing voice The instruction is converted into the first signal.
- a control method of a voice assistant system which further includes: detecting whether the earphone is worn by the sender of the voice command; and if the earphone is not worn by the sender of the voice command, determining not to start Voice assistant system.
- a control device for a voice assistant system which is used to control the startup of the voice assistant system based on voice instructions and a hot word library in a headset, which includes: a first picking unit that vibrates through air Pick up the voice instruction; the second pickup unit picks up the voice instruction through the human body vibration of the issuer of the voice instruction; the conversion unit converts the voice instruction picked up through the air vibration into the first signal, and the human body vibration of the issuer through the voice instruction The picked up voice command is converted into a second signal; the comparison unit compares the first signal with the hot word database to obtain the first comparison result, and compares the second signal with the hot word database to obtain the second comparison And the calculation unit, based on the first comparison result and/or the second comparison result, to determine whether to activate the voice assistant system.
- a control device for a voice assistant system wherein the first signal and the second signal are both digital signals obtained through analog-to-digital conversion, and the hot word database contains at least one hot word code, if If the first signal in the comparison unit is inconsistent with all the hot word codes of the hot word database, the computing unit determines not to start the voice assistant system; or if the first signal in the comparison unit is consistent with a hot word code in the hot word database , And the second signal is inconsistent with the one hot word code, the calculation unit determines not to start the voice assistant system; or if the first signal in the comparison unit is consistent with a hot word code in the hot word database, and the second signal is consistent with If all the hot word codes of the hot word database are inconsistent, the calculation unit determines not to start the voice assistant system; or if the first signal in the comparison unit is consistent with a hot word code of the hot word database, and the second signal is consistent with the one If the hot word codes are consistent, the computing unit determines to start the
- a control device for a voice assistant system which further includes: a third pickup unit that picks up environmental noise; and a processing unit that reduces voice commands picked up by air vibration based on the environmental noise.
- Noise wherein the conversion unit converts the noise-reduced voice command into the first signal; and/or the wearing detection unit detects whether the earphone is worn by the sender of the voice command; wherein, if the earphone is not worn by the sender of the voice command, Then the computing unit determines not to start the voice assistant system.
- a Bluetooth headset which includes the control device of the above-mentioned voice assistant system.
- the voice assistant system control method, control device and Bluetooth headset including the control device provided by the present disclosure can effectively prevent the voice assistant system from being triggered by mistake.
- the control method, control device, and Bluetooth headset of the voice assistant system provided by the present disclosure can not only prevent the voice assistant from being triggered by mistake, but also reduce the environmental noise in the voice command.
- Fig. 1 is a flowchart of a control method of a voice assistant system according to an embodiment of the present disclosure
- Fig. 2 is a flowchart of a control method of a voice assistant system according to an embodiment of the present disclosure
- Fig. 3 is a block diagram of a control device of a voice assistant system according to an embodiment of the present disclosure
- Fig. 4 is a block diagram of a control device of a voice assistant system according to an embodiment of the present disclosure
- Fig. 5 is a schematic diagram of a Bluetooth headset including a control device of a voice assistant system according to an embodiment of the present disclosure.
- a control method of a voice assistant system is provided, and the control method can be used to control the start of the voice assistant system based on voice instructions and a hot word database in a headset.
- a voice assistant system based on artificial intelligence technology is installed in a smart phone, and a headset (including a Bluetooth headset) worn by the user is installed with a hot word library containing at least one specific hot word,
- the voice command is picked up by the microphone of the earphone worn by the user as a pickup device, and then through the control method, based on the picked up voice command and the hot word database, it can be realized Only when the voice command is issued by the user in real time, can the voice assistant system be activated or awakened, and interact with the voice assistant to meet the user's needs; otherwise, the voice assistant system will not be activated, which effectively prevents false wakeups happened.
- the method will be described in conjunction with
- the control method of the voice assistant system may include: picking up the voice instruction through air vibration; picking up the voice instruction through the human body vibration of the sender of the voice instruction; converting the voice instruction picked up through the air vibration into the first Signal, and convert the voice command picked up by the human body vibration of the voice command into a second signal; compare the first signal with the hot word database to obtain the first comparison result, and compare the second signal with the hot word The library is compared to obtain a second comparison result; and based on the first comparison result and/or the second comparison result, it is determined whether to start the voice assistant system.
- Fig. 1 shows a flowchart of a control method 100 of a voice assistant system according to this embodiment, as shown in Fig.
- step S110 the voice command is picked up by air vibration; in S120, the human body of the sender of the voice command Vibration picks up the voice command; in S130, the voice command picked up by air vibration is converted into a first signal, and the voice command picked up by the human body vibration of the sender of the voice command is converted into a second signal; in S140, the first signal A signal is compared with a hot word database to obtain a first comparison result, and the second signal is compared with a hot word database to obtain a second comparison result; in S150, based on the first comparison result and/or The second comparison result determines whether to start the voice assistant system. As shown in FIG. 1, in the control method 100, step S110 and step S120 are performed simultaneously, so that the same voice command can be picked up through different propagation paths.
- the voice commands issued by the user can be propagated in the air as a medium, specifically in the form of waves through the vibration of the air.
- a microphone also called a "microphone” can be used to pick up voice commands through air vibrations and convert the sound signals into analog electrical signals.
- the microphone for picking up voice commands transmitted through air vibration can be set at a position where the headset is closer to the user’s mouth, and the microphone is directed toward the user’s mouth to increase the strength of the picked up voice and reduce the transmission path. May interfere.
- the microphones used here can include electromotive, aluminum ribbon, capacitive, piezoelectric, electromagnetic, carbon particle, semiconductor microphones, and even MEMS microphone arrays.
- the voice transmission medium used to pick up voice commands through air vibration is air
- the voice commands picked up in this way may be issued by other people or media other than the user and transmitted using air as a medium and be mistaken Pick up, this is the cause of false wakeup.
- the same voice command is also picked up by the body vibration of the speaker.
- the voice commands issued by the user are transmitted through a solid medium such as the skeletal bones of the issuer. Therefore, it can also be picked up by a sound pickup device such as a bone conduction microphone. The same voice command transmitted, the pickup device converts the picked-up voice command into a corresponding analog electrical signal.
- the voice instruction is picked up by the human body vibration of the voice instruction issuer.
- the voice transmission medium used in this way is the human body vibration of the issuer. It can be determined that the voice instruction picked up in this manner is issued by the voice issuer in real time.
- the two electrical signals are converted separately to further obtain the signals required for subsequent operations, so that the signals picked up by air vibration
- the voice instruction is converted into a first signal
- the voice instruction picked up by the human body vibration of the issuer of the voice instruction is converted into a second signal.
- an analog-to-digital converter can be used for sampling at a sampling frequency of 22.05kHz, and quantization with a 16-bit word length quantized digits, thereby Convert two analog electrical signals into encoded digital signals respectively.
- ADC analog-to-digital converter
- a hot word database containing at least one designated hot word is constructed in advance, and the picked up voice instructions can be compared with the hot words in the hot word bank one by one to obtain the corresponding comparison result .
- the comparison results include the following two types: (1) the voice command is consistent with a hot word in the hot word database; (2) the voice command is inconsistent with all the hot words in the hot word database.
- the first signal corresponding to the voice instruction propagated through air vibration is compared with the hot word database to obtain the first comparison result, and the same voice instruction picked up by the human body vibration of the voice instruction is corresponding to The second signal is compared with the hot word database to obtain a second comparison result.
- the process of comparing the first signal with the hot word database and comparing the second signal with the hot word database is essentially to compare the digital code as the first signal and the second signal with the hot word database in the hot word database.
- the process of comparing word and digital codes For example, for the first signal, when 16-bit quantized digits are used, the digital code as the first signal is 1001110010111011.
- the comparison result is the first A signal is consistent with a hot word in the hot word database; if the hot word code 1001110010111011 does not exist in the hot word database, the comparison result is that the first signal is inconsistent with all the hot words in the hot word database.
- the voice assistant system after comparing the first signal and the second signal with the hot word database to obtain the first comparison result and the second comparison result, it can be based on The combination of the first comparison result and the second comparison result, or based only on the second comparison result, or only based on the first comparison result, determines whether to start the voice assistant system, that is, determine whether to start the voice assistant system, or determine Do not start the voice assistant system.
- whether to start the voice assistant system can be determined based on the combination of the first comparison result and the second comparison result. Specifically, if the first signal and the heat If all the hot word codes of the thesaurus are inconsistent, it is determined not to start the voice assistant system; or if the first signal is consistent with a hot word code of the hot word database, but the second signal is inconsistent with a hot word code, then it is determined not to start the voice Assistant system; or if the first signal is consistent with a hot word code of the hot word database, but the second signal is inconsistent with all the hot word codes of the hot word database, then it is determined not to start the voice assistant system; or if the first signal is consistent with the hot word code If a hot word code in the thesaurus is consistent, and the second signal is consistent with a hot word code, it is determined to start the voice assistant system, which can effectively avoid false wake-ups. In terms of avoiding false wakeups, this is an optimal implementation.
- the control method of the voice assistant system it is also possible to determine whether to start the voice assistant system based only on the second comparison result. Specifically, if the second signal and the hot word database are all hot If the word codes are inconsistent, it is determined not to start the voice assistant system; or if the second signal is consistent with a hot word code in the hot word database, it is determined to start the voice assistant system. In this way, false wake-ups can be avoided to a certain extent.
- the mode selection can be used to stop picking up voice instructions through air vibration and subsequent conversion and comparison operations, and only pick up voice instructions through human vibration and perform Corresponding conversion and comparison operations can effectively reduce energy consumption, and can be used as a sub-optimal implementation to achieve a balance between preventing false wakeup and reducing energy consumption.
- the control method of the voice assistant system it is also possible to determine whether to start the voice assistant system based only on the first comparison result. Specifically, if the first signal and the hot word database are all hot If the word codes are inconsistent, it is determined not to start the voice assistant system; or if the first signal is consistent with a hot word code in the hot word database, it is determined to start the voice assistant system.
- control method of the voice assistant system may further include: picking up environmental noise; based on the environmental noise, denoising the voice command picked up by air vibration; and converting the noise-reduced voice command into a first signal.
- Fig. 2 shows a flowchart of a control method 200 of a voice assistant system according to this embodiment, as shown in Fig.
- the voice command is picked up by air vibration; in S220, the human body of the sender of the voice command Vibration picks up the voice command; in S230, pick up the environmental noise; in S240, based on the environmental noise, noise reduction is performed on the voice command picked up by air vibration; in S250, the noise-reduced voice command is converted into the first signal, And convert the voice command picked up by the human body vibration of the voice command into a second signal; in S260, compare the first signal with the hot vocabulary to obtain the first comparison result, and compare the second signal with The hot vocabulary is compared to obtain the second comparison result; in S270, it is determined whether to start the voice assistant system based on the first comparison result and/or the second comparison result.
- the pickup of environmental noise in step S230 is performed simultaneously with the pickup of voice instructions by air vibration in step S210 and the pickup of voice instructions by human body vibration of the sender of the voice instruction in step S220.
- step S230 the environmental noise in the external environment is picked up, and then in step S240, based on the environmental noise, the voice command picked up by air vibration is denoised. Specifically, after the environmental noise is analyzed and features are extracted, Based on this, noise with a phase difference of 180 degrees from the environmental noise is generated, and superimposed on the voice signal picked up by air vibration, and the environmental noise in the voice signal is cancelled by the cancellation principle.
- any existing active noise reduction technology can be used to eliminate the environmental noise in the voice picked up by air vibration.
- the control method of the voice assistant system may further include: detecting whether the earphone is worn by the sender of the voice command; and if the earphone is not worn by the sender of the voice command, determining not to start the voice assistant system.
- the above-mentioned operations of the control method can be performed before all other operations, which can prevent the occurrence of false wakeup when the user is not wearing a headset.
- any method in the prior art can be used to detect whether the earphone is worn by the person issuing the voice command.
- a photoelectric sensor can be installed on the head of the earphone to detect whether the earphone is inserted into the ear canal of the user to determine whether the earphone is worn. ;
- the motion sensor set in the earphone can be used to detect whether the action of wearing the earphone occurs to determine whether the earphone is worn.
- a control device for a voice assistant system is also provided, and the control device is used for controlling the start of the voice assistant system in the earphone based on voice instructions and a hot word database.
- the control device of this voice assistant system can be used in hardware products including Bluetooth headsets to implement the control method of the voice assistant system in the previous content of the present disclosure.
- the control device of the voice assistant system may include: a first pickup unit that picks up a voice instruction through air vibration; a second pickup unit that picks up the voice instruction through human body vibration of the issuer of the voice instruction; conversion A unit that converts the voice command picked up by air vibration into a first signal, and converts the voice command picked up by the human body vibration of the sender of the voice command into a second signal; the comparison unit compares the first signal with the hot word The database is compared to obtain a first comparison result, and the second signal is compared with the hot word database to obtain a second comparison result; and a calculation unit, based on the first comparison result and/or the second comparison result The result is determined whether to start the voice assistant system.
- the control device 500 includes: a first picking unit 510 that picks up voice instructions through air vibration; a second picking unit 520 that The human body vibration of the issuer of the voice instruction picks up the voice instruction; the conversion unit 530 converts the voice instruction picked up by air vibration into a first signal, and converts the voice instruction picked up by the human body vibration of the voice instruction into a second signal.
- the comparison unit 540 which compares the first signal with the hot word database to obtain a first comparison result, and compares the second signal with the hot word database to obtain a second comparison result; and a calculation unit 550, Determine whether to activate the voice assistant system based on the first comparison result and/or the second comparison result.
- the first pickup unit may be any suitable microphone including an electric, aluminum ribbon, capacitive, piezoelectric, electromagnetic, carbon particle, semiconductor microphone, or even a MEMS microphone.
- Microphone array the second pickup unit can be a bone conduction microphone.
- the conversion unit, the comparison unit, and the calculation unit may be separate digital processors, or they may be integrated in the same digital processing at the same time.
- the first signal and the second signal are both digital signals obtained through analog-to-digital conversion
- the hot word database contains at least one hot word code
- the comparison unit in the comparison unit, the first comparison result and the second comparison result are obtained by comparing the first signal and the second signal with the hot word database.
- the calculation unit can determine whether to start the voice assistant system based on the combination of the first comparison result and the second comparison result, or only the second comparison result, or only the first comparison result, that is, Confirm to start the voice assistant system, or confirm not to start the voice assistant system.
- the calculation unit determines not to start the voice assistant system; or If the first signal in the comparison unit is consistent with a hot word code in the hot word database, and the second signal is inconsistent with the one hot word code, the computing unit determines not to start the voice assistant system; or if in the comparison unit The first signal is consistent with a hot word code of the hot word database, but the second signal is inconsistent with all the hot word codes of the hot word database, the computing unit determines not to start the voice assistant system; or if the first signal is in the comparison unit If it is consistent with a hot word code of the hot word database, and the second signal is consistent with the one hot word code, the computing unit determines to start the voice assistant system.
- the calculation unit determines not to start the voice assistant system; Or if the second signal in the comparison unit is consistent with a hot word code in the hot word database, the calculation unit determines to start the voice assistant system.
- the calculation unit determines not to start the voice assistant system; Or if the first signal in the comparison unit is consistent with a hot word code in the hot word database, the calculation unit determines to start the voice assistant system.
- control device of the voice assistant system may further include: a third pickup unit that picks up environmental noise; and a processing unit that, based on the environmental noise, performs noise reduction on the voice command picked up by air vibration, wherein the conversion The unit converts the noise-reduced voice command into the first signal.
- Fig. 4 shows a control device 600 of the voice assistant system according to this embodiment. As shown in Fig.
- the control device 600 includes: a first picking unit 610, which picks up voice instructions through air vibration; The human body vibration of the sender of the voice instruction picks up the voice instruction; the third pickup unit 630 picks up environmental noise; the processing unit 640 reduces the noise of the voice instruction picked up by air vibration based on the environmental noise; the conversion unit 650 vibrates through the air
- the picked-up voice command is converted into a first signal, and the voice command picked up by the human body vibration of the sender of the voice command is converted into a second signal;
- the comparison unit 660 compares the first signal with the hot vocabulary to obtain the first signal A comparison result, and the second signal is compared with the hot word database to obtain a second comparison result; and the calculation unit 670 determines whether to activate the voice assistant based on the first comparison result and/or the second comparison result system.
- the first pickup unit may be any suitable microphone including electric, aluminum ribbon, capacitive, piezoelectric, electromagnetic, carbon particle, semiconductor microphones, and even It can be a MEMS microphone array; the second pickup unit can be a bone conduction microphone; the third pickup unit can also be an electric, aluminum ribbon, capacitive, piezoelectric, electromagnetic, carbon particle, and semiconductor microphone. Any suitable microphone, even a MEMS microphone array.
- the processing unit, the conversion unit, the comparison unit, and the calculation unit may be separate digital processors, or they may be integrated in the same digital processing at the same time.
- control device of the voice assistant system may further include: a wearing detection unit that detects whether the earphone is worn by the sender of the voice instruction; wherein, if the earphone is not worn by the sender of the voice instruction, the computing unit determines Do not start the voice assistant system.
- the wearing detection unit may be any device in the prior art for detecting whether the earphone is worn by the person issuing the voice command, for example, it may be a photoelectric sensor arranged on the head of the earphone, which detects whether the earphone is It is inserted into the ear canal of the user to determine whether the earphone is worn; or it may be a motion sensor provided in the earphone, which detects whether the earphone is worn to determine whether the earphone is worn.
- a Bluetooth headset which includes the aforementioned voice assistant control device.
- FIG. 5 shows a Bluetooth headset 900 according to an embodiment of the present disclosure.
- the Bluetooth headset 900 contains the control device of the above-mentioned voice assistant system, which can effectively prevent false wake-up.
- the environmental microphone 930 may also be provided with a photoelectric sensor or a motion sensor (not shown) on the head of the Bluetooth headset 900 to detect whether the headset is in the ear-in state.
- the description with reference to the terms “one embodiment/mode”, “some embodiments/modes”, “examples”, “specific examples”, or “some examples”, etc. means to combine the embodiments/modes
- the specific features, structures, materials or characteristics described by the examples are included in at least one embodiment/mode or example of the present disclosure.
- the schematic representations of the aforementioned terms do not necessarily refer to the same embodiment/mode or example.
- the described specific features, structures, materials, or characteristics may be combined in any one or more embodiments or examples in an appropriate manner.
- those skilled in the art can combine and combine the different embodiments/modes or examples and the features of the different embodiments/modes or examples described in this specification without contradicting each other.
- first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include at least one of the features. In the description of the present disclosure, "a plurality of” means at least two, such as two, three, etc., unless otherwise specifically defined.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Telephone Function (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (10)
- 一种语音助手系统的控制方法,用于在耳机中基于语音指令和热词库来控制语音助手系统的启动,其特征在于,所述控制方法包括:通过空气振动拾取所述语音指令;通过所述语音指令的发出者的人体振动拾取所述语音指令;将通过空气振动拾取的所述语音指令转换为第一信号,以及将通过所述语音指令的发出者的人体振动拾取的所述语音指令转换为第二信号;将所述第一信号与所述热词库进行比对以获得第一比对结果,以及将所述第二信号与所述热词库进行比对以获得第二比对结果;以及基于所述第一比对结果和/或所述第二比对结果来确定是否启动语音助手系统。
- 如权利要求1所述的控制方法,其特征在于,所述第一信号和第二信号均为通过模数转换获得的数字信号,所述热词库包含至少一个热词编码。
- 如权利要求2所述的控制方法,其特征在于,若所述第一信号与所述热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若所述第一信号与所述热词库的一个热词编码一致,而所述第二信号与所述一个热词编码不一致,则确定不启动语音助手系统;或者若所述第一信号与所述热词库的一个热词编码一致,而所述第二信号与所述热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若所述第一信号与所述热词库的一个热词编码一致,且所述第二信号与所述一个热词编码一致,则确定启动语音助手系统。
- 如权利要求2所述的控制方法,其特征在于,若所述第二信号与所述热词库的全部热词编码均不一致,则确定不启动语音助手系统;或者若所述第二信号与所述热词库的一个热词编码一致,则确定启动语音助手系统。
- 如权利要求1-4任一项所述的控制方法,其特征在于,所述控制方法还包括:拾取环境噪声;基于所述环境噪声,对通过空气振动拾取的所述语音指令进行降噪;以及将经过降噪的所述语音指令转换为所述第一信号。
- 如权利要求1-5任一项所述的控制方法,其特征在于,所述控制方法还包括:检测所述耳机是否被所述语音指令的发出者佩戴;以及若所述耳机未被所述语音指令的发出者佩戴,则确定不启动语音助手系统。
- 一种语音助手系统的控制装置,用于在耳机中基于语音指令和热词库来控制语音助手系统的启动,其特征在于,所述控制装置包括:第一拾取单元,通过空气振动拾取所述语音指令;第二拾取单元,通过所述语音指令的发出者的人体振动拾取所述语音指令;转换单元,将通过空气振动拾取的所述语音指令转换为第一信号,以及将通过所述语音指令的发出者的人体振动拾取的所述语音指令转换为第二信号;比对单元,将所述第一信号与所述热词库进行比对以获得第一比对结果,以及将所述第二信号与所述热词库进行比对以获得第二比对结果;以及计算单元,基于所述第一比对结果和/或所述第二比对结果以确定是否启动语音助手系统。
- 如权利要求7所述的控制装置,其特征在于,所述第一信号和第二信号均为通过模数转换获得的数字信号,所述热词库包含至少一个热词编码,若在所述比对单元中所述第一信号与所述热词库的全部热词编码均不一致,则所述计算单元确定不启动语音助手系统;或者若在所述比对单元中所述第一信号与所述热词库的一个热词编码一致,而所述第二信号与所述一个热词编码不一致,则所述计算单元确定不启动语音助手系统;或者若在所述比对单元中所述第一信号与所述热词库的一个热词编码一致,而所述第二信号与所述热词库的全部热词编码均不一致,则所述计算单元确定不启动语音助手系统;或者若在所述比对单元中所述第一信号与所述热词库的一个热词编码一致,且所述第二信号与所述一个热词编码一致,则所述计算单元确定启动语音助手系统;或者若在所述比对单元中所述第二信号与所述热词库的全部热词编码均不一致,则所述计算单元确定不启动语音助手系统;或者若在所述比对单元中所述第二信号与所述热词库的一个热词编码一致,则所述计算单元确定启动语音助手系统。
- 如权利要求7或8所述的控制装置,其特征在于,所述控制装置还包括:第三拾取单元,拾取环境噪声;以及处理单元,基于所述环境噪声,对通过空气振动拾取的语音指令进行降噪;其中,所述转换单元将经过降噪的语音指令转换为所述第一信号;和/或佩戴检测单元,检测所述耳机是否被所述语音指令的发出者佩戴;其中,若所述耳机未被所述语音指令的发出者佩戴,则所述计算单元确定不启动语音助手系统。
- 一种蓝牙耳机,其特征在于,所述蓝牙耳机包含如权利要求7-9任一项所述的控制装置。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910391232.7 | 2019-05-11 | ||
CN201910391232.7A CN110265007B (zh) | 2019-05-11 | 2019-05-11 | 语音助手系统的控制方法、控制装置及蓝牙耳机 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020228332A1 true WO2020228332A1 (zh) | 2020-11-19 |
Family
ID=67914603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/127460 WO2020228332A1 (zh) | 2019-05-11 | 2019-12-23 | 语音助手系统的控制方法、控制装置及蓝牙耳机 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110265007B (zh) |
WO (1) | WO2020228332A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110265007B (zh) * | 2019-05-11 | 2020-07-24 | 出门问问信息科技有限公司 | 语音助手系统的控制方法、控制装置及蓝牙耳机 |
CN111862975A (zh) * | 2020-07-15 | 2020-10-30 | 百度在线网络技术(北京)有限公司 | 智能终端控制方法、装置、设备、存储介质和系统 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090271190A1 (en) * | 2008-04-25 | 2009-10-29 | Nokia Corporation | Method and Apparatus for Voice Activity Determination |
CN103871419A (zh) * | 2012-12-11 | 2014-06-18 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
CN106686494A (zh) * | 2016-12-27 | 2017-05-17 | 广东小天才科技有限公司 | 一种可穿戴设备的语音输入控制方法及可穿戴设备 |
CN106847275A (zh) * | 2016-12-27 | 2017-06-13 | 广东小天才科技有限公司 | 一种用于控制穿戴设备的方法及穿戴设备 |
CN106992015A (zh) * | 2015-12-22 | 2017-07-28 | 恩智浦有限公司 | 语音激活系统 |
CN108847221A (zh) * | 2018-06-19 | 2018-11-20 | Oppo广东移动通信有限公司 | 语音识别方法、装置、存储介质及电子设备 |
CN110265007A (zh) * | 2019-05-11 | 2019-09-20 | 出门问问信息科技有限公司 | 语音助手系统的控制方法、控制装置及蓝牙耳机 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9978397B2 (en) * | 2015-12-22 | 2018-05-22 | Intel Corporation | Wearer voice activity detection |
CN106714023B (zh) * | 2016-12-27 | 2019-03-15 | 广东小天才科技有限公司 | 一种基于骨传导耳机的语音唤醒方法、系统及骨传导耳机 |
CN107678793A (zh) * | 2017-09-14 | 2018-02-09 | 珠海市魅族科技有限公司 | 语音助手启动方法及装置、终端及计算机可读存储介质 |
CN109729463A (zh) * | 2017-10-27 | 2019-05-07 | 北京金锐德路科技有限公司 | 用于脖戴式语音交互耳机的声麦骨麦复合收音装置 |
CN109346075A (zh) * | 2018-10-15 | 2019-02-15 | 华为技术有限公司 | 通过人体振动识别用户语音以控制电子设备的方法和系统 |
-
2019
- 2019-05-11 CN CN201910391232.7A patent/CN110265007B/zh active Active
- 2019-12-23 WO PCT/CN2019/127460 patent/WO2020228332A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090271190A1 (en) * | 2008-04-25 | 2009-10-29 | Nokia Corporation | Method and Apparatus for Voice Activity Determination |
CN103871419A (zh) * | 2012-12-11 | 2014-06-18 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
CN106992015A (zh) * | 2015-12-22 | 2017-07-28 | 恩智浦有限公司 | 语音激活系统 |
CN106686494A (zh) * | 2016-12-27 | 2017-05-17 | 广东小天才科技有限公司 | 一种可穿戴设备的语音输入控制方法及可穿戴设备 |
CN106847275A (zh) * | 2016-12-27 | 2017-06-13 | 广东小天才科技有限公司 | 一种用于控制穿戴设备的方法及穿戴设备 |
CN108847221A (zh) * | 2018-06-19 | 2018-11-20 | Oppo广东移动通信有限公司 | 语音识别方法、装置、存储介质及电子设备 |
CN110265007A (zh) * | 2019-05-11 | 2019-09-20 | 出门问问信息科技有限公司 | 语音助手系统的控制方法、控制装置及蓝牙耳机 |
Also Published As
Publication number | Publication date |
---|---|
CN110265007B (zh) | 2020-07-24 |
CN110265007A (zh) | 2019-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11748462B2 (en) | Biometric authentication | |
US11693939B2 (en) | Ear proximity detection | |
US9324322B1 (en) | Automatic volume attenuation for speech enabled devices | |
US20200184057A1 (en) | Headset for Acoustic Authentication of a User | |
WO2019233228A1 (zh) | 电子设备及设备控制方法 | |
US20190147890A1 (en) | Audio peripheral device | |
GB2608710A (en) | Speaker identification | |
WO2020228332A1 (zh) | 语音助手系统的控制方法、控制装置及蓝牙耳机 | |
US11900730B2 (en) | Biometric identification | |
US11918345B2 (en) | Cough detection | |
US11894000B2 (en) | Authenticating received speech | |
JP2002358089A (ja) | 音声処理装置及び音声処理方法 | |
US10916248B2 (en) | Wake-up word detection | |
US11290802B1 (en) | Voice detection using hearable devices | |
US11488606B2 (en) | Audio system with digital microphone | |
CN214226506U (zh) | 声音处理电路、电声器件和声音处理系统 | |
CN106653060B (zh) | 吹气声识别系统及采用该系统的吹气识别方法 | |
US20220366932A1 (en) | Methods and apparatus for detecting singing | |
US11393449B1 (en) | Methods and apparatus for obtaining biometric data | |
CN110166863B (zh) | 一种入耳式语音装置 | |
KR102562180B1 (ko) | 웨어러블 음향 변환 장치 | |
TWI697891B (zh) | 入耳式語音裝置 | |
JP2004317942A (ja) | 音声処理装置、音声認識装置及び音声処理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19928450 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19928450 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19928450 Country of ref document: EP Kind code of ref document: A1 |