CN113763945B - Voice awakening method, device, equipment and storage medium - Google Patents

Voice awakening method, device, equipment and storage medium Download PDF

Info

Publication number
CN113763945B
CN113763945B CN202011595654.5A CN202011595654A CN113763945B CN 113763945 B CN113763945 B CN 113763945B CN 202011595654 A CN202011595654 A CN 202011595654A CN 113763945 B CN113763945 B CN 113763945B
Authority
CN
China
Prior art keywords
sound signal
voice
wake
low
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011595654.5A
Other languages
Chinese (zh)
Other versions
CN113763945A (en
Inventor
于书涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202011595654.5A priority Critical patent/CN113763945B/en
Publication of CN113763945A publication Critical patent/CN113763945A/en
Application granted granted Critical
Publication of CN113763945B publication Critical patent/CN113763945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Electric Clocks (AREA)
  • Toys (AREA)

Abstract

The embodiment of the invention discloses a voice awakening method, a voice awakening device, voice awakening equipment and a storage medium. The method comprises the steps of acquiring a low-frequency sound signal acquired through a built-in low-frequency recording module, accurately acquiring the low-frequency sound signal through a built-in stoping circuit, accurately acquiring equipment sound signals of various frequency bands through the stoping circuit, removing the low-frequency sound signal and the equipment sound signal from the original sound signal according to the acquired low-frequency sound signal and the acquired equipment sound signal, acquiring a voice wake-up signal, and executing wake-up operation based on wake-up words by extracting wake-up words in voice wake-up information corresponding to the voice wake-up signal.

Description

Voice awakening method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to a voice interaction technology, in particular to a voice awakening method, a voice awakening device, voice awakening equipment and a voice awakening storage medium.
Background
With the development of speech recognition technology, speech interaction technology is becoming a popular control method. When the voice interaction is carried out, the intelligent equipment is required to be activated from a sleep state to an operation state, namely the voice wake-up equipment is subjected to voice wake-up, and the voice interaction experience is directly influenced by the effect of voice wake-up.
Currently, in order to ensure the wake-up rate of the voice wake-up device, the voice wake-up device collects external sound signals by using a microphone (Mic) array (such as a 2Mic array, a 4Mic array or a 6Mic array), and performs noise reduction and echo cancellation on the collected sound, further extracts wake-up words, and decides whether to trigger the intelligent device to execute the wake-up action based on the extracted wake-up words.
In the process of realizing the invention, the prior art has at least the following problems:
The frequency band of the sound actually collected by the microphone array of the voice wake-up device is 20 Hz-8 KHz, but the frequency response (i.e. amplitude frequency characteristic or phase frequency characteristic) of the external sound signal (such as the ambient noise signal, the surrounding equipment noise signal, the noise signal of the voice wake-up device and the like) in the low frequency band (20 Hz-100 Hz) has larger distortion, and the external sound signal with larger distortion degree in the low frequency band influences the processing of noise reduction, lift-up elimination and the like, so that the voice wake-up rate is reduced, and the use experience of a user is influenced.
Disclosure of Invention
The embodiment of the invention provides a voice awakening method, a device, equipment and a storage medium, which are used for realizing the effect of improving the voice awakening rate of voice awakening equipment and further improving the use experience of a user.
In a first aspect, an embodiment of the present invention provides a voice wake-up method, including:
Acquiring a low-frequency sound signal acquired through a built-in low-frequency recording module, a stoping sound signal acquired through a built-in stoping circuit and an original sound signal acquired through a built-in microphone module;
determining a voice wake-up signal according to the low-frequency sound signal, the stoping sound signal and the original sound signal;
and extracting wake-up words in voice wake-up information corresponding to the voice wake-up signals, and executing wake-up operation based on the wake-up words.
In a second aspect, an embodiment of the present invention further provides a voice wake apparatus, including:
The device comprises a main control module, a low-frequency recording module, a microphone module, a stoping circuit and an output module;
The low-frequency recording module is used for collecting low-frequency sound signals and sending the low-frequency sound signals to the main control module;
The microphone module is used for collecting original sound signals and sending the original sound signals to the main control module;
The extraction circuit is used for collecting sound signals sent by the voice awakening equipment, converting the sound signals sent by the voice awakening equipment into extraction sound signals and sending the extraction sound signals to the main control module;
The main control module is used for determining a voice wake-up signal according to the low-frequency sound signal, the stoping sound signal and the original sound signal, extracting wake-up words in voice wake-up information corresponding to the voice wake-up signal, and executing wake-up operation based on the wake-up words;
the output module is used for playing the voice feedback information corresponding to the wake-up operation.
In a third aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions which, when executed by a computer processor, implement the voice wake method of any of the first aspects.
According to the technical scheme provided by the embodiment, the low-frequency sound recording module can accurately collect low-frequency sound signals, the extraction circuit can accurately collect equipment sound signals of all frequency bands, the two signals are removed from the original sound signals according to the collected low-frequency sound signals and the equipment sound signals, accurate voice wake-up signals are obtained, wake-up words in voice wake-up information corresponding to the voice wake-up signals are extracted, wake-up operation is performed based on the wake-up words, the recognition precision of the main control module on the wake-up words can be improved, and the wake-up rate of voice wake-up equipment is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a frequency response performance index of a microphone module of a voice wake apparatus of the prior art;
FIG. 2 is a frequency response curve of a microphone module according to the prior art;
FIG. 3 is a diagram showing a sound distortion curve generated by a speaker of a voice wake-up device collected by a microphone module according to the prior art;
Fig. 4 is a flowchart of a voice wake-up method according to a first embodiment of the present invention;
FIG. 5 is a frequency response curve of a low frequency recording module according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a voice wake-up device according to a second embodiment of the present invention;
fig. 7 is a schematic block diagram of a voice wake-up device according to a third embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all structures related to the present invention are shown in the drawings.
When the voice wake-up device executes wake-up operation based on the collected voice signals, the microphone module collects the voice signals, processes the collected voice signals, extracts wake-up words, executes voice wake-up operation according to the extracted wake-up words, and the collected voice signals comprise voice wake-up signals of various frequency bands (for example, 20 Hz-8 KHz), noise signals of various frequency bands and device voice signals of various frequency bands. In practical testing, the frequency response of the collected low-frequency-band sound signal is poor due to the limitation of the self hardware performance of the voice wake-up device. As shown in fig. 1, which is a frequency response performance index of a microphone module of a voice wake-up device, fig. 2, which is a frequency response curve actually tested by the microphone module, and comparing fig. 1 and fig. 2, it can be seen that the frequency response performance of a low-frequency band (for example, 20 Hz-100 Hz) sound signal collected by the microphone module is poor, and the low-frequency band sound signal includes a low-frequency band device sound signal, a low-frequency band noise signal, and the like, so that the microphone module cannot accurately and truly record the sound signal, and the main control module affecting the voice wake-up device performs noise reduction and echo cancellation processing on the sound signal, thereby further reducing the wake-up rate of the voice wake-up device and affecting the use experience of a user. The noise signals of the low frequency range comprise the running noise of a central air conditioner, the running noise of a water pump, the running noise of a fan and the like. As shown in fig. 3, which is a distortion curve of sound emitted by a speaker of the voice wake-up device collected by the microphone module, it can be seen from fig. 3 that the device sound signal collected by the microphone module also has very large distortion in a low frequency band (for example, 20Hz to 150 Hz) range, and the distortion rate is as high as 12% or more.
Example 1
Fig. 4 is a flowchart of a voice wake-up method according to a first embodiment of the present invention, where the embodiment is applicable to a situation of waking up an intelligent speaker based on a collected low-frequency sound signal, a back-collected sound signal and an original sound signal, and may be used in other voice interaction scenarios, for example, applications scenarios such as controlling intelligent home appliances and mobile terminals through sound. The method can be performed by a voice wake-up device, which can be implemented by software and/or hardware, and integrated into a device with a voice playing function, such as a smart sound box, a smart home appliance or a mobile terminal. The embodiment is explained by taking an intelligent sound box as an example. The method specifically comprises the following steps:
s110, acquiring a low-frequency sound signal acquired through a built-in low-frequency recording module, a stoping sound signal acquired through a built-in stoping circuit and an original sound audio signal acquired through a built-in microphone module.
The low-frequency recording module refers to a microphone which is arranged in the intelligent sound box and used for collecting low-frequency sound signals. As shown in fig. 5, which shows the frequency response curve of the low-frequency recording module, it can be seen from fig. 5 that the frequency response curve of the low-frequency sound signal collected by the low-frequency recording module is better represented in the low frequency band (for example, 20Hz to 200 Hz), so that the low-frequency sound signal can be recorded accurately and truly. The low frequency recording module may be a single microphone (Mic) or an array of microphones (Mic), for example, a 2Mic array, a 4Mic array, or a 6Mic array. Alternatively, the low frequency recording module may be an electret microphone or a miniature microphone. The low-frequency sound signal may include a noise signal of a low frequency band (for example, 20Hz to 200 Hz) of an environment in which the smart speaker is located and a device sound signal of a low frequency band emitted by the smart speaker itself.
The extraction circuit is a circuit for collecting sound signals (namely extraction sound signals) sent by the intelligent sound box, the distortion rate of equipment sound signals (namely collecting the sound signals before entering the loudspeaker) of each frequency band collected by the extraction circuit is below 0.1%, the equipment sound signals can be completely collected by the extraction circuit, and the intelligent sound box can distinguish the sound signals sent by the intelligent sound box from received external sound signals. The stoping sound signals comprise equipment sound signals of various frequency bands (for example, 20 Hz-8 kHz) of the intelligent sound box. Alternatively, the extraction circuit may include a voice control module, an audio processing module, and an output module. The voice control module receives voice data sent by the intelligent sound box and sends the voice data to the audio processing module; the audio processing module is used for receiving the voice data and performing analog-to-digital conversion processing, gain processing and sound effect processing on the voice data; the output module receives the voice data processed by the audio processing module and outputs the processed voice data. The voice control module can be a main control module (CPU), and the voice control module receives voice signals and recognizes the voice signals; the audio processing module may include an analog-to-digital converter, a power amplifier, an Equalizer (EQ), and other components, where the analog-to-digital converter is configured to perform analog-to-digital conversion processing on voice data, the power amplifier is configured to perform gain processing on a voice signal, and the EQ is configured to perform equalization processing on a time domain signal and a frequency domain signal of the voice signal, so as to adjust an audio effect of the voice signal; the output module may include audio output devices such as speakers, headsets, headphones, small speakers, or loudspeakers. Through the stoping circuit, the equipment sound signals can be collected before entering the output module, and the follow-up main control module can distinguish the collected sound signals.
The microphone module refers to a microphone arranged in the intelligent sound box and used for collecting sound signals (namely original sound signals) of various frequency bands, wherein the original sound signals comprise voice wake-up signals of various frequency bands (for example, 20 Hz-8 kHz), noise signals of low frequency bands and equipment sound signals of various frequency bands. The microphone module may be a single microphone (Mic) or a linear or circular Mic array composed of Mic arrays of the same specification.
It can be understood that after the low-frequency sound recording module, the stoping circuit and the microphone module collect the low-frequency sound signal, the stoping sound signal and the original sound signal respectively, the low-frequency sound signal, the stoping sound signal and the original sound signal are sent to the main control module, and the main control module obtains the sound signals and processes the sound signals so as to execute the awakening operation on the intelligent sound box according to the processing result.
S120, determining a voice wake-up signal according to the low-frequency voice signal, the stoping voice signal and the original voice signal.
The voice wake-up signal refers to a sound signal actively emitted by a user to the intelligent sound box. As described above, the original sound signal includes a voice wake-up signal of each frequency band, a noise signal of a low frequency band, and a device sound signal of each frequency band, and the original sound signal is collected by the microphone module; the stoping sound module comprises equipment sound signals of all frequency bands; the low frequency sound signal includes a noise signal of a low frequency band and a device sound signal of a low frequency band. The device sound signal is almost completely collected by the extraction circuit, so that the low-frequency sound signal almost only includes the noise signal of the low frequency band, and the original sound signal includes the voice wake-up signal of each frequency band and the noise signal of the low frequency band. Based on the above, the noise reduction processing can be performed on the original sound signal according to the low-frequency sound signal, and the echo cancellation is performed on the original sound signal after the noise reduction processing based on the stoping sound signal, so as to obtain the voice wake-up signal, so that the low-frequency sound signal and the stoping sound signal in the original sound signal are removed, and the voice wake-up signal of each frequency band is obtained.
Optionally, the noise reduction processing on the original sound signal according to the low-frequency sound signal includes: determining noise reduction parameters based on amplitude-frequency characteristics and/or phase-frequency characteristics of the low-frequency sound signals; and carrying out noise reduction processing on the original sound signal based on the noise reduction parameters.
Optionally, the performing echo cancellation on the original sound signal after the noise reduction processing based on the stoped sound signal includes: determining echo cancellation parameters based on amplitude-frequency characteristics and/or phase-frequency characteristics of the stoped sound signals; and carrying out echo cancellation on the original sound signal after the noise reduction processing based on the echo cancellation parameters.
The noise reduction parameters may include phase interval parameters and amplitude interval parameters of the low frequency sound signal. Specifically, a frequency response curve of the low-frequency sound signal is determined, noise reduction parameters of the low-frequency sound signal are extracted based on the frequency response curve of the low-frequency sound signal, and noise reduction processing is performed on the original sound signal by means of a gaussian filter or the like. The low frequency sound signal can be removed from the original sound signal in the above manner.
Wherein the echo cancellation parameters may include phase interval parameters and amplitude interval parameters of the reproduced sound signal. Specifically, determining a frequency response curve of the stoping sound signal, extracting echo cancellation parameters of the stoping sound signal based on the frequency response curve of the stoping sound signal, and performing echo cancellation on an original audio signal from which the low-frequency sound signal is removed by adopting a Least Mean Square (LMS) adaptive filter, a Recursive Least Square (RLS) filter lattice filter, an Infinite Impulse Response (IIR) filter or the like, namely removing the stoping sound signal from the original audio signal from which the low-frequency sound signal is removed, so as to obtain a voice wake-up signal.
Through the mode, the intelligent sound box accurately collects low-frequency sound signals through the low-frequency recording module, accurately collects equipment sound signals of all frequency bands through the stoping circuit, removes the collected low-frequency sound signals and the equipment sound signals from original sound signals according to the collected low-frequency sound signals and the collected equipment sound signals, obtains accurate voice wake-up signals, facilitates the improvement of the correct judgment of wake-up words by the main control module, and finally improves the wake-up rate.
S130, extracting wake-up words in voice wake-up information corresponding to the voice wake-up signals, and executing wake-up operation based on the wake-up words.
The voice wake-up information refers to wake-up content of the intelligent sound box by a user, the wake-up words can be keywords in the wake-up content, a main control module of the intelligent sound box identifies the keywords for extracting the voice wake-up information, and whether the keywords are wake-up words is determined, so that the main control module executes wake-up operation according to an identification result.
Optionally, the extracting a wake-up word in the voice wake-up information corresponding to the voice wake-up signal includes: extracting at least one keyword of the voice wakeup information; and calculating the similarity between each keyword and a preset awakening key word, and if the similarity exceeds a similarity threshold value, taking the keyword with the similarity larger than the similarity threshold value as the awakening word.
In this embodiment, the main control module may perform speech recognition on the speech wake-up information based on a Dynamic Time Warping (DTW) algorithm, a Support Vector Machine (SVM) algorithm, a Vector Quantization (VQ) algorithm, a Hidden Markov Model (HMM), a Gaussian Mixture Model (GMM), a Deep Neural Network (DNN), and the like, to determine at least one keyword; comparing the extracted keywords with preset awakening keywords, determining the similarity between each keyword and the preset awakening keywords, and if the similarity is larger than a similarity threshold, using the keywords as awakening words, and generating awakening feedback information based on the awakening words to finish awakening operation.
Optionally, the extracted at least one keyword may also be input to a pre-trained wake word recognition model, where the wake word recognition model may be trained based on the at least one language keyword and the wake word. The wake word recognition model may be a full convolution network, a cyclic convolution network, a residual network, a logistic regression model, etc.
The voice wake-up information is "xiaoyi", the key words extracted by the main control module are "xiaoyi", the extracted key words are matched with the wake-up key words, the determined wake-up words are "xiaoyi", wake-up feedback information generated based on the wake-up words is "en", information such as "in woolen or" you say "is generated.
It should be noted that, the voice wake-up signal may also be a voice control signal for controlling the voice wake-up device. For example, the functions of controlling the refrigerator to be opened or closed, controlling the mobile phone to take a picture and the like are controlled, so that a user can control the point adding or the mobile phone through sound.
According to the technical scheme provided by the embodiment, the low-frequency sound recording module can accurately collect low-frequency sound signals, the extraction circuit can accurately collect equipment sound signals of all frequency bands, the two signals are removed from the original sound signals according to the collected low-frequency sound signals and the equipment sound signals, accurate voice wake-up signals are obtained, wake-up words in voice wake-up information corresponding to the voice wake-up signals are extracted, wake-up operation is performed based on the wake-up words, the recognition precision of the main control module on the wake-up words can be improved, and the wake-up rate of voice wake-up equipment is further improved.
Example two
Fig. 6 is a schematic structural diagram of a voice wake-up device according to a second embodiment of the present invention. Referring to fig. 6, the apparatus includes: the signal acquisition module 210, the voice wake-up signal determination module 220, and the wake-up operation execution module 230.
The signal acquisition module 210 is configured to acquire a low-frequency sound signal acquired by the built-in low-frequency recording module, a stoping sound signal acquired by the built-in stoping circuit, and an original sound signal acquired by the built-in microphone module;
A voice wake-up signal determining module 220, configured to determine a voice wake-up signal according to the low-frequency sound signal, the stoping sound signal, and the original sound signal;
The wake-up operation execution module 230 is configured to extract a wake-up word in the voice wake-up information corresponding to the voice wake-up signal, and execute a wake-up operation based on the wake-up word.
According to the technical scheme provided by the embodiment, the low-frequency sound recording module can accurately collect low-frequency sound signals, the extraction circuit can accurately collect equipment sound signals of all frequency bands, the two signals are removed from the original sound signals according to the collected low-frequency sound signals and the equipment sound signals, accurate voice wake-up signals are obtained, wake-up words in voice wake-up information corresponding to the voice wake-up signals are extracted, wake-up operation is performed based on the wake-up words, the recognition precision of the main control module on the wake-up words can be improved, and the wake-up rate of voice wake-up equipment is further improved.
On the basis of the above technical solutions, the voice wake-up signal determining module 220 is further configured to perform noise reduction processing on the original sound signal according to the low-frequency sound signal, and perform echo cancellation on the noise-reduced original sound signal based on the stoped sound signal, so as to obtain the voice wake-up signal.
On the basis of the above technical solutions, the voice wake-up signal determining module 220 is further configured to determine a noise reduction parameter based on an amplitude-frequency characteristic and/or a phase-frequency characteristic of the low-frequency sound signal;
And carrying out noise reduction processing on the original sound signal based on the noise reduction parameters.
On the basis of the above technical solutions, the voice wake-up signal determining module 220 is further configured to determine an echo cancellation parameter based on an amplitude-frequency characteristic and/or a phase-frequency characteristic of the stoped sound signal;
and carrying out echo cancellation on the original sound signal after the noise reduction processing based on the echo cancellation parameters.
On the basis of the above technical solutions, the wake-up operation execution module 230 is further configured to extract at least one keyword of the voice wake-up information;
and calculating the similarity between each keyword and a preset awakening keyword, and if the similarity exceeds a similarity threshold, taking the keyword with the similarity larger than the similarity threshold as the awakening word.
On the basis of the above technical solutions, the low-frequency sound signal includes a low-frequency noise signal and a low-frequency device sound signal, the stoping sound signal includes a device sound signal of each frequency band, and the original sound signal includes a voice wake-up signal of each frequency band, a low-frequency noise signal and a device sound signal of each frequency band.
Example III
Fig. 7 is a schematic block diagram of a voice wake-up device according to a third embodiment of the present invention, where the voice wake-up device 1 includes a main control module 11, a low frequency recording module 12, a microphone module 13, a stoping circuit 14, and an output module 15.
The low-frequency recording module 12 is configured to collect a low-frequency sound signal and send the low-frequency sound signal to the main control module;
the microphone module 13 is configured to collect an original sound signal, and send the original sound signal to the main control module 11;
The extraction circuit 14 is configured to collect a sound signal sent by the voice wake-up device 1, convert the sound signal sent by the voice wake-up device into an extraction sound signal, and send the extraction sound signal to the main control module 11;
The main control module 11 is configured to determine a voice wake-up signal according to the low-frequency sound signal, the stoping sound signal and the original sound signal, extract wake-up words in voice wake-up information corresponding to the voice wake-up signal, and execute wake-up operation based on the wake-up words;
the output module 15 is configured to play the voice feedback information corresponding to the wake-up operation.
Alternatively, the output module 15 may include an audio output device such as a speaker, a headset, an earphone, a small speaker, or a loudspeaker.
Fig. 7 shows a schematic diagram of the installation positions of the low frequency recording module 12 and the microphone module 13, wherein the low frequency recording module 12 and the microphone module 13 can be arranged on the same horizontal plane of the voice wake-up device. The voice awakening device comprises an intelligent sound box, an intelligent household appliance and a mobile terminal. Specifically, the low-frequency recording module 12 and the microphone module 13 are welded on the same level of the circuit board of the voice wake-up device, one low-frequency recording module 12 can be arranged between every two microphone modules 13, each two microphone modules 13 and the low-frequency recording module 12 are used as a group of sound collecting modules, and the low-frequency sound signals and the original sound signals are collected based on at least one group of sound collecting modules. Further, the voice wake apparatus 1 further comprises: a recording module 16 and a playback module 17. The recording module 16 is configured to receive and amplify the low-frequency sound signal, the stoping sound signal, and the original sound signal, and send the amplified low-frequency sound signal, the stoping sound signal, and the original sound signal to the main control module 11;
the playback module 17 is configured to amplify a voice feedback signal corresponding to the voice wake-up operation, and send the amplified voice feedback signal to the output module 15, so that the output module 15 plays voice feedback information corresponding to the voice feedback signal.
According to the technical scheme, the low-frequency recording module is added on the same horizontal plane of the microphone module, the low-frequency recording module is easy to integrate and install, popularization and application are facilitated, the low-frequency sound signals are accurately collected through the low-frequency recording module, the equipment sound signals of all frequency bands are accurately collected through the stoping circuit, the two signals are removed from the original sound signals according to the collected low-frequency sound signals and the equipment sound signals, an accurate voice awakening signal is obtained, awakening words in voice awakening information corresponding to the voice awakening signal are extracted, awakening operation is carried out based on the awakening words, the recognition precision of the awakening words by the main control module can be improved, and then the awakening rate of voice awakening equipment is improved.
Example IV
The fourth embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a voice wake-up method as provided by the embodiments of the present invention, the method comprising:
Acquiring a low-frequency sound signal acquired through a built-in low-frequency recording module, a stoping sound signal acquired through a built-in stoping circuit and an original sound signal acquired through a built-in microphone module;
determining a voice wake-up signal according to the low-frequency sound signal, the stoping sound signal and the original sound signal;
and extracting wake-up words in voice wake-up information corresponding to the voice wake-up signals, and executing wake-up operation based on the wake-up words.
Of course, the computer readable storage medium provided by the embodiments of the present invention, on which the computer program stored is not limited to the above method operations, may also perform the related operations in a voice wake-up method provided by any of the embodiments of the present invention.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device.
The computer readable signal medium may include a low frequency sound signal, a stopsound signal, an original sound signal, a wake-up word, etc., in which computer readable program code is carried. Such a propagated low frequency sound signal, a stoped sound signal, an original sound signal, a wake-up word, etc. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the context of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or can be connected to an external computer (for example, through the Internet using an Internet service provider).
It should be noted that, in the embodiment of the voice wake-up device, each included module is only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of numerous obvious changes, rearrangements and substitutions without departing from the scope of the present invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (11)

1. A method of waking up speech, comprising:
Acquiring a low-frequency sound signal acquired through a built-in low-frequency recording module, a stoping sound signal acquired through a built-in stoping circuit and an original sound signal acquired through a built-in microphone module, wherein the stoping circuit is used for acquiring equipment sound signals before the equipment sound signals enter audio output equipment;
determining a voice wake-up signal according to the low-frequency sound signal, the stoping sound signal and the original sound signal;
Extracting wake-up words in voice wake-up information corresponding to the voice wake-up signals, and executing wake-up operation based on the wake-up words;
the low-frequency sound signal comprises a noise signal of a low frequency band and a device sound signal of the low frequency band, the stoping sound signal comprises the device sound signal of each frequency band, and the original sound signal comprises a voice wake-up signal of each frequency band, the noise signal of the low frequency band and the device sound signal of each frequency band.
2. The method of claim 1, wherein said determining a voice wake signal from said low frequency sound signal, said back sound signal, and said original sound signal comprises:
And carrying out noise reduction processing on the original sound signal according to the low-frequency sound signal, and carrying out echo cancellation on the original sound signal after the noise reduction processing based on the stoping sound signal to obtain the voice wake-up signal.
3. The method of claim 2, wherein said noise reduction processing of said original sound signal from said low frequency sound signal comprises:
determining noise reduction parameters based on amplitude-frequency characteristics and/or phase-frequency characteristics of the low-frequency sound signals;
And carrying out noise reduction processing on the original sound signal based on the noise reduction parameters.
4. The method of claim 2, wherein echo cancelling the noise-reduced original sound signal based on the stoped sound signal comprises:
determining echo cancellation parameters based on amplitude-frequency characteristics and/or phase-frequency characteristics of the stoped sound signals;
and carrying out echo cancellation on the original sound signal after the noise reduction processing based on the echo cancellation parameters.
5. The method of claim 1, wherein the extracting wake words in the voice wake information corresponding to the voice wake signal comprises:
extracting at least one keyword of the voice wakeup information;
And calculating the similarity between each keyword and a preset awakening keyword, and if the similarity exceeds a similarity threshold, taking the keyword with the similarity larger than the similarity threshold as the awakening word.
6. A voice wakeup apparatus, comprising:
The signal acquisition module is used for acquiring a low-frequency sound signal acquired through the built-in low-frequency recording module, a stoping sound signal acquired through the built-in stoping circuit and an original sound signal acquired through the built-in microphone module, wherein the stoping circuit is used for acquiring the equipment sound signal before entering the audio output equipment;
the voice wake-up signal determining module is used for determining a voice wake-up signal according to the low-frequency sound signal, the stoping sound signal and the original sound signal;
the wake-up operation execution module is used for extracting wake-up words in voice wake-up information corresponding to the voice wake-up signal and executing wake-up operation based on the wake-up words;
the low-frequency sound signal comprises a noise signal of a low frequency band and a device sound signal of the low frequency band, the stoping sound signal comprises the device sound signal of each frequency band, and the original sound signal comprises a voice wake-up signal of each frequency band, the noise signal of the low frequency band and the device sound signal of each frequency band.
7. A voice wakeup apparatus, comprising:
The device comprises a main control module, a low-frequency recording module, a microphone module, a stoping circuit and an output module;
The low-frequency recording module is used for collecting low-frequency sound signals and sending the low-frequency sound signals to the main control module;
The microphone module is used for collecting original sound signals and sending the original sound signals to the main control module;
the extraction circuit is used for collecting sound signals sent by the voice awakening equipment, converting the sound signals sent by the voice awakening equipment into extraction sound signals and sending the extraction sound signals to the main control module; the extraction circuit is used for collecting equipment sound signals before the equipment sound signals enter the audio output equipment;
The main control module is used for determining a voice wake-up signal according to the low-frequency sound signal, the stoping sound signal and the original sound signal, extracting wake-up words in voice wake-up information corresponding to the voice wake-up signal, and executing wake-up operation based on the wake-up words;
The output module is used for playing the voice feedback information corresponding to the wake-up operation;
the low-frequency sound signal comprises a noise signal of a low frequency band and a device sound signal of the low frequency band, the stoping sound signal comprises the device sound signal of each frequency band, and the original sound signal comprises a voice wake-up signal of each frequency band, the noise signal of the low frequency band and the device sound signal of each frequency band.
8. The voice wakeup device of claim 7, further comprising:
the recording module and the playback module are used for recording and playing;
The recording module is used for receiving and amplifying the low-frequency sound signal, the stoping sound signal and the original sound signal, and sending the amplified low-frequency sound signal, the stoping sound signal and the original sound signal to the main control module;
The playback module is used for amplifying the voice feedback signal corresponding to the voice wake-up operation, and sending the amplified voice feedback signal to the output module so that the output module plays the voice feedback information corresponding to the voice feedback signal.
9. The voice wakeup device of claim 7, wherein the voice wakeup device is a voice,
The low-frequency recording module and the microphone module are arranged on the same horizontal plane of the voice awakening device.
10. The voice wakeup device of claim 7, wherein the voice wakeup device is a voice,
The voice awakening device comprises an intelligent sound box, an intelligent household appliance or a mobile terminal.
11. A storage medium containing computer executable instructions which when executed by a computer processor implement the voice wake method of any of claims 1-5.
CN202011595654.5A 2020-12-29 2020-12-29 Voice awakening method, device, equipment and storage medium Active CN113763945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011595654.5A CN113763945B (en) 2020-12-29 2020-12-29 Voice awakening method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011595654.5A CN113763945B (en) 2020-12-29 2020-12-29 Voice awakening method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113763945A CN113763945A (en) 2021-12-07
CN113763945B true CN113763945B (en) 2024-05-17

Family

ID=78786213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011595654.5A Active CN113763945B (en) 2020-12-29 2020-12-29 Voice awakening method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113763945B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102595281A (en) * 2011-01-14 2012-07-18 通用汽车环球科技运作有限责任公司 Unified microphone pre-processing system and method
WO2013175780A1 (en) * 2012-05-22 2013-11-28 京セラ株式会社 Electronic equipment and method of controlling electronic equipment
CN105448294A (en) * 2015-12-09 2016-03-30 江苏天安智联科技股份有限公司 Intelligent voice recognition system for vehicle equipment
CN109040501A (en) * 2018-09-10 2018-12-18 成都擎天树科技有限公司 A kind of echo cancel method improving VOIP phone quality
CN109068215A (en) * 2018-08-14 2018-12-21 歌尔科技有限公司 A kind of noise-reduction method of In-Ear Headphones, device and In-Ear Headphones
CN208477912U (en) * 2018-06-28 2019-02-05 山西智济电子科技有限公司 A kind of locomotive vehicle-mounted audio intelligent analysis management reason system
CN109360562A (en) * 2018-12-07 2019-02-19 深圳创维-Rgb电子有限公司 Echo cancel method, device, medium and voice awakening method and equipment
CN110876106A (en) * 2018-08-31 2020-03-10 北京京东尚科信息技术有限公司 Electronic device, noise reduction method, computer system, and medium
US10636435B1 (en) * 2018-12-22 2020-04-28 Microsemi Semiconductor (U.S.) Inc. Acoustic echo cancellation using low-frequency double talk detection
CN111916068A (en) * 2019-05-07 2020-11-10 北京地平线机器人技术研发有限公司 Audio detection method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001224099A (en) * 2000-02-14 2001-08-17 Pioneer Electronic Corp Sound field correction method in audio system
JP6069829B2 (en) * 2011-12-08 2017-02-01 ソニー株式会社 Ear hole mounting type sound collecting device, signal processing device, and sound collecting method
CN106910500B (en) * 2016-12-23 2020-04-17 北京小鸟听听科技有限公司 Method and device for voice control of device with microphone array

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102595281A (en) * 2011-01-14 2012-07-18 通用汽车环球科技运作有限责任公司 Unified microphone pre-processing system and method
WO2013175780A1 (en) * 2012-05-22 2013-11-28 京セラ株式会社 Electronic equipment and method of controlling electronic equipment
CN105448294A (en) * 2015-12-09 2016-03-30 江苏天安智联科技股份有限公司 Intelligent voice recognition system for vehicle equipment
CN208477912U (en) * 2018-06-28 2019-02-05 山西智济电子科技有限公司 A kind of locomotive vehicle-mounted audio intelligent analysis management reason system
CN109068215A (en) * 2018-08-14 2018-12-21 歌尔科技有限公司 A kind of noise-reduction method of In-Ear Headphones, device and In-Ear Headphones
CN110876106A (en) * 2018-08-31 2020-03-10 北京京东尚科信息技术有限公司 Electronic device, noise reduction method, computer system, and medium
CN109040501A (en) * 2018-09-10 2018-12-18 成都擎天树科技有限公司 A kind of echo cancel method improving VOIP phone quality
CN109360562A (en) * 2018-12-07 2019-02-19 深圳创维-Rgb电子有限公司 Echo cancel method, device, medium and voice awakening method and equipment
US10636435B1 (en) * 2018-12-22 2020-04-28 Microsemi Semiconductor (U.S.) Inc. Acoustic echo cancellation using low-frequency double talk detection
CN111916068A (en) * 2019-05-07 2020-11-10 北京地平线机器人技术研发有限公司 Audio detection method and device

Also Published As

Publication number Publication date
CN113763945A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN111883156B (en) Audio processing method and device, electronic equipment and storage medium
CN112004177B (en) Howling detection method, microphone volume adjustment method and storage medium
WO2014117722A1 (en) Speech processing method, device and terminal apparatus
CN111128167B (en) Far-field voice awakening method and device, electronic product and storage medium
CN104036771A (en) Signal processing device, signal processing method, and storage medium
KR20130054195A (en) Automatic gain control
CN110931007B (en) Voice recognition method and system
WO2023284402A1 (en) Audio signal processing method, system, and apparatus, electronic device, and storage medium
JP2020115206A (en) System and method
WO2022078351A1 (en) Terminal device and audio collection method therefor
CN101233561A (en) Enhancement of speech intelligibility in a mobile communication device by controlling the operation of a vibrator of a vibrator in dependance of the background noise
CN112383855A (en) Bluetooth headset charging box, recording method and computer readable storage medium
US11290802B1 (en) Voice detection using hearable devices
CN116312545B (en) Speech recognition system and method in a multi-noise environment
CN113763945B (en) Voice awakening method, device, equipment and storage medium
CN114697782A (en) Earphone wind noise identification method and device and earphone
CN107370898B (en) Ring tone playing method, terminal and storage medium thereof
CN112235462A (en) Voice adjusting method, system, electronic equipment and computer readable storage medium
CN111182416A (en) Processing method and device and electronic equipment
CN108882073B (en) Method and device for inhibiting wind noise of microphone, earphone and mobile terminal
JP2015070292A (en) Sound collection/emission device and sound collection/emission program
CN113744732A (en) Equipment wake-up related method and device and story machine
CN112312258B (en) Intelligent earphone with hearing protection and hearing compensation
JP2015070291A (en) Sound collection/emission device, sound source separation unit and sound source separation program
WO2019246314A1 (en) Acoustic aware voice user interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant