CN110211599A - Using awakening method, device, storage medium and electronic equipment - Google Patents
Using awakening method, device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN110211599A CN110211599A CN201910478400.6A CN201910478400A CN110211599A CN 110211599 A CN110211599 A CN 110211599A CN 201910478400 A CN201910478400 A CN 201910478400A CN 110211599 A CN110211599 A CN 110211599A
- Authority
- CN
- China
- Prior art keywords
- audio data
- preset
- filter coefficient
- adaptive filter
- electronic device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000012795 verification Methods 0.000 claims abstract description 98
- 238000012545 processing Methods 0.000 claims abstract description 59
- 230000002452 interceptive effect Effects 0.000 claims abstract description 8
- 230000003044 adaptive effect Effects 0.000 claims description 88
- 239000013598 vector Substances 0.000 claims description 52
- 230000003993 interaction Effects 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 14
- 239000000203 mixture Substances 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 10
- 230000002618 waking effect Effects 0.000 claims description 6
- 230000001755 vocal effect Effects 0.000 abstract description 5
- 230000002708 enhancing effect Effects 0.000 abstract 2
- 230000000875 corresponding effect Effects 0.000 description 55
- 230000006870 function Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000002592 echocardiography Methods 0.000 description 6
- 238000012937 correction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The embodiment of the present application discloses a kind of application awakening method, device, storage medium and electronic equipment, wherein, electronic equipment includes two microphones, can collect two-way audio data by two microphones, and get the background audio data played during audio collection;Then, echo cancellation process is carried out to two-way audio data according to background audio data, to eliminate self noise;Then, Wave beam forming processing is carried out to the two-way audio data after echo cancellor, to eliminate external noise, obtains enhancing audio data;Then, to enhancing audio data text feature and vocal print feature carry out two-stage verification, and two-stage verification by when wake up interactive voice application, to realize the interactive voice between electronic equipment and user.The application can exclude the interference of self noise and external noise as a result, and ensure to verify accuracy using two-stage verification, achieve the purpose that improve interactive voice application wake-up rate.
Description
Technical Field
The present application relates to the field of speech processing technologies, and in particular, to an application wake-up method, apparatus, storage medium, and electronic device.
Background
Currently, with the development of voice recognition technology, an electronic device (such as a mobile phone, a tablet computer, etc.) may perform voice interaction with a user through a running voice interaction application, for example, the user may say "i want to listen to a song", and then the voice interaction application recognizes the voice of the user and plays the song after recognizing the intention of the user that wants to listen to the song. It can be understood that the premise of voice interaction between the user and the electronic device is to wake up the voice interaction application, however, in an actual use environment, various noises often exist, so that the wake-up rate of the voice interaction application is low.
Disclosure of Invention
The embodiment of the application awakening method and device, the storage medium and the electronic equipment can improve the awakening rate of the voice interaction application.
In a first aspect, an embodiment of the present application provides an application wake-up method, which is applied to an electronic device, where the electronic device includes two microphones, and the application wake-up method includes:
acquiring two paths of audio data through the two microphones, and acquiring background audio data played in an audio acquisition period;
performing echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;
performing beam forming processing on the two paths of audio data after the echo cancellation to obtain enhanced audio data;
performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;
and if the secondary verification passes, awakening the voice interactive application.
In a second aspect, an embodiment of the present application provides an application waking device, which is applied to an electronic device, where the electronic device includes two microphones, and the application waking device includes:
the audio acquisition module is used for acquiring two paths of audio data through the two microphones and acquiring background audio data played in an audio acquisition period;
the echo cancellation module is used for carrying out echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;
the beam forming module is used for carrying out beam forming processing on the two paths of audio data after the echo cancellation to obtain enhanced audio data;
the audio verification module is used for performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;
and the application awakening module is used for awakening the voice interaction application when the secondary verification is passed.
In a third aspect, the present application provides a storage medium, on which a computer program is stored, and when the computer program is run on an electronic device including two microphones, the electronic device is caused to execute the application wakeup method provided in the present application.
In a fourth aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a processor, a memory, and two microphones, where the memory stores a computer program, and the processor is used to execute the application wake-up method provided in the embodiment of the present application by calling the processor.
In the embodiment of the application, the electronic equipment comprises two microphones, and the two microphones can acquire two paths of audio data and acquire background audio data played in an audio acquisition period; then, echo cancellation processing is carried out on the two paths of audio data according to the background audio data so as to eliminate self-noise; then, performing beam forming processing on the two paths of audio data after echo cancellation to eliminate external noise and obtain enhanced audio data; secondly, performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed; and finally, if the secondary verification passes, awakening the voice interaction application so as to realize the voice interaction between the electronic equipment and the user. Therefore, the method and the device can eliminate the interference of self noise and external noise, ensure the verification accuracy by utilizing two-stage verification and achieve the purpose of improving the awakening rate of the voice interaction application.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart illustrating an application wake-up method according to an embodiment of the present disclosure.
Fig. 2 is a schematic diagram of the arrangement positions of two microphones in the embodiment of the present application.
Fig. 3 is a schematic flow chart of training a voiceprint feature extraction model in the embodiment of the present application.
Fig. 4 is a schematic diagram of a spectrogram extracted in the example of the present application.
Fig. 5 is another flowchart of an application wake-up method according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of an application wake-up apparatus according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Fig. 8 is another schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.
The embodiment of the present application first provides an application wake-up method, where an execution main body of the application wake-up method may be an electronic device provided in the embodiment of the present application, the electronic device includes two microphones, and the electronic device may be a device with processing capability and configured with a processor, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.
Referring to fig. 1, fig. 1 is a flowchart illustrating an application wake-up method according to an embodiment of the present disclosure. The application wake-up method is applied to the electronic device provided by the present application, where the electronic device includes two microphones, as shown in fig. 1, a flow of the application wake-up method provided by the embodiment of the present application may be as follows:
in 101, two paths of audio data are acquired by two microphones, and background audio data played during audio acquisition is acquired.
For example, two microphones included in the electronic device are arranged back to back and separated by a preset distance, where the arrangement of the two microphones back to back means that sound pickup holes of the two microphones face opposite directions. For example, referring to fig. 2, the electronic device includes two microphones, which are a microphone 1 disposed on a lower side of the electronic device and a microphone 2 disposed on an upper side of the electronic device, respectively, wherein a sound-collecting hole of the microphone 1 faces downward, a sound-collecting hole of the microphone 2 faces upward, and a connection line between the microphone 2 and the microphone 1 is parallel to left/right sides of the electronic device. Furthermore, the two microphones included in the electronic device may be non-directional microphones (or, omni-directional microphones).
In the embodiment of the application, the electronic equipment can collect sound through two microphones arranged back to back during playing audio and video, so that two paths of audio data with the same time length are collected. In addition, the electronic device may also obtain audio data played during audio acquisition, which may be independent audio data, such as audio files, songs, etc. played, or audio data appended to the video data, etc. It should be noted that, in order to distinguish between audio data obtained by performing sound acquisition and audio data played during audio acquisition, the audio data obtained during audio acquisition is referred to as background audio data in the present application.
At 102, echo cancellation processing is performed on the two paths of audio data according to the background audio data, so as to obtain two paths of audio data after echo cancellation.
It should be noted that, during playing audio and video, the electronic device performs sound collection through two microphones, and will collect and obtain the sound of the playing background audio data, that is, echo (or self-noise). In the application, in order to eliminate echoes in the two collected audio data, echo cancellation processing is further performed on the two audio data by using an echo cancellation algorithm according to background audio data so as to eliminate echoes in the two audio data, and the two audio data after echo cancellation are obtained. It should be noted that, in the embodiment of the present application, there is no particular limitation on what echo cancellation algorithm is used, and a person skilled in the art may select the echo cancellation algorithm according to actual needs.
For example, the electronic device may perform anti-phase processing on the background audio data to obtain anti-phase background audio data, and then superimpose the anti-phase background audio data with the two paths of audio data respectively to eliminate echoes in the two paths of audio data, so as to obtain two paths of audio data after echo cancellation.
In a popular way, the echo cancellation process performed above cancels the self-noise carried in the audio data.
In 103, the two paths of audio data after echo cancellation are processed by beamforming to obtain enhanced audio data.
After completing echo cancellation processing on the two paths of audio data and obtaining the two paths of audio data after echo cancellation, the electronic device further performs beam forming processing on the two paths of audio data after echo cancellation to obtain a path of audio data with a higher signal-to-noise ratio, and the audio data is recorded as enhanced audio data.
In colloquial terms, the beamforming process performed above eliminates external noise carried in the audio data. Therefore, the electronic device obtains the enhanced audio data with self-noise and external noise removed through echo cancellation processing and beam forming processing of the two paths of acquired audio data.
At 104, a primary check is performed on the text features and the voiceprint features of the enhanced audio data, and a secondary check is performed on the text features and the voiceprint features of the enhanced audio data after the primary check is passed.
As described above, the enhanced audio data eliminates self-noise and external noise compared to the collected original two-way audio data, which has a higher signal-to-noise ratio. At this time, the electronic device further performs two-stage verification on the text feature and the voiceprint feature of the enhanced audio data, wherein the electronic device performs one-stage verification on the text feature and the voiceprint feature of the enhanced audio data based on the first wake-up algorithm, and if the one-stage verification passes, the electronic device performs two-stage verification on the text feature and the voiceprint feature of the enhanced audio data based on the second wake-up algorithm.
It should be noted that, in the embodiment of the present application, whether the primary verification or the secondary verification is performed on the text feature and the voiceprint feature of the enhanced audio data, it is verified whether the enhanced audio data includes a preset wake-up word spoken by a preset user (for example, an owner of the electronic device, or another user who the owner authorizes to use the electronic device), if the enhanced audio data includes the preset wake-up word spoken by the preset user, the text feature and the voiceprint feature of the enhanced audio data are verified to be passed, and otherwise, the verification is not passed. For example, the enhanced audio data includes a preset wake-up word set by a preset user, and if the preset wake-up word is spoken by the preset user, the text feature and the voiceprint feature of the enhanced audio data pass the verification. For another example, when the enhanced audio data includes a preset wake-up word spoken by a user other than the preset user, or the enhanced audio data does not include any preset wake-up word spoken by the user, the verification fails (or the verification fails).
In addition, it should be further noted that, in the embodiment of the present application, the first wake-up algorithm and the second wake-up algorithm adopted by the electronic device are different. For example, the first voice wake-up algorithm is a voice wake-up algorithm based on a gaussian mixture model, and the second voice wake-up algorithm is a voice wake-up algorithm based on a neural network.
In 105, if the secondary check passes, the voice interaction application is awakened.
Among them, the voice interactive application is a so-called voice assistant, such as the voice assistant "xiaoho" of the european.
Based on the above description, it can be understood by those skilled in the art that when the secondary verification of the enhanced audio data passes, it indicates that a preset user currently speaks a preset wake-up word, and at this time, the voice interaction application is woken up, so as to implement voice interaction between the electronic device and the user.
As can be seen from the above, in the embodiment of the application, the electronic device may acquire two paths of audio data through two microphones, and acquire background audio data played during audio acquisition; then, echo cancellation processing is carried out on the two paths of audio data according to the background audio data so as to eliminate self-noise; then, performing beam forming processing on the two paths of audio data after echo cancellation to eliminate external noise and obtain enhanced audio data; secondly, performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed; and finally, if the secondary verification passes, awakening the voice interaction application so as to realize the voice interaction between the electronic equipment and the user. Therefore, the method and the device can eliminate the interference of self noise and external noise, ensure the verification accuracy by utilizing two-stage verification and achieve the purpose of improving the awakening rate of the voice interaction application.
In one embodiment, "echo cancellation processing two audio data according to background audio data" includes:
(1) obtaining an initial adaptive filter coefficient, and iteratively updating the initial adaptive filter coefficient according to background audio data and audio data to obtain a target adaptive filter coefficient;
(2) and performing echo cancellation processing on the audio data according to the target self-adaptive filter coefficient.
In the embodiment of the present application, when the electronic device performs echo cancellation processing on two paths of audio data according to background audio data, the following description will take echo cancellation processing on one path of audio data as an example.
The electronic equipment firstly acquires an initial adaptive filter coefficient, and then iteratively updates the initial adaptive filter coefficient according to background audio data and one path of audio data to obtain a target adaptive filter coefficient. Then, the electronic device estimates echo audio data carried in the audio data according to the target adaptive filter coefficient obtained by iterative update, so as to eliminate the echo audio data carried in the audio data, and complete echo cancellation processing on the audio data, as shown in the following formula:
X’=X-WT*X;
where X' denotes audio data after echo cancellation, X denotes audio data before echo cancellation, W denotes a target adaptive filter coefficient, and T denotes transposition.
In one embodiment, "iteratively updating the initial adaptive filter coefficients according to the background audio data and the audio data to obtain the target adaptive filter coefficients" includes:
(1) obtaining the self-adaptive filter coefficient at the current moment according to the initial self-adaptive filter coefficient;
(2) estimating echo audio data carried in the audio data and corresponding to the current moment according to the coefficient of the adaptive filter at the current moment;
(3) acquiring error audio data at the current moment according to the background audio data and the echo audio data obtained by estimation;
(4) and identifying the active part of the adaptive filter coefficient at the current moment, updating the active part of the adaptive filter coefficient at the current moment according to the error audio data at the current moment, and adjusting the order of the adaptive filter coefficient at the current moment to obtain the adaptive filter coefficient at the next moment.
How to iteratively update the initial adaptive filter coefficients is described in one update process below.
The current time is not specific to a certain time, but refers to a time when the initial adaptive filter coefficient is updated once.
Taking the first update of the initial adaptive filter coefficient as an example, the electronic device obtains the initial adaptive filter coefficient as the adaptive filter coefficient at the current time k. For example, the adaptive filter coefficient obtained at current time k is W(k)=[w0,w1,w3...wL-1]TAnd has a length L.
Then, the electronic device estimates, according to the adaptive filter coefficient at the current time k, that the audio data carries echo audio data corresponding to the current time, as shown in the following formula:
wherein,representing the estimated echo audio data corresponding to the current time k, and x (k) representing the portion of the audio data corresponding to the current time k.
Then, the electronic device obtains error audio data of the current time k according to the portion of the background audio data corresponding to the current time k and the echo audio data obtained by estimation, as shown in the following formula:
where e (k) represents error audio data at the current time k, and r (k) represents a portion of the background audio data corresponding to the current time k.
It should be noted that larger filter orders increase computational complexity, while smaller filter orders do not fully converge the echo. In the application, the coefficients of the adaptive filter are mostly 0, and only a small part of the coefficients play a role of iterative update, so that only the active part of the adaptive filter can be iteratively updated, and the order of the adaptive filter can be adjusted in real time.
Correspondingly, in this embodiment of the present application, after acquiring the error audio data at the current time, the electronic device further identifies an active part of the adaptive filter coefficient at the current time k, so as to update the active part of the adaptive filter coefficient at the current time according to the error audio data at the current time, as shown in the following formula:
W(k+1)=W(k)+ux(k)e(k);
wherein u represents a preset convergence step length, which can be set by a person skilled in the art according to actual needs, and this is not specifically limited in the embodiment of the present application. It is emphasized that the adaptive filter coefficient W at the current time k(k)When an update is made, only the active part thereof is updated. For example, W(k)=[w0,w1,w3...wL-1]TWherein [ w0,w1,w3...wL-3]Determined to be active, the electronic device pairs w as per the formula above0,w1,w3...wL-3]And (6) updating.
In addition, the electronic device adjusts the order of the adaptive filter coefficient at the current time according to the identified active portion, thereby obtaining an adaptive filter coefficient W (k +1) at the next time.
In one embodiment, "identifying an active portion of adaptive filter coefficients for a current time instant" includes:
(1) dividing the adaptive filter coefficient at the current moment into a plurality of sub-filter coefficients with equal length;
(2) acquiring the average value and the variance of each sub-filter coefficient from the back to the front, and determining the first sub-filter coefficient and the previous sub-filter coefficient of which the average value is greater than the preset average value and the variance is greater than the preset variance as active parts;
adjusting the order of the adaptive filter coefficient at the current time comprises:
(3) and judging whether the first sub-filter coefficient is the last sub-filter coefficient, if so, increasing the order of the adaptive filter coefficient at the current moment, and otherwise, reducing the order of the adaptive filter coefficient at the current moment.
In the embodiment of the application, when the electronic device identifies the active part of the adaptive filter coefficient at the current time, the electronic device firstly identifies the adaptive filter at the current timeThe coefficients are divided into a plurality of sub-filter coefficients of equal length (the length is greater than 1), for example, the electronic device sets the adaptive filter coefficient W at the current time to [ W ═ W%0,w1,w2...wL-1]TDividing the coefficient into M sub-filter coefficients with equal length, wherein the length of each sub-filter coefficient is L/M, and then obtaining the mth sub-filter coefficient Wm=[wmL/M,wmL/M+1,wmL/M+2…w(m+1)L/M]TAnd M has a value range of [0, M]。
Then, the electronic device obtains the average value and the variance of each sub-filter coefficient from the back to the front, that is, the average value and the variance of the Mth sub-filter coefficient are obtained first, and then the average value and the scheme of the M-1 th sub-filter coefficient are obtained until the first sub-filter coefficient of which the average value is larger than the preset average value and the opposite side difference is larger than the preset variance is obtained, and the first sub-filter coefficient and the sub-filter coefficients before the first sub-filter coefficient are determined as the active part of the adaptive filter coefficient at the current moment.
The preset average value and the preset variance may be obtained by a person skilled in the art through experimental adjustment, which is not specifically limited in the embodiment of the present application, for example, in the embodiment of the present application, the preset average value may be 0.000065, and the preset variance may be 0.003.
In addition, when the order of the adaptive filter coefficient at the current time is adjusted, the electronic device may determine whether the first sub-filter coefficient is the last sub-filter coefficient, if so, it indicates that the order of the adaptive filter coefficient at the current time is insufficient, and increase the order of the adaptive filter coefficient at the current time, otherwise, it indicates that the order of the adaptive filter coefficient at the current time is sufficient, and may decrease the order of the adaptive filter coefficient at the current time.
In this embodiment, for the variation of increasing or decreasing the order, a person skilled in the art can take an empirical value according to actual needs, and the embodiment of the present application does not specifically limit this.
In an embodiment, the "performing beamforming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data" includes:
and respectively carrying out beam forming processing on the two paths of audio data after echo cancellation at a plurality of preset angles by adopting a preset beam forming algorithm to obtain a plurality of enhanced audio data.
In the embodiment of the present application, a plurality of preset angles are provided for a microphone of an electronic device, for example, in a process of performing voice interaction with a user, the electronic device counts incoming wave angles of voice of the user to obtain a plurality of incoming wave angles at which a user usage probability reaches a preset probability, and uses the plurality of incoming wave angles as the plurality of preset angles.
Therefore, the electronic equipment can preset a beam forming algorithm to perform beam forming processing on the two paths of audio data after echo cancellation at a plurality of preset angles respectively to obtain a plurality of enhanced audio data.
For example, assume that 3 preset angles are provided, each being θ1,θ2And theta3The GSC algorithm can be adopted for beam forming processing, and since the GSC algorithm needs to estimate the beam forming angle in advance, the electronic equipment will estimate the angle theta1,θ2And theta3As the beam forming angles estimated by the GSC algorithm, the GSC algorithm is adopted to respectively aim at theta1,θ2And theta3And performing beam forming processing to obtain 3 paths of enhanced audio data.
As described above, in the embodiment of the present application, the preset angle is used instead of the beam forming angle of the angle estimation, so that time-consuming angle estimation is not required, and the overall efficiency of beam forming can be improved.
In one embodiment, "performing a primary check on text features and voiceprint features of the enhanced audio data" includes:
(1) extracting Mel frequency cepstrum coefficients of the enhanced audio data corresponding to each preset angle;
(2) calling a target voiceprint characteristic model related to a preset text to match the extracted mel frequency cepstrum coefficients;
(3) if the matched mel frequency cepstrum coefficient exists, judging that the primary check is passed;
the target voiceprint feature model is obtained by a Gaussian mixture general background model related to a preset text in a self-adaptive mode according to the Mel frequency cepstrum coefficient of the preset audio data, and the preset audio data are audio data of a preset text spoken by a preset user.
The first-order wake-up algorithm is explained below.
It should be noted that, in the embodiment of the present application, a gaussian mixture general background model related to a preset text is trained in advance. The preset text is the above mentioned preset wake-up word. For example, audio data of a plurality of people (e.g., 200 people) who speak a preset wake-up word may be collected in advance, mel-frequency cepstrum coefficients of the audio data are extracted respectively, and a gaussian mixture general background model related to a preset text (i.e., the preset wake-up word) is obtained through training according to the mel-frequency cepstrum coefficients of the audio data.
Then, further training is carried out on the Gaussian mixture general background model, wherein self-adaptive processing (such as self-adaptive algorithms of maximum posterior probability MAP, maximum likelihood linear regression MLLR and the like) is carried out on the Gaussian mixture general background model according to the Mel frequency cepstrum coefficients of the preset audio data, the preset audio data is audio data of a preset text (namely a preset awakening word) spoken by a preset user, therefore, each Gaussian distribution of the Gaussian mixture general background model is close to the Mel frequency cepstrum coefficient corresponding to the preset user, the Gaussian mixture general background model carries the voiceprint characteristics of the preset user, and the Gaussian mixture general background model carrying the voiceprint characteristics of the preset user is recorded as a target voiceprint characteristic model.
Therefore, when the electronic equipment performs primary verification on the text features and the voiceprint features of the enhanced audio data, Mel frequency cepstrum coefficients of the enhanced audio data corresponding to each preset angle are respectively extracted, then target voiceprint feature models related to preset texts are called to respectively match the extracted Mel frequency cepstrum coefficients, the electronic equipment inputs the extracted Mel frequency cepstrum coefficients into the target voiceprint feature models, the target voiceprint feature models identify the input Mel frequency cepstrum coefficients and output a score, when the output score reaches a preset threshold value, the input Mel frequency cepstrum coefficients can be judged to be matched with the target voiceprint feature models, and otherwise, the input Mel frequency cepstrum coefficients are not matched with the target voiceprint feature models. For example, in the embodiment of the present application, the interval of the output score of the target voiceprint feature model is [0,1], and the preset threshold is configured to be 0.28, that is, when the score corresponding to the mel-frequency cepstrum coefficient of the input target voiceprint feature model reaches 0.28, the electronic device determines that the mel-frequency cepstrum coefficient matches with the target voiceprint feature model.
After the electronic equipment calls a target voiceprint feature model related to the preset text to match the extracted mel frequency cepstrum coefficients, if the matched mel frequency cepstrum coefficients exist, the electronic equipment judges that the primary check is passed.
In one embodiment, "performing a secondary check on text features and voiceprint features of enhanced audio data" includes:
(1) dividing the enhanced audio data corresponding to the preset angle into a plurality of sub audio data;
(2) extracting a voiceprint characteristic vector of each sub audio data according to a voiceprint characteristic extraction model related to a preset text;
(3) acquiring similarity between each voiceprint feature vector and a target voiceprint feature vector, wherein the target voiceprint feature vector is a voiceprint feature vector of preset audio data;
(4) according to the similarity corresponding to each sub audio data, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;
(5) and if the enhanced audio data corresponding to the preset angle passing the verification exists, judging that the secondary verification passes.
The secondary wake-up algorithm is explained below.
In the embodiment of the present application, it is considered that the enhanced audio data may not only include the preset wake-up word, for example, the preset wake-up word is "small europe and small europe", and the enhanced audio data is "small europe and small europe with your good". Therefore, in the embodiment of the present application, the voice part is divided into a plurality of sub-audio data according to the length of the preset wakeup word, where the length of each sub-audio data is greater than or equal to the length of the preset wakeup word, and two adjacent sub-audio data have an overlapped part, and as for the length of the overlapped part, the length of the overlapped part may be set by a person skilled in the art according to actual needs, for example, the length of the overlapped part is set to be 25% of the length of the sub-audio data in the embodiment of the present application.
It should be noted that in the embodiment of the present application, a voiceprint feature extraction model related to a preset text (i.e., a preset wake-up word) is also trained in advance. For example, in the embodiment of the present application, a voiceprint feature extraction model based on a convolutional neural network is trained, as shown in fig. 3, audio data of a preset wakeup word spoken by multiple persons (for example, 200 persons) is collected in advance, then endpoint detection is performed on the audio data, a preset wakeup word part is segmented out, then preprocessing (for example, high-pass filtering) and windowing are performed on the segmented preset wakeup word part, then fourier transform (for example, short-time fourier transform) is performed, and then energy density is calculated, a spectrogram of a gray scale is generated (as shown in fig. 4, wherein a horizontal axis represents time, a vertical axis represents frequency, and a gray scale represents an energy value), and finally, the generated spectrogram is trained by using the convolutional neural network, and a voiceprint feature extraction model related to a preset text is generated. In addition, in the embodiment of the application, a spectrogram of audio data of a preset user speaking a preset wakeup word (that is, a preset text) is extracted and input into a previously trained voiceprint feature extraction model, and after passing through a plurality of convolution layers, pooling layers and full-link layers of the voiceprint feature extraction model, a corresponding group of feature vectors are output and recorded as a target voiceprint feature vector.
Correspondingly, after the electronic device divides the enhanced audio data corresponding to the preset angle into a plurality of sub audio data, the spectrogram of each sub audio data is respectively extracted. For how to extract the spectrogram, details are not repeated here, and specific reference may be made to the above related description. After extracting the spectrogram of the sub-audio data, the electronic device inputs the spectrogram of the sub-audio data into a previously trained voiceprint feature extraction model, so as to extract a voiceprint feature vector of each sub-audio data.
After extracting the voiceprint feature vectors of the sub-audio data, the electronic device respectively obtains the similarity between the voiceprint feature vectors of the sub-audio data and the target voiceprint feature vector, and then verifies the text feature and the voiceprint feature of the enhanced audio data corresponding to the preset angle according to the similarity corresponding to the sub-audio data. For example, the electronic device may determine whether there is sub audio data whose similarity between the voiceprint feature vector and the target voiceprint feature vector reaches a preset similarity (an empirical value may be taken by a person of ordinary skill in the art according to actual needs, and may be set to 75%, for example), and if there is, determine a text feature and a voiceprint feature of the enhanced audio data corresponding to the preset angle.
After the electronic equipment completes the verification of the text features and the voiceprint features of the enhanced audio data corresponding to the preset angle, if the enhanced audio data corresponding to the preset angle passes the verification, the electronic equipment judges that the secondary verification passes.
In an embodiment, the verifying the text feature and the voiceprint feature of the enhanced audio data corresponding to the predetermined angle according to the similarity corresponding to each sub-audio data includes:
according to the similarity corresponding to each sub audio data and a preset identification function, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;
wherein is presetRecognition function of gamman=γn-1+f(ln),γnRepresenting the state value, gamma, of the recognition function corresponding to the nth sub-audio datan-1Represents the state value of the recognition function corresponding to the n-1 th sub audio data,a is a correction value of the recognition function, b is a predetermined similarity, lnIf the similarity exists between the voiceprint characteristic vector of the nth sub-audio data and the target voiceprint characteristic vector, the similarity is larger than the gamma of the preset identification function state valuenAnd judging that the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle pass verification.
It should be noted that the value of a in the recognition function can be an empirical value according to actual needs by those skilled in the art, for example, a can be set to 1.
In addition, the value of b in the recognition function is positively correlated with the recognition rate of the voiceprint feature extraction model, and the value of b is determined according to the recognition rate of the voiceprint feature extraction model obtained through actual training.
In addition, the preset recognition function state value can also be obtained by a person skilled in the art according to actual needs, and the higher the value is, the higher the accuracy of verification on the voice part is.
Therefore, through the identification function, even if other information except the preset awakening words is included in the enhanced audio data, the enhanced audio data can be accurately verified.
Optionally, when obtaining the similarity between the voiceprint feature vector of each sub-audio data and the target voiceprint feature training, the similarity between the voiceprint feature vector of each sub-audio data and the target voiceprint feature vector may be calculated according to a dynamic time warping algorithm.
Or, a feature distance between the voiceprint feature vector of each sub-audio data and the target voiceprint feature vector may be calculated as a similarity, and as to what feature distance is used to measure the similarity between the two vectors, no specific limitation is imposed in this embodiment of the application, for example, an euclidean distance may be used to measure the similarity between the voiceprint feature vector of the sub-audio data and the target voiceprint feature vector.
Fig. 5 is another flowchart of an application wake-up method according to an embodiment of the present application. The application wake-up method is applied to the electronic device provided by the present application, where the electronic device includes two microphones, as shown in fig. 5, a flow of the application wake-up method provided by the embodiment of the present application may be as follows:
in 201, the electronic device determines whether the electronic device is in an audio/video playing state based on a processor, if so, the electronic device proceeds to 202, and if not, the electronic device proceeds to 206.
In the embodiment of the application, the electronic device firstly judges whether the electronic device is in the audio and video playing state based on the processor, for example, taking an android system as an example, the electronic device receives an android internal message based on the processor, and judges whether the electronic device is in the audio and video playing state according to the android internal message.
In 202, the electronic device acquires two paths of audio data through two microphones, and acquires background audio data played during audio acquisition.
For example, two microphones included in the electronic device are arranged back to back and separated by a preset distance, where the arrangement of the two microphones back to back means that sound pickup holes of the two microphones face opposite directions. For example, referring to fig. 2, the electronic device includes two microphones, which are a microphone 1 disposed on a lower side of the electronic device and a microphone 2 disposed on an upper side of the electronic device, respectively, wherein a sound-collecting hole of the microphone 1 faces downward, a sound-collecting hole of the microphone 2 faces upward, and a connection line between the microphone 2 and the microphone 1 is parallel to left/right sides of the electronic device. Furthermore, the two microphones included in the electronic device may be non-directional microphones (or, omni-directional microphones).
In the embodiment of the application, the electronic equipment can collect sound through two microphones arranged back to back during playing audio and video, so that two paths of audio data with the same time length are collected. In addition, the electronic device may also obtain audio data played during audio acquisition, which may be independent audio data, such as audio files, songs, etc. played, or audio data appended to the video data, etc. It should be noted that, in order to distinguish between audio data obtained by performing sound acquisition and audio data played during audio acquisition, the audio data obtained during audio acquisition is referred to as background audio data in the present application.
At 203, the electronic device performs echo cancellation processing on the two paths of audio data based on the processor according to the background audio data to obtain two paths of audio data after echo cancellation.
It should be noted that, during playing audio and video, the electronic device performs sound collection through two microphones, and will collect and obtain the sound of the playing background audio data, that is, echo (or self-noise). In the application, in order to eliminate echoes in the two collected audio data, an echo cancellation algorithm is called based on the processor to perform echo cancellation processing on the two audio data further according to background audio data so as to eliminate echoes in the two audio data and obtain two audio data after echo cancellation. It should be noted that, in the embodiment of the present application, there is no particular limitation on what echo cancellation algorithm is used, and a person skilled in the art may select the echo cancellation algorithm according to actual needs.
For example, the electronic device may perform anti-phase processing on the background audio data based on the processor to obtain anti-phase background audio data, and then superimpose the anti-phase background audio data with the two paths of audio data respectively to eliminate echoes in the two paths of audio data, so as to obtain two paths of audio data after echo cancellation.
In a popular way, the echo cancellation process performed above cancels the self-noise carried in the audio data.
At 204, the electronic device performs beamforming processing on the two paths of audio data after echo cancellation based on the processor, so as to obtain enhanced audio data.
After the electronic device completes echo cancellation processing on the two paths of audio data to obtain two paths of audio data after echo cancellation, the electronic device further performs beam forming processing on the two paths of audio data after echo cancellation based on the processor to obtain one path of audio data with a higher signal-to-noise ratio, and the audio data is recorded as enhanced audio data.
In colloquial terms, the beamforming process performed above eliminates external noise carried in the audio data. Therefore, the electronic device obtains the enhanced audio data with self-noise and external noise removed through echo cancellation processing and beam forming processing of the two paths of acquired audio data.
In 205, the electronic device performs a primary check on the text feature and the voiceprint feature of the enhanced audio data based on the processor, performs a secondary check on the text feature and the voiceprint feature of the enhanced audio data based on the processor after the primary check is passed, and wakes up the voice interaction application based on the processor if the secondary check is passed.
As described above, the enhanced audio data eliminates self-noise and external noise compared to the collected original two-way audio data, which has a higher signal-to-noise ratio. At this time, the electronic device further performs two-stage verification on the text feature and the voiceprint feature of the enhanced audio data based on the processor, wherein the first wake-up algorithm is called based on the processor to perform one-stage verification on the text feature and the voiceprint feature of the enhanced audio data, and if the one-stage verification passes, the second wake-up algorithm is called based on the processor to perform two-stage verification on the text feature and the voiceprint feature of the enhanced audio data.
It should be noted that, in the embodiment of the present application, whether the primary verification or the secondary verification is performed on the text feature and the voiceprint feature of the enhanced audio data, it is verified whether the enhanced audio data includes a preset wake-up word spoken by a preset user (for example, an owner of the electronic device, or another user who the owner authorizes to use the electronic device), if the enhanced audio data includes the preset wake-up word spoken by the preset user, the text feature and the voiceprint feature of the enhanced audio data are verified to be passed, and otherwise, the verification is not passed. For example, the enhanced audio data includes a preset wake-up word set by a preset user, and if the preset wake-up word is spoken by the preset user, the text feature and the voiceprint feature of the enhanced audio data pass the verification. For another example, when the enhanced audio data includes a preset wake-up word spoken by a user other than the preset user, or the enhanced audio data does not include any preset wake-up word spoken by the user, the verification fails (or the verification fails).
In addition, it should be further noted that, in the embodiment of the present application, the first wake-up algorithm and the second wake-up algorithm adopted by the electronic device are different. For example, the first voice wake-up algorithm is a voice wake-up algorithm based on a gaussian mixture model, and the second voice wake-up algorithm is a voice wake-up algorithm based on a neural network.
At 206, the electronic device acquires a channel of audio data through any of the microphones.
When the electronic equipment does not play audio and video, sound collection is carried out through any microphone, and one path of audio data is obtained.
In 207, the electronic device performs a primary verification on the one path of audio data based on the dedicated voice recognition chip, and performs a secondary verification on the one path of audio data based on the processor after the primary verification passes.
The dedicated voice recognition chip is a dedicated chip designed for voice recognition, such as a digital signal processing chip designed for voice, an application specific integrated circuit chip designed for voice, and the like, and has lower power consumption than a general-purpose processor.
After the electronic equipment acquires the path of audio data, calling a third awakening algorithm based on a special voice recognition chip to verify the path of audio data, wherein text characteristics and voiceprint characteristics of the path of audio data can be verified simultaneously, and text characteristics of the path of audio data can be verified only.
For example, the electronic device may extract the mel-frequency cepstrum coefficient of the aforementioned audio data based on a dedicated speech recognition chip; then, calling a Gaussian mixture general background model related to a preset text based on a special voice recognition chip to match the extracted Mel frequency cepstrum coefficient; if the matching is successful, the text characteristic check of the path of audio data is judged to be passed.
After the first-level verification of the one path of audio data is passed, the electronic device further performs second-level verification on the one path of audio data based on the processor, wherein when the electronic device performs the second-level verification on the one path of audio data based on the processor, the first awakening algorithm or the second awakening algorithm is called based on the processor to verify the text feature and the sound pattern feature of the one path of audio data.
At 208, if the secondary check passes, the electronic device wakes up the voice interaction application based on the processor.
When the secondary verification of the audio data passes, the electronic equipment can wake up the voice interaction application based on the processor, and the voice interaction between the electronic equipment and the user is realized.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an application wake-up apparatus according to an embodiment of the present application. The application waking device can be applied to an electronic device which comprises two microphones. The wake-on-app device may include an audio acquisition module 401, an echo cancellation module 402, a beamforming module 403, an audio verification module 404, and a wake-on-app module 405, wherein,
the audio acquisition module 401 is configured to acquire two paths of audio data through two microphones and acquire background audio data played during audio acquisition;
the echo cancellation module 402 is configured to perform echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;
a beam forming module 403, configured to perform beam forming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data;
the audio verification module 404 is configured to perform primary verification on the text features and the voiceprint features of the enhanced audio data, and perform secondary verification on the text features and the voiceprint features of the enhanced audio data after the primary verification is passed;
and an application wake-up module 405, configured to wake up the voice interaction application when the secondary verification passes.
In an embodiment, when performing echo cancellation processing on two audio data according to background audio data, the echo cancellation module 402 may be configured to:
obtaining an initial adaptive filter coefficient, and iteratively updating the initial adaptive filter coefficient according to background audio data and audio data to obtain a target adaptive filter coefficient;
and performing echo cancellation processing on the audio data according to the target self-adaptive filter coefficient.
In one embodiment, when iteratively updating the initial adaptive filter coefficients according to the background audio data and the audio data to obtain the target adaptive filter coefficients, the echo cancellation module 402 may be configured to:
obtaining the self-adaptive filter coefficient at the current moment according to the initial self-adaptive filter coefficient;
estimating echo audio data carried in the audio data and corresponding to the current moment according to the coefficient of the adaptive filter at the current moment;
acquiring error audio data at the current moment according to the background audio data and the echo audio data obtained by estimation;
and identifying the active part of the adaptive filter coefficient at the current moment, updating the active part of the adaptive filter coefficient at the current moment according to the error audio data at the current moment, and adjusting the order of the adaptive filter coefficient at the current moment to obtain the adaptive filter coefficient at the next moment.
In an embodiment, in identifying the active portion of the adaptive filter coefficients at the current time, the echo cancellation module 402 may be configured to:
dividing the adaptive filter coefficient at the current moment into a plurality of sub-filter coefficients with equal length;
acquiring the average value and the variance of each sub-filter coefficient from the back to the front, and determining the first sub-filter coefficient and the previous sub-filter coefficient of which the average value is greater than the preset average value and the variance is greater than the preset variance as active parts;
while adjusting the order of the adaptive filter coefficients at the current time, the echo cancellation module 402 may be configured to:
and judging whether the first sub-filter coefficient is the last sub-filter coefficient, if so, increasing the order of the adaptive filter coefficient at the current moment, and otherwise, reducing the order of the adaptive filter coefficient at the current moment.
In an embodiment, when performing beamforming on the two paths of audio data after echo cancellation to obtain enhanced audio data, the beamforming module 403 may be configured to:
and respectively carrying out beam forming processing on the two paths of audio data after echo cancellation at a plurality of preset angles by adopting a preset beam forming algorithm to obtain a plurality of enhanced audio data.
In one embodiment, in performing the primary verification on the text feature and the voiceprint feature of the enhanced audio data, the audio verification module 404 may be configured to:
extracting Mel frequency cepstrum coefficients of the enhanced audio data corresponding to each preset angle;
calling a target voiceprint characteristic model related to a preset text to match the extracted mel frequency cepstrum coefficients;
if the matched mel frequency cepstrum coefficient exists, judging that the primary check is passed;
the target voiceprint feature model is obtained by a Gaussian mixture general background model related to a preset text in a self-adaptive mode according to the Mel frequency cepstrum coefficient of the preset audio data, and the preset audio data are audio data of a preset text spoken by a preset user.
In one embodiment, in performing the secondary verification on the text feature and the voiceprint feature of the enhanced audio data, the audio verification module 404 may be configured to:
dividing the enhanced audio data corresponding to the preset angle into a plurality of sub audio data;
extracting a voiceprint characteristic vector of each sub audio data according to a voiceprint characteristic extraction model related to a preset text;
acquiring similarity between each voiceprint feature vector and a target voiceprint feature vector, wherein the target voiceprint feature vector is a voiceprint feature vector of preset audio data;
according to the similarity corresponding to each sub audio data, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;
and if the enhanced audio data corresponding to the preset angle passing the verification exists, judging that the secondary verification passes.
In an embodiment, when the text feature and the voiceprint feature of the enhanced audio data corresponding to the preset angle are checked according to the similarity corresponding to each piece of sub audio data, the audio checking module 404 may be configured to:
according to the similarity corresponding to each sub audio data and a preset identification function, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;
wherein the preset identification function is gamman=γn-1+f(ln),γnRepresenting the state value, gamma, of the recognition function corresponding to the nth sub-audio data-1Represents the state value of the recognition function corresponding to the n-1 th sub audio data,a is a correction value of the recognition function, b is a predetermined similarity, lnIf the similarity exists between the voiceprint characteristic vector of the nth sub-audio data and the target voiceprint characteristic vector, the similarity is larger than the gamma of the preset identification function state valuenAnd judging that the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle pass verification.
In an embodiment, when obtaining the similarity between the voiceprint feature vector of each sub-audio data and the target voiceprint feature training, the audio verification module 404 may be configured to:
calculating the similarity between the vocal print characteristic vector of each sub audio data and the target vocal print characteristic vector according to a dynamic time warping algorithm;
or, calculating a feature distance between the voiceprint feature vector of each sub-audio data and the target voiceprint feature vector as a similarity.
The embodiment of the present application provides a storage medium, on which an instruction execution program is stored, and when the stored instruction execution program is executed on an electronic device provided in the embodiment of the present application, the electronic device is caused to execute the steps in the application wake-up method provided in the embodiment of the present application. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
Referring to fig. 7, the electronic device includes a processor 501, a memory 502, and a microphone 503.
The processor 501 in the present embodiment is a general purpose processor, such as an ARM architecture processor.
The memory 502 stores an instruction execution program, which may be a high speed random access memory, or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 access to the memory 502 to implement the following functions:
acquiring two paths of audio data through two microphones, and acquiring background audio data played in an audio acquisition period;
performing echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;
performing beam forming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data;
performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;
and if the secondary verification passes, awakening the voice interactive application.
Referring to fig. 8, fig. 8 is another schematic structural diagram of the electronic device according to the embodiment of the present disclosure, and the difference from the electronic device shown in fig. 7 is that the electronic device further includes components such as an input unit 504 and an output unit 505.
The input unit 504 may be used for receiving input numbers, character information, or user characteristic information (such as fingerprints), and generating a keyboard, a mouse, a joystick, an optical or trackball signal input, and the like, related to user settings and function control, among others.
The output unit 505 may be used to display information input by a user or information provided to a user, such as a screen.
In this embodiment of the present application, the processor 501 in the electronic device loads instructions corresponding to one or more processes of the computer program into the memory 502 according to the following steps, and the processor 501 runs the computer program stored in the memory 502, so as to implement various functions, as follows:
acquiring two paths of audio data through two microphones, and acquiring background audio data played in an audio acquisition period;
performing echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;
performing beam forming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data;
performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;
and if the secondary verification passes, awakening the voice interactive application.
In an embodiment, when performing echo cancellation processing on two audio data according to background audio data, the processor 501 may perform:
obtaining an initial adaptive filter coefficient, and iteratively updating the initial adaptive filter coefficient according to background audio data and audio data to obtain a target adaptive filter coefficient;
and performing echo cancellation processing on the audio data according to the target self-adaptive filter coefficient.
In one embodiment, when iteratively updating the initial adaptive filter coefficients according to the background audio data and the audio data to obtain the target adaptive filter coefficients, the processor 501 may perform:
obtaining the self-adaptive filter coefficient at the current moment according to the initial self-adaptive filter coefficient;
estimating echo audio data carried in the audio data and corresponding to the current moment according to the coefficient of the adaptive filter at the current moment;
acquiring error audio data at the current moment according to the background audio data and the echo audio data obtained by estimation;
and identifying the active part of the adaptive filter coefficient at the current moment, updating the active part of the adaptive filter coefficient at the current moment according to the error audio data at the current moment, and adjusting the order of the adaptive filter coefficient at the current moment to obtain the adaptive filter coefficient at the next moment.
In an embodiment, in identifying the active portion of the adaptive filter coefficients at the current time, processor 501 may perform:
dividing the adaptive filter coefficient at the current moment into a plurality of sub-filter coefficients with equal length;
acquiring the average value and the variance of each sub-filter coefficient from the back to the front, and determining the first sub-filter coefficient and the previous sub-filter coefficient of which the average value is greater than the preset average value and the variance is greater than the preset variance as active parts;
while adjusting the order of the adaptive filter coefficients at the current time, processor 501 may perform:
and judging whether the first sub-filter coefficient is the last sub-filter coefficient, if so, increasing the order of the adaptive filter coefficient at the current moment, and otherwise, reducing the order of the adaptive filter coefficient at the current moment.
In an embodiment, when performing beamforming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data, the processor 501 may perform:
and respectively carrying out beam forming processing on the two paths of audio data after echo cancellation at a plurality of preset angles by adopting a preset beam forming algorithm to obtain a plurality of enhanced audio data.
In one embodiment, in performing a primary check on the text feature and the voiceprint feature of the enhanced audio data, the processor 501 may perform:
extracting Mel frequency cepstrum coefficients of the enhanced audio data corresponding to each preset angle;
calling a target voiceprint characteristic model related to a preset text to match the extracted mel frequency cepstrum coefficients;
if the matched mel frequency cepstrum coefficient exists, judging that the primary check is passed;
the target voiceprint feature model is obtained by a Gaussian mixture general background model related to a preset text in a self-adaptive mode according to the Mel frequency cepstrum coefficient of the preset audio data, and the preset audio data are audio data of a preset text spoken by a preset user.
In one embodiment, in performing the secondary verification on the text feature and the voiceprint feature of the enhanced audio data, the processor 501 may perform:
dividing the enhanced audio data corresponding to the preset angle into a plurality of sub audio data;
extracting a voiceprint characteristic vector of each sub audio data according to a voiceprint characteristic extraction model related to a preset text;
acquiring similarity between each voiceprint feature vector and a target voiceprint feature vector, wherein the target voiceprint feature vector is a voiceprint feature vector of preset audio data;
according to the similarity corresponding to each sub audio data, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;
and if the enhanced audio data corresponding to the preset angle passing the verification exists, judging that the secondary verification passes.
In an embodiment, when the text feature and the voiceprint feature of the enhanced audio data corresponding to the preset angle are checked according to the similarity corresponding to each sub audio data, the processor 501 may perform:
according to the similarity corresponding to each sub audio data and a preset identification function, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;
wherein the preset identification function is gamman=γn-1+f(ln),γnRepresenting the state value, gamma, of the recognition function corresponding to the nth sub-audio datan-1Represents the state value of the recognition function corresponding to the n-1 th sub audio data,a is a correction value of the recognition function, b is a predetermined similarity, lnIf the similarity exists between the voiceprint characteristic vector of the nth sub-audio data and the target voiceprint characteristic vector, the similarity is larger than the gamma of the preset identification function state valuenAnd judging that the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle pass verification.
In an embodiment, when obtaining the similarity between the voiceprint feature vector of each sub-audio data and the target voiceprint feature training, the processor 501 may perform:
calculating the similarity between the vocal print characteristic vector of each sub audio data and the target vocal print characteristic vector according to a dynamic time warping algorithm;
or, calculating a feature distance between the voiceprint feature vector of each sub-audio data and the target voiceprint feature vector as a similarity.
It should be noted that the electronic device provided in the embodiment of the present application and the application wake-up method in the foregoing embodiments belong to the same concept, and any method provided in the embodiment of the application wake-up method may be run on the electronic device, and a specific implementation process thereof is described in detail in the embodiment of the feature extraction method, and is not described herein again.
It should be noted that, for the application wake-up method in the embodiment of the present application, it can be understood by a person skilled in the art that all or part of the process for implementing the application wake-up method in the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, where the computer program can be stored in a computer readable storage medium, such as a memory of an electronic device, and executed by a processor and a dedicated voice recognition chip in the electronic device, and the process of executing the process can include, for example, the process of implementing the embodiment of the application wake-up method. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.
The application wake-up method, the storage medium, and the electronic device provided in the embodiments of the present application are described in detail above, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the description of the above embodiments is only used to help understand the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (10)
1. An application wake-up method applied to an electronic device, wherein the electronic device comprises two microphones, the application wake-up method comprising:
acquiring two paths of audio data through the two microphones, and acquiring background audio data played in an audio acquisition period;
performing echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;
performing beam forming processing on the two paths of audio data after the echo cancellation to obtain enhanced audio data;
performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data, and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;
and if the secondary verification passes, awakening the voice interactive application.
2. The application wake-up method according to claim 1, wherein the performing echo cancellation processing on the two paths of audio data according to the background audio data comprises:
obtaining an initial adaptive filter coefficient, and iteratively updating the initial adaptive filter coefficient according to the background audio data and the audio data to obtain a target adaptive filter coefficient;
and carrying out echo cancellation processing on the audio data according to the target self-adaptive filter coefficient.
3. The wake-up application method according to claim 2, wherein iteratively updating the initial adaptive filter coefficients according to the background audio data and the audio data to obtain target adaptive filter coefficients comprises:
obtaining the self-adaptive filter coefficient of the current moment according to the initial self-adaptive filter coefficient;
estimating echo audio data carried in the audio data and corresponding to the current moment according to the adaptive filter coefficient of the current moment;
acquiring error audio data at the current moment according to the background audio data and the echo audio data;
and identifying the active part of the adaptive filter coefficient at the current moment, updating the active part according to the error audio data, and adjusting the order of the adaptive filter coefficient at the current moment to obtain the adaptive filter coefficient at the next moment.
4. The application wake-up method according to claim 3, wherein the identifying the active part of the adaptive filter coefficients for the current time instant comprises:
dividing the adaptive filter coefficient of the current moment into a plurality of sub-filter coefficients with equal length;
acquiring the average value and the variance of each sub-filter coefficient from the back to the front, and determining the first sub-filter coefficient and the previous sub-filter coefficient of which the average value is greater than the preset average value and the variance is greater than the preset variance as the active part;
the adjusting the order of the adaptive filter coefficient at the current time includes:
and judging whether the first sub-filter coefficient is the last sub-filter coefficient, if so, increasing the order of the adaptive filter coefficient at the current moment, and otherwise, reducing the order of the adaptive filter coefficient at the current moment.
5. The method for waking up from an application as claimed in any one of claims 1 to 4, wherein the performing beamforming processing on the two paths of audio data after echo cancellation to obtain enhanced audio data comprises:
and respectively carrying out beam forming processing on the two paths of audio data after the echo cancellation at a plurality of preset angles by adopting a preset beam forming algorithm to obtain a plurality of enhanced audio data.
6. The application wake-up method according to claim 5, wherein the primary checking of the text feature and the voiceprint feature of the enhanced audio data comprises:
extracting Mel frequency cepstrum coefficients of the enhanced audio data corresponding to each preset angle;
calling a target voiceprint characteristic model related to a preset text to match the extracted mel frequency cepstrum coefficients;
if the matched mel frequency cepstrum coefficient exists, judging that the primary check is passed;
the target voiceprint feature model is obtained by a Gaussian mixture general background model related to a preset text in a self-adaptive mode according to a Mel frequency cepstrum coefficient of preset audio data, and the preset audio data are audio data of the preset text spoken by a preset user.
7. The application wake-up method according to claim 6, wherein the secondary verification of the text feature and the voiceprint feature of the enhanced audio data comprises:
dividing the enhanced audio data corresponding to the preset angle into a plurality of sub audio data;
extracting a voiceprint feature vector of each sub-audio data according to a voiceprint feature extraction model related to the preset text;
obtaining similarity between each voiceprint feature vector and a target voiceprint feature vector, wherein the target voiceprint feature vector is the voiceprint feature vector of the preset audio data;
according to the similarity corresponding to each sub audio data, verifying the text characteristic and the voiceprint characteristic of the enhanced audio data corresponding to the preset angle;
and if the enhanced audio data corresponding to the preset angle passing the verification exists, judging that the secondary verification passes.
8. An application waking device applied to an electronic device, wherein the electronic device includes two microphones, the application waking device comprising:
the audio acquisition module is used for acquiring two paths of audio data through the two microphones and acquiring background audio data played in an audio acquisition period;
the echo cancellation module is used for carrying out echo cancellation processing on the two paths of audio data according to the background audio data to obtain two paths of audio data after echo cancellation;
the beam forming module is used for carrying out beam forming processing on the two paths of audio data after the echo cancellation to obtain enhanced audio data;
the audio verification module is used for performing primary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data and performing secondary verification on the text characteristic and the voiceprint characteristic of the enhanced audio data after the primary verification is passed;
and the application awakening module is used for awakening the voice interaction application when the secondary verification is passed.
9. An electronic device, characterized in that the electronic device comprises a processor, a memory and two microphones, the memory storing a computer program, characterized in that the processor is adapted to execute the application wake-up method according to any of claims 1 to 7 by invoking the computer program.
10. A storage medium, characterized in that, when a computer program stored in the storage medium is run on an electronic device comprising two microphones, the electronic device is caused to perform an application wake-up method according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910478400.6A CN110211599B (en) | 2019-06-03 | 2019-06-03 | Application awakening method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910478400.6A CN110211599B (en) | 2019-06-03 | 2019-06-03 | Application awakening method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110211599A true CN110211599A (en) | 2019-09-06 |
CN110211599B CN110211599B (en) | 2021-07-16 |
Family
ID=67790514
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910478400.6A Active CN110211599B (en) | 2019-06-03 | 2019-06-03 | Application awakening method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110211599B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111048071A (en) * | 2019-11-11 | 2020-04-21 | 北京海益同展信息科技有限公司 | Voice data processing method and device, computer equipment and storage medium |
CN111179931A (en) * | 2020-01-03 | 2020-05-19 | 青岛海尔科技有限公司 | Method and device for voice interaction and household appliance |
CN111755002A (en) * | 2020-06-19 | 2020-10-09 | 北京百度网讯科技有限公司 | Speech recognition device, electronic apparatus, and speech recognition method |
CN112307161A (en) * | 2020-02-26 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for playing audio |
CN112581972A (en) * | 2020-10-22 | 2021-03-30 | 广东美的白色家电技术创新中心有限公司 | Voice interaction method, related device and corresponding relation establishing method |
WO2021169711A1 (en) * | 2020-02-27 | 2021-09-02 | Oppo广东移动通信有限公司 | Instruction execution method and apparatus, storage medium, and electronic device |
CN114333877A (en) * | 2021-12-20 | 2022-04-12 | 北京声智科技有限公司 | Voice processing method, device, equipment and storage medium |
WO2022206602A1 (en) * | 2021-03-31 | 2022-10-06 | 华为技术有限公司 | Speech wakeup method and apparatus, and storage medium and system |
CN115171703A (en) * | 2022-05-30 | 2022-10-11 | 青岛海尔科技有限公司 | Distributed voice awakening method and device, storage medium and electronic device |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002374588A (en) * | 2001-06-15 | 2002-12-26 | Sony Corp | Device and method for reducing acoustic noise |
CN101763858A (en) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | Method for processing double-microphone signal |
CN101917527A (en) * | 2010-09-02 | 2010-12-15 | 杭州华三通信技术有限公司 | Method and device of echo elimination |
CN103680515A (en) * | 2013-11-21 | 2014-03-26 | 苏州大学 | Proportional adaptive filter coefficient vector updating method using coefficient reusing |
CN104520925A (en) * | 2012-08-01 | 2015-04-15 | 杜比实验室特许公司 | Percentile filtering of noise reduction gains |
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
CN105654959A (en) * | 2016-01-22 | 2016-06-08 | 韶关学院 | Self-adaptive filtering coefficient updating method and device |
CN107123430A (en) * | 2017-04-12 | 2017-09-01 | 广州视源电子科技股份有限公司 | Echo cancellation method, device, conference tablet and computer storage medium |
CN107464565A (en) * | 2017-09-20 | 2017-12-12 | 百度在线网络技术(北京)有限公司 | A kind of far field voice awakening method and equipment |
US9842606B2 (en) * | 2015-09-15 | 2017-12-12 | Samsung Electronics Co., Ltd. | Electronic device, method of cancelling acoustic echo thereof, and non-transitory computer readable medium |
US10013995B1 (en) * | 2017-05-10 | 2018-07-03 | Cirrus Logic, Inc. | Combined reference signal for acoustic echo cancellation |
CN109218882A (en) * | 2018-08-16 | 2019-01-15 | 歌尔科技有限公司 | The ambient sound monitor method and earphone of earphone |
US10194259B1 (en) * | 2018-02-28 | 2019-01-29 | Bose Corporation | Directional audio selection |
US20190074025A1 (en) * | 2017-09-01 | 2019-03-07 | Cirrus Logic International Semiconductor Ltd. | Acoustic echo cancellation (aec) rate adaptation |
-
2019
- 2019-06-03 CN CN201910478400.6A patent/CN110211599B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002374588A (en) * | 2001-06-15 | 2002-12-26 | Sony Corp | Device and method for reducing acoustic noise |
CN101763858A (en) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | Method for processing double-microphone signal |
CN101917527A (en) * | 2010-09-02 | 2010-12-15 | 杭州华三通信技术有限公司 | Method and device of echo elimination |
CN104520925A (en) * | 2012-08-01 | 2015-04-15 | 杜比实验室特许公司 | Percentile filtering of noise reduction gains |
CN103680515A (en) * | 2013-11-21 | 2014-03-26 | 苏州大学 | Proportional adaptive filter coefficient vector updating method using coefficient reusing |
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
US9842606B2 (en) * | 2015-09-15 | 2017-12-12 | Samsung Electronics Co., Ltd. | Electronic device, method of cancelling acoustic echo thereof, and non-transitory computer readable medium |
CN105654959A (en) * | 2016-01-22 | 2016-06-08 | 韶关学院 | Self-adaptive filtering coefficient updating method and device |
CN107123430A (en) * | 2017-04-12 | 2017-09-01 | 广州视源电子科技股份有限公司 | Echo cancellation method, device, conference tablet and computer storage medium |
US10013995B1 (en) * | 2017-05-10 | 2018-07-03 | Cirrus Logic, Inc. | Combined reference signal for acoustic echo cancellation |
US20190074025A1 (en) * | 2017-09-01 | 2019-03-07 | Cirrus Logic International Semiconductor Ltd. | Acoustic echo cancellation (aec) rate adaptation |
CN107464565A (en) * | 2017-09-20 | 2017-12-12 | 百度在线网络技术(北京)有限公司 | A kind of far field voice awakening method and equipment |
US10194259B1 (en) * | 2018-02-28 | 2019-01-29 | Bose Corporation | Directional audio selection |
CN109218882A (en) * | 2018-08-16 | 2019-01-15 | 歌尔科技有限公司 | The ambient sound monitor method and earphone of earphone |
Non-Patent Citations (3)
Title |
---|
XIAOJIAN LU ET AL.: "《A centralized acoustic echo canceller exploiting masking properties of the human ear》", 《2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2003. PROCEEDINGS. (ICASSP "03)》 * |
文昊翔等: "《自 适应回声消除的初期迭代统计学模型及改进算法》", 《数据采集与处理》 * |
王正腾等: "《基于预测残差和自适应阶数的回声消除方法研究》", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111048071A (en) * | 2019-11-11 | 2020-04-21 | 北京海益同展信息科技有限公司 | Voice data processing method and device, computer equipment and storage medium |
CN111179931A (en) * | 2020-01-03 | 2020-05-19 | 青岛海尔科技有限公司 | Method and device for voice interaction and household appliance |
CN111179931B (en) * | 2020-01-03 | 2023-07-21 | 青岛海尔科技有限公司 | Method and device for voice interaction and household appliance |
CN112307161A (en) * | 2020-02-26 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for playing audio |
CN112307161B (en) * | 2020-02-26 | 2022-11-22 | 北京字节跳动网络技术有限公司 | Method and apparatus for playing audio |
WO2021169711A1 (en) * | 2020-02-27 | 2021-09-02 | Oppo广东移动通信有限公司 | Instruction execution method and apparatus, storage medium, and electronic device |
CN111755002A (en) * | 2020-06-19 | 2020-10-09 | 北京百度网讯科技有限公司 | Speech recognition device, electronic apparatus, and speech recognition method |
CN112581972A (en) * | 2020-10-22 | 2021-03-30 | 广东美的白色家电技术创新中心有限公司 | Voice interaction method, related device and corresponding relation establishing method |
WO2022206602A1 (en) * | 2021-03-31 | 2022-10-06 | 华为技术有限公司 | Speech wakeup method and apparatus, and storage medium and system |
CN114333877A (en) * | 2021-12-20 | 2022-04-12 | 北京声智科技有限公司 | Voice processing method, device, equipment and storage medium |
CN115171703A (en) * | 2022-05-30 | 2022-10-11 | 青岛海尔科技有限公司 | Distributed voice awakening method and device, storage medium and electronic device |
CN115171703B (en) * | 2022-05-30 | 2024-05-24 | 青岛海尔科技有限公司 | Distributed voice awakening method and device, storage medium and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN110211599B (en) | 2021-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110211599B (en) | Application awakening method and device, storage medium and electronic equipment | |
US11823679B2 (en) | Method and system of audio false keyphrase rejection using speaker recognition | |
CN110021307B (en) | Audio verification method and device, storage medium and electronic equipment | |
CN110400571B (en) | Audio processing method and device, storage medium and electronic equipment | |
CN110310623B (en) | Sample generation method, model training method, device, medium, and electronic apparatus | |
US20180374487A1 (en) | Detection of replay attack | |
CN106663446B (en) | User environment aware acoustic noise reduction | |
US20200227071A1 (en) | Analysing speech signals | |
US9633652B2 (en) | Methods, systems, and circuits for speaker dependent voice recognition with a single lexicon | |
CN102903360B (en) | Microphone array based speech recognition system and method | |
CN110600048B (en) | Audio verification method and device, storage medium and electronic equipment | |
CN110232933B (en) | Audio detection method and device, storage medium and electronic equipment | |
CN110556103A (en) | Audio signal processing method, apparatus, system, device and storage medium | |
EP0822539A2 (en) | Two-staged cohort selection for speaker verification system | |
TW201419270A (en) | Method and apparatus for utterance verification | |
CN110223687B (en) | Instruction execution method and device, storage medium and electronic equipment | |
US11081115B2 (en) | Speaker recognition | |
US9953633B2 (en) | Speaker dependent voiced sound pattern template mapping | |
CN110689887B (en) | Audio verification method and device, storage medium and electronic equipment | |
CN113823301A (en) | Training method and device of voice enhancement model and voice enhancement method and device | |
CN111369992A (en) | Instruction execution method and device, storage medium and electronic equipment | |
CN113889091A (en) | Voice recognition method and device, computer readable storage medium and electronic equipment | |
CN110992977B (en) | Method and device for extracting target sound source | |
WO2020015546A1 (en) | Far-field speech recognition method, speech recognition model training method, and server | |
CN111192569B (en) | Double-microphone voice feature extraction method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |