CN111968651A - WT (WT) -based voiceprint recognition method and system - Google Patents

WT (WT) -based voiceprint recognition method and system Download PDF

Info

Publication number
CN111968651A
CN111968651A CN202010865114.8A CN202010865114A CN111968651A CN 111968651 A CN111968651 A CN 111968651A CN 202010865114 A CN202010865114 A CN 202010865114A CN 111968651 A CN111968651 A CN 111968651A
Authority
CN
China
Prior art keywords
voiceprint
signal
identified
wavelet
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010865114.8A
Other languages
Chinese (zh)
Inventor
汪金玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010865114.8A priority Critical patent/CN111968651A/en
Publication of CN111968651A publication Critical patent/CN111968651A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention relates to the technical field of voiceprint recognition, and discloses a voiceprint recognition method based on WT (WT), which comprises the following steps: collecting voiceprint signals to be identified by adopting a microphone array, and filtering the voiceprint signals to be identified by utilizing a sound source separation algorithm based on phase transformation; denoising the filtered vocal print signal by using a denoising algorithm based on a wavelet threshold to obtain a denoised vocal print signal; pre-emphasis and windowing are carried out on the noise-reduced voiceprint signal to be identified; extracting the voiceprint characteristics of the preprocessed voiceprint signals by utilizing a voiceprint characteristic extraction algorithm based on a hair cell function to obtain the voiceprint characteristics of the voiceprint signals to be recognized; and extracting energy parameters in the voiceprint features by using an energy operator, and identifying the voiceprint by using a log-likelihood ratio algorithm. The invention also provides a voiceprint recognition system based on the WT. The invention realizes the identification of the voiceprint.

Description

WT (WT) -based voiceprint recognition method and system
Technical Field
The invention relates to the technical field of voiceprint recognition, in particular to a voiceprint recognition method and system based on WT.
Background
The voice is the most direct, most common and most convenient carrier for realizing information interaction between people, bears and conveys various information resources, and plays an important role in man-machine interaction and information transmission. However, due to the interference of too much noise in real life, the voice quality identified by the voiceprint is poor, so that the voice is not easy to understand by people, the human-computer equipment is difficult to obtain accurate information, and how to effectively complete the voiceprint identification task becomes a hot topic of current research.
In the existing voiceprint recognition technology, only the signal-to-noise ratio of the denoised voice is focused, and the overall quality of the denoised voice is not focused. These voiceprint recognition techniques may remove the weaker portion of the speech signal as noise, so that the background noise is removed and the components of the desired speech signal are also corrupted. Thus, the quality of hearing may not be substantially improved or even reversed when the denoised speech signal is compared to the signal before denoising.
Meanwhile, in the field of speech feature parameters, the widely used features are: MFCC, perceptual linear prediction coefficients, and perceptual log-area ratio coefficients. These features can achieve good recognition accuracy in clean environments, but cannot effectively adapt to noisy environments, especially under noise conditions of less than 10dB, and their performance decreases with decreasing signal-to-noise ratio.
In view of this, how to effectively denoise the voiceprint and extract the voiceprint features adaptable to the noise environment for voiceprint recognition is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The invention provides a voiceprint recognition method based on WT (wavelet transform). A voiceprint is subjected to denoising treatment by using a denoising algorithm based on wavelet transform, the existing voiceprint feature extraction algorithm is improved, and the improved voiceprint feature extraction algorithm is used for extracting and enhancing the voiceprint features, so that the voiceprint recognition is finally realized.
In order to achieve the above object, the present invention provides a WT-based voiceprint recognition method, including:
collecting voiceprint signals to be identified by adopting a microphone array;
filtering the voiceprint signal to be identified by utilizing a sound source separation algorithm based on phase transformation;
denoising the filtered vocal print signal by using a denoising algorithm based on a wavelet threshold to obtain a denoised vocal print signal;
pre-emphasis and windowing are carried out on the noise-reduced voiceprint signal to be identified;
extracting the voiceprint characteristics of the preprocessed voiceprint signals by utilizing a voiceprint characteristic extraction algorithm based on a hair cell function to obtain the voiceprint characteristics of the voiceprint signals to be recognized;
and extracting energy parameters in the voiceprint features by using an energy operator, and identifying the voiceprint by using a log-likelihood ratio algorithm.
Optionally, the acquiring the voiceprint signal by using a microphone array includes:
collecting voiceprint signals to be identified by adopting a microphone array, wherein the collected ith channel signal is xi(t) which is derived from J unknown signal sources sj(t) a convolution mixed noisy speech signal, said convolution mixed formula being:
Figure BDA0002649491820000021
wherein:
a (t) is an impulse response;
sj(t) denotes the jth unknown signal source;
j is the number of unknown signal sources;
bi(t) is a noise signal;
xi(t) is the i-th channel signal collected;
t represents a discrete time.
Optionally, the filtering the voiceprint signal to be recognized by using a sound source separation algorithm based on phase transformation includes:
1) calculating the cross-correlation relationship of any two voiceprint signals to be identified:
wherein:
x1(t),x2(t) any two collected voiceprint signals to be identified;
α represents the attenuation of sound propagating from the sound source to the microphone;
s represents a signal emitted by an unknown signal source;
τ represents the time for sound to travel from the sound source to the two microphones;
2) from the cross-correlation relationship and the cross-frequency spectrum relationship, it can be known that:
Figure BDA0002649491820000031
wherein:
Figure BDA0002649491820000032
receiving a signal x for a microphone1(t) and x2(t) cross-power spectrum;
ω represents the power phase;
3) weighting the power spectrum in a frequency domain to realize filtering processing of a voiceprint signal to be identified, wherein the weighting formula is as follows:
Figure BDA0002649491820000033
Figure BDA0002649491820000034
wherein:
Figure BDA0002649491820000035
representing a phase weighting function;
X1(ω),X2(ω) represents the power spectra of the two microphone received signals resulting from the fourier transform;
Figure BDA0002649491820000036
receiving a signal x for a microphone1(t) and x2(t) cross-power spectrum;
ω represents the power phase.
Optionally, the denoising the filtered voiceprint signal by using a wavelet threshold based denoising algorithm includes:
1) using a low-pass/high-pass filter to scale the voiceprint signal to be identified by a factor of 2bWhere b denotes the transform level of the wavelet transform, which is set to 2 by the present invention, the sampling expression is:
Figure BDA0002649491820000037
wherein:
a is a scale factor;
c is a displacement factor;
f (t) is a voiceprint signal to be identified;
t is the time of the voiceprint signal;
l represents the number of sub-bands of the wavelet transform;
according to the sampling expression, firstly decomposing a voiceprint signal f (t) to be identified into two subband signals which have equal length and respectively carry low-frequency and high-frequency information components, and then decomposing the signal of each subband into a next-level subband signal, namely four subband signals by applying the decomposition operation again;
2) setting a wavelet threshold lambda:
Figure BDA0002649491820000041
wherein:
y is a variable parameter, and when the noise is white noise, y is 1;
n is the signal length of the voiceprint signal to be identified;
sigma is the standard deviation of the wavelet subband signal;
j is the number of wavelet sub-bands;
3) denoising the wavelet sub-band reaching the wavelet threshold by using a denoising function based on a threshold, wherein the denoising function based on the threshold is as follows:
Figure BDA0002649491820000042
wherein:
λ is wavelet threshold;
p is a positive real number less than the wavelet threshold;
a is a wavelet parameter, which is set to 0.01 by the invention;
b represents the number of transform levels of the wavelet transform, which is set to 2 by the present invention;
l represents the number of sub-bands of the wavelet transform;
Figure BDA0002649491820000043
is a sampling signal based on wavelet transformation;
wlthe noise reduction signal of the ith wavelet sub-band.
Optionally, the pre-processing of pre-emphasis and windowing on the noise-reduced voiceprint signal includes:
1) the voiceprint signal is boosted with a pre-emphasis function:
H(z)=1-az-1
wherein:
z is a voiceprint signal to be identified;
a is a pre-emphasis coefficient, which is set to 0.912 by the invention;
2) windowing the voiceprint signal by using a Hamming window, wherein the time domain expression of the Hamming window is as follows:
Figure BDA0002649491820000051
wherein:
n is the frame number of the voiceprint signal to be identified;
and N is the total frame number of the voiceprint signals to be identified.
Optionally, the extracting the voiceprint features of the preprocessed voiceprint signal by using a voiceprint feature extraction algorithm based on a hair cell function includes:
1) processing the preprocessed voiceprint signal with a hair cell function:
h(a,b)=[H(a,b)]2
wherein:
h (a, b) is a preprocessed voiceprint signal;
2) responding to the hair cell function by using a filter to obtain the output of each hair cell:
Figure BDA0002649491820000052
wherein:
d represents the window length of the ith frequency band hair cell function;
τithe time length of the center frequency of the ith filter;
l is a frame shift;
3) the output of the hair cell function is subjected to cubic root scale transformation based on a loudness function, the energy value is changed into the perceived loudness, and the calculation formula is as follows:
y(i,j)=[S(i,j)]1/3
4) performing decorrelation processing by using discrete cosine transform to obtain voiceprint characteristics as follows:
Figure BDA0002649491820000053
wherein:
m represents the number of filters;
i represents the ith frequency band;
n represents a sampling of the voiceprint signal at point n.
The extracting the energy parameter in the voiceprint feature by using the energy operator comprises the following steps:
extracting energy parameters in the voiceprint features by using an energy operator, wherein the extraction formula of the energy parameters is as follows:
ψ[f(i,n)]=[f(i,n)]2-f(i,n+1)f(i,n-1)
wherein:
i represents the ith frequency band of the voiceprint signal;
n represents the sampling of the voiceprint signal at n points;
normalizing the energy parameters and taking logarithm:
Figure BDA0002649491820000061
wherein:
ψ [ f (i, n) ] represents an energy parameter at n points of the voiceprint signal of the i-th band.
Further, to achieve the above object, the present invention provides a WT-based voiceprint recognition system, including:
the voiceprint acquisition device is used for acquiring a voiceprint to be identified;
the voiceprint processor is used for filtering the voiceprint signal to be identified by utilizing a sound source separation algorithm based on phase transformation and carrying out noise reduction processing on the filtered voiceprint signal by utilizing a denoising algorithm based on a wavelet threshold;
the voiceprint recognition device is used for extracting the voiceprint features of the preprocessed voiceprint signals by utilizing a voiceprint feature extraction algorithm based on a hair cell function to obtain the voiceprint features of the voiceprint signals to be recognized, extracting energy parameters in the voiceprint features by utilizing an energy operator, and recognizing the voiceprint by utilizing a log-likelihood ratio algorithm.
Furthermore, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon voiceprint recognition program instructions executable by one or more processors to implement the steps of the WT based voiceprint recognition implementation method as described above.
Compared with the prior art, the invention provides a voiceprint recognition method based on WT, which has the following advantages:
firstly, aiming at noise signals existing in voiceprints, the invention provides a sound source separation algorithm based on phase transformation, which carries out filtering processing on voiceprint signals to be identified and calculates the cross-correlation relationship between signals received by two microphones
Figure BDA0002649491820000062
Wherein x1(t),x2(t) any two acquired voiceprint signals to be identified, alpha represents attenuation of sound from a sound source to a microphone, s represents a signal emitted by an unknown signal source, tau represents time of sound from the sound source to the two microphones, and cross-frequency spectra of the voiceprint signals are obtained from a cross-correlation relationship and a cross-frequency spectrum relationship
Figure BDA0002649491820000063
Wherein
Figure BDA0002649491820000064
Receiving a signal x for a microphone1(t) and x2The cross frequency spectrum of (t) can cause the influence of reverberation and noise in the actual microphone array signal processing model
Figure BDA0002649491820000065
The peak value of the cross-power spectrum is not obvious, the precision of the time delay estimation is reduced, in order to sharpen the peak value, the cross-power spectrum is weighted in the frequency domain,
Figure BDA0002649491820000066
wherein
Figure BDA0002649491820000071
Represents a phase weighting function, which is equivalent to filtering the data, emphasizes the spectral components of the source signal in the received signal,therefore, noise and reverberation interference can be suppressed, and higher delay estimation precision can be obtained.
Meanwhile, aiming at the defects that in the traditional threshold denoising algorithm, the voice processed by the hard threshold denoising algorithm generates oscillation, and the voice processed by the soft threshold denoising algorithm has larger distortion. The invention provides a threshold function based on a wavelet threshold value, which comprises the following steps:
Figure BDA0002649491820000072
wherein lambda is wavelet threshold, p is positive real number smaller than wavelet threshold, a is wavelet parameter, which is set as 0.01 by the invention, b represents transformation series of wavelet transformation, l represents number of sub-bands of wavelet transformation,
Figure BDA0002649491820000073
for sampling signals based on wavelet transform, wlFor the noise reduction signal of the l wavelet sub-band, the new threshold function algorithm considers the rule that the attenuation of the noise wavelet transformation modulus value conforms to the index, is continuous at the threshold, and can adjust parameters, so that the algorithm can be more suitable for different noise intensities, further reduces the coefficient of the noise signal, avoids the traditional processing method that the threshold function is directly set to zero when the wavelet coefficient is smaller than the threshold, and effectively enhances the noise reduction effect.
Finally, because a certain kind of features generally only contains partial voice information, and the original feature parameters reflect the static characteristics of the voice signals, the dynamic feature parameters can reflect the dynamic characteristics of the voice signals. Therefore, the invention adopts the combination of the dynamic and static characteristic parameters to make the dynamic information and the static information form complementation, thereby better describing the dynamic and static characteristics of the voice, and adds an energy operator reflecting energy conversion on the basis of the voiceprint characteristics based on the hair cell function, so that the obtained energy not only represents the auditory perception characteristics of human ears, but also combines the characteristics of the voice instantaneous energy, and also inhibits the influence of zero-mean noise on the voiceprint signals to a certain extent, thereby more completely describing the characteristics of the voiceprint.
Drawings
FIG. 1 is a flowchart illustrating a WT-based voiceprint recognition method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a WT-based voiceprint recognition system according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The voiceprint is subjected to noise reduction processing by using a denoising algorithm based on wavelet transformation, the existing voiceprint feature extraction algorithm is improved, and the improved voiceprint feature extraction algorithm is used for extracting and enhancing the voiceprint features, so that the voiceprint is finally identified. Referring to fig. 1, a diagram of a WT-based voiceprint recognition method according to an embodiment of the present invention is shown.
In this embodiment, the WT-based voiceprint recognition method includes:
and S1, collecting the voiceprint signals to be recognized by adopting a microphone array, and filtering the voiceprint signals to be recognized by utilizing a sound source separation algorithm based on phase transformation.
Firstly, the invention adopts a microphone array to collect the voiceprint signals to be identified, wherein the collected ith channel signal is xi(t) which is derived from J unknown signal sources sj(t) a convolution mixed noisy speech signal, said convolution mixed formula being:
Figure BDA0002649491820000081
wherein:
a (t) is an impulse response;
sj(t) denotes the jth unknown signal source;
j is the number of unknown signal sources;
bi(t) is a noise signal;
xi(t) is the i-th channel signal collected;
t represents a discrete time;
further, a sound source separation algorithm based on phase transformation is utilized to perform filtering processing on the voiceprint signal to be identified, and the filtering processing process comprises the following steps:
1) calculating the cross-correlation relationship of any two voiceprint signals to be identified:
Figure BDA0002649491820000082
wherein:
x1(t),x2(t) any two collected voiceprint signals to be identified;
α represents the attenuation of sound propagating from the sound source to the microphone;
s represents a signal emitted by an unknown signal source;
τ represents the time for sound to travel from the sound source to the two microphones;
2) from the cross-correlation relationship and the cross-frequency spectrum relationship, it can be known that:
Figure BDA0002649491820000091
wherein:
Figure BDA0002649491820000092
receiving a signal x for a microphone1(t) and x2(t) cross-power spectrum;
ω represents the power phase;
3) weighting the power spectrum in a frequency domain to realize filtering processing of a voiceprint signal to be identified, wherein the weighting formula is as follows:
Figure BDA0002649491820000093
Figure BDA0002649491820000094
wherein:
Figure BDA0002649491820000095
representing a phase weighting function;
Xincense stick(ω),X2(ω) represents the power spectra of the two microphone received signals resulting from the fourier transform;
Figure BDA0002649491820000096
receiving a signal x for a microphone1(t) and x2(t) cross-power spectrum;
ω represents the power phase.
And S2, denoising the filtered vocal print signal by using a denoising algorithm based on a wavelet threshold to obtain a denoised vocal print signal.
Firstly, the invention obtains the voiceprint signals removed by filtering, and carries out noise reduction processing on the voiceprint signals by using a denoising algorithm based on a wavelet threshold, wherein the algorithm flow of the denoising algorithm based on the wavelet threshold is as follows:
1) using a low-pass/high-pass filter to scale the voiceprint signal to be identified by a factor of 2bWhere b denotes the transform level of the wavelet transform, which is set to 2 by the present invention, the sampling expression is:
Figure BDA0002649491820000097
wherein:
a is a scale factor;
c is a displacement factor;
f (t) is a voiceprint signal to be identified;
t is the time of the voiceprint signal;
l represents the number of sub-bands of the wavelet transform;
according to the sampling expression, firstly decomposing a voiceprint signal f (t) to be identified into two subband signals which have equal length and respectively carry low-frequency and high-frequency information components, and then decomposing the signal of each subband into a next-level subband signal, namely four subband signals by applying the decomposition operation again;
2) setting a wavelet threshold lambda:
Figure BDA0002649491820000098
wherein:
y is a variable parameter, and when the noise is white noise, y is 1;
n is the signal length of the voiceprint signal to be identified;
sigma is the standard deviation of the wavelet subband signal;
j is the number of wavelet sub-bands;
3) denoising the wavelet sub-band reaching the wavelet threshold by using a denoising function based on a threshold, wherein the denoising function based on the threshold is as follows:
Figure BDA0002649491820000101
wherein:
λ is wavelet threshold;
p is a positive real number less than the wavelet threshold;
a is a wavelet parameter, which is set to 0.01 by the invention;
b represents the number of transform levels of the wavelet transform, which is set to 2 by the present invention;
l represents the number of sub-bands of the wavelet transform;
Figure BDA0002649491820000102
is a sampling signal based on wavelet transformation;
wlthe noise reduction signal of the ith wavelet sub-band.
And S3, pre-emphasis and windowing pre-processing is carried out on the noise-reduced voiceprint signal to be recognized.
Further, the invention performs pre-emphasis and windowing processing on the noise-reduced voiceprint signal to be identified, wherein the pre-emphasis and windowing processing comprises the following steps:
1) the voiceprint signal is boosted with a pre-emphasis function:
H(z)=1-az-1
wherein:
z is a voiceprint signal to be identified;
a is a pre-emphasis coefficient, which is set to 0.912 by the invention;
2) windowing the voiceprint signal by using a Hamming window, wherein the time domain expression of the Hamming window is as follows:
Figure BDA0002649491820000111
wherein:
n is the frame number of the voiceprint signal to be identified;
and N is the total frame number of the voiceprint signals to be identified.
S4, extracting the voiceprint characteristics of the preprocessed voiceprint signals by utilizing a voiceprint characteristic extraction algorithm based on a hair cell function to obtain the voiceprint characteristics of the voiceprint signals to be recognized.
Furthermore, the invention utilizes a voiceprint feature extraction algorithm based on a hair cell function to extract the features of the voiceprint signal to be identified, and the voiceprint feature extraction algorithm based on the hair cell function comprises the following steps:
1) processing the preprocessed voiceprint signal with a hair cell function:
h(a,b)=[H(a,b)]2
wherein:
h (a, b) is a preprocessed voiceprint signal;
2) responding to the hair cell function by using a filter to obtain the output of each hair cell:
Figure BDA0002649491820000112
wherein:
d represents the window length of the ith frequency band hair cell function;
τithe time length of the center frequency of the ith filter;
l is a frame shift;
3) the output of the hair cell function is subjected to cubic root scale transformation based on a loudness function, the energy value is changed into the perceived loudness, and the calculation formula is as follows:
y(i,j)=[S(i,j)]1/3
4) performing decorrelation processing by using discrete cosine transform to obtain voiceprint characteristics as follows:
Figure BDA0002649491820000113
wherein:
m represents the number of filters;
i represents the ith frequency band;
n represents a sampling of the voiceprint signal at point n.
And S5, extracting energy parameters in the voiceprint features by using an energy operator, and identifying the voiceprint by using a log-likelihood ratio algorithm.
Further, for the voiceprint feature f (i, n), the energy parameter in the voiceprint feature is extracted by using an energy operator, and the extraction formula of the energy parameter is as follows:
ψ[f(i,n)]=[f(i,n)]2-f(i,n+1)f(i,n-1)
wherein:
i represents the ith frequency band of the voiceprint signal;
n represents the sampling of the voiceprint signal at n points;
further, the invention normalizes the energy parameters and takes logarithm:
Figure BDA0002649491820000121
wherein:
ψ [ f (i, n) ] represents an energy parameter at n points of the voiceprint signal of the i-th band.
Further, the invention utilizes log-likelihood ratio algorithm to compare energy parameter similarity, and the formula of the log-likelihood ratio is as follows:
Figure BDA0002649491820000122
wherein:
hs and Hd are the voiceprint characteristics obtained by calculation and the voiceprint characteristics of the speaker stored in the system data respectively;
Figure BDA0002649491820000123
is an energy parameter of the voiceprint to be identified;
y is a voiceprint signal of a voiceprint to be identified;
and finally, if the similarity is smaller than a preset value, judging that the voice of the speaker is not in the data stored in advance.
The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware test environment of the algorithm of the invention is as follows: the processor is an Intel (R) core (TM) i5-8700 CPU 8 core, the display card is GeForce GTX1060, the display memory is 8G, and the development environment is matlab; the comparison algorithms are MFCC, GMM and DFCNN algorithms.
In the algorithm experiment of the invention, the experiment selects mixed voice signals actually collected by 2 microphones, the sampling frequency is 16kHz, the microphone spacing is 10cm, each sound source is at different positions about 1-2 m away from the microphone, 6 speakers (3 males and 3 females) are selected, mixed voices of 2 or 3 speakers are randomly formed, and 5 mixed voices are selected as the experimental voices. The invention identifies the experimental voice by the comparison algorithm and the algorithm provided by the invention, and takes the identification accuracy as the evaluation standard of the algorithm.
According to the experimental result, the accuracy of the voiceprint recognition result of the MFCC algorithm is 86.18%, the accuracy of the voiceprint recognition result of the GMM algorithm is 75.61%, the accuracy of the voiceprint recognition result of the DFCNN algorithm is 90.03%, the accuracy of the voiceprint recognition result of the algorithm provided by the invention is 93.12%, and compared with the comparison algorithm, the voice recognition method provided by the invention has higher voice recognition accuracy.
The invention also provides a WT-based voiceprint recognition system. Referring to fig. 2, a schematic diagram of an internal structure of a WT-based voiceprint recognition system according to an embodiment of the present invention is shown.
In this embodiment, the WT-based voiceprint recognition system 1 comprises at least a voiceprint acquisition device 11, a voiceprint processor 12, a voiceprint recognition device 13, a communications bus 14, and a network interface 15.
The voiceprint acquiring apparatus 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, and a mobile Computer, or may be a server.
The voiceprint processor 12 includes at least one type of readable storage media including flash memory, a hard disk, a multi-media card, card type memory (e.g., SD or DX memory, etc.), magnetic memory, a magnetic disk, an optical disk, and the like. The voiceprint processor 12 may in some embodiments be an internal storage unit of the WT based voiceprint recognition system 1, for example a hard disk of the WT based voiceprint recognition system 1. The voiceprint processor 12 can also be an external storage device of the WT-based voiceprint recognition system 1 in other embodiments, such as a plug-in hard disk provided on the WT-based voiceprint recognition system 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and so forth. Further, the voiceprint processor 12 can also include both an internal memory unit and an external memory device of the WT-based voiceprint recognition system 1. The voiceprint processor 12 can be used not only to store application software and various types of data installed in the WT-based voiceprint recognition system 1, but also to temporarily store data that has been output or is to be output.
The voiceprint recognition device 13 may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip for running program code stored in the voiceprint processor 12 or Processing data, such as voiceprint recognition program instructions.
The communication bus 14 is used to enable connection communication between these components.
The network interface 15 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the system 1 and other electronic devices.
Optionally, the system 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, as appropriate, is used, among other things, to display information processed in the WT-based voiceprint recognition system 1 and to display a visual user interface.
While fig. 2 only shows the system 1 with components 11-15 and WT-based voiceprint recognition, those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of WT-based voiceprint recognition system 1 and may include fewer or more components than shown, or combine certain components, or a different arrangement of components.
In the apparatus 1 embodiment shown in FIG. 2, stored in voiceprint processor 12 are WT-based voiceprint recognition program instructions; the steps of the voiceprint recognition means 13 executing the voiceprint recognition program instructions stored in the voiceprint processor 12 are the same as the method of implementing the WT-based voiceprint recognition method and will not be described here.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon voiceprint recognition program instructions, which are executable by one or more processors to implement the following operations:
collecting voiceprint signals to be identified by adopting a microphone array;
filtering the voiceprint signal to be identified by utilizing a sound source separation algorithm based on phase transformation;
denoising the filtered vocal print signal by using a denoising algorithm based on a wavelet threshold to obtain a denoised vocal print signal;
pre-emphasis and windowing are carried out on the noise-reduced voiceprint signal to be identified;
extracting the voiceprint characteristics of the preprocessed voiceprint signals by utilizing a voiceprint characteristic extraction algorithm based on a hair cell function to obtain the voiceprint characteristics of the voiceprint signals to be recognized;
and extracting energy parameters in the voiceprint features by using an energy operator, and identifying the voiceprint by using a log-likelihood ratio algorithm.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A WT-based voiceprint recognition method, the method comprising:
collecting voiceprint signals to be identified by adopting a microphone array;
filtering the voiceprint signal to be identified by utilizing a sound source separation algorithm based on phase transformation;
denoising the filtered vocal print signal by using a denoising algorithm based on a wavelet threshold to obtain a denoised vocal print signal;
pre-emphasis and windowing are carried out on the noise-reduced voiceprint signal to be identified;
extracting the voiceprint characteristics of the preprocessed voiceprint signals by utilizing a voiceprint characteristic extraction algorithm based on a hair cell function to obtain the voiceprint characteristics of the voiceprint signals to be recognized;
and extracting energy parameters in the voiceprint features by using an energy operator, and identifying the voiceprint by using a log-likelihood ratio algorithm.
2. The WT-based voiceprint recognition method of claim 1 wherein said acquiring a voiceprint signal with a microphone array comprises:
carrying out voiceprint signal identification by adopting microphone arrayCollecting, wherein the collected ith channel signal is xi(t) which is derived from J unknown signal sources sj(t) a convolution mixed noisy speech signal, said convolution mixed formula being:
Figure FDA0002649491810000011
wherein:
a (t) is an impulse response;
sj(t) denotes the jth unknown signal source;
j is the number of unknown signal sources;
bi(t) is a noise signal;
xi(t) is the i-th channel signal collected;
t represents a discrete time.
3. The WT-based voiceprint recognition method according to claim 2, wherein the filtering the voiceprint signal to be recognized by using the phase-transform-based sound source separation algorithm comprises:
1) calculating the cross-correlation relationship of any two voiceprint signals to be identified:
Figure FDA0002649491810000021
wherein:
x1(t),x2(t) any two collected voiceprint signals to be identified;
α represents the attenuation of sound propagating from the sound source to the microphone;
s represents a signal emitted by an unknown signal source;
τ represents the time for sound to travel from the sound source to the two microphones;
2) from the cross-correlation relationship and the cross-frequency spectrum relationship, it can be known that:
Figure FDA0002649491810000022
wherein:
Figure FDA0002649491810000023
receiving a signal x for a microphone1(t) and x2(t) cross-power spectrum;
ω represents the power phase;
3) weighting the power spectrum in a frequency domain to realize filtering processing of a voiceprint signal to be identified, wherein the weighting formula is as follows:
Figure FDA0002649491810000024
Figure FDA0002649491810000025
wherein:
Figure FDA0002649491810000026
representing a phase weighting function;
X1(ω),X2(ω) represents the power spectra of the two microphone received signals resulting from the fourier transform;
Figure FDA0002649491810000027
receiving a signal x for a microphone1(t) and x2(t) cross-power spectrum;
ω represents the power phase.
4. The WT-based voiceprint recognition method according to claim 3, wherein the denoising the filtered voiceprint signal using the wavelet threshold-based denoising algorithm comprises:
1) using a low-pass/high-pass filter to scale the voiceprint signal to be identified by a factor of 2bWhere b denotes the transform level of the wavelet transform, which is set to 2 by the present invention, the sampling expression is:
Figure FDA0002649491810000028
wherein:
a is a scale factor;
c is a displacement factor;
f (t) is a voiceprint signal to be identified;
t is the time of the voiceprint signal;
l represents the number of sub-bands of the wavelet transform;
according to the sampling expression, firstly decomposing a voiceprint signal f (t) to be identified into two subband signals which have equal length and respectively carry low-frequency and high-frequency information components, and then decomposing the signal of each subband into a next-level subband signal, namely four subband signals by applying the decomposition operation again;
2) setting a wavelet threshold lambda:
Figure FDA0002649491810000031
wherein:
y is a variable parameter, and when the noise is white noise, y is 1;
n is the signal length of the voiceprint signal to be identified;
sigma is the standard deviation of the wavelet subband signal;
j is the number of wavelet sub-bands;
3) denoising the wavelet sub-band reaching the wavelet threshold by using a denoising function based on a threshold, wherein the denoising function based on the threshold is as follows:
Figure FDA0002649491810000032
wherein:
λ is wavelet threshold;
p is a positive real number less than the wavelet threshold;
a is a wavelet parameter, which is set to 0.01 by the invention;
b represents the number of transform levels of the wavelet transform, which is set to 2 by the present invention;
l represents the number of sub-bands of the wavelet transform;
Figure FDA0002649491810000033
is a sampling signal based on wavelet transformation;
wlthe noise reduction signal of the ith wavelet sub-band.
5. The WT-based voiceprint recognition method according to claim 4, wherein the pre-emphasis and windowing pre-processing of the noise-reduced voiceprint signal comprises:
1) the voiceprint signal is boosted with a pre-emphasis function:
H(z)=1-az-1
wherein:
z is a voiceprint signal to be identified;
a is a pre-emphasis coefficient, which is set to 0.912 by the invention;
2) windowing the voiceprint signal by using a Hamming window, wherein the time domain expression of the Hamming window is as follows:
Figure FDA0002649491810000041
wherein:
n is the frame number of the voiceprint signal to be identified;
and N is the total frame number of the voiceprint signals to be identified.
6. The WT-based voiceprint recognition method according to claim 5, wherein the extracting the voiceprint features of the preprocessed voiceprint signal by the voiceprint feature extraction algorithm based on the hair cell function comprises:
1) processing the preprocessed voiceprint signal with a hair cell function:
h(a,b)=[H(a,b)]2
wherein:
h (a, b) is a preprocessed voiceprint signal;
2) responding to the hair cell function by using a filter to obtain the output of each hair cell:
Figure FDA0002649491810000042
wherein:
d represents the window length of the ith frequency band hair cell function;
τithe time length of the center frequency of the ith filter;
l is a frame shift;
3) the output of the hair cell function is subjected to cubic root scale transformation based on a loudness function, the energy value is changed into the perceived loudness, and the calculation formula is as follows:
y(i,j)=[S(i,j)]1/3
4) performing decorrelation processing by using discrete cosine transform to obtain voiceprint characteristics as follows:
Figure FDA0002649491810000043
wherein:
m represents the number of filters;
i represents the ith frequency band;
n represents a sampling of the voiceprint signal at point n.
7. The WT-based voiceprint recognition method according to claim 6, wherein said extracting energy parameters from the voiceprint features using an energy operator comprises:
extracting energy parameters in the voiceprint features by using an energy operator, wherein the extraction formula of the energy parameters is as follows:
ψ[f(i,n)]=[f(i,n)]2-f(i,n+1)f(i,n-1)
wherein:
i represents the ith frequency band of the voiceprint signal;
n represents the sampling of the voiceprint signal at n points;
normalizing the energy parameters and taking logarithm:
Figure FDA0002649491810000051
wherein:
ψ [ f (i, n) ] represents an energy parameter at n points of the voiceprint signal of the i-th band.
8. A WT-based voiceprint recognition system, said system comprising:
the voiceprint acquisition device is used for acquiring a voiceprint to be identified;
the voiceprint processor is used for filtering the voiceprint signal to be identified by utilizing a sound source separation algorithm based on phase transformation and carrying out noise reduction processing on the filtered voiceprint signal by utilizing a denoising algorithm based on a wavelet threshold;
the voiceprint recognition device is used for extracting the voiceprint features of the preprocessed voiceprint signals by utilizing a voiceprint feature extraction algorithm based on a hair cell function to obtain the voiceprint features of the voiceprint signals to be recognized, extracting energy parameters in the voiceprint features by utilizing an energy operator, and recognizing the voiceprint by utilizing a log-likelihood ratio algorithm.
9. A computer readable storage medium having stored thereon voiceprint recognition program instructions executable by one or more processors to perform the steps of a method of implementing WT based voiceprint recognition according to any one of claims 1 to 7.
CN202010865114.8A 2020-08-25 2020-08-25 WT (WT) -based voiceprint recognition method and system Withdrawn CN111968651A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010865114.8A CN111968651A (en) 2020-08-25 2020-08-25 WT (WT) -based voiceprint recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010865114.8A CN111968651A (en) 2020-08-25 2020-08-25 WT (WT) -based voiceprint recognition method and system

Publications (1)

Publication Number Publication Date
CN111968651A true CN111968651A (en) 2020-11-20

Family

ID=73390335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010865114.8A Withdrawn CN111968651A (en) 2020-08-25 2020-08-25 WT (WT) -based voiceprint recognition method and system

Country Status (1)

Country Link
CN (1) CN111968651A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205022A (en) * 2021-04-23 2021-08-03 湖南万脉医疗科技有限公司 Respiratory anomaly monitoring method and system based on wavelet analysis
WO2023070874A1 (en) * 2021-10-28 2023-05-04 中国科学院深圳先进技术研究院 Voiceprint recognition method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205022A (en) * 2021-04-23 2021-08-03 湖南万脉医疗科技有限公司 Respiratory anomaly monitoring method and system based on wavelet analysis
WO2023070874A1 (en) * 2021-10-28 2023-05-04 中国科学院深圳先进技术研究院 Voiceprint recognition method

Similar Documents

Publication Publication Date Title
US10504539B2 (en) Voice activity detection systems and methods
CN107945815B (en) Voice signal noise reduction method and device
EP2316118B1 (en) Method to facilitate determining signal bounding frequencies
CN108198545B (en) Speech recognition method based on wavelet transformation
CN108847253B (en) Vehicle model identification method, device, computer equipment and storage medium
CN108597505A (en) Audio recognition method, device and terminal device
CN109147798B (en) Speech recognition method, device, electronic equipment and readable storage medium
CN110111769B (en) Electronic cochlea control method and device, readable storage medium and electronic cochlea
CN111968651A (en) WT (WT) -based voiceprint recognition method and system
US20070055519A1 (en) Robust bandwith extension of narrowband signals
US11594239B1 (en) Detection and removal of wind noise
Sanam et al. Enhancement of noisy speech based on a custom thresholding function with a statistically determined threshold
TWI594232B (en) Method and apparatus for processing of audio signals
Chu et al. A noise-robust FFT-based auditory spectrum with application in audio classification
Maganti et al. Auditory processing-based features for improving speech recognition in adverse acoustic conditions
CN116312561A (en) Method, system and device for voice print recognition, authentication, noise reduction and voice enhancement of personnel in power dispatching system
CN112863517B (en) Speech recognition method based on perceptual spectrum convergence rate
CN113948088A (en) Voice recognition method and device based on waveform simulation
CN110795996B (en) Method, device, equipment and storage medium for classifying heart sound signals
CN113593599A (en) Method for removing noise signal in voice signal
CN113593604A (en) Method, device and storage medium for detecting audio quality
EP2063420A1 (en) Method and assembly to enhance the intelligibility of speech
CN112908340A (en) Global-local windowing-based sound feature rapid extraction method
CN113611321B (en) Voice enhancement method and system
TWI749547B (en) Speech enhancement system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201120

WW01 Invention patent application withdrawn after publication