CN111968651A

CN111968651A - WT (WT) -based voiceprint recognition method and system

Info

Publication number: CN111968651A
Application number: CN202010865114.8A
Authority: CN
Inventors: 汪金玲
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2020-11-20

Abstract

The invention relates to the technical field of voiceprint recognition, and discloses a voiceprint recognition method based on WT (WT), which comprises the following steps: collecting voiceprint signals to be identified by adopting a microphone array, and filtering the voiceprint signals to be identified by utilizing a sound source separation algorithm based on phase transformation; denoising the filtered vocal print signal by using a denoising algorithm based on a wavelet threshold to obtain a denoised vocal print signal; pre-emphasis and windowing are carried out on the noise-reduced voiceprint signal to be identified; extracting the voiceprint characteristics of the preprocessed voiceprint signals by utilizing a voiceprint characteristic extraction algorithm based on a hair cell function to obtain the voiceprint characteristics of the voiceprint signals to be recognized; and extracting energy parameters in the voiceprint features by using an energy operator, and identifying the voiceprint by using a log-likelihood ratio algorithm. The invention also provides a voiceprint recognition system based on the WT. The invention realizes the identification of the voiceprint.

Description

WT (WT) -based voiceprint recognition method and system

Technical Field

The invention relates to the technical field of voiceprint recognition, in particular to a voiceprint recognition method and system based on WT.

Background

The voice is the most direct, most common and most convenient carrier for realizing information interaction between people, bears and conveys various information resources, and plays an important role in man-machine interaction and information transmission. However, due to the interference of too much noise in real life, the voice quality identified by the voiceprint is poor, so that the voice is not easy to understand by people, the human-computer equipment is difficult to obtain accurate information, and how to effectively complete the voiceprint identification task becomes a hot topic of current research.

In the existing voiceprint recognition technology, only the signal-to-noise ratio of the denoised voice is focused, and the overall quality of the denoised voice is not focused. These voiceprint recognition techniques may remove the weaker portion of the speech signal as noise, so that the background noise is removed and the components of the desired speech signal are also corrupted. Thus, the quality of hearing may not be substantially improved or even reversed when the denoised speech signal is compared to the signal before denoising.

Meanwhile, in the field of speech feature parameters, the widely used features are: MFCC, perceptual linear prediction coefficients, and perceptual log-area ratio coefficients. These features can achieve good recognition accuracy in clean environments, but cannot effectively adapt to noisy environments, especially under noise conditions of less than 10dB, and their performance decreases with decreasing signal-to-noise ratio.

In view of this, how to effectively denoise the voiceprint and extract the voiceprint features adaptable to the noise environment for voiceprint recognition is an urgent problem to be solved by those skilled in the art.

Disclosure of Invention

The invention provides a voiceprint recognition method based on WT (wavelet transform). A voiceprint is subjected to denoising treatment by using a denoising algorithm based on wavelet transform, the existing voiceprint feature extraction algorithm is improved, and the improved voiceprint feature extraction algorithm is used for extracting and enhancing the voiceprint features, so that the voiceprint recognition is finally realized.

In order to achieve the above object, the present invention provides a WT-based voiceprint recognition method, including:

collecting voiceprint signals to be identified by adopting a microphone array;

filtering the voiceprint signal to be identified by utilizing a sound source separation algorithm based on phase transformation;

denoising the filtered vocal print signal by using a denoising algorithm based on a wavelet threshold to obtain a denoised vocal print signal;

pre-emphasis and windowing are carried out on the noise-reduced voiceprint signal to be identified;

extracting the voiceprint characteristics of the preprocessed voiceprint signals by utilizing a voiceprint characteristic extraction algorithm based on a hair cell function to obtain the voiceprint characteristics of the voiceprint signals to be recognized;

and extracting energy parameters in the voiceprint features by using an energy operator, and identifying the voiceprint by using a log-likelihood ratio algorithm.

Optionally, the acquiring the voiceprint signal by using a microphone array includes:

collecting voiceprint signals to be identified by adopting a microphone array, wherein the collected ith channel signal is x_i(t) which is derived from J unknown signal sources s_j(t) a convolution mixed noisy speech signal, said convolution mixed formula being:

wherein:

a (t) is an impulse response;

s_j(t) denotes the jth unknown signal source;

j is the number of unknown signal sources;

b_i(t) is a noise signal;

x_i(t) is the i-th channel signal collected;

t represents a discrete time.

Optionally, the filtering the voiceprint signal to be recognized by using a sound source separation algorithm based on phase transformation includes:

1) calculating the cross-correlation relationship of any two voiceprint signals to be identified:

wherein:

x₁(t)，x₂(t) any two collected voiceprint signals to be identified;

α represents the attenuation of sound propagating from the sound source to the microphone;

s represents a signal emitted by an unknown signal source;

τ represents the time for sound to travel from the sound source to the two microphones;

2) from the cross-correlation relationship and the cross-frequency spectrum relationship, it can be known that:

wherein:

receiving a signal x for a microphone₁(t) and x₂(t) cross-power spectrum;

ω represents the power phase;

3) weighting the power spectrum in a frequency domain to realize filtering processing of a voiceprint signal to be identified, wherein the weighting formula is as follows:

wherein:

representing a phase weighting function;

X₁(ω)，X₂(ω) represents the power spectra of the two microphone received signals resulting from the fourier transform;

receiving a signal x for a microphone₁(t) and x₂(t) cross-power spectrum;

ω represents the power phase.

Optionally, the denoising the filtered voiceprint signal by using a wavelet threshold based denoising algorithm includes:

1) using a low-pass/high-pass filter to scale the voiceprint signal to be identified by a factor of 2^bWhere b denotes the transform level of the wavelet transform, which is set to 2 by the present invention, the sampling expression is:

wherein:

a is a scale factor;

c is a displacement factor;

f (t) is a voiceprint signal to be identified;

t is the time of the voiceprint signal;

l represents the number of sub-bands of the wavelet transform;

according to the sampling expression, firstly decomposing a voiceprint signal f (t) to be identified into two subband signals which have equal length and respectively carry low-frequency and high-frequency information components, and then decomposing the signal of each subband into a next-level subband signal, namely four subband signals by applying the decomposition operation again;

2) setting a wavelet threshold lambda:

wherein:

y is a variable parameter, and when the noise is white noise, y is 1;

n is the signal length of the voiceprint signal to be identified;

sigma is the standard deviation of the wavelet subband signal;

j is the number of wavelet sub-bands;

3) denoising the wavelet sub-band reaching the wavelet threshold by using a denoising function based on a threshold, wherein the denoising function based on the threshold is as follows:

wherein:

λ is wavelet threshold;

p is a positive real number less than the wavelet threshold;

a is a wavelet parameter, which is set to 0.01 by the invention;

b represents the number of transform levels of the wavelet transform, which is set to 2 by the present invention;

l represents the number of sub-bands of the wavelet transform;

is a sampling signal based on wavelet transformation;

w_lthe noise reduction signal of the ith wavelet sub-band.

Optionally, the pre-processing of pre-emphasis and windowing on the noise-reduced voiceprint signal includes:

1) the voiceprint signal is boosted with a pre-emphasis function:

H(z)＝1-az^-1

wherein:

z is a voiceprint signal to be identified;

a is a pre-emphasis coefficient, which is set to 0.912 by the invention;

2) windowing the voiceprint signal by using a Hamming window, wherein the time domain expression of the Hamming window is as follows:

wherein:

n is the frame number of the voiceprint signal to be identified;

and N is the total frame number of the voiceprint signals to be identified.

Optionally, the extracting the voiceprint features of the preprocessed voiceprint signal by using a voiceprint feature extraction algorithm based on a hair cell function includes:

1) processing the preprocessed voiceprint signal with a hair cell function:

h(a，b)＝[H(a，b)]²

wherein:

h (a, b) is a preprocessed voiceprint signal;

2) responding to the hair cell function by using a filter to obtain the output of each hair cell:

wherein:

d represents the window length of the ith frequency band hair cell function;

τ_ithe time length of the center frequency of the ith filter;

l is a frame shift;

3) the output of the hair cell function is subjected to cubic root scale transformation based on a loudness function, the energy value is changed into the perceived loudness, and the calculation formula is as follows:

y(i，j)＝[S(i，j)]^1/3

4) performing decorrelation processing by using discrete cosine transform to obtain voiceprint characteristics as follows:

wherein:

m represents the number of filters;

i represents the ith frequency band;

n represents a sampling of the voiceprint signal at point n.

The extracting the energy parameter in the voiceprint feature by using the energy operator comprises the following steps:

extracting energy parameters in the voiceprint features by using an energy operator, wherein the extraction formula of the energy parameters is as follows:

ψ[f(i，n)]＝[f(i，n)]²-f(i，n+1)f(i，n-1)

wherein:

i represents the ith frequency band of the voiceprint signal;

n represents the sampling of the voiceprint signal at n points;

normalizing the energy parameters and taking logarithm:

wherein:

ψ [ f (i, n) ] represents an energy parameter at n points of the voiceprint signal of the i-th band.

Further, to achieve the above object, the present invention provides a WT-based voiceprint recognition system, including:

the voiceprint acquisition device is used for acquiring a voiceprint to be identified;

the voiceprint processor is used for filtering the voiceprint signal to be identified by utilizing a sound source separation algorithm based on phase transformation and carrying out noise reduction processing on the filtered voiceprint signal by utilizing a denoising algorithm based on a wavelet threshold;

the voiceprint recognition device is used for extracting the voiceprint features of the preprocessed voiceprint signals by utilizing a voiceprint feature extraction algorithm based on a hair cell function to obtain the voiceprint features of the voiceprint signals to be recognized, extracting energy parameters in the voiceprint features by utilizing an energy operator, and recognizing the voiceprint by utilizing a log-likelihood ratio algorithm.

Furthermore, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon voiceprint recognition program instructions executable by one or more processors to implement the steps of the WT based voiceprint recognition implementation method as described above.

Compared with the prior art, the invention provides a voiceprint recognition method based on WT, which has the following advantages:

firstly, aiming at noise signals existing in voiceprints, the invention provides a sound source separation algorithm based on phase transformation, which carries out filtering processing on voiceprint signals to be identified and calculates the cross-correlation relationship between signals received by two microphones

Wherein x₁(t)，x₂(t) any two acquired voiceprint signals to be identified, alpha represents attenuation of sound from a sound source to a microphone, s represents a signal emitted by an unknown signal source, tau represents time of sound from the sound source to the two microphones, and cross-frequency spectra of the voiceprint signals are obtained from a cross-correlation relationship and a cross-frequency spectrum relationship

Wherein

Receiving a signal x for a microphone₁(t) and x₂The cross frequency spectrum of (t) can cause the influence of reverberation and noise in the actual microphone array signal processing model

The peak value of the cross-power spectrum is not obvious, the precision of the time delay estimation is reduced, in order to sharpen the peak value, the cross-power spectrum is weighted in the frequency domain,

wherein

Represents a phase weighting function, which is equivalent to filtering the data, emphasizes the spectral components of the source signal in the received signal,therefore, noise and reverberation interference can be suppressed, and higher delay estimation precision can be obtained.

Meanwhile, aiming at the defects that in the traditional threshold denoising algorithm, the voice processed by the hard threshold denoising algorithm generates oscillation, and the voice processed by the soft threshold denoising algorithm has larger distortion. The invention provides a threshold function based on a wavelet threshold value, which comprises the following steps:

wherein lambda is wavelet threshold, p is positive real number smaller than wavelet threshold, a is wavelet parameter, which is set as 0.01 by the invention, b represents transformation series of wavelet transformation, l represents number of sub-bands of wavelet transformation,

for sampling signals based on wavelet transform, w_lFor the noise reduction signal of the l wavelet sub-band, the new threshold function algorithm considers the rule that the attenuation of the noise wavelet transformation modulus value conforms to the index, is continuous at the threshold, and can adjust parameters, so that the algorithm can be more suitable for different noise intensities, further reduces the coefficient of the noise signal, avoids the traditional processing method that the threshold function is directly set to zero when the wavelet coefficient is smaller than the threshold, and effectively enhances the noise reduction effect.

Finally, because a certain kind of features generally only contains partial voice information, and the original feature parameters reflect the static characteristics of the voice signals, the dynamic feature parameters can reflect the dynamic characteristics of the voice signals. Therefore, the invention adopts the combination of the dynamic and static characteristic parameters to make the dynamic information and the static information form complementation, thereby better describing the dynamic and static characteristics of the voice, and adds an energy operator reflecting energy conversion on the basis of the voiceprint characteristics based on the hair cell function, so that the obtained energy not only represents the auditory perception characteristics of human ears, but also combines the characteristics of the voice instantaneous energy, and also inhibits the influence of zero-mean noise on the voiceprint signals to a certain extent, thereby more completely describing the characteristics of the voiceprint.

Drawings

FIG. 1 is a flowchart illustrating a WT-based voiceprint recognition method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a WT-based voiceprint recognition system according to an embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The voiceprint is subjected to noise reduction processing by using a denoising algorithm based on wavelet transformation, the existing voiceprint feature extraction algorithm is improved, and the improved voiceprint feature extraction algorithm is used for extracting and enhancing the voiceprint features, so that the voiceprint is finally identified. Referring to fig. 1, a diagram of a WT-based voiceprint recognition method according to an embodiment of the present invention is shown.

In this embodiment, the WT-based voiceprint recognition method includes:

and S1, collecting the voiceprint signals to be recognized by adopting a microphone array, and filtering the voiceprint signals to be recognized by utilizing a sound source separation algorithm based on phase transformation.

Firstly, the invention adopts a microphone array to collect the voiceprint signals to be identified, wherein the collected ith channel signal is x_i(t) which is derived from J unknown signal sources s_j(t) a convolution mixed noisy speech signal, said convolution mixed formula being:

wherein:

a (t) is an impulse response;

s_j(t) denotes the jth unknown signal source;

j is the number of unknown signal sources;

b_i(t) is a noise signal;

x_i(t) is the i-th channel signal collected;

t represents a discrete time;

further, a sound source separation algorithm based on phase transformation is utilized to perform filtering processing on the voiceprint signal to be identified, and the filtering processing process comprises the following steps:

wherein:

x₁(t)，x₂(t) any two collected voiceprint signals to be identified;

s represents a signal emitted by an unknown signal source;

wherein:

receiving a signal x for a microphone₁(t) and x₂(t) cross-power spectrum;

ω represents the power phase;

wherein:

representing a phase weighting function;

X_{incense stick}(ω)，X₂(ω) represents the power spectra of the two microphone received signals resulting from the fourier transform;

receiving a signal x for a microphone₁(t) and x₂(t) cross-power spectrum;

ω represents the power phase.

And S2, denoising the filtered vocal print signal by using a denoising algorithm based on a wavelet threshold to obtain a denoised vocal print signal.

Firstly, the invention obtains the voiceprint signals removed by filtering, and carries out noise reduction processing on the voiceprint signals by using a denoising algorithm based on a wavelet threshold, wherein the algorithm flow of the denoising algorithm based on the wavelet threshold is as follows:

wherein:

a is a scale factor;

c is a displacement factor;

f (t) is a voiceprint signal to be identified;

t is the time of the voiceprint signal;

l represents the number of sub-bands of the wavelet transform;

2) setting a wavelet threshold lambda:

wherein:

y is a variable parameter, and when the noise is white noise, y is 1;

n is the signal length of the voiceprint signal to be identified;

sigma is the standard deviation of the wavelet subband signal;

j is the number of wavelet sub-bands;

wherein:

λ is wavelet threshold;

p is a positive real number less than the wavelet threshold;

a is a wavelet parameter, which is set to 0.01 by the invention;

l represents the number of sub-bands of the wavelet transform;

is a sampling signal based on wavelet transformation;

w_lthe noise reduction signal of the ith wavelet sub-band.

And S3, pre-emphasis and windowing pre-processing is carried out on the noise-reduced voiceprint signal to be recognized.

Further, the invention performs pre-emphasis and windowing processing on the noise-reduced voiceprint signal to be identified, wherein the pre-emphasis and windowing processing comprises the following steps:

1) the voiceprint signal is boosted with a pre-emphasis function:

H(z)＝1-az^-1

wherein:

z is a voiceprint signal to be identified;

a is a pre-emphasis coefficient, which is set to 0.912 by the invention;

wherein:

n is the frame number of the voiceprint signal to be identified;

and N is the total frame number of the voiceprint signals to be identified.

S4, extracting the voiceprint characteristics of the preprocessed voiceprint signals by utilizing a voiceprint characteristic extraction algorithm based on a hair cell function to obtain the voiceprint characteristics of the voiceprint signals to be recognized.

Furthermore, the invention utilizes a voiceprint feature extraction algorithm based on a hair cell function to extract the features of the voiceprint signal to be identified, and the voiceprint feature extraction algorithm based on the hair cell function comprises the following steps:

1) processing the preprocessed voiceprint signal with a hair cell function:

h(a，b)＝[H(a，b)]²

wherein:

h (a, b) is a preprocessed voiceprint signal;

wherein:

d represents the window length of the ith frequency band hair cell function;

τ_ithe time length of the center frequency of the ith filter;

l is a frame shift;

y(i，j)＝[S(i，j)]^1/3

wherein:

m represents the number of filters;

i represents the ith frequency band;

n represents a sampling of the voiceprint signal at point n.

And S5, extracting energy parameters in the voiceprint features by using an energy operator, and identifying the voiceprint by using a log-likelihood ratio algorithm.

Further, for the voiceprint feature f (i, n), the energy parameter in the voiceprint feature is extracted by using an energy operator, and the extraction formula of the energy parameter is as follows:

ψ[f(i，n)]＝[f(i，n)]²-f(i，n+1)f(i，n-1)

wherein:

i represents the ith frequency band of the voiceprint signal;

n represents the sampling of the voiceprint signal at n points;

further, the invention normalizes the energy parameters and takes logarithm:

wherein:

Further, the invention utilizes log-likelihood ratio algorithm to compare energy parameter similarity, and the formula of the log-likelihood ratio is as follows:

wherein:

hs and Hd are the voiceprint characteristics obtained by calculation and the voiceprint characteristics of the speaker stored in the system data respectively;

is an energy parameter of the voiceprint to be identified;

y is a voiceprint signal of a voiceprint to be identified;

and finally, if the similarity is smaller than a preset value, judging that the voice of the speaker is not in the data stored in advance.

The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware test environment of the algorithm of the invention is as follows: the processor is an Intel (R) core (TM) i5-8700 CPU 8 core, the display card is GeForce GTX1060, the display memory is 8G, and the development environment is matlab; the comparison algorithms are MFCC, GMM and DFCNN algorithms.

In the algorithm experiment of the invention, the experiment selects mixed voice signals actually collected by 2 microphones, the sampling frequency is 16kHz, the microphone spacing is 10cm, each sound source is at different positions about 1-2 m away from the microphone, 6 speakers (3 males and 3 females) are selected, mixed voices of 2 or 3 speakers are randomly formed, and 5 mixed voices are selected as the experimental voices. The invention identifies the experimental voice by the comparison algorithm and the algorithm provided by the invention, and takes the identification accuracy as the evaluation standard of the algorithm.

According to the experimental result, the accuracy of the voiceprint recognition result of the MFCC algorithm is 86.18%, the accuracy of the voiceprint recognition result of the GMM algorithm is 75.61%, the accuracy of the voiceprint recognition result of the DFCNN algorithm is 90.03%, the accuracy of the voiceprint recognition result of the algorithm provided by the invention is 93.12%, and compared with the comparison algorithm, the voice recognition method provided by the invention has higher voice recognition accuracy.

The invention also provides a WT-based voiceprint recognition system. Referring to fig. 2, a schematic diagram of an internal structure of a WT-based voiceprint recognition system according to an embodiment of the present invention is shown.

In this embodiment, the WT-based voiceprint recognition system 1 comprises at least a voiceprint acquisition device 11, a voiceprint processor 12, a voiceprint recognition device 13, a communications bus 14, and a network interface 15.

The voiceprint acquiring apparatus 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, and a mobile Computer, or may be a server.

The voiceprint processor 12 includes at least one type of readable storage media including flash memory, a hard disk, a multi-media card, card type memory (e.g., SD or DX memory, etc.), magnetic memory, a magnetic disk, an optical disk, and the like. The voiceprint processor 12 may in some embodiments be an internal storage unit of the WT based voiceprint recognition system 1, for example a hard disk of the WT based voiceprint recognition system 1. The voiceprint processor 12 can also be an external storage device of the WT-based voiceprint recognition system 1 in other embodiments, such as a plug-in hard disk provided on the WT-based voiceprint recognition system 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and so forth. Further, the voiceprint processor 12 can also include both an internal memory unit and an external memory device of the WT-based voiceprint recognition system 1. The voiceprint processor 12 can be used not only to store application software and various types of data installed in the WT-based voiceprint recognition system 1, but also to temporarily store data that has been output or is to be output.

The voiceprint recognition device 13 may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip for running program code stored in the voiceprint processor 12 or Processing data, such as voiceprint recognition program instructions.

The communication bus 14 is used to enable connection communication between these components.

The network interface 15 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the system 1 and other electronic devices.

Optionally, the system 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, as appropriate, is used, among other things, to display information processed in the WT-based voiceprint recognition system 1 and to display a visual user interface.

While fig. 2 only shows the system 1 with components 11-15 and WT-based voiceprint recognition, those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of WT-based voiceprint recognition system 1 and may include fewer or more components than shown, or combine certain components, or a different arrangement of components.

In the apparatus 1 embodiment shown in FIG. 2, stored in voiceprint processor 12 are WT-based voiceprint recognition program instructions; the steps of the voiceprint recognition means 13 executing the voiceprint recognition program instructions stored in the voiceprint processor 12 are the same as the method of implementing the WT-based voiceprint recognition method and will not be described here.

Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon voiceprint recognition program instructions, which are executable by one or more processors to implement the following operations:

collecting voiceprint signals to be identified by adopting a microphone array;

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A WT-based voiceprint recognition method, the method comprising:

collecting voiceprint signals to be identified by adopting a microphone array;

2. The WT-based voiceprint recognition method of claim 1 wherein said acquiring a voiceprint signal with a microphone array comprises:

carrying out voiceprint signal identification by adopting microphone arrayCollecting, wherein the collected ith channel signal is x_i(t) which is derived from J unknown signal sources s_j(t) a convolution mixed noisy speech signal, said convolution mixed formula being:

wherein:

a (t) is an impulse response;

s_j(t) denotes the jth unknown signal source;

j is the number of unknown signal sources;

b_i(t) is a noise signal;

x_i(t) is the i-th channel signal collected;

t represents a discrete time.

3. The WT-based voiceprint recognition method according to claim 2, wherein the filtering the voiceprint signal to be recognized by using the phase-transform-based sound source separation algorithm comprises:

wherein:

x₁(t)，x₂(t) any two collected voiceprint signals to be identified;

s represents a signal emitted by an unknown signal source;

wherein:

receiving a signal x for a microphone₁(t) and x₂(t) cross-power spectrum;

ω represents the power phase;

wherein:

representing a phase weighting function;

receiving a signal x for a microphone₁(t) and x₂(t) cross-power spectrum;

ω represents the power phase.

4. The WT-based voiceprint recognition method according to claim 3, wherein the denoising the filtered voiceprint signal using the wavelet threshold-based denoising algorithm comprises:

wherein:

a is a scale factor;

c is a displacement factor;

f (t) is a voiceprint signal to be identified;

t is the time of the voiceprint signal;

l represents the number of sub-bands of the wavelet transform;

2) setting a wavelet threshold lambda:

wherein:

y is a variable parameter, and when the noise is white noise, y is 1;

n is the signal length of the voiceprint signal to be identified;

sigma is the standard deviation of the wavelet subband signal;

j is the number of wavelet sub-bands;

wherein:

λ is wavelet threshold;

p is a positive real number less than the wavelet threshold;

a is a wavelet parameter, which is set to 0.01 by the invention;

l represents the number of sub-bands of the wavelet transform;

is a sampling signal based on wavelet transformation;

w_lthe noise reduction signal of the ith wavelet sub-band.

5. The WT-based voiceprint recognition method according to claim 4, wherein the pre-emphasis and windowing pre-processing of the noise-reduced voiceprint signal comprises:

1) the voiceprint signal is boosted with a pre-emphasis function:

H(z)＝1-az^-1

wherein:

z is a voiceprint signal to be identified;

a is a pre-emphasis coefficient, which is set to 0.912 by the invention;

wherein:

n is the frame number of the voiceprint signal to be identified;

and N is the total frame number of the voiceprint signals to be identified.

6. The WT-based voiceprint recognition method according to claim 5, wherein the extracting the voiceprint features of the preprocessed voiceprint signal by the voiceprint feature extraction algorithm based on the hair cell function comprises:

1) processing the preprocessed voiceprint signal with a hair cell function:

h(a，b)＝[H(a，b)]²

wherein:

h (a, b) is a preprocessed voiceprint signal;

wherein:

d represents the window length of the ith frequency band hair cell function;

τ_ithe time length of the center frequency of the ith filter;

l is a frame shift;

y(i，j)＝[S(i，j)]^1/3

wherein:

m represents the number of filters;

i represents the ith frequency band;

n represents a sampling of the voiceprint signal at point n.

7. The WT-based voiceprint recognition method according to claim 6, wherein said extracting energy parameters from the voiceprint features using an energy operator comprises:

ψ[f(i，n)]＝[f(i，n)]²-f(i，n+1)f(i，n-1)

wherein:

i represents the ith frequency band of the voiceprint signal;

n represents the sampling of the voiceprint signal at n points;

normalizing the energy parameters and taking logarithm:

wherein:

8. A WT-based voiceprint recognition system, said system comprising:

9. A computer readable storage medium having stored thereon voiceprint recognition program instructions executable by one or more processors to perform the steps of a method of implementing WT based voiceprint recognition according to any one of claims 1 to 7.