CN116312561A

CN116312561A - Method, system and device for voice print recognition, authentication, noise reduction and voice enhancement of personnel in power dispatching system

Info

Publication number: CN116312561A
Application number: CN202310297886.XA
Authority: CN
Inventors: 崔兆阳; 衷宇清; 张雄威; 凌健文; 徐武华; 蒋盛智; 彭丽文; 周上; 罗慕尧; 骆雅菲; 刘晨辉; 孔嘉麟; 陈文文; 张思敏; 周菲; 吴若迪; 冯雅雯
Original assignee: Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2023-03-23
Filing date: 2023-03-23
Publication date: 2023-06-23

Abstract

The invention provides a method, a system and a device for voice print recognition, authentication, noise reduction and voice enhancement of power dispatching system personnel, wherein the method comprises the following steps: the calling user sends operation request and voice signal to dispatcher through telephone; separating a calling user voice signal from mixed voices of a calling user and a dispatcher; noise reduction is carried out on the voice signal of the calling subscriber; performing voice enhancement on a voice signal of a calling user; the power dispatching system matches the voice signal of the calling user with the voice signal which is recorded in advance by the personnel with the operation authority by using a trained voiceprint recognition model; and if the matching is successful, allowing the calling user to operate, and if the matching is unsuccessful, not allowing the calling user to operate. The invention can accurately recognize the user voice under the condition of being interfered by current and noise.

Description

Method, system and device for voice print recognition, authentication, noise reduction and voice enhancement of personnel in power dispatching system

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to the technical field of voiceprint recognition, and particularly relates to a method, a system and a device for voiceprint recognition, authentication, noise reduction and voice enhancement of personnel in a power dispatching system.

Background

In power dispatching systems, telephony dispatching is a common fundamental form. When receiving a scheduling instruction of a calling party through a telephone, performing identity authentication and authentication on the calling party is a core problem for improving the security and reliability of a scheduling system.

Voiceprint recognition by voice signals of the dispatch applicant acting as the calling party is one possible way to authenticate it.

When the corresponding operation is carried out, a series of processes such as telephone voice extraction, voice signal preprocessing, deep neural network training based on voice sample signals, judgment and authentication based on actual dispatching voice signals of calling parties are involved.

One factor that has a great influence on the success rate and reliability of voiceprint recognition of power dispatching system personnel is the quality and interference problem of the voice signal extracted by dispatching telephone.

When a dispatcher of a power system makes a call by dispatching the call, problems of noise interference and telephone channel noise interference in the working environment are inevitably encountered. Therefore, how to effectively suppress these two kinds of noise and to pertinently perform speech enhancement is a critical issue for improving the system performance.

Disclosure of Invention

The invention aims to provide a voice print recognition, authentication, noise reduction and voice enhancement method, a voice print recognition, authentication, noise reduction and voice enhancement device for personnel of a power dispatching system.

A voice print recognition, authentication, noise reduction and voice enhancement method for power dispatching system personnel comprises the following steps:

the calling user sends operation request and voice signal to dispatcher through telephone;

separating a calling user voice signal from mixed voices of a calling user and a dispatcher;

noise reduction is carried out on the voice signal of the calling subscriber;

performing voice enhancement on a voice signal of a calling user;

the power dispatching system matches the voice signal of the calling user with the voice signal which is recorded in advance by the personnel with the operation authority by using a trained voiceprint recognition model;

and if the matching is successful, allowing the calling user to operate, and if the matching is unsuccessful, not allowing the calling user to operate.

Separating the caller speech signal from the mixed speech of the caller and dispatcher comprises:

a first voice signal obtained from a telephone terminal;

a side sound eliminating circuit is added in a transmission line of the power dispatching system, and a second voice signal is acquired from a telephone receiver end;

using short-time zero-crossing rate, end point detection and voice energy spectrum to analyze the voice signal intensity of the first voice signal and the second voice signal and compare the signals, and separating out the voice signal of the calling user;

the four voice signals affected by different noises are obtained after separation:

noise reduction of the caller's speech signal includes:

noise reduction is carried out on the voice signal of the calling user by adopting a relevant characteristic method:

assuming that the voice signal of the calling user is mutually incoherent with the environmental noise of the calling user and the noise of a telephone transmission channel, carrying out autocorrelation processing on the noisy signal to obtain an autocorrelation frame sequence of the voice signal without noise:

where s (t) is a clean speech signal, n (t) is a noise signal, w (t) is a window function applied to achieve short-time stationary, R _y (τ) and R _S (τ) is the auto-correlation function of the caller's speech signal with and without noise, respectively;

noise reduction is carried out on the voice signal by adopting a wiener filtering method:

the output s '(t) of the noisy speech signal after passing through the wiener filter satisfies E [ |s' (t) -s (t) | ² ]The wiener filtering method is based on the premise of short-time stable voice signals, and the following formula is obtained for the wiener filter:

in the above formula, h|omega| is impulse response of wiener filter frequency domain, and P _s (ω)，P _n (ω) into a signal power spectrum and a noise power spectrum;

S _O (ω)＝H(ω)·Y(ω)

s in the above _O (ω) is the output signal spectrum of the wiener filter and Y (ω) is the caller noisy telephone speech signal spectrum.

The voice enhancement of the calling user voice signal comprises:

the cepstrum mean-average regular noise reduction CMN method is used for removing noise components in telephone voice signal cepstrum with non-additive noise, and the enhanced voice cepstrum obtained by processing through the CMN method is expressed as follows:

wherein the method comprises the steps of

To enhance cepstrum of speech, C _sn (t) is cepstrum of noisy speech, C _s (t) is a cepstrum of pure speech, < >>

A cepstrum average of the speech segments is collected for the caller.

Using the short-time zero-crossing rate, the end point detection and the voice energy spectrum to perform voice signal intensity analysis and signal comparison on the first voice signal and the second voice signal, and separating the voice signal of the calling user comprises the following steps:

detecting unvoiced sound by using a short-time zero-crossing rate detection algorithm combining short-time energy and zero-crossing rate detection, and detecting voiced sound by using short-time energy;

selecting a corresponding unvoiced model and a corresponding voiced model according to the voiced and unvoiced sounds of the voice signal to detect the voice signal end points so as to obtain the voice signal of the calling user;

the selecting the corresponding unvoiced model and the corresponding voiced model according to the voiced and unvoiced of the voice signal to perform voice signal endpoint detection, thereby obtaining the voice signal of the calling user includes:

when unvoiced, the corresponding unvoiced excitation model is simulated into random white noise, and a sequence with zero mean, 1 variance and white distribution on time and amplitude values is used;

when voiced sound, intermittent pulse waves are generated, and the mathematical expression is as follows:

in the above formula, N1 is the time of the rising part of the oblique triangular wave, and N2 is the time of the falling part thereof;

after the speech signal is framed, the energy of the nth frame of speech signal xn (m) can be expressed as:

the short-time zero-crossing rate is the number of times that the waveform of the voice signal in one frame of voice passes through the horizontal axis, namely the zero level, and can be expressed as:

wherein sgn () is a sign function that evaluates the number of zero crossings by examining whether a sign change on the waveform occurs between the current sampled signal and the last sampled signal;

energy spectrum estimation is carried out on the voice signal of the calling user:

after the speech signal is framed, the energy of the nth frame speech signal xn (m) is expressed as:

extracting the voice signal of the calling user by adopting an autocorrelation method:

the short-time autocorrelation function Rn (k) of the speech signal xn (m) can be expressed as:

the method comprises the steps of obtaining a pitch period of a voice waveform sequence by using an autocorrelation function for a voiced sound signal, obtaining a large difference between a peak amplitude of the autocorrelation function of a noise signal and a noise-containing voice, setting a threshold according to the size of noise, and determining an endpoint of the noise signal, wherein K is the maximum delay point number, and the autocorrelation function is also the periodic function of the same period on the assumption that the voice sequence has periodicity.

The voiceprint recognition model is formed by serially connecting a convolutional neural network CNN and a long-short-term memory network LSTM network.

Before the power dispatching system uses the trained voiceprint recognition model to match the received user voice signal and the voiceprint information which is input in advance, the power dispatching system further comprises the step of training the voiceprint recognition model, specifically:

dividing the preprocessed voice signals into a training set and a testing set;

inputting the training set into a voiceprint recognition model;

outputting a judging result of the voice signal by the voiceprint recognition model;

and iteratively training the voiceprint recognition model until the error rate is smaller than a preset value.

The power dispatching system matches the received user voice signal with the voice signal pre-recorded by the personnel with the operation authority by using a trained voiceprint recognition model, which comprises the following steps:

performing fast Fourier transform on the noise-reduced calling user voice signals to obtain frequency spectrum characteristics corresponding to each sound source signal;

filtering the spectrum characteristics by a Mel filter and then taking the logarithm to obtain a Mel frequency logarithm energy spectrum corresponding to the telephone voice signal of the calling user;

discrete cosine transforming the mel frequency logarithmic energy spectrum to obtain a mel coefficient spectrum corresponding to the voice signal of the calling user;

and carrying out voiceprint recognition processing based on the corresponding Mel coefficient spectrum, judging the identity of the calling user and authenticating.

A power dispatching system personnel voiceprint recognition authentication noise reduction and voice enhancement system, comprising:

the receiving module is used for receiving an operation request and a voice signal sent by a calling user to a dispatcher through a telephone;

the first data processing module is used for separating calling user voice signals from mixed voices of the calling user and the dispatcher;

the second data processing module is used for matching the voice signal of the calling user with the voice signal which is recorded in advance by a person with the operation authority by using a trained voiceprint recognition model by the power dispatching system;

and the result output module is used for allowing the user to operate if the matching is successful, and not allowing the user to operate if the matching is unsuccessful.

The utility model provides a power dispatching system personnel voiceprint discernment authentication noise reduction and speech enhancement device, is connected with power dispatching system personnel voiceprint discernment authentication system through the data transmission route, makes power dispatching system personnel voiceprint discernment authentication device carry out a power dispatching system personnel voiceprint discernment authentication noise reduction and speech enhancement method, includes:

the data acquisition unit is used for receiving an operation request and a voice signal sent by a calling user to a dispatcher through a telephone;

the data processing unit is used for separating calling user voice signals from mixed voices of the calling user and the dispatcher;

the judging unit is used for matching the voice signal of the calling user with the voice signal which is recorded in advance by the personnel with the operation authority by using a trained voiceprint recognition model by the power dispatching system;

and the output unit is used for allowing the user to operate if the matching is successful, and not allowing the user to operate if the matching is unsuccessful.

According to the invention, a calling user sends an operation request and a voice signal to a dispatcher through a telephone; separating a calling user voice signal from mixed voices of a calling user and a dispatcher; noise reduction is carried out on the voice signal of the calling subscriber; performing voice enhancement on a voice signal of a calling user; the power dispatching system matches the voice signal of the calling user with the voice signal which is recorded in advance by the personnel with the operation authority by using a trained voiceprint recognition model; and if the matching is successful, allowing the calling user to operate, and if the matching is unsuccessful, not allowing the calling user to operate. Telephone voice signal extraction can be carried out from the input end and the microphone end of the dispatching telephone at the same time, voices which do not belong to a calling party are removed through voice comparison of the telephone input end and the microphone end, the purification precision of user voice signals is improved, the processed user voice signals can enable a voiceprint recognition model to judge user voice information more accurately, work of a dispatcher is reduced, and dispatching efficiency is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the method for obtaining clean user speech signals according to the present invention;

FIG. 3 is a flow chart of the voiceprint recognition model training of the present invention;

FIG. 4 is a flowchart illustrating the operation of the voiceprint recognition model of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.

Furthermore, the description of "first," "second," etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

The dispatching voice is used as a most direct mode of dispatching a password by a dispatcher, is also a most common carrier for dispatching information transmission, and is more required to be an intelligent dispatching voice processing platform at present with higher and higher artificial intelligence level, so that various dispatching voice information is identified, analyzed and diagnosed, and the dispatcher is assisted to make the most timely response, the most accurate judgment and the most efficient analysis. Time-frequency analysis is a common approach in the field of acoustic signal processing. However, the acoustic signals of the operating dispatcher are inevitably affected by current, noise interference and the like, so that the acoustic signals monitored at different times are changed and have broadband non-stationary characteristics, the time-frequency characteristics of the acoustic signals show a certain complexity, and the acoustic signals are difficult to directly analyze to distinguish different working states of the dispatcher. How to improve the accuracy of the identification of the work state of the scheduler is a problem to be solved.

The voice recognition method based on the neural network is easy to be interfered by external environment noise and other human voices to cause inaccurate recognition results, the method can eliminate the interference of the external environment noise and other human voices to obtain pure target human voice signals, the recognition accuracy of a voice print recognition model is improved, the characteristics extracted by a single convolution network model are single, the recognition results are inaccurate, the voice print recognition model is formed by combining the convolution neural network and a long-term and short-term memory network, and the voice recognition accuracy is greatly improved.

Example 1

s100, a calling user sends an operation request and a voice signal to a dispatcher through a telephone;

s200, separating a calling user voice signal from mixed voices of a calling user and a dispatcher;

s300, noise reduction is carried out on the voice signal of the calling subscriber;

s400, carrying out voice enhancement on the voice signal of the calling user;

s500, the power dispatching system matches the voice signal of the calling user with the voice signal which is recorded in advance by the personnel with the authority of the operation by using a trained voiceprint recognition model;

s600, if the matching is successful, the calling user is allowed to operate, and if the matching is unsuccessful, the calling user is not allowed to operate.

S200, separating the calling user voice signal from the mixed voice of the calling user and the dispatcher comprises the following steps:

s201, a first voice signal is acquired from a telephone terminal;

s202, a side sound eliminating circuit is added in a transmission line of the power dispatching system, and a second voice signal is acquired from a telephone receiver end;

s203, performing voice signal intensity analysis and signal comparison on the first voice signal and the second voice signal by using the short-time zero-crossing rate, the end point detection and the voice energy spectrum, and separating out the voice signal of the calling user;

the noise characteristics of the system are considered, and signals can be amplified appropriately according to the intensity of the energy spectrum of the system, so that the telephone intensity signals of calling and called persons at different acquisition ends are close. The noise of the telephone can be extracted by canceling the telephone end caller and the telephone receiver end caller, the noise of the telephone can be canceled by canceling the telephone end callee and the telephone receiver end callee, and then the noise signal of the telephone extracted in advance can be eliminated, and the mute time n under the current call state can be obtained _{Caller ambient noise + telephone transmission channel noise} Is not affected by noise. The noise effect is compared with the expression of the caller at the telephone end, and the noise effect can be decomposed

The method can greatly improve the noise suppression characteristic of the telephone collected voice signals and improve the accuracy of subsequent voiceprint recognition of the calling person.

The endpoint detection may be developed based on a number of different methods, such as a dual-threshold method, an autocorrelation method, a spectral entropy method, a scaling method, and a logarithmic spectral distance method.

Double threshold method: short-time energy detection can better distinguish between voiced sounds and silence. For unvoiced sound, because the energy is smaller, the energy is misjudged as silence because the energy is lower than an energy threshold in short-time energy detection; short-time zero-crossing detection can then distinguish silence from unvoiced speech. The two aspects are combined with each other, so that a voice segment and a mute segment can be detected.

Autocorrelation method: the short-time autocorrelation function Rn (k) of the speech signal xn (m) can be expressed as:

where K is the maximum delay point number.

The autocorrelation function of a speech sequence is also a periodic function of the same period, assuming that the speech sequence has periodicity. The autocorrelation function may be used to find the pitch period of the speech waveform sequence for a voiced signal. The autocorrelation function of the noise signal and the noise-containing voice has a large difference in peak amplitude, a proper threshold is set according to the size of the noise, whether the corresponding voice signal exists or not is judged, and the endpoint of the voice signal is determined.

Log spectral distance method: let the noise-containing speech signal be x (N), the i-th frame speech signal xi (m) obtained after windowing and framing processing, and the frame length be N. FFT (fast fourier transform) is performed for xi (m), and it is possible to obtain:

taking the modulus value of the frequency spectrum Xi (k) and then taking the logarithm, the method can obtain:

because the energy spectra of the noise signal and the noise-containing speech signal differ significantly (the noise signal energy spectrum is much lower than the noise-containing speech signal energy spectrum), the end point of the speech signal can be determined by the logarithmic spectral difference between the two frames of signals.

By combining short-time zero-crossing rate, endpoint detection voice and energy spectrum judgment, the voice signals extracted by the two different methods are subjected to signal comparison, so that the voice signals of the calling party in the power dispatching system can be effectively extracted, and the voice signals are used for subsequent voiceprint recognition deep learning neural network frame training of the voice signals and voiceprint recognition identity judgment and authentication of the calling party.

In the processing process, the voice signals obtained by the receiver interface of the telephone handle end can cause obvious strength distinction of the voice signals between the calling party and the called party due to the existence of the telephone side-sound eliminating circuit, and the signals of the calling party and the called party can be effectively segmented and intercepted by combining short-time zero-crossing rate and end point detection.

S300, noise reduction is carried out on the voice signal of the calling user, and specifically, the method comprises the following steps:

assuming that the calling user voice signal is mutually incoherent with the calling user environment noise and the telephone transmission channel noise, carrying out autocorrelation processing on the noisy signal to obtain an autocorrelation frame sequence similar to the voice signal without noise:

S _O (ω)＝H(ω)·Y(ω)

For the voice print recognition system of the power dispatching telephone, noise introduced by the working environment of the dispatching personnel, the transmission channel of the dispatching telephone and the telephone itself causes that when the voice print recognition processing is carried out, the collected voice signals of the calling party have larger background and interference noise deviation compared with the voice signals adopted when the large-scale voice sample is trained because of the problems of reduction of voice quality and interference of the calling party, thereby greatly reducing the voice print recognition rate.

In order to effectively improve the success rate of voiceprint recognition of the system, it is necessary to reduce as much as possible the background interference in the caller's voice signal, the interference of the telephone transmission channel, and the interference introduced by the telephone itself.

The available noise reduction and speech enhancement methods are as follows:

active noise reduction: the method is based on the superposition principle of sound waves, namely, noise removal is realized through mutual cancellation of the sound waves. By finding a sound exactly the same as the noise spectrum to be cancelled, only the opposite phase is added, thus canceling the noise. The difficulty with this approach is that the frequency of the noise is integrated with the frequency spectrum of the speech signal, making it difficult to find a sound with exactly opposite phase, and to perform subsequent noise cancellation.

The characteristic extraction method for speaker identification is classified and arranged, and the characteristic extraction method of the noise-free compensation technology is classified into the following categories for explanation: high/low level based feature extraction, type of transformation, speech generation/hearing system, type of feature extraction technique, time-variability, speech processing technique. In addition, the noise compensation characteristic extraction method is divided into a noise shielding characteristic, a characteristic normalization method and a characteristic compensation method.

non-Negative Matrix Factorization (NMF) algorithms based on sparse constraints. The NMF algorithm based on sparse constraint of the Mel frequency spectrum is used by adopting a method of matrix decomposition based on the Mel frequency spectrum as data in combination with the common amplitude frequency spectrum or Mel frequency spectrum characteristics and the non-negative matrix decomposition principle. Existing sparsely constrained NMF algorithms use fixed noise and speech dictionaries, and when the noise of noisy speech and the noise dictionary do not match, the denoising performance is reduced.

The spectral subtraction is combined with an ideal binary masking (Ideal Binary Mask, IBM) algorithm to mask the speech to be enhanced first, then to spectrally subtract the noise.

The noise reduction process is to separate the environmental audio signal, telephone channel signal, telephone interference signal and speaker voice signal to obtain purer caller voice information.

After noise reduction treatment is carried out on the collected voice information of the calling party, voiceprint recognition matching is carried out on the collected voice information of the calling party and a prerecorded speaker audio signal.

S400 speech enhancement of the caller' S speech signal comprises:

wherein the method comprises the steps of

Collecting a cepstrum average value of a voice section for a calling person;

homomorphism filtering method: for additive noise, a linear processing method can be adopted, and for non-additive noise, a homomorphic filtering method can be adopted for processing. Because cepstrum signals are widely used in speech signal processing, the noise reduction goal can be achieved based on the process of cepstrum processing. After the convolution signal passes through the homomorphic filter, the convolution operation becomes summation operation of complex cepstrum, so that multiplicative noise can be separated. And finally, extracting tone parameters from the complex cepstrum, and obtaining corresponding formants through spectrum analysis, so that the noise-reduced voice signal can be further obtained. Noise components in the telephone voice signal cepstrum of the calling party with non-additive noise can be removed by using a cepstrum average value regular noise reduction (Cepstral Mean Normalization, CMN) method, so that the voice quality is improved.

S203, performing voice signal strength analysis and signal comparison on the first voice signal and the second voice signal by using the short-time zero-crossing rate, the end point detection and the voice energy spectrum, and separating the voice signal of the calling user comprises:

s2031, detecting unvoiced sound by a short-time zero-crossing rate detection algorithm combining short-time energy and zero-crossing rate detection, and detecting voiced sound by short-time energy;

s2032, selecting a corresponding unvoiced model and a corresponding voiced model according to the voiced and unvoiced sounds of the voice signal to detect the voice signal end point so as to obtain the voice signal of the calling user.

S2032 selects a corresponding unvoiced model and a corresponding voiced model according to voiced and unvoiced sounds of the voice signal, and performing voice signal endpoint detection to obtain the voice signal of the calling user includes:

S500 before the power dispatching system uses the trained voiceprint recognition model to match the received user voice signal and the voiceprint information which is input in advance, S410 is also included to train the voiceprint recognition model, specifically:

s411, dividing the preprocessed voice signals into a training set and a testing set;

s412, inputting the training set into a voiceprint recognition model;

s413, outputting a judgment result of the voice signal by the voiceprint recognition model;

s414, iteratively training the voiceprint recognition model until the error rate is smaller than a preset value.

S500, the power dispatching system matches the received user voice signal with the voice signal pre-recorded by the personnel with the operation authority by using a trained voiceprint recognition model, which comprises the following steps:

s501, performing fast Fourier transform on the noise-reduced calling user voice signals to obtain frequency spectrum characteristics corresponding to each sound source signal;

because speech waves are a non-stationary process, standard fourier transforms applied to periodic, transient or stationary random signals cannot directly represent the speech signal, but rather the spectrum of the speech signal should be processed using short-time fourier transforms. The corresponding spectrum is called the short-term spectrum.

S502, filtering the frequency spectrum characteristics by a Mel filter and then taking the logarithm to obtain a Mel frequency logarithm energy spectrum corresponding to the telephone voice signal of the calling user;

s503, carrying out discrete cosine transform on the Mel frequency logarithmic energy spectrum to obtain a Mel coefficient spectrum corresponding to the calling user voice signal;

s504, voiceprint recognition processing is carried out based on the corresponding Mel coefficient spectrum, and the identity of the calling user is judged and authenticated.

Example 2

Example 3

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The voice print recognition, authentication, noise reduction and voice enhancement method for the personnel of the power dispatching system is characterized by comprising the following steps of:

noise reduction is carried out on the voice signal of the calling subscriber;

performing voice enhancement on a voice signal of a calling user;

2. The method of claim 1, wherein the step of separating the caller's voice signal from the mixed voice of the caller and the dispatcher comprises:

a first voice signal obtained from a telephone terminal;

3. the method for voice print recognition, authentication, noise reduction and voice enhancement of power dispatching system personnel according to claim 1, wherein the step of noise reduction of the voice signal of the calling party comprises the steps of:

S _O (ω)＝H(ω)·Y(ω)

4. The method for voice print recognition, authentication, noise reduction and voice enhancement of power dispatching system personnel according to claim 1, wherein the voice enhancement of the calling user voice signal comprises:

wherein the method comprises the steps of

To enhance cepstrum of speech, C _sn (t) is cepstrum of noisy speech, C _s (t) is a cepstrum of pure speech,

a cepstrum average of the speech segments is collected for the caller.

5. The method for voice print recognition, authentication, noise reduction and voice enhancement of power dispatching system personnel according to claim 2, wherein the steps of performing voice signal strength analysis and signal comparison on the first voice signal and the second voice signal by using short-time zero-crossing rate, end point detection and voice energy spectrum, and separating the voice signal of the calling party include:

6. The method for voice print recognition, authentication, noise reduction and voice enhancement of power dispatching system personnel according to claim 1, wherein the voice print recognition model is formed by serially connecting a convolutional neural network CNN and a long-short-term memory network LSTM network.

7. The method for voice print recognition, authentication, noise reduction and voice enhancement of personnel in a power dispatching system according to claim 1, wherein before the power dispatching system matches the received user voice signal with the voice print information recorded in advance by using a trained voice print recognition model, the method further comprises training the voice print recognition model, specifically comprises the following steps:

dividing the preprocessed voice signals into a training set and a testing set;

inputting the training set into a voiceprint recognition model;

8. The method for voice print recognition, authentication, noise reduction and voice enhancement of personnel in a power dispatching system according to claim 1, wherein the step of matching the received user voice signal with a voice signal pre-recorded by the personnel with the authority of the operation by using a trained voice print recognition model comprises the following steps:

9. A power dispatching system personnel voiceprint recognition authentication noise reduction and voice enhancement system, comprising:

10. The power dispatching system personnel voiceprint recognition, authentication, noise reduction and voice enhancement device is characterized in that the device is connected with a power dispatching system personnel voiceprint recognition, authentication system through a data transmission path, so that the power dispatching system personnel voiceprint recognition, authentication device executes the power dispatching system personnel voiceprint recognition, authentication, noise reduction and voice enhancement method in claims 1-8, and the method comprises the following steps: